in french
 Home Page       Is COVID-19 sensitive to weather ?      

Study 1: data by country

Countries are not infected in the same way by COVID-19
Some countries have 0 deaths declared; in some other countries there are tens of thousands of deaths
Why such differences between countries ?

For each country, we have collected these DATA:
- number of inhabitants
- population density
- wealth by inhabitant
- average life span (from 52 years old in Angola to 84 years old in Japan)
- average age (from 16 years old in Chad to 43 years old in Italy or Japan)
- quality of health system
- freedom of the press (from 0 in North Korean to 77 in Sweden)
- temperature on average in 2020 April, in the biggest city of the country
- number of deaths (declared !), by million of inhabitants, from COVID-19, in April and May

Let's analyze the correlation matrix
(A correlation matrix is a table showing correlation coefficients between variables. Each cell in the table shows the correlation between two variables. A correlation matrix is used to summarize data, as an input into a more advanced analysis, and as a diagnostic for advanced analyses. The value correlation ranges from -1 to +1. +1 describes a perfect positive correlation -1 describes a perfect negative correlation 0 means no linear correlation)


4 features are highly correlated: wealth by inhabitant, average life span, average age and quality of health system.
Indeed, in a prosperous country, people live a long time, the average age is high and the quality of the health system is better

The number of deaths declared by million of inhabitants is very correlated with these 4 features.
That may sound contradictory, but the better the quality of the health care system, the more deaths from Covid-19. It's a side effect.
A good health care system has a consequence a long life span and an high percentage of older population, who themselves have the consequence of an increased death rate from COVID-19

The freedom of press and the number of deaths from COVID-19 are positively correlated: the less freedom of the press, the lower the death rate declared

Temperatures and the wealth by inhabitant are negatively correlated: the poorer a country, the hotter its weather

Temperatures are also negatively correlated with the number of deaths

But beware: an effect is correlated to its cause but two effects will also be correlated between them
1) Does an increase in temperature reduce the number of deaths ?
2) Or are there fewer deaths in countries where it is hot because they are poor countries with a young population and a low life span ?

We cannot answer that question by only analysing our matrix of correlations.
We need to deepen our study: we have to train a machine learning algorithm

The gradient boosting algorithm is considered to be the most reliable machine learning algorithm with the best results
We have trained the algorithm with our dataset

The algorithm has given us the features it considered important to explain the differences in the number of COVID-19 deaths between countries.
The most important of our variables is the average age of the population.
The temperature has virtually zero impact, and is not retained by the algorithm.

Conclusion: the temperature has no significant impact on the evolution of the pandemic. What a pity!

Our algorithm can be improved by adding other features to our dataset: we are going to do that to the next few days

Our data sources::

data about COVID-19:
freedom of the press :
quality of health system:
average life span:
average age, by country:
wealth by inhabitant, by country:
population density:
number of inhabitants:
Temperature on 2020 April:

Study 2: data by US-county

For each of the 3242 counties in the US, we have collected data on:
- number of inhabitants
- area
- density of population
- distribution by age group
- percentage of graduates
- containment index through Google Mobility
- mean of temperature in March 2020
- mean of percentage of humidity in March 2020
- evolution of epidemic: number of cases and number of deaths in March/April 2020

We have trained our algorithms on our data sets.
The objective is to identify the features which are important and which have an impact on the evolution of the epidemic.

We have been careful of biases.

Our data sources:


Bad news: there is no correlation between the temperature and the speed of evolution of the COVID-19.
The United States is a very large country. In March 2020 a vast range of temperatures could be found there: betwwen 20 F and 90 F

It's the same thing for the humidity and atmospheric pressure: there is no correlation

We have built a correlation matrix heat map: the correlation coefficients between weather and evolution of the epidemic are insignificant