Coronavirus
Update 5/10/2020: I have not found it as personally useful to make these graphs everyday, as I did when I started two months ago. Not only is the information overly abundant, I don’t find it as helpful any more. I’ll still try to update them periodically.
There is excellent information and data about COVID-19 online, particularly from Johns Hopkins.
I find data helpful, so here are a number of time series figures and animations.
Initially, I found the Coronavirus pandemic both deeply troubling and fascinating. It was hard to find up-to-date time series figures of U.S. cases at first (which is no longer a problem), so I made some. This mostly started as an excuse to practice automating data acquisition and visualization, while scratching my curiosity about this disease. I’m primarily using data compiled by Johns Hopkins, the New York Times, and the COVID Tracking Project. While I can’t attest to the veracity of this data, I am trying to reproduce it accurately. I’ll probably update it about once a day. These data sets are updated at different times so, the number for a given region won’t always match. I’ve noted which data set each figure is using.
To be absolutely clear, I don’t know anything about epidemiology or public health. For real, trustworthy information, best to go to the above resources. The CDC and WHO also have useful resources. Our World in Data has an excellent site with lots of information and figures. More data and nicer graphs can be found here and here and here and here and here and here and here. This article is really useful. The reports from the Imperial College London MRC Center are important. A group at UW has an influential statistical model for projecting the growth of the disease that updates daily (though potential caveats to this model and how to interpret it are probably important to keep in mind).
The most important caveat: This data should be viewed as what we know, not what there is. Most of the plots shown here are of confirmed cases of the disease or deaths. While there are several important caveats to looking at this data, the most important is that the growth of confirmed cases through time and their distribution through space (e.g. between regions, states, countries) does not necessarily tell you the prevalence of the disease itself. The mapping between those two things, confirmed cases and actual presence, is not well known, as I understand it. It depends on many things, both in space and time. This article explains the issue well.
United States
Coronavirus (COVID-19) cases in the U.S. on linear (top) and log (bottom) axes.
Another disclaimer: the reported numbers are changing very fast. By the time these figures are made, they are out of date. These are the numbers of confirmed (and presumptive) cases. The actual number of infections is likely (much) larger.
The numbers of confirmed cases are surely linked to the prevalence of testing in the United States.
Below is the time series of confirmed cases in several U.S. States on linear (top) and log (bottom) axes.
Here is how the number of cases for each state has evolved since the day cases in every state exceeded 50.
Here is the time series of deaths in the U.S..
Here is a map of North American Cases:
Note that the size and the color of the circle reflect the number of cases, the size on a linear scale and the color on a logarithmic scale.
The U.S. data is displayed here by County, while all other data is displayed by Province (where available) or Country.
Here is an animation of North American cases over time :
Here’s a look at just the continental U.S.
Here’s an animation of deaths due to COVID-19 in the continental U.S.
And finally, the density of deaths in the U.S. This is the deaths per 100,000 people in the population. Because this data is at the county level, and some counties are very small, this can be skewed a bit.
Here we can see the time series of deaths per 100,000 in the population at the state level.
Around the Globe
Global coronavirus cases on linear (left) and log (right) axes. Also showing cases in Mainland China:
The recovered cases number may not be super reliable according to Johns Hopkins.
Here are the time series of confirmed COVID-19 cases in several highly-affected countries, as well as the global sum and a couple other less-affected countries for reference. Linear (left) and log (right) axes:
The following shows the cases in these countries in the days since confirmed cases in each country exceeded 100.
All other countries are shown in then grey lines. The thin black line is a fit to the U.S. cases (red with black circles) for all days since March 3rd (the day U.S. cases exceeded 100). This fit is just for reference and is in no way a prediction. In the countries with the largest number of cases, the growth rate has slowed after the initial steep rise, often due to mitigation measures.
Note that the first data for China has several hundred cases, so I’ve shifted it by 5 days for better reference to other countries. The un-shifted line for China is in light grey/red to the left, for reference.
The black fit line is roughly equivalent to a doubling time of 2.6 days (as of 4/5). This is not a prediction. A fit to just the last 7 days in the U.S. is less steep compared to the whole record, roughly equivalent to a doubling time of 5.5 days (as of 4/5). The growth rate is slowing in the US, and has been slowing for the last week or so.
The trajectory of the U.S. does not look great compared to other countries at the same point on the growth curve. In fact, it is the worst of all countries in the world. Unless something changes, using the fit from the last 7 days, the U.S. will have over a million cases within 9 days (of 4/5).
On a personal note, I work with exponential curves regularly in my normal research. I thought I understood them and their implications pretty intuitively. But I had no idea what they felt like. Here is I think the most sobering implication of exponential growth. The number of current cases in the U.S. is a pretty staggeringly large number. Unless the growth rate begins to significantly slow even more than it has, it won’t be long before the daily increase in cases is larger than the total sum of cases up to today. Further, you could say the same statement at any time in the future (or past) and it would still be true.
Here are the time series of deaths due to COVID-19 in some highly-effected countries, as well as the global sum, on linear (left) and log (right) axes:
We can make a similar normalized figure to the one above, but this time for deaths in the days since a country’s deaths exceed 50. The superficial picture here actually varies quite a bit, depending on the arbitrary threshold one chooses (e.g. >50 deaths, or 10 or 100). But some differences are robust, like the differences in the curves of China, Italy, and Korea for example.
Here is a map of Global Cases.
Note that the size of the circle is linear with the number of cases and the color is logarithmic with the number of cases. The data in the U.S. is presented at the county level, a much finer scale of jurisdiction than in any other country.
Here’s another animation, this time just of the Northern Hemisphere. Again the size of the circles scale linearly with the number of cases, while color scale is logarithmic.
Mortality rate
Here is the long term mortality rate (# deaths / # confirmed cases). This is probably not super accurate since there may be a large number of existing but unconfirmed cases, and/or unreported deaths. Again, I don’t know anything about epidemiology.
The large initial spike in the U.S. “mortality” rate, is likely just reflecting small number statistics.
Another issue with the above estimate of mortality rate (in addition to the large uncertainty in the denominator) is that the disease takes a while to run its course. So there may be lags in the comparison of deaths and confirmed cases.
I thought the following was an interesting figure. It is the number of deaths as a function of the size of the outbreak (i.e. # of confirmed cases) rather than as a function of time. I have no comment on the interpretation of this figure, I just found it interesting. The diagonal black lines show percentages for reference to the figure above.
Testing for COVID-19 in the United States
The following is data about testing for COVID-19 in the United states. This data is from the COVID Tracking Project. There are some caveats to this data (visit the link for details) and they are not updated at exactly at the same time as the Johns Hopkins data above. The total number of people tested over time is in blue. The number tested positive is in red (i.e. those with the virus) and the number tested negative is in green (those without). Deaths are in black.
When there has been testing, the virus is found something like 10% of the time, on average. This might be important for how to think about the above time series of confirmed cases. Again, I don’t know anything about epidemiology.
Below you can see a map of cases that have tested positive for COVID-19 (i.e. people that have the virus):
The next maps shows the total reported deaths due to COVID-19.
The next two maps show the total people tested (blue) followed by those tested negative (green).
As more data becomes available, I'll plot numbers of people hospitalized in the above plot. At the moment data is only available for a few states.
If any of the two people reading this want the code or have suggestions of other things to look at, please feel free to get in touch:
This page was started 3/11/2020