6th July 2020
A visual thank you to our contributing sites
The COVID-19 Critical Care Consortium is an international collaboration of hundreds of hospital sites from dozens of countries around the world. Our sites have been slowly but steadily gathering data for critical cases of COVID-19 since earlier in the year, and we thought it would be good to show just how international the collaboration is.
To that end, in my role as one of the technical leads (responsible for the data pipeline and ingestion), I created out a simple (and no-risk) data product simply containing a list of our participant sites, and how many patients they have enrolled in our study at a given date. The details for those patients are - of course - not contained in this data product in any way.
This is what we’re going to be making.
So let’s start things up, and load in the enrolment data.
00543-Medical University of Vienna | 00544-Lancaster General Health | 00546-Penn Medicine | 00547-Oklahoma Heart Institute | 00548-UH Cleveland Hospital | |
---|---|---|---|---|---|
date_enrolment | |||||
2020-09-24 | 9 | 39 | 42 | 2 | 26 |
2020-09-25 | 9 | 39 | 42 | 2 | 26 |
2020-09-26 | 9 | 39 | 42 | 2 | 26 |
2020-09-27 | 9 | 39 | 42 | 2 | 26 |
2020-09-28 | 9 | 39 | 42 | 2 | 26 |
Great, so you can see we have the vetical axis representing the date, and each site is a column, with its numeric identifier first. Lets remove those IDs, because they are useful in a database, but not for us.
sites = [c.split("-", 1)[1] if "-" in c else c for c in data.columns]
print(sites[60:70])
['Fukuoka University', 'Mater Dei Hospital', 'Yokohama City University Medical Center', 'Nagoya University Hospital', 'PICU Saiful Anwar Hospital', 'Adult ICU Saiful Anwar Hospital', 'KourituTouseiHospital', 'HokkaidoUniversityHospital', 'ChibaUniversityHospital', 'UniversityofAlabamaatBirminghamHospital']
Looking good so far. However, I know that some of the sites have - for some reason - had all whitespace removed. So lets write a small parser to go from CamelCase to normal words.
['Fukuoka University', 'Mater Dei Hospital', 'Yokohama City University Medical Center', 'Nagoya University Hospital', 'PICU Saiful Anwar Hospital', 'Adult ICU Saiful Anwar Hospital', 'Kouritu Tousei Hospital', 'Hokkaido University Hospital', 'Chiba University Hospital', 'Universityof Alabamaat Birmingham Hospital']
This is probably as good as we can get it right now. So lets copy the dataframe so we don’t clobber the original, and update the columns.
data_fixed = data.copy()
data_fixed.columns = sites_fixed
data_fixed.iloc[-5:, :5]
Medical University of Vienna | Lancaster General Health | Penn Medicine | Oklahoma Heart Institute | UH Cleveland Hospital | |
---|---|---|---|---|---|
date_enrolment | |||||
2020-09-24 | 9 | 39 | 42 | 2 | 26 |
2020-09-25 | 9 | 39 | 42 | 2 | 26 |
2020-09-26 | 9 | 39 | 42 | 2 | 26 |
2020-09-27 | 9 | 39 | 42 | 2 | 26 |
2020-09-28 | 9 | 39 | 42 | 2 | 26 |
Now, I do want to make an animation out of this, with more frames than days, so we’ll just do a super simple interpolation to add extra evenly spaced datetimes that will correspond to each frame. In addition, I’ll start the clock ticking from February first.
And I also want to have the animation flash or brighten a bit when sites add new patients, so to get a feel for that, we’ll simply take the difference in rows (and fillna to put zero in the first row).
data_fixed_change = data_fixed.diff().fillna(0)
Each site obviously represents a specific physical location on the planet. Alas, I do not know this - all I have is a name. So, lets use opencage
to do a search for each site name, and extract the latitude and longitude for each site if we can find it. I don’t expect this to work for them all, but I’d rather manually look up ten sites than a hundred.
Let’s set up the library with our token to start with:
from opencage.geocoder import OpenCageGeocode
key = "" # The trial version allows you do this all for free
geocoder = OpenCageGeocode(key)
And then write a little function that - when given a query - will try and find the latitude, longitude and country. If it can’t find anything, we’ll return None and I’ll do it myself.
def get_lat_long_from_site(query):
results = geocoder.geocode(query)
if not len(results):
print(f"{query} unable to be located")
return None
lat = results[0]['geometry']['lat']
long = results[0]['geometry']['lng']
country = results[0]["components"]["country"]
return (lat, long, country)
And to make sure I don’t spam this API over and over, we’ll run this once, save it out to JSON, and when I run this again in the future we can just read the file in.
We now have 148 sites ready to go!
Great! Onto the next part…
Our dataframe is broken up into a lot of rows, where each row now represents a frame in the animation. Lets write a function to extract a row and put it into something easier to work with when plotting.
def get_row(date):
row = data_fixed.loc[date].to_frame().reset_index()
change = data_fixed_change.loc[date].to_frame().reset_index()
row.columns = ["site", "enrolment"]
change.columns = ["site", "change"]
row = row.merge(change, on="site")
row["date"] = date
row["coord"] = row["site"].map(coords)
row["lat"] = row["coord"].str[0]
row["long"] = row["coord"].str[1]
row["country"] = row["coord"].str[2]
row = row.drop(columns="coord")
# Manually fix up the issues to separate HK and China
hk = np.abs(row.lat - 22.3) < 0.2
row.loc[hk, "country"] = "Hong Kong"
np.random.seed(1)
row.loc[hk, "lat"] += np.random.normal(scale=0.5, size=hk.sum())
row.loc[hk, "long"] += np.random.normal(scale=0.5, size=hk.sum())
return row
test_row = get_row(data_fixed.index.max())
test_row
site | enrolment | change | date | lat | long | country | |
---|---|---|---|---|---|---|---|
0 | Medical University of Vienna | 9.0 | 0.0 | 2020-09-28 | 48.208490 | 16.372080 | Austria |
1 | Lancaster General Health | 39.0 | 0.0 | 2020-09-28 | 54.016293 | -2.793612 | United Kingdom |
2 | Penn Medicine | 42.0 | 0.0 | 2020-09-28 | 39.957043 | -75.197520 | United States of America |
3 | Oklahoma Heart Institute | 2.0 | 0.0 | 2020-09-28 | 36.029075 | -95.869532 | United States of America |
4 | UH Cleveland Hospital | 26.0 | 0.0 | 2020-09-28 | 41.504861 | -81.605748 | United States of America |
5 | Ohio State University | 121.0 | 0.0 | 2020-09-28 | 40.005709 | -83.028663 | United States of America |
6 | North Estonia Medical Centre, Tallin | 20.0 | 0.0 | 2020-09-28 | 59.396168 | 24.698524 | Estonia |
7 | Tartu University Hospital, Tartu | 14.0 | 0.0 | 2020-09-28 | 58.369456 | 26.700090 | Estonia |
8 | National Taiwan University Hospital | 1.0 | 0.0 | 2020-09-28 | 25.016828 | 121.538469 | Taiwan |
9 | Hospital for Tropical Diseases, Vietnam | 1.0 | 0.0 | 2020-09-28 | 10.753047 | 106.678478 | Vietnam |
10 | Keimyung University Dong San Hospital | 2.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
11 | Groote Schuur Hospital | 207.0 | 0.0 | 2020-09-28 | -33.941170 | 18.462639 | South Africa |
12 | The Heart Hospital Baylor Plano | 1.0 | 0.0 | 2020-09-28 | 33.014781 | -96.789958 | United States of America |
13 | Baylor University Medical Centre | 8.0 | 0.0 | 2020-09-28 | 32.786683 | -96.781894 | United States of America |
14 | Baylor Scott | 3.0 | 0.0 | 2020-09-28 | 31.077858 | -97.362730 | United States of America |
15 | Hospital Nuestra Señora de Gracia Zaragoza | 20.0 | 0.0 | 2020-09-28 | 32.502700 | -117.003710 | Mexico |
16 | São João Hospital Centre, Portugal | 1.0 | 0.0 | 2020-09-28 | 39.980168 | -8.318848 | Portugal |
17 | Piedmont Atlanta Hospital, USA | 40.0 | 0.0 | 2020-09-28 | 33.789048 | -84.371886 | United States of America |
18 | Washington University in St. Louis, USA | 20.0 | 0.0 | 2020-09-28 | 38.647240 | -90.308402 | United States of America |
19 | Medical College of Wisconsin, USA | 31.0 | 0.0 | 2020-09-28 | 43.043867 | -88.022453 | United States of America |
20 | Policlinico di S. Orsola, Università di Bologn... | 5.0 | 0.0 | 2020-09-28 | 44.496530 | 11.353080 | Italy |
21 | Fundación Cardiovascular de Colombia, Colombia | 2.0 | 0.0 | 2020-09-28 | -20.175830 | -48.688890 | Brazil |
22 | INOVA Fairfax Medical Center, USA | 3.0 | 0.0 | 2020-09-28 | 41.620447 | -86.228800 | United States of America |
23 | Hospital de Clínicas, Argentina | 7.0 | 0.0 | 2020-09-28 | -34.599502 | -58.400386 | Argentina |
24 | Allegheny General Hospital | 4.0 | 0.0 | 2020-09-28 | 40.456969 | -80.003311 | United States of America |
25 | Clinica Alemana De Santiago | 1.0 | 0.0 | 2020-09-28 | -33.391999 | -70.572619 | Chile |
26 | Kyoto Medical Centre | 1.0 | 0.0 | 2020-09-28 | -1.246694 | 116.862195 | Indonesia |
27 | Hiroshima University | 4.0 | 0.0 | 2020-09-28 | 34.401977 | 132.712320 | Japan |
28 | Stanford University | 1.0 | 0.0 | 2020-09-28 | 37.431314 | -122.169365 | United States of America |
29 | Tufts Medical Centre | 1.0 | 0.0 | 2020-09-28 | 42.349559 | -71.063411 | United States of America |
30 | Carilion Clinic | 13.0 | 0.0 | 2020-09-28 | 37.088584 | -80.505764 | United States of America |
31 | Beth Israel Deaconess Medical Center | 11.0 | 0.0 | 2020-09-28 | 36.858403 | -76.305431 | United States of America |
32 | Clinica Las Condez, Chile | 52.0 | 0.0 | 2020-09-28 | -30.000000 | -71.000000 | Chile |
33 | Hyogo Prefectural Kakogawa Medical Center | 16.0 | 0.0 | 2020-09-28 | 36.858403 | -76.305431 | United States of America |
34 | University of California San Francisco - Fresno | 30.0 | 0.0 | 2020-09-28 | 36.747730 | -119.772370 | United States of America |
35 | Uniklinik (University Hospital Frankfurt) | 1.0 | 0.0 | 2020-09-28 | 50.093620 | 8.650671 | Germany |
36 | Seoul National University Bundang Hospital | 12.0 | 0.0 | 2020-09-28 | 37.349249 | 127.123941 | South Korea |
37 | University of Iowa | 12.0 | 0.0 | 2020-09-28 | 41.665850 | -91.573107 | United States of America |
38 | University of Cincinnati | 9.0 | 0.0 | 2020-09-28 | 39.131853 | -84.515762 | United States of America |
39 | Rio Hortega University Hospital | 12.0 | 0.0 | 2020-09-28 | 33.470630 | -81.984130 | United States of America |
40 | Hamad General Hospital | 52.0 | 0.0 | 2020-09-28 | 25.293518 | 51.502231 | Qatar |
41 | Presbyterian Hospital Services | 51.0 | 0.0 | 2020-09-28 | 35.635979 | -105.962692 | United States of America |
42 | Clinica Valle de Lilli | 46.0 | 0.0 | 2020-09-28 | 59.369440 | 25.359170 | Estonia |
43 | University Hospital in Krakow | 15.0 | 0.0 | 2020-09-28 | 50.061430 | 19.936580 | Poland |
44 | The University of Utah | 53.0 | 0.0 | 2020-09-28 | 40.762814 | -111.836872 | United States of America |
45 | Ospedale di Arco | 1.0 | 0.0 | 2020-09-28 | 45.915643 | 10.879765 | Italy |
46 | Ospedale San Paolo | 58.0 | 0.0 | 2020-09-28 | 41.117631 | 16.779988 | Italy |
47 | Hospital Universitario Sant Joan d'Alacant | 3.0 | 0.0 | 2020-09-28 | 38.401480 | -0.436230 | Spain |
48 | Kimitsu Chuo Hospital | 2.0 | 0.0 | 2020-09-28 | 35.327470 | 139.907261 | Japan |
49 | Fatmawati Hospital | 43.0 | 0.0 | 2020-09-28 | -6.292454 | 106.792423 | Indonesia |
50 | Rinku General Medical Center | 1.0 | 0.0 | 2020-09-28 | 34.411921 | 135.302686 | Japan |
51 | Hospital Universitari Sagrat Cor | 2.0 | 0.0 | 2020-09-28 | -31.413500 | -64.181050 | Argentina |
52 | Cleveland Clinic - Florida | 3.0 | 0.0 | 2020-09-28 | 26.080382 | -80.364079 | United States of America |
53 | San Martino Hospital | 4.0 | 0.0 | 2020-09-28 | 43.713059 | 10.404481 | Italy |
54 | Hospital Alemán | 6.0 | 0.0 | 2020-09-28 | -34.591840 | -58.401984 | Argentina |
55 | San Pedro de Alcantara Hospital | 2.0 | 0.0 | 2020-09-28 | 36.486635 | -4.990532 | Spain |
56 | Legacy Emanuel Medical Center | 11.0 | 0.0 | 2020-09-28 | 45.543893 | -122.670033 | United States of America |
57 | Kyoto Prefectural University of Medicine | 20.0 | 0.0 | 2020-09-28 | 35.024341 | 135.768228 | Japan |
58 | Lankenau Institute of Medical Research | 36.0 | 0.0 | 2020-09-28 | -37.700000 | 145.183330 | Australia |
59 | Providence Saint John's Health Centre | 8.0 | 0.0 | 2020-09-28 | 34.030577 | -118.479544 | United States of America |
60 | Fukuoka University | 1.0 | 0.0 | 2020-09-28 | 33.548443 | 130.364514 | Japan |
61 | Mater Dei Hospital | 1.0 | 0.0 | 2020-09-28 | 35.901807 | 14.476600 | Malta |
62 | Yokohama City University Medical Center | 3.0 | 0.0 | 2020-09-28 | 32.240500 | -110.945940 | United States of America |
63 | Nagoya University Hospital | 27.0 | 0.0 | 2020-09-28 | 35.153309 | 136.967781 | Japan |
64 | PICU Saiful Anwar Hospital | 21.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
65 | Adult ICU Saiful Anwar Hospital | 13.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
66 | Kouritu Tousei Hospital | 2.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
67 | Hokkaido University Hospital | 2.0 | 0.0 | 2020-09-28 | 43.079008 | 141.337729 | Japan |
68 | Chiba University Hospital | 2.0 | 0.0 | 2020-09-28 | 35.627869 | 140.103466 | Japan |
69 | Universityof Alabamaat Birmingham Hospital | 11.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
70 | Universityof Florida | 2.0 | 0.0 | 2020-09-28 | 27.945565 | -82.463843 | United States of America |
71 | Saiseikai Senri Hospital | 5.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
72 | Rush University | 26.0 | 0.0 | 2020-09-28 | 41.873644 | -87.669498 | United States of America |
73 | University of Chicago | 14.0 | 0.0 | 2020-09-28 | 41.784977 | -87.590524 | United States of America |
74 | Johns Hopkins University | 1.0 | 0.0 | 2020-09-28 | 44.494770 | 11.355897 | Italy |
75 | Hospitaldel Torax | 5.0 | 0.0 | 2020-09-28 | 41.594067 | 2.007054 | Spain |
76 | Persahabatan Hospital | 92.0 | 0.0 | 2020-09-28 | 55.650621 | 37.501444 | Russia |
77 | Universityof Oklahoma Health Sciences Center | 2.0 | 0.0 | 2020-09-28 | 7.250000 | 2.166670 | Benin |
78 | The Christ Hospital | 1.0 | 0.0 | 2020-09-28 | 39.120839 | -84.511113 | United States of America |
79 | Hasan Sadikin Hospital ( Adult ) | 19.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
80 | Kyung Pook National University Chilgok Hospital | 8.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
81 | Hospital Mount Sinai Medical Center | 7.0 | 0.0 | 2020-09-28 | 25.813381 | -80.140873 | United States of America |
82 | Hospital Vergedela Cintade Tortosa | 28.0 | 0.0 | 2020-09-28 | 40.812490 | 0.521600 | Spain |
83 | Prof Dr R. D. Kandou Central Hospital - Paedia... | 4.0 | 0.0 | 2020-09-28 | 1.453734 | 124.805662 | Indonesia |
84 | Prof Dr R. D. Kandou Central Hospital - Adult | 13.0 | 0.0 | 2020-09-28 | 1.450000 | 124.800000 | Indonesia |
85 | Tokyo Metropolitan Tama Medical Center | 3.0 | 0.0 | 2020-09-28 | 36.858403 | -76.305431 | United States of America |
86 | Universityof Maryland | 1.0 | 0.0 | 2020-09-28 | 38.992516 | -76.991021 | United States of America |
87 | Mar del Plata Medical Foundation Private Commu... | 25.0 | 0.5 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
88 | Dr Sardjito Government Hospital - Paediatric | 3.0 | 0.0 | 2020-09-28 | -7.768611 | 110.371285 | Indonesia |
89 | London Health Sciences Centre | 2.0 | 0.0 | 2020-09-28 | 4.750000 | 11.833330 | Cameroon |
90 | Hospital du Sacre Coeur | 1.0 | 0.0 | 2020-09-28 | 48.886806 | 2.343015 | France |
91 | Mayo Clinic College of Medicine - Arizona | 8.0 | 0.0 | 2020-09-28 | 32.973525 | -111.515363 | United States of America |
92 | Shizuoka Children's Hospital | 1.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
93 | Rochester General Hospital | 2.0 | 0.0 | 2020-09-28 | 43.192043 | -77.588289 | United States of America |
94 | Siriraj Hospital | 2.0 | 0.0 | 2020-09-28 | 13.757829 | 100.485379 | Thailand |
95 | Obihiro-Kosei General Hospital | 2.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
96 | King Faisal Specialist Hospital and Research C... | 22.0 | 0.0 | 2020-09-28 | 21.560019 | 39.148056 | Saudi Arabia |
97 | University of Nebraska Medical Center | 36.0 | 0.0 | 2020-09-28 | 41.256303 | -95.977842 | United States of America |
98 | Foothills Hospital | 71.0 | 0.0 | 2020-09-28 | 40.016004 | -105.236631 | United States of America |
99 | Queen Mary Hospital the University of Hong Kong | 14.0 | 0.0 | 2020-09-28 | 23.093693 | 114.571774 | Hong Kong |
100 | Galway University Hospital | 67.0 | 0.0 | 2020-09-28 | 53.277087 | -9.066559 | Ireland |
101 | Teine Keijinkai Hospital | 16.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
102 | Fondazione IRCCS Ca | 75.0 | 0.0 | 2020-09-28 | 37.250220 | -119.751260 | United States of America |
103 | Fondazione Policlinico Universitario Agostino ... | 12.0 | 0.0 | 2020-09-28 | 41.891930 | 12.511330 | Italy |
104 | Ospedale Molinette Torino | 20.0 | 0.0 | 2020-09-28 | 45.039464 | 7.674405 | Italy |
105 | Oregon Health and Science University Hospital | 2.0 | 0.0 | 2020-09-28 | 45.499038 | -122.685695 | United States of America |
106 | Hospital Clinic, Barcelona | 58.0 | 0.0 | 2020-09-28 | 41.388062 | 2.150639 | Spain |
107 | Columbia University | 1.0 | 0.0 | 2020-09-28 | 40.807949 | -73.961797 | United States of America |
108 | Klinik für Innere Medizin II | 55.0 | 0.0 | 2020-09-28 | 54.089022 | 12.109247 | Germany |
109 | Hospital Vall D Hebron | 22.0 | 0.0 | 2020-09-28 | 48.600046 | 1.675945 | France |
110 | Pamela Youde Nethersole Eastern Hospital | 5.0 | 0.0 | 2020-09-28 | 21.964346 | 113.086001 | Hong Kong |
111 | Maastricht University Medical Centre | 1.0 | 0.0 | 2020-09-28 | 50.857985 | 5.696988 | The Netherlands |
112 | Sozialmedizinisches Zentrum S | 7.0 | 0.0 | 2020-09-28 | 48.173496 | 16.350574 | Austria |
113 | Saiseikai Utsunomiya Hospital | 2.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
114 | Chonnam National University Hospital | 1.0 | 0.0 | 2020-09-28 | 35.176906 | 126.906909 | South Korea |
115 | Ospedale San Gerardo | 55.0 | 0.0 | 2020-09-28 | 45.602154 | 9.260360 | Italy |
116 | Policlinico of Padova, Padova | 13.0 | 0.0 | 2020-09-28 | 45.349274 | 11.786716 | Italy |
117 | Barmherzige Brüder Regensburg | 1.0 | 0.0 | 2020-09-28 | 49.052150 | 12.079680 | Germany |
118 | Civil Hospital Marie Curie, Brussels | 44.0 | 0.0 | 2020-09-28 | 50.850450 | 4.348780 | Belgium |
119 | St. Marianna University School of Medicine | 1.0 | 0.0 | 2020-09-28 | 35.600212 | 139.548866 | Japan |
120 | ISMETT | 1.0 | 0.0 | 2020-09-28 | 38.108440 | 13.361333 | Italy |
121 | Mater Misericordiae University Hospital, Ireland | 31.0 | 0.0 | 2020-09-28 | 53.359704 | -6.267077 | Ireland |
122 | Harapan Kita National Heart Centre Hospital (P... | 10.0 | 0.0 | 2020-09-28 | 52.473060 | -8.430560 | Ireland |
123 | Princess Margaret Hospital, Hong Kong | 6.0 | 0.0 | 2020-09-28 | 22.077152 | 115.006972 | Hong Kong |
124 | Queen Elizabeth Hospital, Hong Kong | 7.0 | 0.0 | 2020-09-28 | 21.772608 | 113.794131 | Hong Kong |
125 | Klinikum Passau | 2.0 | 0.0 | 2020-09-28 | 48.565036 | 13.445336 | Germany |
126 | Hartford HealthCare | 31.0 | 0.0 | 2020-09-28 | 41.681286 | -71.912487 | United States of America |
127 | Hospitales Puerta de Hierro, Mexico | 4.0 | 0.0 | 2020-09-28 | 23.000000 | -102.000000 | Mexico |
128 | Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed... | 17.0 | 0.0 | 2020-09-28 | 29.387697 | 47.988127 | Kuwait |
129 | Hospital Universitario Virgen de Valme | 1.0 | 0.0 | 2020-09-28 | 37.318825 | -5.971218 | Spain |
130 | Severance Hospital, Seoul | 1.0 | 0.0 | 2020-09-28 | 37.562258 | 126.940570 | South Korea |
131 | Al-Adan Hospital | 8.0 | 0.0 | 2020-09-28 | 5.258065 | 96.007263 | Indonesia |
132 | Medizinische Klinik und Poliklinik II, Munich | 21.0 | 0.0 | 2020-09-28 | 48.137430 | 11.575490 | Germany |
133 | Barwon Health, VIC | 0.0 | 0.0 | 2020-09-28 | -38.152080 | 144.365610 | Australia |
134 | Box Hill Hospital, VIC | 0.0 | 0.0 | 2020-09-28 | -37.813614 | 145.118405 | Australia |
135 | Gold Coast Hospital, QLD | 0.0 | 0.0 | 2020-09-28 | -28.002373 | 153.414599 | Australia |
136 | Launceston Hospital, TAS | 0.0 | 0.0 | 2020-09-28 | -41.434081 | 147.137350 | Australia |
137 | Royal Adelaide Hospital, SA | 0.0 | 0.0 | 2020-09-28 | -34.920724 | 138.586599 | Australia |
138 | Royal Children's Hospital, VIC | 0.0 | 0.0 | 2020-09-28 | -37.793427 | 144.949575 | Australia |
139 | Royal North Shore Hospital, NSW | 0.0 | 0.0 | 2020-09-28 | -33.821411 | 151.191138 | Australia |
140 | Royal Prince Alfred Hospital, NSW | 0.0 | 0.0 | 2020-09-28 | -33.889744 | 151.181500 | Australia |
141 | St George Hospital, NSW | 0.0 | 0.0 | 2020-09-28 | -33.967165 | 151.134025 | Australia |
142 | St Vincent's Hospital Sydney, NSW | 0.0 | 0.0 | 2020-09-28 | -33.880568 | 151.220564 | Australia |
143 | The Alfred Hospital, VIC | 0.0 | 0.0 | 2020-09-28 | -37.846075 | 144.982554 | Australia |
144 | Westmead Hospital, NSW | 0.0 | 0.0 | 2020-09-28 | -33.802939 | 150.987761 | Australia |
This is the hard part now. Not coding wise, but installing basemap can be a real pain. So I’m going to skip over it. We start by defining a nice dark base figure:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
def get_base_fig():
# Lets define some colors
bg_color = "#000000"
coast_color = "#333333"
country_color = "#222222"
fig = plt.figure(figsize=(12, 6))
m = Basemap(projection='cyl', llcrnrlat=-70,urcrnrlat=90,
llcrnrlon=-170, urcrnrlon=190, area_thresh=10000.)
m.fillcontinents(color=bg_color, lake_color=bg_color, zorder=-2)
m.drawcoastlines(color=coast_color, linewidth=0.7, zorder=-1)
m.drawcountries(color=country_color, linewidth=0.7, zorder=-1)
m.drawmapboundary(fill_color=bg_color, zorder=-2)
return fig, m
get_base_fig();
Instead of randomly assigning colours, I’ve tried to make the colour for each country somewhat related to the country itself. Normally via primary flag colour, although the fact most flags only use a very small subset of colours is quite difficult to work with.
Let’s just get a snapshot for the final day in our dataset.
I mean… it’s nice. But cool graphics glow and have changing colours. So I’ll define a colormap that allows me to brighten the colour of each site when new patients come in, so that they flicker, glow and grow as they add patients into the system.
Now let’s test this whole thing animated, so lets loop over every row and output a frame that I’ll stitch together using ffmpeg
. Normally I’d use joblib
for this… but basemap
really doesn’t like that.
def plot_date(i, date):
data = get_row(date)
get_shaded(data, date, i)
for i, date in enumerate(data_fixed.index):
plot_date(i, date)
Now that we have a bunch of frames, lets turn it into a ncie MP4 video. But lets be fancy, and have this bad boy glow. To do this, Im going to load in a mask (to make sure the title doesnt glow), and run it through a filter complex that took me 4 hours to debug until it worked. It will also add a few seconds of pause at the end, so on looping players people can still see the final result.
And there it is! Perhaps soon I’ll go through and manually add all the site names in, but for now, I feel this does a pretty good job of showing just how international our collaboration is.
For your convenience, here’s the code in one block:
import pandas as pd
import numpy as np
data = pd.read_csv("cccc_enrolment/enrolment_site.csv", parse_dates=[0], index_col=0)
data = data.fillna(0).astype(int)
data.iloc[-5:, :5]
sites = [c.split("-", 1)[1] if "-" in c else c for c in data.columns]
print(sites[60:70])
# Some of the sites are missing spaces for some reason
def fix_site(site):
if " " not in site:
# Dont fix a sitename which is an acryonym
if site != site.upper():
site = ''.join(map(lambda x: x if x.islower() else " " + x, site))
return site.strip()
sites_fixed = [fix_site(s) for s in sites]
print(sites_fixed[60:70])
data_fixed = data.copy()
data_fixed.columns = sites_fixed
data_fixed.iloc[-5:, :5]
# Interpolation
fr = 30 # frame rate
t = 12 # seconds
new_index = pd.date_range("2020-02-01", data_fixed.index.max(), fr * t)
# Combine index, interp, remove original index
data_fixed = data_fixed.reindex(new_index | data_fixed.index).interpolate().loc[new_index]
data_fixed_change = data_fixed.diff().fillna(0)
from opencage.geocoder import OpenCageGeocode
key = "" # The trial version allows you do this all for free
geocoder = OpenCageGeocode(key)
def get_lat_long_from_site(query):
results = geocoder.geocode(query)
if not len(results):
print(f"{query} unable to be located")
return None
lat = results[0]['geometry']['lat']
long = results[0]['geometry']['lng']
country = results[0]["components"]["country"]
return (lat, long, country)
import os
import json
filename = "cccc_enrolment/site_locations.json"
# Check if file exists
if os.path.exists(filename):
with open(filename) as f:
coords = json.load(f)
# Add manual ones that I know wont be found
coords["Uniklinik (University Hospital Frankfurt)"] = 50.0936204, 8.6506709, "Germany"
coords["Prof Dr R. D. Kandou Central Hospital - Paediatric"] = 1.453734, 124.8056623, "Indonesia"
coords["Prof Dr R. D. Kandou Central Hospital - Adult"] = 1.45, 124.80, "Indonesia"
coords["Kyoto Prefectural University of Medicine"] = 35.0243414, 135.7682285, "Japan"
coords["ISMETT"] = 38.1084401, 13.3613329, "Italy"
coords["Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed Hospitals"] = 29.3876968, 47.9881274, "Kuwait"
coords["Dr Sardjito Government Hospital - Paediatric"] = -7.768611, 110.3712855, "Indonesia"
coords["Hospitaldel Torax"] = 41.594067, 2.007054, "Spain"
# Check we have all the sites we need
save = False
for s in sites_fixed:
if s not in coords:
coords[s] = get_lat_long_from_site(s)
save = True
# If we've updated, save it out
if save:
with open(filename, "w") as f:
json.dump(coords, f)
print(f"We now have {len(coords.keys())} sites ready to go!")
def get_row(date):
row = data_fixed.loc[date].to_frame().reset_index()
change = data_fixed_change.loc[date].to_frame().reset_index()
row.columns = ["site", "enrolment"]
change.columns = ["site", "change"]
row = row.merge(change, on="site")
row["date"] = date
row["coord"] = row["site"].map(coords)
row["lat"] = row["coord"].str[0]
row["long"] = row["coord"].str[1]
row["country"] = row["coord"].str[2]
row = row.drop(columns="coord")
# Manually fix up the issues to separate HK and China
hk = np.abs(row.lat - 22.3) < 0.2
row.loc[hk, "country"] = "Hong Kong"
np.random.seed(1)
row.loc[hk, "lat"] += np.random.normal(scale=0.5, size=hk.sum())
row.loc[hk, "long"] += np.random.normal(scale=0.5, size=hk.sum())
return row
test_row = get_row(data_fixed.index.max())
test_row
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
def get_base_fig():
# Lets define some colors
bg_color = "#000000"
coast_color = "#333333"
country_color = "#222222"
fig = plt.figure(figsize=(12, 6))
m = Basemap(projection='cyl', llcrnrlat=-70,urcrnrlat=90,
llcrnrlon=-170, urcrnrlon=190, area_thresh=10000.)
m.fillcontinents(color=bg_color, lake_color=bg_color, zorder=-2)
m.drawcoastlines(color=coast_color, linewidth=0.7, zorder=-1)
m.drawcountries(color=country_color, linewidth=0.7, zorder=-1)
m.drawmapboundary(fill_color=bg_color, zorder=-2)
return fig, m
get_base_fig();
import numpy as np
# Colours based roughly on primary colour in countries flag
colors = {
"Australia": "#FFD54F",
"United States of America": "#1e88e5",
"United Kingdom": "#4FC3F7",
"Estonia": "#1E88E5",
"Taiwan": "#E53935",
"Vietnam": "#C62828",
"Ireland": "#FFA726",
"Brazil": "#4CAF50",
"Argentina": "#4FC3F7",
"Chile": "#F44336",
"Indonesia": "#FF8A80",
"Japan": "#C62828",
"Germany": "#E040FB",
"South Korea": "#BBDEFB",
"Qatar": "#AD1457",
"Poland": "#E53935",
"Spain": "#FFB300",
"Australia": "#FFCA28",
"Russia": "#3F51B5",
"Benin": "#558B2F",
"Saudi Arabia": "#1B5E20",
"Hong Kong": "#D84315",
"France": "#01579B",
"The Netherlands": "#B71C1C",
"Belgium": "#FDD835",
"Kuwait": "#4CAF50",
"Yemen": "#D81B60",
"Italy": "#8BC34A",
"Austria": "#C62828",
"Mexico": "#4CAF50",
"Portugal": "#F44336",
"South Africa": "#8BC34A",
"Cameroon": "#2bab7a",
"Malta": "#ed2f3c",
"Thailand": "#458eed",
}
def get_scatter(data):
fig, m = get_base_fig()
# Loop over each country and its institutions
for country in np.unique(data.country):
c = colors.get(country, "#FF99FF")
subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
s = 10 + subset.enrolment
m.scatter(subset.long, subset.lat, latlon=True, c=c, s=s, zorder=1)
return m
get_scatter(test_row);
from matplotlib.colors import LinearSegmentedColormap as LSC
def get_shaded(data, date, frame=0, show=False):
fig, m = get_base_fig()
# Loop over each country and its institutions
max_v = data.change.max() + 1
for country in np.unique(data.country):
c = colors.get(country)
if c is None:
c = "#FF99FF"
print(f"Cannot find colour for country {country}")
# From base colour, increase intensity of patients added today
cmap = LSC.from_list("fade", [c, "#FFFFFF"], N=100)
subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
s = 10 + subset.enrolment
cs = cmap(2 * subset.change / max_v)
m.scatter(subset.long, subset.lat, latlon=True, c=cs, s=s, zorder=1)
# Set the title, and make the background black
plt.title("CCCC Patient Contributions", fontsize=16,
color="#EEEEEE", fontname="Open Sans", y=1.03)
d = pd.to_datetime(date).strftime("%d - %B")
ax = fig.get_axes()[0]
plt.text(0.5, 1.02, d, c="#AAAAAA", fontsize=14,
verticalalignment="top", horizontalalignment="center",
transform=ax.transAxes)
fig.patch.set_facecolor("#000000")
if show:
return fig
else:
name = f"cccc_enrolment/output/{frame:04d}.png"
fig.savefig(name, bbox_inches="tight", padding=0, facecolor=fig.get_facecolor(), transparent=False, dpi=300)
plt.close(fig)
get_shaded(test_row, "2020-03-23", show=True);
def plot_date(i, date):
data = get_row(date)
get_shaded(data, date, i)
for i, date in enumerate(data_fixed.index):
plot_date(i, date)