Visualising Patient Contributions for the CCCC

2020-07-31

A visual thank you to our contributing sites

The COVID-19 Critical Care Consortium is an international collaboration of hundreds of hospital sites from dozens of countries around the world. Our sites have been slowly but steadily gathering data for critical cases of COVID-19 since earlier in the year, and we thought it would be good to show just how international the collaboration is.

To that end, in my role as one of the technical leads (responsible for the data pipeline and ingestion), I created out a simple (and no-risk) data product simply containing a list of our participant sites, and how many patients they have enrolled in our study at a given date. The details for those patients are - of course - not contained in this data product in any way.

This is what we’re going to be making.

Basic Dataset Prep

So let’s start things up, and load in the enrolment data.

import pandas as pd
import numpy as np
data = pd.read_csv("cccc_enrolment/enrolment_site.csv", parse_dates=[0], index_col=0)
data = data.fillna(0).astype(int)
data.iloc[-5:, :5]
00543-Medical University of Vienna 00544-Lancaster General Health 00546-Penn Medicine 00547-Oklahoma Heart Institute 00548-UH Cleveland Hospital
date_enrolment
2020-09-24 9 39 42 2 26
2020-09-25 9 39 42 2 26
2020-09-26 9 39 42 2 26
2020-09-27 9 39 42 2 26
2020-09-28 9 39 42 2 26

Great, so you can see we have the vetical axis representing the date, and each site is a column, with its numeric identifier first. Lets remove those IDs, because they are useful in a database, but not for us.

sites = [c.split("-", 1)[1] if "-" in c else c for c in data.columns]
print(sites[60:70])
['Fukuoka University', 'Mater Dei Hospital', 'Yokohama City University Medical Center', 'Nagoya University Hospital', 'PICU Saiful Anwar Hospital', 'Adult ICU Saiful Anwar Hospital', 'KourituTouseiHospital', 'HokkaidoUniversityHospital', 'ChibaUniversityHospital', 'UniversityofAlabamaatBirminghamHospital']

Looking good so far. However, I know that some of the sites have - for some reason - had all whitespace removed. So lets write a small parser to go from CamelCase to normal words.

# Some of the sites are missing spaces for some reason
def fix_site(site):
    if " " not in site:
        # Dont fix a sitename which is an acryonym
        if site != site.upper():
            site = ''.join(map(lambda x: x if x.islower() else " " + x, site))
    return site.strip()

sites_fixed = [fix_site(s) for s in sites]
print(sites_fixed[60:70])
['Fukuoka University', 'Mater Dei Hospital', 'Yokohama City University Medical Center', 'Nagoya University Hospital', 'PICU Saiful Anwar Hospital', 'Adult ICU Saiful Anwar Hospital', 'Kouritu Tousei Hospital', 'Hokkaido University Hospital', 'Chiba University Hospital', 'Universityof Alabamaat Birmingham Hospital']

This is probably as good as we can get it right now. So lets copy the dataframe so we don’t clobber the original, and update the columns.

data_fixed = data.copy()
data_fixed.columns = sites_fixed
data_fixed.iloc[-5:, :5]
Medical University of Vienna Lancaster General Health Penn Medicine Oklahoma Heart Institute UH Cleveland Hospital
date_enrolment
2020-09-24 9 39 42 2 26
2020-09-25 9 39 42 2 26
2020-09-26 9 39 42 2 26
2020-09-27 9 39 42 2 26
2020-09-28 9 39 42 2 26

Now, I do want to make an animation out of this, with more frames than days, so we’ll just do a super simple interpolation to add extra evenly spaced datetimes that will correspond to each frame. In addition, I’ll start the clock ticking from February first.

# Interpolation
fr = 30  # frame rate
t = 12  # seconds
new_index = pd.date_range("2020-02-01", data_fixed.index.max(), fr * t)

# Combine index, interp, remove original index
data_fixed = data_fixed.reindex(new_index | data_fixed.index).interpolate().loc[new_index]

And I also want to have the animation flash or brighten a bit when sites add new patients, so to get a feel for that, we’ll simply take the difference in rows (and fillna to put zero in the first row).

data_fixed_change = data_fixed.diff().fillna(0)

Getting coordinates

Each site obviously represents a specific physical location on the planet. Alas, I do not know this - all I have is a name. So, lets use opencage to do a search for each site name, and extract the latitude and longitude for each site if we can find it. I don’t expect this to work for them all, but I’d rather manually look up ten sites than a hundred.

Let’s set up the library with our token to start with:

from opencage.geocoder import OpenCageGeocode
key = "" # The trial version allows you do this all for free
geocoder = OpenCageGeocode(key)

And then write a little function that - when given a query - will try and find the latitude, longitude and country. If it can’t find anything, we’ll return None and I’ll do it myself.

def get_lat_long_from_site(query):
    results = geocoder.geocode(query)
    if not len(results):
        print(f"{query} unable to be located")
        return None
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']
    country = results[0]["components"]["country"]
    return (lat, long, country)

And to make sure I don’t spam this API over and over, we’ll run this once, save it out to JSON, and when I run this again in the future we can just read the file in.

import os
import json

filename = "cccc_enrolment/site_locations.json"

# Check if file exists
if os.path.exists(filename):
    with open(filename) as f:
        coords = json.load(f)

# Add manual ones that I know wont be found
coords["Uniklinik (University Hospital Frankfurt)"] = 50.0936204, 8.6506709, "Germany"
coords["Prof Dr R. D. Kandou Central Hospital - Paediatric"] = 1.453734, 124.8056623, "Indonesia"
coords["Prof Dr R. D. Kandou Central Hospital - Adult"] = 1.45, 124.80, "Indonesia"
coords["Kyoto Prefectural University of Medicine"] = 35.0243414, 135.7682285, "Japan"
coords["ISMETT"] = 38.1084401, 13.3613329, "Italy"
coords["Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed Hospitals"] = 29.3876968, 47.9881274, "Kuwait"
coords["Dr Sardjito Government Hospital - Paediatric"] = -7.768611, 110.3712855, "Indonesia"
coords["Hospitaldel Torax"] = 41.594067, 2.007054, "Spain"

# Check we have all the sites we need
save = False
for s in sites_fixed:
    if s not in coords:
        coords[s] = get_lat_long_from_site(s)
        save = True

    # If we've updated, save it out
    if save:
        with open(filename, "w") as f:
            json.dump(coords, f)

print(f"We now have {len(coords.keys())} sites ready to go!")
We now have 148 sites ready to go!

Great! Onto the next part…

Plotting a specific datetime

Our dataframe is broken up into a lot of rows, where each row now represents a frame in the animation. Lets write a function to extract a row and put it into something easier to work with when plotting.

def get_row(date):
    row = data_fixed.loc[date].to_frame().reset_index()
    change = data_fixed_change.loc[date].to_frame().reset_index()
    row.columns = ["site", "enrolment"]
    change.columns = ["site", "change"]
    row = row.merge(change, on="site")
    row["date"] = date
    row["coord"] = row["site"].map(coords)
    row["lat"] = row["coord"].str[0]
    row["long"] = row["coord"].str[1]
    row["country"] = row["coord"].str[2]
    row = row.drop(columns="coord")

    # Manually fix up the issues to separate HK and China
    hk = np.abs(row.lat - 22.3) < 0.2
    row.loc[hk, "country"] = "Hong Kong"
    np.random.seed(1)
    row.loc[hk, "lat"] += np.random.normal(scale=0.5, size=hk.sum())
    row.loc[hk, "long"] += np.random.normal(scale=0.5, size=hk.sum())


    return row

test_row = get_row(data_fixed.index.max())
test_row
site enrolment change date lat long country
0 Medical University of Vienna 9.0 0.0 2020-09-28 48.208490 16.372080 Austria
1 Lancaster General Health 39.0 0.0 2020-09-28 54.016293 -2.793612 United Kingdom
2 Penn Medicine 42.0 0.0 2020-09-28 39.957043 -75.197520 United States of America
3 Oklahoma Heart Institute 2.0 0.0 2020-09-28 36.029075 -95.869532 United States of America
4 UH Cleveland Hospital 26.0 0.0 2020-09-28 41.504861 -81.605748 United States of America
5 Ohio State University 121.0 0.0 2020-09-28 40.005709 -83.028663 United States of America
6 North Estonia Medical Centre, Tallin 20.0 0.0 2020-09-28 59.396168 24.698524 Estonia
7 Tartu University Hospital, Tartu 14.0 0.0 2020-09-28 58.369456 26.700090 Estonia
8 National Taiwan University Hospital 1.0 0.0 2020-09-28 25.016828 121.538469 Taiwan
9 Hospital for Tropical Diseases, Vietnam 1.0 0.0 2020-09-28 10.753047 106.678478 Vietnam
10 Keimyung University Dong San Hospital 2.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
11 Groote Schuur Hospital 207.0 0.0 2020-09-28 -33.941170 18.462639 South Africa
12 The Heart Hospital Baylor Plano 1.0 0.0 2020-09-28 33.014781 -96.789958 United States of America
13 Baylor University Medical Centre 8.0 0.0 2020-09-28 32.786683 -96.781894 United States of America
14 Baylor Scott 3.0 0.0 2020-09-28 31.077858 -97.362730 United States of America
15 Hospital Nuestra Señora de Gracia Zaragoza 20.0 0.0 2020-09-28 32.502700 -117.003710 Mexico
16 São João Hospital Centre, Portugal 1.0 0.0 2020-09-28 39.980168 -8.318848 Portugal
17 Piedmont Atlanta Hospital, USA 40.0 0.0 2020-09-28 33.789048 -84.371886 United States of America
18 Washington University in St. Louis, USA 20.0 0.0 2020-09-28 38.647240 -90.308402 United States of America
19 Medical College of Wisconsin, USA 31.0 0.0 2020-09-28 43.043867 -88.022453 United States of America
20 Policlinico di S. Orsola, Università di Bologn... 5.0 0.0 2020-09-28 44.496530 11.353080 Italy
21 Fundación Cardiovascular de Colombia, Colombia 2.0 0.0 2020-09-28 -20.175830 -48.688890 Brazil
22 INOVA Fairfax Medical Center, USA 3.0 0.0 2020-09-28 41.620447 -86.228800 United States of America
23 Hospital de Clínicas, Argentina 7.0 0.0 2020-09-28 -34.599502 -58.400386 Argentina
24 Allegheny General Hospital 4.0 0.0 2020-09-28 40.456969 -80.003311 United States of America
25 Clinica Alemana De Santiago 1.0 0.0 2020-09-28 -33.391999 -70.572619 Chile
26 Kyoto Medical Centre 1.0 0.0 2020-09-28 -1.246694 116.862195 Indonesia
27 Hiroshima University 4.0 0.0 2020-09-28 34.401977 132.712320 Japan
28 Stanford University 1.0 0.0 2020-09-28 37.431314 -122.169365 United States of America
29 Tufts Medical Centre 1.0 0.0 2020-09-28 42.349559 -71.063411 United States of America
30 Carilion Clinic 13.0 0.0 2020-09-28 37.088584 -80.505764 United States of America
31 Beth Israel Deaconess Medical Center 11.0 0.0 2020-09-28 36.858403 -76.305431 United States of America
32 Clinica Las Condez, Chile 52.0 0.0 2020-09-28 -30.000000 -71.000000 Chile
33 Hyogo Prefectural Kakogawa Medical Center 16.0 0.0 2020-09-28 36.858403 -76.305431 United States of America
34 University of California San Francisco - Fresno 30.0 0.0 2020-09-28 36.747730 -119.772370 United States of America
35 Uniklinik (University Hospital Frankfurt) 1.0 0.0 2020-09-28 50.093620 8.650671 Germany
36 Seoul National University Bundang Hospital 12.0 0.0 2020-09-28 37.349249 127.123941 South Korea
37 University of Iowa 12.0 0.0 2020-09-28 41.665850 -91.573107 United States of America
38 University of Cincinnati 9.0 0.0 2020-09-28 39.131853 -84.515762 United States of America
39 Rio Hortega University Hospital 12.0 0.0 2020-09-28 33.470630 -81.984130 United States of America
40 Hamad General Hospital 52.0 0.0 2020-09-28 25.293518 51.502231 Qatar
41 Presbyterian Hospital Services 51.0 0.0 2020-09-28 35.635979 -105.962692 United States of America
42 Clinica Valle de Lilli 46.0 0.0 2020-09-28 59.369440 25.359170 Estonia
43 University Hospital in Krakow 15.0 0.0 2020-09-28 50.061430 19.936580 Poland
44 The University of Utah 53.0 0.0 2020-09-28 40.762814 -111.836872 United States of America
45 Ospedale di Arco 1.0 0.0 2020-09-28 45.915643 10.879765 Italy
46 Ospedale San Paolo 58.0 0.0 2020-09-28 41.117631 16.779988 Italy
47 Hospital Universitario Sant Joan d'Alacant 3.0 0.0 2020-09-28 38.401480 -0.436230 Spain
48 Kimitsu Chuo Hospital 2.0 0.0 2020-09-28 35.327470 139.907261 Japan
49 Fatmawati Hospital 43.0 0.0 2020-09-28 -6.292454 106.792423 Indonesia
50 Rinku General Medical Center 1.0 0.0 2020-09-28 34.411921 135.302686 Japan
51 Hospital Universitari Sagrat Cor 2.0 0.0 2020-09-28 -31.413500 -64.181050 Argentina
52 Cleveland Clinic - Florida 3.0 0.0 2020-09-28 26.080382 -80.364079 United States of America
53 San Martino Hospital 4.0 0.0 2020-09-28 43.713059 10.404481 Italy
54 Hospital Alemán 6.0 0.0 2020-09-28 -34.591840 -58.401984 Argentina
55 San Pedro de Alcantara Hospital 2.0 0.0 2020-09-28 36.486635 -4.990532 Spain
56 Legacy Emanuel Medical Center 11.0 0.0 2020-09-28 45.543893 -122.670033 United States of America
57 Kyoto Prefectural University of Medicine 20.0 0.0 2020-09-28 35.024341 135.768228 Japan
58 Lankenau Institute of Medical Research 36.0 0.0 2020-09-28 -37.700000 145.183330 Australia
59 Providence Saint John's Health Centre 8.0 0.0 2020-09-28 34.030577 -118.479544 United States of America
60 Fukuoka University 1.0 0.0 2020-09-28 33.548443 130.364514 Japan
61 Mater Dei Hospital 1.0 0.0 2020-09-28 35.901807 14.476600 Malta
62 Yokohama City University Medical Center 3.0 0.0 2020-09-28 32.240500 -110.945940 United States of America
63 Nagoya University Hospital 27.0 0.0 2020-09-28 35.153309 136.967781 Japan
64 PICU Saiful Anwar Hospital 21.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
65 Adult ICU Saiful Anwar Hospital 13.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
66 Kouritu Tousei Hospital 2.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
67 Hokkaido University Hospital 2.0 0.0 2020-09-28 43.079008 141.337729 Japan
68 Chiba University Hospital 2.0 0.0 2020-09-28 35.627869 140.103466 Japan
69 Universityof Alabamaat Birmingham Hospital 11.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
70 Universityof Florida 2.0 0.0 2020-09-28 27.945565 -82.463843 United States of America
71 Saiseikai Senri Hospital 5.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
72 Rush University 26.0 0.0 2020-09-28 41.873644 -87.669498 United States of America
73 University of Chicago 14.0 0.0 2020-09-28 41.784977 -87.590524 United States of America
74 Johns Hopkins University 1.0 0.0 2020-09-28 44.494770 11.355897 Italy
75 Hospitaldel Torax 5.0 0.0 2020-09-28 41.594067 2.007054 Spain
76 Persahabatan Hospital 92.0 0.0 2020-09-28 55.650621 37.501444 Russia
77 Universityof Oklahoma Health Sciences Center 2.0 0.0 2020-09-28 7.250000 2.166670 Benin
78 The Christ Hospital 1.0 0.0 2020-09-28 39.120839 -84.511113 United States of America
79 Hasan Sadikin Hospital ( Adult ) 19.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
80 Kyung Pook National University Chilgok Hospital 8.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
81 Hospital Mount Sinai Medical Center 7.0 0.0 2020-09-28 25.813381 -80.140873 United States of America
82 Hospital Vergedela Cintade Tortosa 28.0 0.0 2020-09-28 40.812490 0.521600 Spain
83 Prof Dr R. D. Kandou Central Hospital - Paedia... 4.0 0.0 2020-09-28 1.453734 124.805662 Indonesia
84 Prof Dr R. D. Kandou Central Hospital - Adult 13.0 0.0 2020-09-28 1.450000 124.800000 Indonesia
85 Tokyo Metropolitan Tama Medical Center 3.0 0.0 2020-09-28 36.858403 -76.305431 United States of America
86 Universityof Maryland 1.0 0.0 2020-09-28 38.992516 -76.991021 United States of America
87 Mar del Plata Medical Foundation Private Commu... 25.0 0.5 2020-09-28 52.473060 -8.430560 Ireland
88 Dr Sardjito Government Hospital - Paediatric 3.0 0.0 2020-09-28 -7.768611 110.371285 Indonesia
89 London Health Sciences Centre 2.0 0.0 2020-09-28 4.750000 11.833330 Cameroon
90 Hospital du Sacre Coeur 1.0 0.0 2020-09-28 48.886806 2.343015 France
91 Mayo Clinic College of Medicine - Arizona 8.0 0.0 2020-09-28 32.973525 -111.515363 United States of America
92 Shizuoka Children's Hospital 1.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
93 Rochester General Hospital 2.0 0.0 2020-09-28 43.192043 -77.588289 United States of America
94 Siriraj Hospital 2.0 0.0 2020-09-28 13.757829 100.485379 Thailand
95 Obihiro-Kosei General Hospital 2.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
96 King Faisal Specialist Hospital and Research C... 22.0 0.0 2020-09-28 21.560019 39.148056 Saudi Arabia
97 University of Nebraska Medical Center 36.0 0.0 2020-09-28 41.256303 -95.977842 United States of America
98 Foothills Hospital 71.0 0.0 2020-09-28 40.016004 -105.236631 United States of America
99 Queen Mary Hospital the University of Hong Kong 14.0 0.0 2020-09-28 23.093693 114.571774 Hong Kong
100 Galway University Hospital 67.0 0.0 2020-09-28 53.277087 -9.066559 Ireland
101 Teine Keijinkai Hospital 16.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
102 Fondazione IRCCS Ca 75.0 0.0 2020-09-28 37.250220 -119.751260 United States of America
103 Fondazione Policlinico Universitario Agostino ... 12.0 0.0 2020-09-28 41.891930 12.511330 Italy
104 Ospedale Molinette Torino 20.0 0.0 2020-09-28 45.039464 7.674405 Italy
105 Oregon Health and Science University Hospital 2.0 0.0 2020-09-28 45.499038 -122.685695 United States of America
106 Hospital Clinic, Barcelona 58.0 0.0 2020-09-28 41.388062 2.150639 Spain
107 Columbia University 1.0 0.0 2020-09-28 40.807949 -73.961797 United States of America
108 Klinik für Innere Medizin II 55.0 0.0 2020-09-28 54.089022 12.109247 Germany
109 Hospital Vall D Hebron 22.0 0.0 2020-09-28 48.600046 1.675945 France
110 Pamela Youde Nethersole Eastern Hospital 5.0 0.0 2020-09-28 21.964346 113.086001 Hong Kong
111 Maastricht University Medical Centre 1.0 0.0 2020-09-28 50.857985 5.696988 The Netherlands
112 Sozialmedizinisches Zentrum S 7.0 0.0 2020-09-28 48.173496 16.350574 Austria
113 Saiseikai Utsunomiya Hospital 2.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
114 Chonnam National University Hospital 1.0 0.0 2020-09-28 35.176906 126.906909 South Korea
115 Ospedale San Gerardo 55.0 0.0 2020-09-28 45.602154 9.260360 Italy
116 Policlinico of Padova, Padova 13.0 0.0 2020-09-28 45.349274 11.786716 Italy
117 Barmherzige Brüder Regensburg 1.0 0.0 2020-09-28 49.052150 12.079680 Germany
118 Civil Hospital Marie Curie, Brussels 44.0 0.0 2020-09-28 50.850450 4.348780 Belgium
119 St. Marianna University School of Medicine 1.0 0.0 2020-09-28 35.600212 139.548866 Japan
120 ISMETT 1.0 0.0 2020-09-28 38.108440 13.361333 Italy
121 Mater Misericordiae University Hospital, Ireland 31.0 0.0 2020-09-28 53.359704 -6.267077 Ireland
122 Harapan Kita National Heart Centre Hospital (P... 10.0 0.0 2020-09-28 52.473060 -8.430560 Ireland
123 Princess Margaret Hospital, Hong Kong 6.0 0.0 2020-09-28 22.077152 115.006972 Hong Kong
124 Queen Elizabeth Hospital, Hong Kong 7.0 0.0 2020-09-28 21.772608 113.794131 Hong Kong
125 Klinikum Passau 2.0 0.0 2020-09-28 48.565036 13.445336 Germany
126 Hartford HealthCare 31.0 0.0 2020-09-28 41.681286 -71.912487 United States of America
127 Hospitales Puerta de Hierro, Mexico 4.0 0.0 2020-09-28 23.000000 -102.000000 Mexico
128 Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed... 17.0 0.0 2020-09-28 29.387697 47.988127 Kuwait
129 Hospital Universitario Virgen de Valme 1.0 0.0 2020-09-28 37.318825 -5.971218 Spain
130 Severance Hospital, Seoul 1.0 0.0 2020-09-28 37.562258 126.940570 South Korea
131 Al-Adan Hospital 8.0 0.0 2020-09-28 5.258065 96.007263 Indonesia
132 Medizinische Klinik und Poliklinik II, Munich 21.0 0.0 2020-09-28 48.137430 11.575490 Germany
133 Barwon Health, VIC 0.0 0.0 2020-09-28 -38.152080 144.365610 Australia
134 Box Hill Hospital, VIC 0.0 0.0 2020-09-28 -37.813614 145.118405 Australia
135 Gold Coast Hospital, QLD 0.0 0.0 2020-09-28 -28.002373 153.414599 Australia
136 Launceston Hospital, TAS 0.0 0.0 2020-09-28 -41.434081 147.137350 Australia
137 Royal Adelaide Hospital, SA 0.0 0.0 2020-09-28 -34.920724 138.586599 Australia
138 Royal Children's Hospital, VIC 0.0 0.0 2020-09-28 -37.793427 144.949575 Australia
139 Royal North Shore Hospital, NSW 0.0 0.0 2020-09-28 -33.821411 151.191138 Australia
140 Royal Prince Alfred Hospital, NSW 0.0 0.0 2020-09-28 -33.889744 151.181500 Australia
141 St George Hospital, NSW 0.0 0.0 2020-09-28 -33.967165 151.134025 Australia
142 St Vincent's Hospital Sydney, NSW 0.0 0.0 2020-09-28 -33.880568 151.220564 Australia
143 The Alfred Hospital, VIC 0.0 0.0 2020-09-28 -37.846075 144.982554 Australia
144 Westmead Hospital, NSW 0.0 0.0 2020-09-28 -33.802939 150.987761 Australia

Basemap time

This is the hard part now. Not coding wise, but installing basemap can be a real pain. So I’m going to skip over it. We start by defining a nice dark base figure:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

def get_base_fig():
    # Lets define some colors
    bg_color = "#000000"
    coast_color = "#333333"
    country_color = "#222222"
    fig = plt.figure(figsize=(12, 6))
    m = Basemap(projection='cyl', llcrnrlat=-70,urcrnrlat=90,
                llcrnrlon=-170, urcrnrlon=190, area_thresh=10000.)
    m.fillcontinents(color=bg_color, lake_color=bg_color, zorder=-2)
    m.drawcoastlines(color=coast_color, linewidth=0.7, zorder=-1)
    m.drawcountries(color=country_color, linewidth=0.7, zorder=-1)
    m.drawmapboundary(fill_color=bg_color, zorder=-2)
    return fig, m

get_base_fig();

png

Add each site

Instead of randomly assigning colours, I’ve tried to make the colour for each country somewhat related to the country itself. Normally via primary flag colour, although the fact most flags only use a very small subset of colours is quite difficult to work with.

Let’s just get a snapshot for the final day in our dataset.

import numpy as np

# Colours based roughly on primary colour in countries flag
colors = {
    "Australia": "#FFD54F",
    "United States of America": "#1e88e5",
    "United Kingdom": "#4FC3F7",
    "Estonia": "#1E88E5",
    "Taiwan": "#E53935",
    "Vietnam": "#C62828",
    "Ireland": "#FFA726",
    "Brazil": "#4CAF50",
    "Argentina": "#4FC3F7",
    "Chile": "#F44336",
    "Indonesia": "#FF8A80",
    "Japan": "#C62828",
    "Germany": "#E040FB",
    "South Korea": "#BBDEFB",
    "Qatar": "#AD1457",
    "Poland": "#E53935",
    "Spain": "#FFB300",
    "Australia": "#FFCA28",
    "Russia": "#3F51B5",
    "Benin": "#558B2F",
    "Saudi Arabia": "#1B5E20",
    "Hong Kong": "#D84315",
    "France": "#01579B",
    "The Netherlands": "#B71C1C",
    "Belgium": "#FDD835",
    "Kuwait": "#4CAF50",
    "Yemen": "#D81B60",
    "Italy": "#8BC34A",
    "Austria": "#C62828",
    "Mexico": "#4CAF50",
    "Portugal": "#F44336",
    "South Africa": "#8BC34A",
    "Cameroon": "#2bab7a",
    "Malta": "#ed2f3c",
    "Thailand": "#458eed",
}

def get_scatter(data):
    fig, m = get_base_fig()
    # Loop over each country and its institutions

    for country in np.unique(data.country):
        c = colors.get(country, "#FF99FF")
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        m.scatter(subset.long, subset.lat, latlon=True, c=c, s=s, zorder=1)
    return m

get_scatter(test_row);

png

I mean… it’s nice. But cool graphics glow and have changing colours. So I’ll define a colormap that allows me to brighten the colour of each site when new patients come in, so that they flicker, glow and grow as they add patients into the system.

from matplotlib.colors import LinearSegmentedColormap as LSC

def get_shaded(data, date, frame=0, show=False):
    fig, m = get_base_fig()
    # Loop over each country and its institutions

    max_v = data.change.max() + 1

    for country in np.unique(data.country):
        c = colors.get(country)
        if c is None:
            c = "#FF99FF"
            print(f"Cannot find colour for country {country}")
        # From base colour, increase intensity of patients added today
        cmap = LSC.from_list("fade", [c, "#FFFFFF"], N=100)
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        cs = cmap(2 * subset.change / max_v)
        m.scatter(subset.long, subset.lat, latlon=True, c=cs, s=s, zorder=1)

    # Set the title, and make the background black
    plt.title("CCCC Patient Contributions", fontsize=16,
              color="#EEEEEE", fontname="Open Sans", y=1.03)
    d = pd.to_datetime(date).strftime("%d - %B")
    ax = fig.get_axes()[0]
    plt.text(0.5, 1.02, d, c="#AAAAAA", fontsize=14,
             verticalalignment="top", horizontalalignment="center",
             transform=ax.transAxes)

    fig.patch.set_facecolor("#000000")
    if show:
        return fig
    else:
        name = f"cccc_enrolment/output/{frame:04d}.png"
        fig.savefig(name, bbox_inches="tight", padding=0, facecolor=fig.get_facecolor(), transparent=False, dpi=300)
        plt.close(fig)

get_shaded(test_row, "2020-03-23", show=True);

png

Now let’s test this whole thing animated, so lets loop over every row and output a frame that I’ll stitch together using ffmpeg. Normally I’d use joblib for this… but basemap really doesn’t like that.

def plot_date(i, date):
    data = get_row(date)
    get_shaded(data, date, i)

for i, date in enumerate(data_fixed.index):
    plot_date(i, date)

Now that we have a bunch of frames, lets turn it into a ncie MP4 video. But lets be fancy, and have this bad boy glow. To do this, Im going to load in a mask (to make sure the title doesnt glow), and run it through a filter complex that took me 4 hours to debug until it worked. It will also add a few seconds of pause at the end, so on looping players people can still see the final result.

ffmpeg -r 30 -i cccc_enrolment/output/%04d.png -i cccc_enrolment/mask.png -filter_complex "      [1]setsar=sar=0[p],
[0]split[a][b],
[a][p]overlay,lumakey=0:tolerance=0.3:softness=0.3[x];
color=black,format=rgb24[c];
[c][x]scale2ref[c][i];
[c][i]overlay=format=auto:shortest=1,
setsar=sar=1,
gblur=30:3,
curves=all='0/0 0.5/0.9 1/0.9'[d],
[b]setsar=sar=1[e],
[d][e]blend=all_mode=addition,
scale=1920:-2,
tpad=stop_mode=clone:stop_duration=4
" -vcodec libx264 -crf 23 -movflags faststart -pix_fmt yuv420p cccc_enrolment/contributions.mp4

And there it is! Perhaps soon I’ll go through and manually add all the site names in, but for now, I feel this does a pretty good job of showing just how international our collaboration is.


For your convenience, here’s the code in one block:

import pandas as pd
import numpy as np
data = pd.read_csv("cccc_enrolment/enrolment_site.csv", parse_dates=[0], index_col=0)
data = data.fillna(0).astype(int)
data.iloc[-5:, :5]
sites = [c.split("-", 1)[1] if "-" in c else c for c in data.columns]
print(sites[60:70])
# Some of the sites are missing spaces for some reason
def fix_site(site):
    if " " not in site:
        # Dont fix a sitename which is an acryonym
        if site != site.upper():
            site = ''.join(map(lambda x: x if x.islower() else " " + x, site))
    return site.strip()

sites_fixed = [fix_site(s) for s in sites]
print(sites_fixed[60:70])
data_fixed = data.copy()
data_fixed.columns = sites_fixed
data_fixed.iloc[-5:, :5]
# Interpolation
fr = 30  # frame rate
t = 12  # seconds
new_index = pd.date_range("2020-02-01", data_fixed.index.max(), fr * t)

# Combine index, interp, remove original index
data_fixed = data_fixed.reindex(new_index | data_fixed.index).interpolate().loc[new_index]
data_fixed_change = data_fixed.diff().fillna(0)
from opencage.geocoder import OpenCageGeocode
key = "" # The trial version allows you do this all for free
geocoder = OpenCageGeocode(key)
def get_lat_long_from_site(query):
    results = geocoder.geocode(query)
    if not len(results):
        print(f"{query} unable to be located")
        return None
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']
    country = results[0]["components"]["country"]
    return (lat, long, country)
import os
import json

filename = "cccc_enrolment/site_locations.json"

# Check if file exists
if os.path.exists(filename):
    with open(filename) as f:
        coords = json.load(f)

# Add manual ones that I know wont be found
coords["Uniklinik (University Hospital Frankfurt)"] = 50.0936204, 8.6506709, "Germany"
coords["Prof Dr R. D. Kandou Central Hospital - Paediatric"] = 1.453734, 124.8056623, "Indonesia"
coords["Prof Dr R. D. Kandou Central Hospital - Adult"] = 1.45, 124.80, "Indonesia"
coords["Kyoto Prefectural University of Medicine"] = 35.0243414, 135.7682285, "Japan"
coords["ISMETT"] = 38.1084401, 13.3613329, "Italy"
coords["Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed Hospitals"] = 29.3876968, 47.9881274, "Kuwait"
coords["Dr Sardjito Government Hospital - Paediatric"] = -7.768611, 110.3712855, "Indonesia"
coords["Hospitaldel Torax"] = 41.594067, 2.007054, "Spain"

# Check we have all the sites we need
save = False
for s in sites_fixed:
    if s not in coords:
        coords[s] = get_lat_long_from_site(s)
        save = True

    # If we've updated, save it out
    if save:
        with open(filename, "w") as f:
            json.dump(coords, f)

print(f"We now have {len(coords.keys())} sites ready to go!")
def get_row(date):
    row = data_fixed.loc[date].to_frame().reset_index()
    change = data_fixed_change.loc[date].to_frame().reset_index()
    row.columns = ["site", "enrolment"]
    change.columns = ["site", "change"]
    row = row.merge(change, on="site")
    row["date"] = date
    row["coord"] = row["site"].map(coords)
    row["lat"] = row["coord"].str[0]
    row["long"] = row["coord"].str[1]
    row["country"] = row["coord"].str[2]
    row = row.drop(columns="coord")

    # Manually fix up the issues to separate HK and China
    hk = np.abs(row.lat - 22.3) < 0.2
    row.loc[hk, "country"] = "Hong Kong"
    np.random.seed(1)
    row.loc[hk, "lat"] += np.random.normal(scale=0.5, size=hk.sum())
    row.loc[hk, "long"] += np.random.normal(scale=0.5, size=hk.sum())


    return row

test_row = get_row(data_fixed.index.max())
test_row
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

def get_base_fig():
    # Lets define some colors
    bg_color = "#000000"
    coast_color = "#333333"
    country_color = "#222222"
    fig = plt.figure(figsize=(12, 6))
    m = Basemap(projection='cyl', llcrnrlat=-70,urcrnrlat=90,
                llcrnrlon=-170, urcrnrlon=190, area_thresh=10000.)
    m.fillcontinents(color=bg_color, lake_color=bg_color, zorder=-2)
    m.drawcoastlines(color=coast_color, linewidth=0.7, zorder=-1)
    m.drawcountries(color=country_color, linewidth=0.7, zorder=-1)
    m.drawmapboundary(fill_color=bg_color, zorder=-2)
    return fig, m

get_base_fig();
import numpy as np

# Colours based roughly on primary colour in countries flag
colors = {
    "Australia": "#FFD54F",
    "United States of America": "#1e88e5",
    "United Kingdom": "#4FC3F7",
    "Estonia": "#1E88E5",
    "Taiwan": "#E53935",
    "Vietnam": "#C62828",
    "Ireland": "#FFA726",
    "Brazil": "#4CAF50",
    "Argentina": "#4FC3F7",
    "Chile": "#F44336",
    "Indonesia": "#FF8A80",
    "Japan": "#C62828",
    "Germany": "#E040FB",
    "South Korea": "#BBDEFB",
    "Qatar": "#AD1457",
    "Poland": "#E53935",
    "Spain": "#FFB300",
    "Australia": "#FFCA28",
    "Russia": "#3F51B5",
    "Benin": "#558B2F",
    "Saudi Arabia": "#1B5E20",
    "Hong Kong": "#D84315",
    "France": "#01579B",
    "The Netherlands": "#B71C1C",
    "Belgium": "#FDD835",
    "Kuwait": "#4CAF50",
    "Yemen": "#D81B60",
    "Italy": "#8BC34A",
    "Austria": "#C62828",
    "Mexico": "#4CAF50",
    "Portugal": "#F44336",
    "South Africa": "#8BC34A",
    "Cameroon": "#2bab7a",
    "Malta": "#ed2f3c",
    "Thailand": "#458eed",
}

def get_scatter(data):
    fig, m = get_base_fig()
    # Loop over each country and its institutions

    for country in np.unique(data.country):
        c = colors.get(country, "#FF99FF")
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        m.scatter(subset.long, subset.lat, latlon=True, c=c, s=s, zorder=1)
    return m

get_scatter(test_row);
from matplotlib.colors import LinearSegmentedColormap as LSC

def get_shaded(data, date, frame=0, show=False):
    fig, m = get_base_fig()
    # Loop over each country and its institutions

    max_v = data.change.max() + 1

    for country in np.unique(data.country):
        c = colors.get(country)
        if c is None:
            c = "#FF99FF"
            print(f"Cannot find colour for country {country}")
        # From base colour, increase intensity of patients added today
        cmap = LSC.from_list("fade", [c, "#FFFFFF"], N=100)
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        cs = cmap(2 * subset.change / max_v)
        m.scatter(subset.long, subset.lat, latlon=True, c=cs, s=s, zorder=1)

    # Set the title, and make the background black
    plt.title("CCCC Patient Contributions", fontsize=16,
              color="#EEEEEE", fontname="Open Sans", y=1.03)
    d = pd.to_datetime(date).strftime("%d - %B")
    ax = fig.get_axes()[0]
    plt.text(0.5, 1.02, d, c="#AAAAAA", fontsize=14,
             verticalalignment="top", horizontalalignment="center",
             transform=ax.transAxes)

    fig.patch.set_facecolor("#000000")
    if show:
        return fig
    else:
        name = f"cccc_enrolment/output/{frame:04d}.png"
        fig.savefig(name, bbox_inches="tight", padding=0, facecolor=fig.get_facecolor(), transparent=False, dpi=300)
        plt.close(fig)

get_shaded(test_row, "2020-03-23", show=True);
def plot_date(i, date):
    data = get_row(date)
    get_shaded(data, date, i)

for i, date in enumerate(data_fixed.index):
    plot_date(i, date)