Visualising Patient Contributions for the CCCC

6th July 2020

A visual thank you to our contributing sites

The COVID-19 Critical Care Consortium is an international collaboration of hundreds of hospital sites from dozens of countries around the world. Our sites have been slowly but steadily gathering data for critical cases of COVID-19 since earlier in the year, and we thought it would be good to show just how international the collaboration is.

To that end, in my role as one of the technical leads (responsible for the data pipeline and ingestion), I created out a simple (and no-risk) data product simply containing a list of our participant sites, and how many patients they have enrolled in our study at a given date. The details for those patients are - of course - not contained in this data product in any way.

This is what we’re going to be making.

Basic Dataset Prep

So let’s start things up, and load in the enrolment data.

import pandas as pd
import numpy as np
data = pd.read_csv("cccc_enrolment/enrolment_site.csv", parse_dates=[0], index_col=0)
data = data.fillna(0).astype(int)
data.iloc[-5:, :5]
00543-Medical University of Vienna00544-Lancaster General Health00546-Penn Medicine00547-Oklahoma Heart Institute00548-UH Cleveland Hospital
date_enrolment
2020-09-2493942226
2020-09-2593942226
2020-09-2693942226
2020-09-2793942226
2020-09-2893942226

Great, so you can see we have the vetical axis representing the date, and each site is a column, with its numeric identifier first. Lets remove those IDs, because they are useful in a database, but not for us.

sites = [c.split("-", 1)[1] if "-" in c else c for c in data.columns]
print(sites[60:70])
['Fukuoka University', 'Mater Dei Hospital', 'Yokohama City University Medical Center', 'Nagoya University Hospital', 'PICU Saiful Anwar Hospital', 'Adult ICU Saiful Anwar Hospital', 'KourituTouseiHospital', 'HokkaidoUniversityHospital', 'ChibaUniversityHospital', 'UniversityofAlabamaatBirminghamHospital']

Looking good so far. However, I know that some of the sites have - for some reason - had all whitespace removed. So lets write a small parser to go from CamelCase to normal words.

# Some of the sites are missing spaces for some reason
def fix_site(site):
    if " " not in site:
        # Dont fix a sitename which is an acryonym
        if site != site.upper():
            site = ''.join(map(lambda x: x if x.islower() else " " + x, site))   
    return site.strip()

sites_fixed = [fix_site(s) for s in sites]
print(sites_fixed[60:70])
['Fukuoka University', 'Mater Dei Hospital', 'Yokohama City University Medical Center', 'Nagoya University Hospital', 'PICU Saiful Anwar Hospital', 'Adult ICU Saiful Anwar Hospital', 'Kouritu Tousei Hospital', 'Hokkaido University Hospital', 'Chiba University Hospital', 'Universityof Alabamaat Birmingham Hospital']

This is probably as good as we can get it right now. So lets copy the dataframe so we don’t clobber the original, and update the columns.

data_fixed = data.copy()
data_fixed.columns = sites_fixed
data_fixed.iloc[-5:, :5]
Medical University of ViennaLancaster General HealthPenn MedicineOklahoma Heart InstituteUH Cleveland Hospital
date_enrolment
2020-09-2493942226
2020-09-2593942226
2020-09-2693942226
2020-09-2793942226
2020-09-2893942226

Now, I do want to make an animation out of this, with more frames than days, so we’ll just do a super simple interpolation to add extra evenly spaced datetimes that will correspond to each frame. In addition, I’ll start the clock ticking from February first.

# Interpolation
fr = 30  # frame rate
t = 12  # seconds
new_index = pd.date_range("2020-02-01", data_fixed.index.max(), fr * t)

# Combine index, interp, remove original index
data_fixed = data_fixed.reindex(new_index | data_fixed.index).interpolate().loc[new_index]

And I also want to have the animation flash or brighten a bit when sites add new patients, so to get a feel for that, we’ll simply take the difference in rows (and fillna to put zero in the first row).

data_fixed_change = data_fixed.diff().fillna(0)

Getting coordinates

Each site obviously represents a specific physical location on the planet. Alas, I do not know this - all I have is a name. So, lets use opencage to do a search for each site name, and extract the latitude and longitude for each site if we can find it. I don’t expect this to work for them all, but I’d rather manually look up ten sites than a hundred.

Let’s set up the library with our token to start with:

from opencage.geocoder import OpenCageGeocode
key = "" # The trial version allows you do this all for free
geocoder = OpenCageGeocode(key)

And then write a little function that - when given a query - will try and find the latitude, longitude and country. If it can’t find anything, we’ll return None and I’ll do it myself.

def get_lat_long_from_site(query):
    results = geocoder.geocode(query)
    if not len(results):
        print(f"{query} unable to be located")
        return None
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']
    country = results[0]["components"]["country"]
    return (lat, long, country)

And to make sure I don’t spam this API over and over, we’ll run this once, save it out to JSON, and when I run this again in the future we can just read the file in.

import os
import json

filename = "cccc_enrolment/site_locations.json"

# Check if file exists
if os.path.exists(filename):
    with open(filename) as f:
        coords = json.load(f)

# Add manual ones that I know wont be found
coords["Uniklinik (University Hospital Frankfurt)"] = 50.0936204, 8.6506709, "Germany"
coords["Prof Dr R. D. Kandou Central Hospital - Paediatric"] = 1.453734, 124.8056623, "Indonesia"
coords["Prof Dr R. D. Kandou Central Hospital - Adult"] = 1.45, 124.80, "Indonesia"
coords["Kyoto Prefectural University of Medicine"] = 35.0243414, 135.7682285, "Japan"
coords["ISMETT"] = 38.1084401, 13.3613329, "Italy"
coords["Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed Hospitals"] = 29.3876968, 47.9881274, "Kuwait"
coords["Dr Sardjito Government Hospital - Paediatric"] = -7.768611, 110.3712855, "Indonesia"
coords["Hospitaldel Torax"] = 41.594067, 2.007054, "Spain"

# Check we have all the sites we need
save = False
for s in sites_fixed:
    if s not in coords:
        coords[s] = get_lat_long_from_site(s)
        save = True
        
    # If we've updated, save it out
    if save:
        with open(filename, "w") as f:
            json.dump(coords, f)

print(f"We now have {len(coords.keys())} sites ready to go!")
We now have 148 sites ready to go!

Great! Onto the next part…

Plotting a specific datetime

Our dataframe is broken up into a lot of rows, where each row now represents a frame in the animation. Lets write a function to extract a row and put it into something easier to work with when plotting.

def get_row(date):
    row = data_fixed.loc[date].to_frame().reset_index()
    change = data_fixed_change.loc[date].to_frame().reset_index()
    row.columns = ["site", "enrolment"]
    change.columns = ["site", "change"]
    row = row.merge(change, on="site")
    row["date"] = date
    row["coord"] = row["site"].map(coords)
    row["lat"] = row["coord"].str[0]
    row["long"] = row["coord"].str[1]
    row["country"] = row["coord"].str[2]
    row = row.drop(columns="coord")

    # Manually fix up the issues to separate HK and China
    hk = np.abs(row.lat - 22.3) < 0.2
    row.loc[hk, "country"] = "Hong Kong"
    np.random.seed(1)
    row.loc[hk, "lat"] += np.random.normal(scale=0.5, size=hk.sum())
    row.loc[hk, "long"] += np.random.normal(scale=0.5, size=hk.sum())

    
    return row

test_row = get_row(data_fixed.index.max())
test_row
siteenrolmentchangedatelatlongcountry
0Medical University of Vienna9.00.02020-09-2848.20849016.372080Austria
1Lancaster General Health39.00.02020-09-2854.016293-2.793612United Kingdom
2Penn Medicine42.00.02020-09-2839.957043-75.197520United States of America
3Oklahoma Heart Institute2.00.02020-09-2836.029075-95.869532United States of America
4UH Cleveland Hospital26.00.02020-09-2841.504861-81.605748United States of America
5Ohio State University121.00.02020-09-2840.005709-83.028663United States of America
6North Estonia Medical Centre, Tallin20.00.02020-09-2859.39616824.698524Estonia
7Tartu University Hospital, Tartu14.00.02020-09-2858.36945626.700090Estonia
8National Taiwan University Hospital1.00.02020-09-2825.016828121.538469Taiwan
9Hospital for Tropical Diseases, Vietnam1.00.02020-09-2810.753047106.678478Vietnam
10Keimyung University Dong San Hospital2.00.02020-09-2852.473060-8.430560Ireland
11Groote Schuur Hospital207.00.02020-09-28-33.94117018.462639South Africa
12The Heart Hospital Baylor Plano1.00.02020-09-2833.014781-96.789958United States of America
13Baylor University Medical Centre8.00.02020-09-2832.786683-96.781894United States of America
14Baylor Scott3.00.02020-09-2831.077858-97.362730United States of America
15Hospital Nuestra Señora de Gracia Zaragoza20.00.02020-09-2832.502700-117.003710Mexico
16São João Hospital Centre, Portugal1.00.02020-09-2839.980168-8.318848Portugal
17Piedmont Atlanta Hospital, USA40.00.02020-09-2833.789048-84.371886United States of America
18Washington University in St. Louis, USA20.00.02020-09-2838.647240-90.308402United States of America
19Medical College of Wisconsin, USA31.00.02020-09-2843.043867-88.022453United States of America
20Policlinico di S. Orsola, Università di Bologn...5.00.02020-09-2844.49653011.353080Italy
21Fundación Cardiovascular de Colombia, Colombia2.00.02020-09-28-20.175830-48.688890Brazil
22INOVA Fairfax Medical Center, USA3.00.02020-09-2841.620447-86.228800United States of America
23Hospital de Clínicas, Argentina7.00.02020-09-28-34.599502-58.400386Argentina
24Allegheny General Hospital4.00.02020-09-2840.456969-80.003311United States of America
25Clinica Alemana De Santiago1.00.02020-09-28-33.391999-70.572619Chile
26Kyoto Medical Centre1.00.02020-09-28-1.246694116.862195Indonesia
27Hiroshima University4.00.02020-09-2834.401977132.712320Japan
28Stanford University1.00.02020-09-2837.431314-122.169365United States of America
29Tufts Medical Centre1.00.02020-09-2842.349559-71.063411United States of America
30Carilion Clinic13.00.02020-09-2837.088584-80.505764United States of America
31Beth Israel Deaconess Medical Center11.00.02020-09-2836.858403-76.305431United States of America
32Clinica Las Condez, Chile52.00.02020-09-28-30.000000-71.000000Chile
33Hyogo Prefectural Kakogawa Medical Center16.00.02020-09-2836.858403-76.305431United States of America
34University of California San Francisco - Fresno30.00.02020-09-2836.747730-119.772370United States of America
35Uniklinik (University Hospital Frankfurt)1.00.02020-09-2850.0936208.650671Germany
36Seoul National University Bundang Hospital12.00.02020-09-2837.349249127.123941South Korea
37University of Iowa12.00.02020-09-2841.665850-91.573107United States of America
38University of Cincinnati9.00.02020-09-2839.131853-84.515762United States of America
39Rio Hortega University Hospital12.00.02020-09-2833.470630-81.984130United States of America
40Hamad General Hospital52.00.02020-09-2825.29351851.502231Qatar
41Presbyterian Hospital Services51.00.02020-09-2835.635979-105.962692United States of America
42Clinica Valle de Lilli46.00.02020-09-2859.36944025.359170Estonia
43University Hospital in Krakow15.00.02020-09-2850.06143019.936580Poland
44The University of Utah53.00.02020-09-2840.762814-111.836872United States of America
45Ospedale di Arco1.00.02020-09-2845.91564310.879765Italy
46Ospedale San Paolo58.00.02020-09-2841.11763116.779988Italy
47Hospital Universitario Sant Joan d'Alacant3.00.02020-09-2838.401480-0.436230Spain
48Kimitsu Chuo Hospital2.00.02020-09-2835.327470139.907261Japan
49Fatmawati Hospital43.00.02020-09-28-6.292454106.792423Indonesia
50Rinku General Medical Center1.00.02020-09-2834.411921135.302686Japan
51Hospital Universitari Sagrat Cor2.00.02020-09-28-31.413500-64.181050Argentina
52Cleveland Clinic - Florida3.00.02020-09-2826.080382-80.364079United States of America
53San Martino Hospital4.00.02020-09-2843.71305910.404481Italy
54Hospital Alemán6.00.02020-09-28-34.591840-58.401984Argentina
55San Pedro de Alcantara Hospital2.00.02020-09-2836.486635-4.990532Spain
56Legacy Emanuel Medical Center11.00.02020-09-2845.543893-122.670033United States of America
57Kyoto Prefectural University of Medicine20.00.02020-09-2835.024341135.768228Japan
58Lankenau Institute of Medical Research36.00.02020-09-28-37.700000145.183330Australia
59Providence Saint John's Health Centre8.00.02020-09-2834.030577-118.479544United States of America
60Fukuoka University1.00.02020-09-2833.548443130.364514Japan
61Mater Dei Hospital1.00.02020-09-2835.90180714.476600Malta
62Yokohama City University Medical Center3.00.02020-09-2832.240500-110.945940United States of America
63Nagoya University Hospital27.00.02020-09-2835.153309136.967781Japan
64PICU Saiful Anwar Hospital21.00.02020-09-2852.473060-8.430560Ireland
65Adult ICU Saiful Anwar Hospital13.00.02020-09-2852.473060-8.430560Ireland
66Kouritu Tousei Hospital2.00.02020-09-2852.473060-8.430560Ireland
67Hokkaido University Hospital2.00.02020-09-2843.079008141.337729Japan
68Chiba University Hospital2.00.02020-09-2835.627869140.103466Japan
69Universityof Alabamaat Birmingham Hospital11.00.02020-09-2852.473060-8.430560Ireland
70Universityof Florida2.00.02020-09-2827.945565-82.463843United States of America
71Saiseikai Senri Hospital5.00.02020-09-2852.473060-8.430560Ireland
72Rush University26.00.02020-09-2841.873644-87.669498United States of America
73University of Chicago14.00.02020-09-2841.784977-87.590524United States of America
74Johns Hopkins University1.00.02020-09-2844.49477011.355897Italy
75Hospitaldel Torax5.00.02020-09-2841.5940672.007054Spain
76Persahabatan Hospital92.00.02020-09-2855.65062137.501444Russia
77Universityof Oklahoma Health Sciences Center2.00.02020-09-287.2500002.166670Benin
78The Christ Hospital1.00.02020-09-2839.120839-84.511113United States of America
79Hasan Sadikin Hospital ( Adult )19.00.02020-09-2852.473060-8.430560Ireland
80Kyung Pook National University Chilgok Hospital8.00.02020-09-2852.473060-8.430560Ireland
81Hospital Mount Sinai Medical Center7.00.02020-09-2825.813381-80.140873United States of America
82Hospital Vergedela Cintade Tortosa28.00.02020-09-2840.8124900.521600Spain
83Prof Dr R. D. Kandou Central Hospital - Paedia...4.00.02020-09-281.453734124.805662Indonesia
84Prof Dr R. D. Kandou Central Hospital - Adult13.00.02020-09-281.450000124.800000Indonesia
85Tokyo Metropolitan Tama Medical Center3.00.02020-09-2836.858403-76.305431United States of America
86Universityof Maryland1.00.02020-09-2838.992516-76.991021United States of America
87Mar del Plata Medical Foundation Private Commu...25.00.52020-09-2852.473060-8.430560Ireland
88Dr Sardjito Government Hospital - Paediatric3.00.02020-09-28-7.768611110.371285Indonesia
89London Health Sciences Centre2.00.02020-09-284.75000011.833330Cameroon
90Hospital du Sacre Coeur1.00.02020-09-2848.8868062.343015France
91Mayo Clinic College of Medicine - Arizona8.00.02020-09-2832.973525-111.515363United States of America
92Shizuoka Children's Hospital1.00.02020-09-2852.473060-8.430560Ireland
93Rochester General Hospital2.00.02020-09-2843.192043-77.588289United States of America
94Siriraj Hospital2.00.02020-09-2813.757829100.485379Thailand
95Obihiro-Kosei General Hospital2.00.02020-09-2852.473060-8.430560Ireland
96King Faisal Specialist Hospital and Research C...22.00.02020-09-2821.56001939.148056Saudi Arabia
97University of Nebraska Medical Center36.00.02020-09-2841.256303-95.977842United States of America
98Foothills Hospital71.00.02020-09-2840.016004-105.236631United States of America
99Queen Mary Hospital the University of Hong Kong14.00.02020-09-2823.093693114.571774Hong Kong
100Galway University Hospital67.00.02020-09-2853.277087-9.066559Ireland
101Teine Keijinkai Hospital16.00.02020-09-2852.473060-8.430560Ireland
102Fondazione IRCCS Ca75.00.02020-09-2837.250220-119.751260United States of America
103Fondazione Policlinico Universitario Agostino ...12.00.02020-09-2841.89193012.511330Italy
104Ospedale Molinette Torino20.00.02020-09-2845.0394647.674405Italy
105Oregon Health and Science University Hospital2.00.02020-09-2845.499038-122.685695United States of America
106Hospital Clinic, Barcelona58.00.02020-09-2841.3880622.150639Spain
107Columbia University1.00.02020-09-2840.807949-73.961797United States of America
108Klinik für Innere Medizin II55.00.02020-09-2854.08902212.109247Germany
109Hospital Vall D Hebron22.00.02020-09-2848.6000461.675945France
110Pamela Youde Nethersole Eastern Hospital5.00.02020-09-2821.964346113.086001Hong Kong
111Maastricht University Medical Centre1.00.02020-09-2850.8579855.696988The Netherlands
112Sozialmedizinisches Zentrum S7.00.02020-09-2848.17349616.350574Austria
113Saiseikai Utsunomiya Hospital2.00.02020-09-2852.473060-8.430560Ireland
114Chonnam National University Hospital1.00.02020-09-2835.176906126.906909South Korea
115Ospedale San Gerardo55.00.02020-09-2845.6021549.260360Italy
116Policlinico of Padova, Padova13.00.02020-09-2845.34927411.786716Italy
117Barmherzige Brüder Regensburg1.00.02020-09-2849.05215012.079680Germany
118Civil Hospital Marie Curie, Brussels44.00.02020-09-2850.8504504.348780Belgium
119St. Marianna University School of Medicine1.00.02020-09-2835.600212139.548866Japan
120ISMETT1.00.02020-09-2838.10844013.361333Italy
121Mater Misericordiae University Hospital, Ireland31.00.02020-09-2853.359704-6.267077Ireland
122Harapan Kita National Heart Centre Hospital (P...10.00.02020-09-2852.473060-8.430560Ireland
123Princess Margaret Hospital, Hong Kong6.00.02020-09-2822.077152115.006972Hong Kong
124Queen Elizabeth Hospital, Hong Kong7.00.02020-09-2821.772608113.794131Hong Kong
125Klinikum Passau2.00.02020-09-2848.56503613.445336Germany
126Hartford HealthCare31.00.02020-09-2841.681286-71.912487United States of America
127Hospitales Puerta de Hierro, Mexico4.00.02020-09-2823.000000-102.000000Mexico
128Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed...17.00.02020-09-2829.38769747.988127Kuwait
129Hospital Universitario Virgen de Valme1.00.02020-09-2837.318825-5.971218Spain
130Severance Hospital, Seoul1.00.02020-09-2837.562258126.940570South Korea
131Al-Adan Hospital8.00.02020-09-285.25806596.007263Indonesia
132Medizinische Klinik und Poliklinik II, Munich21.00.02020-09-2848.13743011.575490Germany
133Barwon Health, VIC0.00.02020-09-28-38.152080144.365610Australia
134Box Hill Hospital, VIC0.00.02020-09-28-37.813614145.118405Australia
135Gold Coast Hospital, QLD0.00.02020-09-28-28.002373153.414599Australia
136Launceston Hospital, TAS0.00.02020-09-28-41.434081147.137350Australia
137Royal Adelaide Hospital, SA0.00.02020-09-28-34.920724138.586599Australia
138Royal Children's Hospital, VIC0.00.02020-09-28-37.793427144.949575Australia
139Royal North Shore Hospital, NSW0.00.02020-09-28-33.821411151.191138Australia
140Royal Prince Alfred Hospital, NSW0.00.02020-09-28-33.889744151.181500Australia
141St George Hospital, NSW0.00.02020-09-28-33.967165151.134025Australia
142St Vincent's Hospital Sydney, NSW0.00.02020-09-28-33.880568151.220564Australia
143The Alfred Hospital, VIC0.00.02020-09-28-37.846075144.982554Australia
144Westmead Hospital, NSW0.00.02020-09-28-33.802939150.987761Australia

Basemap time

This is the hard part now. Not coding wise, but installing basemap can be a real pain. So I’m going to skip over it. We start by defining a nice dark base figure:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

def get_base_fig():
    # Lets define some colors
    bg_color = "#000000"
    coast_color = "#333333"
    country_color = "#222222"
    fig = plt.figure(figsize=(12, 6))
    m = Basemap(projection='cyl', llcrnrlat=-70,urcrnrlat=90, 
                llcrnrlon=-170, urcrnrlon=190, area_thresh=10000.)
    m.fillcontinents(color=bg_color, lake_color=bg_color, zorder=-2)
    m.drawcoastlines(color=coast_color, linewidth=0.7, zorder=-1)
    m.drawcountries(color=country_color, linewidth=0.7, zorder=-1)
    m.drawmapboundary(fill_color=bg_color, zorder=-2)
    return fig, m

get_base_fig();

png

Add each site

Instead of randomly assigning colours, I’ve tried to make the colour for each country somewhat related to the country itself. Normally via primary flag colour, although the fact most flags only use a very small subset of colours is quite difficult to work with.

Let’s just get a snapshot for the final day in our dataset.

import numpy as np

# Colours based roughly on primary colour in countries flag
colors = {
    "Australia": "#FFD54F",
    "United States of America": "#1e88e5",
    "United Kingdom": "#4FC3F7",
    "Estonia": "#1E88E5",
    "Taiwan": "#E53935",
    "Vietnam": "#C62828",
    "Ireland": "#FFA726",
    "Brazil": "#4CAF50",
    "Argentina": "#4FC3F7",
    "Chile": "#F44336",
    "Indonesia": "#FF8A80",
    "Japan": "#C62828",
    "Germany": "#E040FB",
    "South Korea": "#BBDEFB",
    "Qatar": "#AD1457",
    "Poland": "#E53935",
    "Spain": "#FFB300",
    "Australia": "#FFCA28",
    "Russia": "#3F51B5",
    "Benin": "#558B2F",
    "Saudi Arabia": "#1B5E20",
    "Hong Kong": "#D84315",
    "France": "#01579B",
    "The Netherlands": "#B71C1C",
    "Belgium": "#FDD835",
    "Kuwait": "#4CAF50",
    "Yemen": "#D81B60",
    "Italy": "#8BC34A",
    "Austria": "#C62828",
    "Mexico": "#4CAF50",
    "Portugal": "#F44336",
    "South Africa": "#8BC34A",
    "Cameroon": "#2bab7a",
    "Malta": "#ed2f3c",
    "Thailand": "#458eed",
}

def get_scatter(data):
    fig, m = get_base_fig()
    # Loop over each country and its institutions
    
    for country in np.unique(data.country):
        c = colors.get(country, "#FF99FF")
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        m.scatter(subset.long, subset.lat, latlon=True, c=c, s=s, zorder=1)
    return m

get_scatter(test_row);

png

I mean… it’s nice. But cool graphics glow and have changing colours. So I’ll define a colormap that allows me to brighten the colour of each site when new patients come in, so that they flicker, glow and grow as they add patients into the system.

from matplotlib.colors import LinearSegmentedColormap as LSC

def get_shaded(data, date, frame=0, show=False):
    fig, m = get_base_fig()
    # Loop over each country and its institutions
    
    max_v = data.change.max() + 1
    
    for country in np.unique(data.country):
        c = colors.get(country)
        if c is None:
            c = "#FF99FF"
            print(f"Cannot find colour for country {country}")
        # From base colour, increase intensity of patients added today
        cmap = LSC.from_list("fade", [c, "#FFFFFF"], N=100)
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        cs = cmap(2 * subset.change / max_v)
        m.scatter(subset.long, subset.lat, latlon=True, c=cs, s=s, zorder=1)
        
    # Set the title, and make the background black
    plt.title("CCCC Patient Contributions", fontsize=16, 
              color="#EEEEEE", fontname="Open Sans", y=1.03)
    d = pd.to_datetime(date).strftime("%d - %B")
    ax = fig.get_axes()[0]
    plt.text(0.5, 1.02, d, c="#AAAAAA", fontsize=14, 
             verticalalignment="top", horizontalalignment="center",
             transform=ax.transAxes)

    fig.patch.set_facecolor("#000000")
    if show:
        return fig
    else:
        name = f"cccc_enrolment/output/{frame:04d}.png"
        fig.savefig(name, bbox_inches="tight", padding=0, facecolor=fig.get_facecolor(), transparent=False, dpi=300)
        plt.close(fig)

get_shaded(test_row, "2020-03-23", show=True);

png

Now let’s test this whole thing animated, so lets loop over every row and output a frame that I’ll stitch together using ffmpeg. Normally I’d use joblib for this… but basemap really doesn’t like that.

def plot_date(i, date):
    data = get_row(date)
    get_shaded(data, date, i)
    
for i, date in enumerate(data_fixed.index):
    plot_date(i, date)

Now that we have a bunch of frames, lets turn it into a ncie MP4 video. But lets be fancy, and have this bad boy glow. To do this, Im going to load in a mask (to make sure the title doesnt glow), and run it through a filter complex that took me 4 hours to debug until it worked. It will also add a few seconds of pause at the end, so on looping players people can still see the final result.

ffmpeg -r 30 -i cccc_enrolment/output/%04d.png -i cccc_enrolment/mask.png -filter_complex "      [1]setsar=sar=0[p],
[0]split[a][b],
[a][p]overlay,lumakey=0:tolerance=0.3:softness=0.3[x];
color=black,format=rgb24[c];
[c][x]scale2ref[c][i];
[c][i]overlay=format=auto:shortest=1,
setsar=sar=1,
gblur=30:3,
curves=all='0/0 0.5/0.9 1/0.9'[d],
[b]setsar=sar=1[e],
[d][e]blend=all_mode=addition,
scale=1920:-2,
tpad=stop_mode=clone:stop_duration=4
" -vcodec libx264 -crf 23 -movflags faststart -pix_fmt yuv420p cccc_enrolment/contributions.mp4

And there it is! Perhaps soon I’ll go through and manually add all the site names in, but for now, I feel this does a pretty good job of showing just how international our collaboration is.


For your convenience, here’s the code in one block:

import pandas as pd
import numpy as np
data = pd.read_csv("cccc_enrolment/enrolment_site.csv", parse_dates=[0], index_col=0)
data = data.fillna(0).astype(int)
data.iloc[-5:, :5]
sites = [c.split("-", 1)[1] if "-" in c else c for c in data.columns]
print(sites[60:70])
# Some of the sites are missing spaces for some reason
def fix_site(site):
    if " " not in site:
        # Dont fix a sitename which is an acryonym
        if site != site.upper():
            site = ''.join(map(lambda x: x if x.islower() else " " + x, site))   
    return site.strip()

sites_fixed = [fix_site(s) for s in sites]
print(sites_fixed[60:70])
data_fixed = data.copy()
data_fixed.columns = sites_fixed
data_fixed.iloc[-5:, :5]
# Interpolation
fr = 30  # frame rate
t = 12  # seconds
new_index = pd.date_range("2020-02-01", data_fixed.index.max(), fr * t)

# Combine index, interp, remove original index
data_fixed = data_fixed.reindex(new_index | data_fixed.index).interpolate().loc[new_index]
data_fixed_change = data_fixed.diff().fillna(0)
from opencage.geocoder import OpenCageGeocode
key = "" # The trial version allows you do this all for free
geocoder = OpenCageGeocode(key)
def get_lat_long_from_site(query):
    results = geocoder.geocode(query)
    if not len(results):
        print(f"{query} unable to be located")
        return None
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']
    country = results[0]["components"]["country"]
    return (lat, long, country)
import os
import json

filename = "cccc_enrolment/site_locations.json"

# Check if file exists
if os.path.exists(filename):
    with open(filename) as f:
        coords = json.load(f)

# Add manual ones that I know wont be found
coords["Uniklinik (University Hospital Frankfurt)"] = 50.0936204, 8.6506709, "Germany"
coords["Prof Dr R. D. Kandou Central Hospital - Paediatric"] = 1.453734, 124.8056623, "Indonesia"
coords["Prof Dr R. D. Kandou Central Hospital - Adult"] = 1.45, 124.80, "Indonesia"
coords["Kyoto Prefectural University of Medicine"] = 35.0243414, 135.7682285, "Japan"
coords["ISMETT"] = 38.1084401, 13.3613329, "Italy"
coords["Kuwait ECLS program, Al-Amiri & Jaber Al-Ahmed Hospitals"] = 29.3876968, 47.9881274, "Kuwait"
coords["Dr Sardjito Government Hospital - Paediatric"] = -7.768611, 110.3712855, "Indonesia"
coords["Hospitaldel Torax"] = 41.594067, 2.007054, "Spain"

# Check we have all the sites we need
save = False
for s in sites_fixed:
    if s not in coords:
        coords[s] = get_lat_long_from_site(s)
        save = True
        
    # If we've updated, save it out
    if save:
        with open(filename, "w") as f:
            json.dump(coords, f)

print(f"We now have {len(coords.keys())} sites ready to go!")
def get_row(date):
    row = data_fixed.loc[date].to_frame().reset_index()
    change = data_fixed_change.loc[date].to_frame().reset_index()
    row.columns = ["site", "enrolment"]
    change.columns = ["site", "change"]
    row = row.merge(change, on="site")
    row["date"] = date
    row["coord"] = row["site"].map(coords)
    row["lat"] = row["coord"].str[0]
    row["long"] = row["coord"].str[1]
    row["country"] = row["coord"].str[2]
    row = row.drop(columns="coord")

    # Manually fix up the issues to separate HK and China
    hk = np.abs(row.lat - 22.3) < 0.2
    row.loc[hk, "country"] = "Hong Kong"
    np.random.seed(1)
    row.loc[hk, "lat"] += np.random.normal(scale=0.5, size=hk.sum())
    row.loc[hk, "long"] += np.random.normal(scale=0.5, size=hk.sum())

    
    return row

test_row = get_row(data_fixed.index.max())
test_row
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

def get_base_fig():
    # Lets define some colors
    bg_color = "#000000"
    coast_color = "#333333"
    country_color = "#222222"
    fig = plt.figure(figsize=(12, 6))
    m = Basemap(projection='cyl', llcrnrlat=-70,urcrnrlat=90, 
                llcrnrlon=-170, urcrnrlon=190, area_thresh=10000.)
    m.fillcontinents(color=bg_color, lake_color=bg_color, zorder=-2)
    m.drawcoastlines(color=coast_color, linewidth=0.7, zorder=-1)
    m.drawcountries(color=country_color, linewidth=0.7, zorder=-1)
    m.drawmapboundary(fill_color=bg_color, zorder=-2)
    return fig, m

get_base_fig();
import numpy as np

# Colours based roughly on primary colour in countries flag
colors = {
    "Australia": "#FFD54F",
    "United States of America": "#1e88e5",
    "United Kingdom": "#4FC3F7",
    "Estonia": "#1E88E5",
    "Taiwan": "#E53935",
    "Vietnam": "#C62828",
    "Ireland": "#FFA726",
    "Brazil": "#4CAF50",
    "Argentina": "#4FC3F7",
    "Chile": "#F44336",
    "Indonesia": "#FF8A80",
    "Japan": "#C62828",
    "Germany": "#E040FB",
    "South Korea": "#BBDEFB",
    "Qatar": "#AD1457",
    "Poland": "#E53935",
    "Spain": "#FFB300",
    "Australia": "#FFCA28",
    "Russia": "#3F51B5",
    "Benin": "#558B2F",
    "Saudi Arabia": "#1B5E20",
    "Hong Kong": "#D84315",
    "France": "#01579B",
    "The Netherlands": "#B71C1C",
    "Belgium": "#FDD835",
    "Kuwait": "#4CAF50",
    "Yemen": "#D81B60",
    "Italy": "#8BC34A",
    "Austria": "#C62828",
    "Mexico": "#4CAF50",
    "Portugal": "#F44336",
    "South Africa": "#8BC34A",
    "Cameroon": "#2bab7a",
    "Malta": "#ed2f3c",
    "Thailand": "#458eed",
}

def get_scatter(data):
    fig, m = get_base_fig()
    # Loop over each country and its institutions
    
    for country in np.unique(data.country):
        c = colors.get(country, "#FF99FF")
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        m.scatter(subset.long, subset.lat, latlon=True, c=c, s=s, zorder=1)
    return m

get_scatter(test_row);
from matplotlib.colors import LinearSegmentedColormap as LSC

def get_shaded(data, date, frame=0, show=False):
    fig, m = get_base_fig()
    # Loop over each country and its institutions
    
    max_v = data.change.max() + 1
    
    for country in np.unique(data.country):
        c = colors.get(country)
        if c is None:
            c = "#FF99FF"
            print(f"Cannot find colour for country {country}")
        # From base colour, increase intensity of patients added today
        cmap = LSC.from_list("fade", [c, "#FFFFFF"], N=100)
        subset = data.loc[(data.country == country) & (data.enrolment > 0), :]
        s = 10 + subset.enrolment
        cs = cmap(2 * subset.change / max_v)
        m.scatter(subset.long, subset.lat, latlon=True, c=cs, s=s, zorder=1)
        
    # Set the title, and make the background black
    plt.title("CCCC Patient Contributions", fontsize=16, 
              color="#EEEEEE", fontname="Open Sans", y=1.03)
    d = pd.to_datetime(date).strftime("%d - %B")
    ax = fig.get_axes()[0]
    plt.text(0.5, 1.02, d, c="#AAAAAA", fontsize=14, 
             verticalalignment="top", horizontalalignment="center",
             transform=ax.transAxes)

    fig.patch.set_facecolor("#000000")
    if show:
        return fig
    else:
        name = f"cccc_enrolment/output/{frame:04d}.png"
        fig.savefig(name, bbox_inches="tight", padding=0, facecolor=fig.get_facecolor(), transparent=False, dpi=300)
        plt.close(fig)

get_shaded(test_row, "2020-03-23", show=True);
def plot_date(i, date):
    data = get_row(date)
    get_shaded(data, date, i)
    
for i, date in enumerate(data_fixed.index):
    plot_date(i, date)