Here at DataHub and Datopian, we recently celebrated Open Data Day 2020. If you’re not familiar with Open Data Day, it’s an annual worldwide celebration of open data.

For part of our day, we decided to clean up and package some data on COVID-19 (coronavirus). The data includes province/state, country/region, latitude, longitude, date, confirmed, recovered, and deaths. Our source was from the Data Repository by Johns Hopkins CSSE, which is updated daily by Johns Hopkins Whiting School of Engineering.

To clean up the data, we used a Python library called dataflows, which is available in the PyPI, and on GitHub. We used this library to unpivot the data, accumulate the daily cases, and consolidate our 3 sources (Johns Hopkins has separate CSV files for cases: confirmed, recovered, and deaths).

The source code and results can be found on GitHub, and a published dataset can be found here on DataHub. Our next step is to release a visualization.

Whether or not you’ve participated in Open Data Day before, we hope to see you participate next year!

If you have questions, comments or feedback join our Discord logo chat channel or open an issue on our GitHub logo tracker.