Providing and plotting data with free services is easy

Collecting and extracting data from an open data provider, put them together in a new dataset, publish and visualize it by using free services? It is easier than you think.

Carsten Sandtner
Towards Data Science

--

Photo by Isaac Smith on Unsplash

As the Corona pandemic provides tons of free accessible data, I had the idea to visualize the 7-day incidence¹ for the rural area I am living in Germany. Robert-Koch-Institute (RKI) provides an open data set² with all numbers for every county in Germany. For the visualization part I wanted to try some easy to use services. Recently, I stumbled upon some great visualizations on German news websites. They (and many more) are using a service from a German startup called Datawrapper. And they provide some free services — great let’s use them! And finally, I wanted to publish my collected data using Qri, an open platform for publishing data.

Let’s take a look at the steps until I could create and publish a visualization.

  1. Collect and save the data regularly
  2. Push the data into a repository
  3. Connect the data and create a visualization

Collect and save the data regularly

Unfortunately, RKI doesn’t publish time series data for the numbers I wanted to use. Just the numbers of the actual day. I’ve created some Python script to collect actual data, extract the numbers I want to use, add and save to my CSV-File. The script runs every day using a cronjob on a simple Raspberry Pi in my home-office. I am using the build-in power of Pandas opening remote CSV. The script is pretty small an almost self-explaining.

import pandas as pd
import os.path

# Get Data
df = pd.read_csv("https://opendata.arcgis.com/datasets/917fc37a709542548cc3be077a786c17_0.csv")

# Get desired row (128 - Rheingau-Taunus)
rtk = df.loc[[128], ['last_update', 'cases','cases_per_100k', 'cases7_per_100k', 'deaths']]

# Update data type
rtk['last_update'] = pd.to_datetime(rtk['last_update'], format="%d.%m.%Y, %H:%M Uhr")

# Add row to CSV
use_header = False if os.path.isfile('rtk.csv') else True
rtk.to_csv('rtk.csv', mode='a', header=use_header, index=False)

I’ve figured out the ID of my county is 128. Now it’s easy to extract the data. As the dataset uses a weird German date format, I’ve converted them using Pandas to_datetime.

Finally, I’m adding my data to a CSV. That’s it! Next step is providing the data using Qri.

Push the data into a repository

Qri is „an open-source project to build software for dataset synching, versioning, storing and collaboration“. It’s a bit like GitHub for datasets. They provide a GUI-Client and CLI-Tools. I’m using the CLI-Tools to be able to versioning and pushing my set regularly to their (free) cloud space and I’m able to share my dataset as an open and free resource for other people interesting in doing cool stuff. You can provide a Readme-File and add some meta information for your Dataset (Description of fields etc.). For setup instructions I refer to their CLI Quick-start Guide. After initializing and publishing the data using CLI tools your data will be accessible at their cloud space. My data is located here.

When I’ve got an updated version of my data I simply save the new version and push it to the cloud.

$ qri save --body ../inzidenz-rtk.csv
$ qri push

This process is still manually as the CLI-Tools do not work on a Raspberry Pi. My plan is to add these steps into my cron job for full automated publishing. So far, manually is sufficient.

I won’t go into details of Qri and refer to their documentation for more insights. The process above works for me and it may be not the best. If there is anything I could improve, please comment!

The best thing about Qri is the integration part. You get a link to the latest .csv for using in your favorite data analysis tools. Furthermore, other Qri users could fork your dataset.

Finally, I am using the dataset for creating a plot with Datawrapper.

Connect the data and create a visualization

Datawrapper is an awesome visualization tool. It’s not like Tableau, Mode or Datorama for building complex dashboards. Datawrappers main focus is crafting clean and nice visualizations itself. You can use standard plots like Bar charts, lines, Pie Carts, Donut Charts etc. They focus on creating unique charts. Every plot is looking awesome out of the box. Datawrapper features several awesome Map plots, too. With a free account, you could use all of them but you are limited in styling. But their default styles and templates are already great and ready to use.

Best feature is, you could link an external data set providing a URL for your data. Of course, I’m using the URL to my dataset at Qri.

Using a URL for your CSV in Datawrapper

Next step is to check & describe your dataset for later use in diagrams

Select fields to be used in your visualization

In my example I’m using two columns: last_update and cases7_per_100k. Now we can create the visualization.

Design and annotate your visualization in Datawrapper

You can refine, annotate and design the visualization. Using the free version, designing is limited to choose a layout. You can choose colors for some parts, like lines or annotations but you couldn’t create visualizations using your companies CI. That’s part of their paid product. For me, the defaults are more than enough.

The best thing about connecting your data using Qri (or any other service):

If your data has been updated, you simply republish your chart and it’s up-to-date!

As I’ve used some annotations like a date and a highlighted range, I need to do some manual work. Everything else is already been there. No need to copy new data, choose field etc.

My final visualization looks like this (yes, Medium can embed Datawrapper visualizations!)

Final visualization embedded in Medium

Conclusion

Collecting data, provide them for other and creating visualizations is easier as it seems. You don’t need to be an expert in Data Science, Data Engineering or Designer. There are many great tools you can use without being a developer. Are you focussing on nice visualizations? Keep an eye on Datawrapper. It’s excellent in helping you to create embeddable diagrams, download them as an image or even allow reusing with their service called River. With Qri you can provide datasets for large audiences. Using their desktop client helps people who dislike the command line interface. You focus at the output instead of the process creating it!

Disclaimer: I’m neither involved with Datawrapper nor with Qri!

[1]: 7-Day Incidence is the number of new infection in seven days per 100.000 residents.
[2]: Data license: Data licence Germany — attribution — Version 2.0 / dl-de/by-2–0

--

--

Tech, Travel, and Life. Apple addict who loves travelling with his camper van and writing about mentioned topics.