Over the past six months, Vox Media has hired multiple embedded engineers for the sole purpose of building editorial content on each vertical. We hope that you’ve noticed an increasing amount of interactive data visualization on our sites. In keeping with the data journalism manifesto penned by Melissa Bell earlier this year, we’re excited to announce that we have created a GitHub repository for the storage of miscellaneous data used in interactives across our verticals. We’re not the first to take this route toward transparency, but we aim to encourage an open and communicative approach to our visualization process from here on out.
The first project in our repo concerns Sarah Kliff's story on medical harm at Vox.com. The original dataset includes hospital scores for multiple types of infections. We narrowed it down to central lines, on which Sarah focused her piece. We also gathered data on the hospitals themselves to provide context. Finally, we cobbled together some shell scripts using csvkit and Python to automate the process of generating the final dataset for production.
We’re still discussing loose guidelines for how we should use the repo. Do you have any ideas for how we should make our data more transparent and accessible? Get in touch with us at firstname.lastname@example.org or drop us a note in the issues.