Clara Wang
  • About
  • Blog
  • Portfolio
  • Favorites

Protocol China Data Analyses and Articles

data visualization
research
I produced several data analyses and reports as the Lead Data Scientist and Researcher for the Protocol China team.
Author

Clara Wang

Published

January 27, 2021

Image source here; data visualization created by Clara Wang (me)

From January 2021 to November 2021, I was the Lead Data Scientist and Researcher for the Protocol China team. Protocol was a tech-focused news site from the publisher of POLITICO, and it was acquired by Axel Springer in October 2021. Protocol ended publication in November 2022 after about three years of operation.

The Protocol China team published a weekly newsletter on China’s tech space, and I was hired to provide a data-driven look at China’s tech industry by creating a client-facing data intelligence platform, along with data-driven articles and reports. For the launch of Protocol China on January 27, 2021, I wrote an analytical piece on China’s TC260 (AKA the National Information Security Standardization Technical Committee), which issues technological standards related to information security and cybersecurity. I collected, cleaned, and analyzed all the data, then translated the information into English and wrote the article in the span of 2.5 weeks after starting my role at Protocol. I created all the data visualizations using R and ggplot2, and I made a custom R package based off the Protocol style guide. I also contacted and interviewed several experts for this article. An archived version of my launch article can be found here.

After the launch of Protocol China, I contributed a blurb to the newsletter every week and pivoted to creating a subscription data intelligence platform that would help government affairs practitioners and investors understand China’s tech industry. I made the platform using Shiny, and I used AWS to run scrapers and data pipelines, as well as store and manage our data assets. I did most of the setup and maintenance of our data infrastructure, and I completed a working prototype of the platform in five months. To create the platform, I identified different publicly-available data sources (most of them in Chinese), wrote scrapers for these platforms, set up data pipelines, then cleaned, translated, and visualized the data. In addition to creating the dashboard and updating the data on a regular basis, I also wrote two different English-language research reports every month based off the Chinese-language data. It was a tremendous amount of work. Luckily, I had some very talented folks helping me out with the technical work – my intern, Theo Lebryk; Protocol’s jack-of-all-trades product person, Shakeel Hashim; and later our new data engineer, Eric Blom. A screenshot of the data intelligence platform was featured on Protocol’s “About” page; it is shown on the screen of the laptop image.

Image source here

When time allowed, I also collaborated with Protocol China’s journalists to create data journalism pieces. I worked with Shen Lu to write an article on tech IPOs on Shanghai’s Star Market, and an archived version of the article can be found here. I scraped, cleaned, analyzed, and translated the data, and I used ggplot2 to create the data visualizations for the piece. One of the plots I am particularly proud of can be seen below.

Image source here; data visualization created by Clara Wang (me)