Research /

Data Voids


Our team focused on exploring the impact of data voids in the face of breaking news events, where a surge of increased interest in a topic could be exploited by malicious actors to spread harmful content, toxic speech, or other forms of mis/dis/malinformation. Originally suggested by a brief paper from Data & Society, we hoped to add more rigor to the theory by mapping the lifecycle of a few real life case studies of data voids “in the wild” from search engines and Wikipedia topic pages, and suggest potential intervention opportunities.


The team wrote a white paper describing the typology of data voids, a harms framework for how to evaluate when data voids were more likely to cause societal harm, and analyzed traffic data from search queries and Wikipedia page edits to gauge the level of engagement with a data void based on a few case studies. We presented the final paper as an interactive webpage, which I coded and designed.

Harms Framework

This draws inspiration from other attempts to define harmful speech, ranging from legal exceptions to free speech protections (e.g. public endangerment oft-cited "shouting fire in a crowded theater" or inciting "imminent lawless action"4) to the Dangerous Speech Project, which identifies a subset of hate speech that has greater potential to incite violence.5 Notably, the question of "why" is absent from this framework, as intent is notoriously difficult to assess online. Search results are filtered through an opaque algorithm ranking, making coordinated manipulation attempts even more difficult to spot.

Data void lifecycles

We plotted the peak week of search activity for each term (in red) and layered in the specific times that authoritative media articles entered the discussion (in blue) and specific times that edits were made to relevant Wikipedia articles (in yellow). With this data we have a timeline of when credible news sources posted about a term, relative to when searches for that data void were spiking.