Photo: Example of a heatmap showing high-risk areas (Covid Watch/Creative Commons)

[BigDataSur-COVID] Making the virus visible with (responsible) mobile data

Technologies to track COVID-19 spread with data generated by mobile devices bring the discussion about responsible use of data to a global level, creating an opportunity to push for regulatory frameworks.


by Peter Füssy


Although the Covid-19 outbreak is known since December, the first images of the virus were published in late January and more detailed captures came only in mid-February. The reason for this is that the majority of viruses are much smaller than a bacteria and cannot be seen with an ordinary light microscope, only with a more sophisticated and expensive electron microscope.

Another method to make the virus visible is by following the hosts, or rather, the “smart” devices carried by the hosts everywhere. These devices display and generate data, which have been mostly used by corporations to predict and steer human behavior for profit. The same data repurposed can be used to see the movement of the disease at different levels, from populations to individuals, but not without generating other risks to the hosts.

These visualizations are not accurate pictures of the virus, but probabilities converted into coloured areas, dots or maps overlaid with statistics. However, they are powerful tools to support decision-making in medical practices and health policies. In addition, these infographics can create high scale awareness when available to a broader audience: they can show how far the virus is from you, sometimes with staggering precision.

Photo: Transmission electron microscope image of the virus that causes Covid-19 (NIAID-RML/Creative Commons)

Tracking the virus (and people)

In China, for instance, the two largest telecom operators, covering 80% of the market, provided data about the number of mobile phones travelling from Wuhan to other regions before the lockdown. The data was used to estimate a better latent infection ratio (transmission before symptoms) in the pandemic epicenter, and can be visualized in an infographic from The New York Times.

Private mobile carriers are also sharing data with health authorities in countries like Italy, Germany, Austria and Brazil to monitor social distancing measures and detect people agglomerations. In those cases, the visual representation shows concentrations and movements of large groups. Examples that handle much more personal data are not difficult to find though – and are even seen as models to limit the virus spreading.

In Pakistan, some residents of Karachi received text messages saying that they may have come in contact with someone diagnosed with COVID-19. According to local reports, authorities are using call detail records (CDRs) from telecom operators to find phone numbers that were recently near a phone owned by someone infected. In Australia, the police said the “threshold” that legitimizes phone tracking of specific individuals – a practice usually restricted to criminal investigations – have been met in the case of a Chinese couple who allegedly carried the virus from Wuhan to Adelaide.

Photo: Map produced by the Hong Kong Government showing the buildings where possible cases of Covid-19 lived for the past 14 days.

In Taiwan, people under mandatory quarantine are “geofenced” in real-time by their cell phone signal – they cannot even let the battery die risking to have police officers knocking on the door. In the same direction, Hong Kong created a map of the city where you can see in which buildings the virus have been spotted along with the infected person’s gender and age. In those cases, each SIM card represents a human being who is a possible host for the virus.

The list of states adopting digital surveillance methods to picture where the virus is could continue for several paragraphs. Those methods for visualizing the virus are shaping actions to “flatten the curve”, controlling citizens in quarantine, detecting social distancing levels and performing contact tracing.

As the emergency of the situation calls for, the arrangements for the data sharing are done fast and, in most cases, with lack of transparency for users and public, which raises questions about future consequences on privacy and freedom. To prevent that, researchers, activists, institutions and NGOs have been building on past experiences to come up with guidelines for responsible practices.

Data sharing and epidemics

The role of visualizations during pandemics have grown in importance since British physician John Snow plotted fatal cholera cases into a map of London’s Soho in 1854 and assumed the source of the disease were the water pumps. During the twentieth century, the introduction of computer-assisted analysis and development of tools for visualizations further increased the reliance on data and graphics to prevent and control infections. More recently, the popularization of mobile phones allowed a new range of possibilities to predict or track the spread of diseases.

Photo: John Snow’s map of cholera in London (Public domain)

Given the sensitive and commercially valuable data that cell phones generate, including call and location history through triangulation of cell towers position, mobile operators are generally reluctant to share this information. Nevertheless, exceptions have been made in research and humanitarian responses during outbreaks of dengue, cholera, malaria and ebola in low- and middle-income countries at the beginning of the decade.

Similarly to the current COVID-19 crisis, in those cases, the perceived benefits from the data sharing have overshadowed potential risks, uncovering an optimistic perspective of technological solutions. As McDonald argues, the push for experimenting with mobile data for contact tracing in Liberia overlooked data protection laws, undermined the coordination ability of local actors and raised tensions with the government, resulting in “big data disaster”.

Potential risks

From the medical viewpoint, call detail records (CDRs) and contact tracing can be somehow helpful to control the virus, but data alone does not solve anything and after shared data do not disappear with the virus, putting in risk privacy and human rights. Even when the data is aggregated and anonymized, researchers have demonstrated that four points (e.g., home address, workplace address, or geo-localized tweets or pictures) are enough to identify 95% of the individuals. If used to political interest, it is possible to track separatists, migrants or dissidents.

Personal data sharing puts in evidence not only the disease but the private life of the host. For corporations, sharing data is not only a matter of doing public good but a decision that is based also on economic and strategic issues. Once data is out, it can be misused by other companies, for instance, private health insurance can determine or deny coverage combining different datasets. This is why we need ethical sharing of data to protect citizens from unintended outcomes of well-meaning initiatives.

In this direction, the Responsible Data community is one of those groups trying to draw a global framework that includes social, legal, privacy and economical-related issues. According to the collective, projects that involve data sharing should consider power dynamics (how the least powerful actors are affected?), have a strong justification, care not just about speed, include diverse perspectives and re-evaluations to understand possible harms and create checks and balances to alert for unexpected effects.

Global opportunity

So far, economic interests, political contexts, local legislations and cultural differences have been a challenge to implement responsible guidelines for data sharing. As a global issue, the COVID-19 crisis could be an opportunity to pressure for regulatory frameworks. In this regard, more groups have come forward with suggestions for technical and social evaluations of actions to track the virus.

The EU Commission recommended a coordinated approach for the use of tracing apps which includes an intermediary (the Joint Research Centre) for processing and storing data from European mobile operators as long as the COVID-19 crisis is ongoing. In addition, the institution that represents the interest of mobile operators worldwide (GSMA) released a privacy guide to deal with data requests. The document recommends operators to proactively implement best practices, encompassing transparency to the public, the prohibition of re-identification of individuals, limits to the scope of use and accountability.

In the United States, the American Civil Liberties Union (ACLU) released a white paper which discusses the limits and the effectiveness of location tracking in epidemics, while another group of researchers suggests that an open-source app and crowdsourcing the data with user consent might the most effective way to slow the spread of the virus with the publication of less sensitive data.

The inclusion of big tech companies like Apple and Google certainly brings another dimension to the discussion, raising both the capacity of development and the amount of data collected but also the stakes in place if something goes wrong. In a partnership, Apple and Google announced a system for contact tracing using Bluetooth which will be embedded in the operational systems (iOS and Android) and promises to keep data anonymized.

The responsible data debate has never gained so much attention than now. It is time for legal and regulatory frameworks “to caught up to the real-world effects of data and technology” (Responsible Data). It is also a time when mistakes can have global consequences.

About the author

Peter is a journalist trying to explore new media in depth, from everyday digital practices to the undesired consequences of a highly connected environment. After more than 10 years of reporting for the most relevant digital outlets in Brazil, he is now second years Research Master’s student in Media Studies at the University of Amsterdam.