Source: Brasil.IO

[BigDataSur-COVID] Liberating COVID-19 data with volunteers in Brazil

While the government limits access to information, data activists are consolidating and structuring COVID-19 data for open access.


by Peter Füssy

While healthcare workers fight the coronavirus pandemic with drugs and ventilators, journalists and data activists try to tackle the infodemic with numbers and visualizations. In Brazil, it is difficult to tell who is winning those battles as the death toll continues to rise and president Jair Bolsonaro “continues to sow confusion by openly flouting and discouraging the sensible measures of physical distancing and lockdown”, according to an editorial from the journal The Lancet. Mixing a historical lack of capacity and an attempt to control the narrative of the crisis, the federal government provides poor quality numbers about COVID-19, reinforcing the fact that beyond producing knowledge to deal with a problem, data (or the lack of data) “can also shape perceived realities” as Renzi and Langlois claim.

As a result of federal administration decisions, Brazil is one of the countries that test less for the virus in relation to the population and has a peculiar COVID-19 dashboard that shows first the number of recovered cases with larger font sizes, then new cases and deaths with smaller fonts. For one week in June, the total toll of cases and deaths was completely removed from the dashboard to reappear after intense criticism. The Brazilian government also cancelled the daily press conferences and started to release pandemic reports after the most-watched TV news program in the country. All of these decisions are seen by the media and independent organizations as authoritarian, insensitive attempts to make COVID-19 deaths invisible.

Photo credits:

Brazil’s Ministry of Health dashboard highlights recovered cases estimation (Source:

Beyond that, the Ministry of Health breaks numbers only by states (in Brazil, one state can be as large as France, Spain and Sweden combined), which means that the local dimensions of the problem are mostly ignored. The government had promised information by cities but never delivered and, to make it worse, switched the total report format from an open format (CSV) to a closed one (Microsoft Excel) in the middle of the pandemic. As I discussed previously in this blog post, Bolsonaro’s regime has been limiting access to information since the beginning. The institutional resistance to transparency has only become more evident with the health crisis.

In response to that, during the pandemic, data activism assumed governmental functions providing the numbers to substantiate decisions on a variety of levels, from NGOs (1, 2) to policymakers (1, 2) which use open data from activist group Brasil.IO. Trying to reduce the “data gap” characteristic from countries of the Global South, data activists and other organizations from civil society are collecting and structuring COVID-19 data. Besides Brasil.IO, journalists from six major newspapers and news portals are working together to provide independent total numbers of COVID-19 deaths and confirmed cases, while a data intelligence consultant is crowdfunding another COVID-19 monitor. Initiatives like those work with primary sources, allowing news production and research to be less dependent on problematic federal reports.

In the case of Brazil, more than making people visible and represented through the concept of data justice or advocating for social change, data activism is essential to challenge the state narrative about the pandemic and to prevent more deaths from COVID-19. If it depends on data activism and data journalism, Brazilian democracy will not die in the dark.

Brasil.IO case

One group of volunteers taking over the Brazilian government responsibilities is part of the Brasil.IO initiative. Their COVID-19 project includes data from more than 5.500 municipalities and other sources, such as notaries, making sub notification of cases and deaths visible. The data have been used by major newspapers and news broadcasters, including The New York Times, CNN and BBC. Scientific research is also relying on their data to produce comparative studies and forecasting.

Figures from the project indicate not only underreporting but also the delay in the official balances of COVID-19 deaths. For instance, the Ministry of Health announced that the country had reached a thousand deaths on April 11, while Brasil.IO’s platform pointed to the same number six days earlier. If undercounting promotes the discourse that minimizes the health crisis, data activism can liberate the numbers to public scrutiny.

Social media call and operationalization

Seeing the lack of structured data about COVID-19, founder of the non-profit organization Brasil.IO, Álvaro Justen, tweeted a call for volunteers to help in collecting data manually from all the 27 federative units on March 20. Rapidly, 34 volunteers answered the tweet (most of them are data journalists or software developers). “Fortunately, I have several friends and contacts who work with journalism and data and it was not difficult to find volunteers”, said Justen in an interview via email.

Photo credits: @turicas/Twitter

The group spent one whole weekend manually tabulating hundreds of epidemiological bulletins from state health departments since the beginning of the pandemic. Because of the urgency, they started with Google Spreadsheets to consolidate data. After the first round, the spreadsheets with the most recent numbers started to feed directly into the reformulated Brasil.IO platform, which uses Python, Django and PostgreSQL in the backend.

All communication is made with an open-source chat platform (Rocket.Chat), while publicization of updates and new insights appear on Twitter and in a Telegram group. Scripts for automated processes, such as scraping, monitoring, checking data, generating internal reports, and consolidating data are available at GitHub. For example, the group uses a robot to send notifications to the chat when one State Health Secretariat updates COVID-19 numbers.

Improving data quality

Due to the issues with data quality, a considerable part of the data collection is made manually by the volunteers. Besides collecting and checking data, volunteers also carry out the task of contacting health secretariats to recommend good data practices and ask them to make changes, so the data is more accessible via automated processes. “Not all respond satisfactorily but most of them are willing to collaborate in some way. With time and pressure, some things are improving but not in speed that a pandemic requires”, Justen points out.

In order to create an open data culture in Brazil, Justen has worked in databases and improved tools to facilitate data extracting from inaccessible formats since 2013. One emblematic example is a dataset that includes more than 500 thousand companies, and its shareholders, registered at the Brazilian Internal Revenue Service (Receita Federal). After the 2011’s Freedom of Information (FOI) act, that information should be publicly available and accessible to be read by robots. However, the page hosting the data used captchas to limit access. After several requests via FOI in 2018, the demand was denied with a link to a system that sold the data for R$ 506,000. Justen and other data activists pressured the IRS, which finally sent the dataset in a USB drive.

As everything is done on a voluntary basis and the data are available free of charge for everyone, one of the challenges of the project is about financial funding. “We don’t have that much time to work on the project and, therefore not everything advances at the speed we would like”, tells Justen. To help with this issue, they started a crowdfunding campaign to hire developers, making it possible to add new datasets that can be useful to “flatten the curve” of coronavirus in Brazil.

Brasil.IO’s manifesto defines the process of collecting, converting, cleaning and making data available in a structured and open format as ‘data liberation’. As stated in the manifesto, “liberating” access to public data is to make democracy less elitist. However, in the exceptional circumstances of the COVID-19 response in Brazil, liberating data seems to be fundamental to keep democracy and save lives.