Category: show on blog page

Niels’ research featured in the New York Times

TL;DR: have a look at the piece in the New York Times that covers Niels’ work.

During the research Niels did for datactive, which culminated in his thesis and a recent paper in New Media & Society, he actively participated in the Internet Engineering Taskforce (IETF). The IETF is one of the main standards and governance bodies of the Internet. While working there Niels’ worked together with others such as Mallory Knodel and Corinne Cath, on addressing exclusionary language in technical standards. An important part of that work was publishing this document, which sparked an extensive discussion in the IETF that up to today has not been resolved. You can read more about it in the New York Times piece.

BigBang Sprint at IETF110 Hackathon

When: March 1-3, 2021

The BigBang project will be working on improving its tool for mailinglist analysis at the IETF 110 Hackathon.

BigBang is an open source research project that studies collaboration and contention in digital infrastructure projects and governance institutions. We do this by combining data science techniques with qualitative methods. For example, with BigBang you can analyze participation, affiliation, gender, and networks in the IETF, ICANN, RIPE, IEEE, or the 3GPP.

We very much welcome both techncial and non-technical contributors! BigBang is built on the scientific Python stack, and we use Jupyter notebooks to make the analysis transparent and accessible.

To join the IETF 110 Hackathon, please register using the link from the Hackathon website. Registration is free!

We intend to work on (some of) the following issues during the hackathon:

– Integration and analysis of 3GPP and IEEE mailing lists
– Integration with the INDELab conversationkg tool
– Produce instructional videos
– Improve linking across datasets (such as the datatracker and mailing lists)
– Query/notebook design to support projects from research community
– Discussion of Star’s boundary object vs. Luhmann’s structural coupling
– The operationalization of _your_ research question!

The BigBang project will have a one-hour team meeting Friday February 26 – 9:00 ET / 14:00 GMT / 15:00 CET before the Hackathon which all are welcome to attend if they are curious about the project. You can join via this link:

Please don’t hesitate to write Seb (sbenthall at gmail dot com) if you have any questions about the BigBang project or the IETF 110 sprint, or if you have suggestions for research questions!


DATACTIVE protests lack of ethical review in the UvA-Huawei collaboration

DATACTIVE, together with Bits of Freedom, the Data Justice Project and many individual scientists, signed the Funding Matters statement that protests the collaboration of a project at the University of Amsterdam and the Vrije University with Huawei. While collaboration with companies is not problematic per se, it is important that such collaborations undergo careful ethical scrutiny. Standards for such structural reviews of the societal impact of such collaborations are currently not in place. Huawei has been accused of collaborating with the Chinese government in human rights violations against the Uyghur people as well as facilitating surveillance in Uganda.

[blogpost] Teaching Students to Question Mr. Robot: Working to Prevent Algorithmic Bias in Educational Artificial Intelligence

Author: Erinne Paisley


With the onset of the COVID-19 pandemic, classrooms around the world have moved online. Students from K-12, as well as University-level, are turning to their computers to stay connected to teachers and progress their education. This move online raises questions of the appropriateness of technologies in the classroom and how close to a utopian “Mr. Robot” we can, or should, get. One of the most contested technological uses in the classroom is the adoption of Artificial Intelligence (AI) to teach.

AI in Education

AI includes many practices that process information in a similar way to humans processing of information. Human intelligence is not one-dimensional and neither is AI, meaning AI includes many different techniques and addresses a multitude of tasks. Two of the main AI techniques that have been adapted into potential educational AI are: automation and adaptive learning. Automation means computers are pre-programmed to complete tasks without the input of a human. Adaptive learning indicates that these automated systems can adjust themselves based on use and become more personalized.

The potential of combining these AI techniques into some type of robot teacher, or “Mr. Robot” sounds like something out of a sci-fi cartoon but it is already a reality for some. A combination of these AI techniques have already been used to assessing students’ prior and ongoing learning levels, placing students in appropriate subject levels, scheduling classes, and individualizing instructions. In the Mississippi Department of Education, in the United States, a shortage of teachers has been addressed through the use of an AI-powered online learning program called “Edgenuity”. This program automates lesson plans following the format of: warm-up, instruction, summary, assignment, and quiz.

A screenshot from an Edgenuity lesson plan.

Despite how utopian an AI-powered classroom may sound, there are some significant issues of inequality and social injustice that are supported by these technologies. In September 2019, the United Nations Education, Scientific and Cultural Organization (UNESCO), along with a number of other organizations, hosted a conference titled: “Where Does Artificial Intelligence Fit in the Classroom?” that explored these issues. One of the main concerns raised was: algorithmic bias.

Algorithmic Bias in Education AI

The mainstream attitude towards AI is still one of faith – faith that these technologies are, as the name says, intelligent as well as objective. However, AI bias illustrates how these new technologies are very far from neutral. Joy Buolamwini explains how AI can be biased, explaining that the biases, conscious or subconscious, present in those who create the code is then a part of the digital systems themselves. This creates systems especially skewed against people of colour, women, and other minorities who are not statistically as included in the process of creating these codes, including AI codes. For instance, the latest AI application pool for Stanford University in the United States was 71% male.

Joy Buolamwini’s Tedx talk on algorithmic bias.

In the educational sector, people of colour, girls, and other minorities are already marginalized. Because of this, there is the concern that AI in the classroom that has encoded biases would further these inequalities. For instance, trapping low-income and minority students into low-achievement tracks. This would create a cycle of poverty, supported by this educational framework, instead of having human teachers address students on an individual level and offer specialized support and attention to those facing adversity.

However, the educational field already has its own biases embedded in it – both within individual teachers and throughout the system more generally. Viewed in this way, the increased use of AI in the classrooms creates the opportunity to create less bias if designed in a way that directly aims to address these issues. The work of designing AI that addresses and aims to create progressive technologies has been taken on by teachers, librarians, students or anyone in-between.

Learning to Fight Algorithmic Bias

By including more voices and perspectives in the process of creating the coding AI technologies, algorithmic bias can be prevented and, instead, a technological system that supports a socially just classroom can be supported. In this final section, I will highlight two pre-existing educational projects aimed at teaching students of all ages to identify and fight algorithmic bias while creating technology that creates a more equal classroom.

Algorithmic Bias Lesson Plans

The use of AI to create a more socially just educational system can start in the classroom, as Blakeley Payne showed when she ran a week-long ethics in AI course for 10 to 14-year-olds in 2019. The course included lessons on creating open-source coding, AI ethics, and ultimately taught students to both understand and fight against algorithmic bias. The lessons plans themselves are available for free online for any classroom to incorporate into their own lesson plans – even from home.

Students learn how to identify algorithm bias during the one-week course.

Blakeley Payne’s one-week program focuses on ages 10-14 to encourage students to become interested and passionate about issues of algorithm bias, and the STEM field more broadly, from a young age. Students work on simple activities such as writing an algorithm for the “best peanut butter and jelly sandwich” in order to practice questioning algorithmic bias. This activity in particular has them question what “best” means? Does it mean best looking? Best tasting? Who decides what this means and what are the implications of this?

Non-profits such as Girls Who Code are also actively working to design lesson plans and activities for young audiences that teach critical thinking and design when it comes to algorithms, including those creating AI. The organization runs after school clubs for girls in grades 3-12, as well as college programs for alumni of the program, as well as summer intensives. Their programs focus technically on developing coding skills but also have a large focus on diversifying the STEM fields and creating equitable coding.


The future of AI in the classroom is inevitable. This may not mean every teacher becomes robotic, but the use of AI and other technologies in the educational field is already happening. Although this raises concerns about algorithmic bias in the education system, it also creates more opportunities to re-think how technologies can be used to create a more socially just educational system. As we have seen through existing educational programs that teach algorithmic bias, even at the kindergarten age, interest in learning, questioning, and re-thinking algorithms can easily be nurtured. The answer to how we create more socially just educational system through AI is simple: just ask the students.


About the Author

Erinne Paisley is a current Research Media Masters student at the University of Amsterdam and completed her BA at the University of Toronto in Peace, Conflict and Justice & Book and Media Studies. She is the author of three books on social media activism for youth with Orca Book Publishing.

[BigDataSur] Data activism in action: The gigantic size and impact of China’s fishing fleet revealed using big data analytics and algorithms

Author: Miren Gutiérrez

As we grapple with the COVID-19 pandemic, another crisis looms over our future: overfishing. Fishing fleets and unsustainable practices have been emptying the oceans of fish and filling them with plastic. Although other countries are also responsible for overfishing, China has a greater responsibility. Why is looking at the Chinese fleet important? China has almost 17.00 vessels capable of distant water fishing, as reveals for the first time an investigative report published by Overseas Development Insitute, Londres.


As part of a team of researchers at the Overseas Development Institute, London, I had access to the world’s largest database of fishing vessels. Combining these data with satellite data from the automatic identification system –which indicates their movements—, we were able to observe their behaviour for two years (2017 and 2018). To do this, we employed big data analytical techniques, machine learning algorithms, and geographic information systems to describe the fleet and analyze how it behaved.

And the first thing we noticed is that China’s fishing fleet is five to eight times larger than any previous estimation. We identified a total of 16,966 Chinese fishing vessels able to fish in “distant waters”, that is, outside its exclusive economic zone, including some 12,500 vessels observed outside Chinese waters during the same period.

Why is this important? If China’s DWF fleet is 5-8 times larger than previous estimates, its impacts are inevitably more significant than previously estimated. This is important for two reasons. First, because millions of people in coastal areas of developing countries depend on fishery resources for their subsistence and food security. Second, due to this extraordinary increase, it is difficult to monitor and control fishing activities in distant waters of China.

The other thing that we observe is the most frequent type of fishing vessel is the trawler. Most of these Chinese trawlers can practice bottom trawling, which is the most damaging fishing technique available. We identified some 1,800 Chinese trawlers, which are more than double what was previously thought.

Furthermore, only 148 Chinese ships were registered in countries commonly regarded as flags of convenience. This shows that the incentives to adopt flags of convenience are few given the relatively lax regulation of the Chinese authorities.

Finally, of the nearly 1,000 registered vessels outside of China, more than half have African flags, especially in west Africa, where law enforcement is limited and fishing rights are often limited to registered vessels in the country, which explains why these Chinese ships have adopted local flags.

What can be said about the ownership of these fishing vessels? It is very complex. We analyzed a subsample of approximately 6,100 vessels to discover that only eight companies owned or operated more than 50 vessels each. That is, there are very few large Chinese fishing companies since small or medium-sized companies own most of them. However, this is only a facade, as many of these companies appear to be subsidiaries of larger corporations, suggesting some form of more centralized control. The lack of transparency hampers monitoring efforts and attempts to hold those responsible for malpractice accountable.

But another exciting facet of the ownership structure is that half of the 183 vessels suspected of involvement in illegal, unreported or unregulated fishing are owned by a handful of companies, and also that several of them are parastatal. This means that focusing on them could solve many problems because these companies own other ships.

There has been an extraordinary boom in Chinese fishing activities that is difficult to control. Chinese companies are free to operate and negotiate their access to fisheries in coastal states of developing countries without being monitored, especially in West Africa. This laxity contrasts with the policy of the European Union to reduce its fishing fleet and exercise greater control over its global operations.

This report is a data activist project that aims at redressing the unfair situation for nations, especially in west Africa, that cannot monitor and police their waters.


This is a version of an op-ed published in Spanish by


About the author: Miren Gutiérrez is passionate about human rights, journalism and the environment (with a weakness for fish), and optimistic about what can be done with data-based research, knowledge and communication. Prof. at the University of Deusto and Research Associate at the Overseas Development Institute. Miren is Research Associate @DATACTIVE.


[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [3/3]


Author: Katherine Reilly, Simon Fraser University, School of Communication

Data Stewardship through Citizen Centered Data Audits

In my previous two posts (the first & the second), I talked about the nature of data audits, and how they might be applied by citizens. Audits, I explained, check whether people are carrying out practices according to established standards or criteria with the goal of ensuring effective use of resources. As citizens we have many tools available at our disposal to audit companies, but when we audit companies according to their criteria, then we risk losing sight of our own needs in the community. The question addressed by this post is how to do data audits from a citizen point of view.

Thinking about data as a resource is a first step in changing our perspective on data audits. Our current data regime is an extractive data regime. As I explained in my first post, in the current regime, governments accept the central audit criteria of businesses, and on top of this, they establish the minimal protections necessary to ensure a steady flow of personal data to those same corporate actors.

I would like to suggest that we rethink our data regime in terms of data stewardship. The term ‘stewardship’ is usually applied to the natural environment. A forest might be governed by a stewardship plan which lays out the rights and responsibilities of resource use. Stewardship implies a plan for the management of those resources, both so that they can be sustained, and also so that everyone can enjoy them.

If the raw material produced by the data forest is our personal information, then we are the trees, and we are being harvested. Our data stewardship regime is organized to support that process, and audits are the means to enforce it. The main beneficiaries of the current data stewardship regime are companies who harvest and process our data. Our own benefits – our right to walk through the forest and enjoy the birds, or our right to profit from the forest materially – are not contemplated in the current stewardship regime.

It is tempting to conclude that audits are to blame, but really, evaluation is an agnostic concept. What matters are the criteria – the standards to which we hold corporate actors. If we change the standards of the data regime, then we change the system. We can introduce principles of stewardship that reflect the needs of community members. To do this, we need to start from the audit criteria that represent the localized concerns of situated peoples.

To this end, I have started a new project in collaboration with 5 fellow data justice organizations in 5 countries in Latin America: HiperDerecho in Chile, Karisma in Colombia, TEDIC in Paraguay, HiperDerecho in Peru and ObservaTIC in Uruguay. We will also enjoy the technical support of Sula Batsu in Costa Rica.

Our focus will be on identifying alternative starting points for data audits. We won’t start from the law, or the technology, or corporate policy. Instead, we will start from people’s lived experiences, and use these as a basis to establish criteria for auditing corporate use of personal data.

We will work with small groups who share a common identity and/or experience, and who are directly affected by corporate use of their personal data. For example, people with chronic health issues have a stake in how personal data, loyalty programs and platform delivery services mediate their relationship with pharmacies and pharmaceutical companies. The project will identify community collaborators who are interested in working with us to establish alternative criteria for evaluating those companies.

Our emerging methodology will use a funnel-like approach, starting from broad discussions about the nature of data, passing through explorations of personal practices and the role of data in them, and then landing on more specific and detailed explorations of specific moments or processes in which people share their personal data.

Once the group has learned something about the reality of data in their daily lives – and in particular the instances where data is of particular concern from them – we will facilitate group activities that help them identify their data needs, as well as the behaviors that would satisfy those needs. An example of a data need might be “I need to feel valued as a person and as woman when I interact with the pharmacy.” A statement of how that need might be satisfied could be, for example, “I would feel more valued as a person and as a woman if the company changed its data collection categories.”

We are particularly interested to think through the application of community criteria to companies who have grown in power and influence during the Covid-19 pandemic. Companies like InstaCart, SkipTheDishes, Rapi, Zoom, and Amazon are uniquely empowered to control urban distribution chains that affect the welfare of millions. What do community members require from these companies in terms of their data practices, and how would they fare against an audit based on those criteria?

We find inspiration for alternative audit criteria in data advocacy projects that have been covered by DATACTIVE’s Big Data from the South Blog. For example, the First Nations Information Governance Centre (FNIGC) of Canada has established the principles of ownership, control, access and permission for the management of First Nations data, and New Zealand has adopted Maori knowledge protocols for information systems used in primary health care provision (as reported by Anna Carlson). Meanwhile, the Mexican organization Controla tu Gobierno argues that we need to view data “less as a commodity – which is the narrative that constantly tries to make us understand data as the new oil – and more as a source of meaning” (Guillen Torres and Mayli Sepulveda, 2017).

From examples like these, and given the concept of data stewardship, we can begin to see that data is only as valuable as the criteria used to assess it, and so we urgently need alternative criteria that reflect the desires, needs and rights of communities.

How would corporate actors fare in an audit based on these alternative criteria? How would such a process reposition the value of data within the community? Who should carry out these evaluative processes, and how can they work together to create a more equitable data stewardship regime that better serves the needs of communities?

By answering these questions, we can move past creating data literate subjects for the existing data stewardship regime. Instead, we can open space for discussion about how we actually want our data resources to be used. In a recent Guardian piece, Hare argued that “The GDPR protects data. To protect people, we need a bill of rights, one that protects our civil liberties in the age of AI.”2 The content of that bill of rights requires careful contemplation. Citizen data audits allow us to think creatively about how data stewardship regimes can serve the needs of communities, and from there we can build out the legal frameworks to protect those rights.


About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

WomenonWeb censored in Spain as reported by Magma

Author: Vasilis Ververis

The Magma project just published new research on censorship concerning, a non-profit organization providing support to women and pregnant people. The article describes how the major ISPs in Spain are blocking’s website. Spanish ISPs have been blocking this website by means of DNS manipulation, TCP reset, HTTP blocking with the use of a Deep Packet Inspection (DPI) infrastructure. Our data analysis is based on network measurements from OONI data. This is the first time that we observe Women on Web being blocked in Spain.

About Magma: Magma aims to build a scalable, reproducible, standard methodology on measuring, documenting and circumventing internet censorship, information controls, internet blackouts and surveillance in a way that will be streamlined and used in practice by researchers, front-line activists, field-workers, human rights defenders, organizations and journalists.

About the author: Vasilis Ververis is a research associate with DATACTIVE and a practitioner of the principles ~ undo / rebuild ~ the current centralization model of the internet. Their research deals with internet censorship and investigation of collateral damage via information controls and surveillance. Some recent affiliations: Humboldt-Universität zu Berlin, Germany; Universidade Estadual do Piaui, Brazil; University Institute of Lisbon, Portugal.

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [2/3]


A First Attempt at Citizen Data Audits

Author: Katherine Reilly, Simon Fraser University, School of Communication

In the first post in this series, I explained that audits are used to check whether people are carrying out practices according to established standards or criteria. They are meant to ensure effective use of resources. Corporations audit their internal processes to make sure that they comply with corporate policy, while governments audit corporations to make sure that they comply with the law.

There is no reason why citizens or watchdogs can’t carry out audits as well. In fact, data privacy laws include some interesting frameworks that can facilitate this type of work. In particular, the EU’s General Data Privacy Regulation (GDPR) gives you the right to know how corporations are using your personal data, and also the ability to access the personal data that companies hold about you. This right is reproduced in the privacy legislation of many countries around the world from Canada and Chile to Costa Rica and Peru, to name just a few.

With this in mind, several years ago the Citizen Lab at the University of Toronto set up a website called Access My Info which helps people access the personal data that companies hold about them. Access My Info was set up as an experiment, so the site only includes a fixed roster of Canadian telecommunications companies, fitness trackers, and dating apps. It walks users through the process of submitting a personal data request to one of these companies, and then tracks whether the companies respond. The goal of this project was to crowdsource insights from citizens that would help researchers learn what companies know about their clients, how companies manage personal data, and who companies share data with. The results of this work have been used to advocate for changes to digital privacy laws.

Using this model as a starting point, in 2019, my team at SFU, and a team from the Peruvian digital rights advocate HiperDerecho, set up a website called SonMisDatos (Son Mis Datos translates as “It’s My Data”.) Son Mis Datos riffed on the open source platform developed by Access My Info, but made several important modifications. In particular, HiperDerecho’s Director, Miguel Morachimo, made the site database-driven so that it was easier to update the roster of corporate actors or their contact details. Miguel also decided to focus on companies that have a more direct material impact on the daily lives of Peruvians – such as gas stations, grocery stores and pharmacies. These companies have loyalty programs that are involved in collecting personal data about users.

Then we took things one step further. We used SonMisDatos to organize citizen data audits of Peruvian companies. HiperDerecho mobilized a team of people who work on digital rights in Peru, and we brought them together at two workshops. At the first workshop, we taught participants about their rights under Peru’s personal data protection laws, introduced SonMisDatos, and asked everyone to use the site to ask companies for access to their personal data. Companies need time to fulfill those requests, so then we waited for two months. At our second workshop, participants reported back on the results of their data requests, and then I shared a series of techniques for auditing companies on the basis of the personal data people had been able to access.

Our audit techniques explored the quality of the data provided, corporate compliance with data laws, how responsive companies were to data requests, the quality of their informed consent process, and several other factors. My favorite audit technique reflected a special feature of the data protection laws of Peru. In that country, companies are required to register databases of personal information with a state entity. The registry, which is published online, includes lists of companies, the titles of their databases, as well as the categories of data collected by each database. (The government does not collect the contents of the databases, it only registers their existence.)

With this information, our auditors were able to verify whether the data they got back from corporate actors was complete and accurate. In one case, the registry told us that a pharmaceutical company was collecting data about whether clients had children. However, in response to an access request, the company only provided lists of purchases organized by date, skew number, quantity and price. Our auditors were really bothered by this discovery, because it suggested that the company was making inferences about clients without telling them. Participants wondered how the company was using these inferences, and whether it might affect pricing, customer experience, access to coupons, or the like.

In another case, one of our auditors subscribed to DirecTV. To complete this process, he needed to provide his cell phone number plus his national ID number. He later realized that he had accidentally typed in the wrong ID number, because he began receiving cell phone spam addressed to another person. This was exciting, because it allowed us to learn which companies were buying personal data from DirecTV. It also demonstrated that DirecTV was doing a poor job of managing their customer’s privacy and security! However, during the audit we also looked back at DirecTV’s terms of service. We discovered that they were completely up front about their intention to sell personal information to advertisers. Our auditors were sheepish about not reading the terms of the deal, but they also felt it was wrong that they had no option but to accept these terms if they wanted to access the service.

On the basis of this experience, we wrote a guidebook that explains how to use Son Mis Datos, and how to carry about an audit on the basis of the ‘access’ provisions in personal data laws. The guide helps users think through questions like: Is the data complete, precise, unmodified, timely, accessible, machine-readable, non-discriminatory, and free? Has this company respected your data rights? What does the company’s response to your data request suggest about its data use and data management practices?

We learned a tonne from realizing these audits! We know, for instance, that the more specific the request, the more data a company provides. If you ask a company for “all of the personal data you hold about me” you will get less data that if you ask for “all of my personal information, all of my IP data, all of my mousing behaviour data, all of my transaction data, etc.”

Our experiments with citizen data audits also allow us to make claims about how companies define the term “personal data.” Often companies define personal data very narrowly to mean registration information (name, address, phone number, identification number, etc.). This lies in extreme contrast to the academic definition of personal data, which is any information that can lead to the identification of an individual person. In the age of big data, that means pretty much any digital traces you produce while logged in. Observations like these allow us to open up larger discussions about corporate data use practices, which helps to build citizen data literacy.

However, we were disappointed to discover that our citizen data audits worked to validate a data regime that is organized around the expropriation of resources from our communities. In my first blog post I explained that the 5 criteria driving data audits are profitability, risk, consent, security and privacy.

Since our audit originated with the law, with technology, and with corporate practices, we ended up using the audit criteria established by businesses and governments to assess corporate data practices. And this meant that we were checking to see if they were using our personal and community resources according to policies and laws that drive an efficient expropriation of those very same resources!

The concept of privacy was particularly difficult to escape. The idea that personal data must be private has been ingrained into all of us, so much so that the notion of pooled data or community data falls outside the popular imagination.

As a result, we felt that our citizen data audits did other people’s data audit work for them. We became watchdogs in the service of government oversight offices. We became the backers of corporate efficiencies. I’ve got nothing personal against watchdogs — they do important work — but what if the laws and policies aren’t worth protecting?

We have struggled greatly with the question of how to generate a conversation that moves beyond established parameters, and that situates our work in the community. With this in mind, we’ve begun to explore alternative approaches to thinking about and carrying out citizen data audits. That’s the subject of the final post in this series.


About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [1/3]

Author: Katherine Reilly, Simon Fraser University, School of Communication

A curious thing happened in Europe after the creation of the GDPR. A whole new wave of data audit companies came into existence to service companies that use personal data. This is because, under the GDPR, private companies must audit their personal data management practices. An entire industry emerged around this requirement. If you enter “GDPR data audit” into Google, you’ll discover article after article covering topics like “the 7 habits of highly effective data managers” and “a checklist for personal data audits.”

Corporate data audits are central to the personal data protection frameworks that have emerged in the past few years. But among citizen groups, and in the community, data audits are very little discussed. The word “audit” is just not very sexy. It brings to mind green eyeshades, piles of ledgers, and a judge-y disposition. Also, audits seem like they might be a tool of datafication and domination. If data colonization “encloses the very substance of life” (Halkort), then wouldn’t data auditing play into these processes?

In these three blog posts, I suggest that this is not necessarily the case. In fact, we precisely need to develop the field of citizen data audits, because they offer us an indispensable tool for the decolonization of big data. The posts look at how audits contribute to upholding our current data regimes, an early attempt to realize a citizen data audit in Peru, and emerging alternative approaches. The series of the following blogposts will be published the coming weeks:

  1. The Current Reality of Personal Data Audits [find below]

  2. A First Attempt at Citizen Data Audits [link]

  3. Data Stewardship through Citizen Centered Data Audits [link]


The Current Reality of Personal Data Audits

Before we can talk about citizen data audits, it is helpful to first introduce the idea of auditing in general, and then unpack the current reality of personal data audits. In this post, I’ll explain what audits are, the dominant approach to data audits in the world right now, and finally, the role that audits play in normalizing the current corporate-focused data regime.

The aim of any audit is to check whether people are carrying out practices according to established standards or criteria that ensure proper, efficient and effective management of resources.

By their nature, audits are twice removed from reality. In one sense, this is because auditors look for evidence of tasks rather than engaging directly in them. An auditor shows up after data has been collected, processed, stored or applied, and they study the processes used, as well as their impacts. They ask questions like “How were these tasks completed, and, were they done properly?”

Auditors are removed from reality in a second sense, because they use standards established by other people. An auditor might ask “Were these tasks done according to corporate policy, professional standards, or the law?” Auditors might gain insights into how policies, standards or laws might be changed, but their main job is to report on compliance with standards set by others.

Because auditors are removed from the reality of data work, and because they focus on compliance, their work can come across as distant, prescribed – and therefore somewhat boring. But when you step back and look at the bigger picture, audits raise many important questions. Who do auditors report to and why? Who sets the standards by which personal data audits are carried out? What processes does a personal data audit enforce? How might audits normalize corporate use of personal data?

We can start to answer these questions by digging into the criteria that currently drive corporate audits of personal data. These can be divided into two main aspects: corporate policy and government regulation.

On the corporate side, audits are driven by two main criteria: risk management and profitability. From a corporate point of view, personal data audits are no exception. Companies want to make sure that personal data doesn’t expose them to liabilities, and that use of this resource is contributing effectively and efficiently to the corporate bottom line.

That means that when they audit their use of personal data, they will check to see whether the costs of warehousing and managing data is worth the reward in terms of efficiencies or returns. They will also check to see whether the use of personal data exposes them to risk, given existing legal requirements, social norms or professional practices. For example, poor data management may expose a company to the risk of being sued, or the risk of alienating their clientele. Companies want to ensure that their internal practices limit exposure to risks that may damage their brand, harm their reputation, incur costs, or undermine productivity.

In total, corporate data audits are driven by, and respond to, corporate policies, and those policies are organized around ensuring the viability and success of the corporation.

Of course, the success of a corporation does not always align with the well-being of the community. We see this clearly in the world of personal data. Corporate hunger for personal data resources has often come at the expense of personal or community rights.

Because of this, governments insist that companies enforce three additional regulatory data audit criteria: informed consent, personal data security, and personal data privacy.

We can see these criteria reflected clearly in the EU’s General Data Privacy Regulation. Under the GDPR, companies must ask customers for permission to access their data, and when they do so, they must provide clear information about how they intend to use that data.

They must also account for the personal data they hold, how it was gathered, from whom, to what end, where it is held, and who accesses it for what business processes. The purpose of these rules is to ensure companies develop clear internal data management policies and practices, and this, in turn, is meant to ensure companies are thinking carefully about how to protect personal privacy and data security. The GDPR requires companies to audit their data management practices on the basis of these criteria.

Taking corporate policy and government regulation together, personal data audits are currently informed by 5 criteria – profitability, risk, consent, security and privacy. What does this tell us about the management of data resources in our current data regime?

In a recent Guardian piece Stephanie Hare pointed out that “the GDPR could have … [made] privacy the default and requir[ed] us to opt in if we want to have our data collected. But this would hurt the ability of governments and companies to know about us and predict and manipulate our behaviour.” Instead, in the current regime, governments accept the central audit criteria of businesses, and on top of this, they establish the minimal protections necessary to ensure a steady flow of personal data to those same corporate actors. This means that the current data regime (at least in the West) privileges the idea that data resides with the individual, and also the idea that corporate success requires access to personal data.

Audits work to enforce the collection of personal data by private companies, by ensuring that companies are efficient, effective and risk averse in the collection of personal data. They also normalize corporate collection of personal data by providing a built in response to security threats and privacy concerns. When the model fails – when there is a security breach or privacy is disrespected – audits can be used to identify the glitch so that the system can continue its forward march.

And this means that audits can, indeed, serve as tools of datafication and domination. But I don’t think this necessarily needs to be the case. In the next post, I’ll explore what we’ve learned from experimenting with citizen data audits, before turning to the question of how they can contribute to the decolonization of big data in the final post.


About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

[BigDataSur] Data journalism without data: challenges from a Brazilian perspective

Author: Peter Füssy

For the last decade, data journalism has attracted attention from scholars, some of whom have provided distinct definitions in order to understand the changes in journalistic practices. Each one of them emphasizes a particular aspect of data journalism; from new forms of collaboration to open-source culture (Coddington, 2014). Yet, even among clashing definitions, it is possible to say they all agree that there is no data journalism without data. But which data? Relevant data does not generate by itself and it is usually related to power, economic, and/or political struggles (De Maeyer et. al, 2014). While journalists in the Global North mostly benefit from open government mechanisms for public scrutiny, journalists working in countries with less transparency and democratic tradition still face infrastructural issues when putting together data and journalism (Borges-Rey, 2019; Wright, Zamith & Bebawi, 2019).

For the next paragraphs, I draw from academic research, reports, projects, and my own experience to briefly problematize one of the most recurring challenges to data journalism in Brazil: access to information. Since relevant data is rarely available immediately, a considerable part of data-driven investigative projects in Brazil relies on Freedom of Information (FOI) law that forces governments to provide data of public interest. Also known as Access to Information or Right to Information, these acts are an essential tool to increase transparency, accountability, citizens agency, and trust. Yet, implementation and compliance of the regulation in Brazil are inefficient in all levels of government bodies (Michener, 2018; Abraji, 2019; Fonseca, 2020; Venturini, 2017).

More than just a bureaucratic issue inherited from years of dictatorship and lack of competences, this inefficiency is also a political act. As Torres argued, taking Mexico as an example, institutional resistance to transparency is carried out through subtle and non-political actions that diminish data activists agency and have the effect of producing or reinforcing inequalities (Torres, 2020). In the case of Brazil, however, recent reports imply that institutional resistance to transparency is not necessarily subtle. It may also be a political flag.

Opacity and Freedom of Information

According to Berliner, the first FOI act was passed in Sweden in 1766, but the recent wave follows the example of the United States’ act from 1966. After the US, there is no clear pattern for adoption; for example, Colombia passed a law in 1985, while the United Kingdom did so only in 2000. FOI acts are more likely to pass when there is a highly competitive domestic political environment, rather than pressure from civil society or international institutions (Berliner, 2014).

Sanctioned in 2011, the Brazilian FOI came to effect only in 2012. In the first six years, 611.3 thousand requests were filled just in the federal government (excluding state and municipal bodies). The average of 279 requests per day or 11 per hour suggests how eager the population was to decentralise information. Although public authorities often give insufficient responses and say that the request was granted, it is possible to say the law was about to “stick”. From the total requests, 458.4 thousand (75%) resulted in partial or full access to the requested information (Valente, 2018).

At the beginning of 2019, while president Jair Bolsonaro was at his first international appearance as the Brazilian head of state in Davos, vice president general Hamilton Mourão signed a decree to limit access to information by allowing government employees to declare confidentiality of public data up to the top-secret level, which makes documents unavailable for 25 years (Folha de S.Paulo, 2019). Until then, this could be done only by the president and vice president, ministers of state, commanders of the armed forces and heads of diplomatic missions abroad. Facing a backlash from civil society, Bolsonaro lost support in Congress to pass that bill and withdraw the resolution a few weeks later. Nonetheless, reports show that the issues regarding FOI requests are growing under his presidency.

Data collected from the Brazilian FOI electronic system by Agência Pública revealed that Federal Government’s denials of requests with the justification of “fishing expedition” increased from 8 in 2018 to 45 in the first year of Bolsonaro’s presidency (Fonseca, 2020). The term “fishing expedition” is pejorative and usually related to secret or non-stated purposes, like using an unrelated investigation or questioning to find evidence to be used against an adversary in a different context. However, according to the Brazilian FOI, the reason behind a request must not be taken into account when deciding to provide information or not.

At the same time, journalists’ perception of difficulties to retrieve information via FOI reached the highest numbers in 2019, when 89% of the interviewed journalists described issues like answers after the legal deadline, missing information, data in closed format, and denial of information (Abraji, 2019). In 2013, 60% reported difficulties, and the number dropped to 57% in 2015.

For example, after more than one year in the office, Bolsonaro’s presidency still refuses to make public the guest list of his inauguration reception. In addition to the guest list, the government keeps in secrecy more than R$ 15 million in expenses made with corporate cards from the Presidency and Vice President’s Office. The confidentiality remains even after a decision by the Supreme Court that overturned the confidentiality in November last year.

More from less

Despite the challenges, Brazilian journalists are following the quantitative turn in the field and creating innovative data-driven projects. As reported by the Brazilian Association of Investigative Journalism (Abraji), at least 1.289 news stories built on data from FOI requests were published from 2012 to 2019. In 2017, the “Ctrl+X” project, which scraped thousands of lawsuits to expose politicians trying to silence journalists in courts, won a prize in the Global Editors’ Data Journalism Awards.

In the following year, G1 won the public choice award with a project that tracked every single murder in the country for a week. The results from the “Violence Monitor” showed a total of 1,195 deaths, one in every eight minutes. However, this project did not rely on FOI requests but on an unprecedented collaboration of 230 journalists employed by the biggest media group in Brazil, Globo. They gathered the data from scratch at police stations all over the country to tell the stories of the victims. Besides that, G1 partnered with Universidade de São Paulo for analysis and launched a campaign on TV and social media so that people could identify some of the victims.

Regardless of the lack of resources, freedom, and safety, these projects show that data journalism can be a tool to rebuild trust from audiences. However, activism to break the resistance to transparency is a challenge even more prominent when opacity seems to be encouraged by institutional actors.


About the author

Peter is a journalist trying to explore new media in depth, from everyday digital practices to the undesired consequences of a highly connected environment. After more than 10 years of writing and multimedia reporting for some of the most relevant news outlets in Brazil, he is now second years Research Master’s student in Media Studies at the University of Amsterdam.



Berliner, Daniel. “The political origins of transparency.” The journal of Politics 76.2 (2014): 479-491.

Borges-Rey, Eddy. “Data Journalism in Latin America: Community, Development and Contestation.” Data Journalism in the Global South. Palgrave Macmillan, Cham, 2019. 257-283.

Coddington, Mark. “Clarifying journalism’s quantitative turn: A typology for evaluating data journalism, computational journalism, and computer-assisted reporting.” Digital journalism 3.3 (2015): 331-348.

De Maeyer, Juliette, et al. “Waiting for data journalism: A qualitative assessment of the anecdotal take-up of data journalism in French-speaking Belgium.” Digital journalism 3.3 (2015): 432-446.

Fonseca, Bruno. Governo Bolsonaro acusa cidadãos de “pescarem” dados ao negar pedidos de informação pública. Agência Pública. 6 Feb, 2020. 

Michener, Gregory, Evelyn Contreras, and Irene Niskier. “From opacity to transparency? Evaluating access to information in Brazil five years later.” Revista de Administração Pública 52.4 (2018): 610-629.

Michener, Gregory, et al. “Googling the requester: Identity‐questing and discrimination in public service provision.” Governance (2019).

Valente, Jonas. “LAI: governo federal recebeu mais de 600 mil pedidos de informação”. Agência Brasil. May 16, 2018. 

Venturini, Lilian. “Se transparência é regra, por que é preciso mandar divulgar salários de juízes?”. Nexo Jornal. São Paulo, 3 Sept. 2017.

Wright, Kate, Rodrigo Zamith, and Saba Bebawi. “Data Journalism beyond Majority World Countries: Challenges and Opportunities.” Digital Journalism 7.9 (2019): 1295-1302.