Category: show on blog page

Art as data-activism

The second day of the DATACTIVE closing workshop, hosted online by SPUI25, focused on artistic responses to datafication and mass data collection. The DATACTIVE team has interviewed many civil society actors from the field of digital rights, privacy, and technology activism. Artists take part in this field, but often they don’t figure as the core actors in what is being highlighted as data activism. In this event, we wanted to stage artistic interventions in particular, in order to tease out what artists do and can do, and what insights and further questions do they generate. In this event we spoke with Karla Zavala and Adriaan Odendaal from The Internet Teapot Design Studio, Manu Luksch, and Viola van Alphen, about their work and ideas:

The Internet Teapot Design Studio is a Rotterdam-based collaboration that focuses on speculative and critical design projects and research. Karla and Adriaan explained how their work is based on the idea that ephemeral data processes have material effects in the world, and that it is needed to focus on the conditions of their production. In order to bring in this focus with their audiences, Karla and Adriaan organize co-creation workshops. In these workshops they aim to create counter-discourses, critical practices, and algorithmic literacy. Part of their approach is working with so called ‘political action cards’, a way to design pathways through the datafied society. In this way, they stimulate creative responses and make people aware of processes of datafication and, for instance, machine learning. One example of such a creative response is participants writing a diary entree from the perspective of a biased machine vision system. By taking the position of the machine, they would imagining processes such as inputs, black boxes, and outputs. Through their workshop, their audience engages with major conceptual themes such as Digital Citizenry, Surveillance Capitalism, Digital Feminism.

Manu Luksch is an intermedia artist and filmmaker who interrogates conceptions of progress and scrutinises the effects of network technologies on social relations, urban space, and political structures. She talked about her work on predictive analysis and how through her work she tries to involve publics in matters of algorithmic decision making. She showed us a part ‘Algo-Rhythm’, a film that “scrutinizes the limitations, errors and abuses of algorithmic representations” . The film, which was shot in Dakar in collaboration with leading Sengalese hip-hop performers, addresses practices of political microtargeting. As she explained, the film is an example of how she frames her findings in a speculative narrative on the basis of observations and analyses. The film got translated in eight languages and has been included in curricula across schools in Germany, which shows how her work finds a place outside of the more classic art settings and operates as a societal intervention.

Viola van Alphen is an activist, writer, and the creative director and curator of Manifestations, an annual Art & Tech festival in Eindhoven, the Netherlands. Viola showed us trailers of Manifestations, and explained how ‘fun’ is an important element for passing the message on an art and technology festival. She provided many examples of how artists try to materialize datafication and concerns around the digital economy. Some of these examples included a baby outfit with an integrated smartphone, data poker games, and candy machines that give candies in return for personal data. She also told us about her experiences of hosting the exhibition online in virtual worlds, and how artists typically managed to push the boundaries of the platform and be kicked off the platform. This, in turn, exhibits ‘the rule of platforms’, but how artists found counter measures via alternative self-hosted and decentralized servers. Other examples included 3D printed face masks that would confuse the Instagram facial recognition system, and a film that disclosed how corporations, including ones that sponsored the exhibition, take part in the weapon industry. For her, artists are important in making complex issues about datafication simple. They can boil them down to a key problem and make that sensible.

In the discussion, we touched upon a variety of issues. In our DATACTIVE workshop, we have talked about the question whether the context of datafication has changed over the last 5-6 years. This question is important to us, because the project started in the wake of the Snowden disclosures and questions about mass data collection and security were relatively new to the larger audience. The Internet Teapot Design Studio addressed how the practices of data tracing and identification are seemingly much more present in the public domain now. Adriaan mentioned how tactics of ‘gaming the system’ are present on social media, and not only amongst the typical tech activists. According to him, algorithmic awareness has become more part of public discourse, as shown by Instagram influencers talking about gaming the algorithms. Karla added how, during social protests in Colombia, tips were being shared about how beauty filters can be repurposed to prevent online facial recognition software to recognize people. They find it interesting to see user generated content emerging that is critical about algorithms.

In response the question about societal change, Manu pointed to the fact that datafication existed also before the digital, and that for years, fears to be outpaced by technological competition hindered data regulation. She stated that it is an urgent task to remind ourselves that data is not immaterial, and that it is not some substrate that we sweat out. She commented that, when looking back, the notion of the ‘data shadow’, a concept that has been used to explain our ‘data profiles’, was maybe an attractive but an ‘unlucky choice’. Data is rather an extension, that opens and closes doors. In other words, data has much more agency than being just a trace that we leave behind.
We also talked about the question whether the artists follow up with their audiences. All participants work on awareness raising. But are people really empowered on the longer term? According Viola, who regularly ‘tests out’ ideas for her exhibition with neighbors and friends, it is important to break out of one’s bubble. Art can touch individual people in their hart, and they might remember single art projects for years, but one needs to invest in speaking a variety of languages. Amongst her visitors are professionals, kids, refugees, and corporate stakeholders. Sustaining awareness is both a continuous and customized process.

The Teapot Design Studio does see communities emerging that keep in touch via social media after workshops. The studio can function as a stepping stone for people to get familiar with the topic, after which they might hopefully become interested in bigger events such as Ars Electronica or Mozilla Fest.

We concluded the event with the following question: If you were looking forward to the future, what methods are needed? What approaches would you teach art students? The Teapot studio stated that one shouldn’t be intimidated by tech in a material way. And also: Digital media is not new: people need to work on understanding what is the post-digital and what are its aesthetics. Manu advises people to take their time to become data literate, develop their sense for values (including values and skills associated to the analogue space and time), and never stop dreaming. Viola states that art projects need to be easy and digestible with only one headline. If people don’t understand it in one minute, they are off again.

There is much more to know. Watch the video of our event to hear Karla and Adriaan about what ‘teapots’ have to do with the internet, to understand how Manu has investigated the way legal regimes co-shape what is returned as ‘an image’ after doing FOIA requests in the context of CCTV surveillance, and to hear Viola reflect upon how robots can provide multi-sensory experiences and raise questions about war. The DATACTIVE team is looking forward to follow the work of the speakers in the coming years. Some of work discussed in the event is also accessible through our website.

 

The first day of DATACTIVE’s final event also featured a more condensed, albeit exciting panel dedicated to the intersections between data / art / activism. Next to the artists already mentioned above, we also had the opportunity to have a peek on the work of Joana Moll, a Barcelona/Berlin based artist and researcher whose work critically explores the way techno-capitalist narratives affect the alphabetization of machines, humans and ecosystems. Stay tuned for more info on this event in an upcoming post!

Niels’ research featured in the New York Times

TL;DR: have a look at the piece in the New York Times that covers Niels’ work.

During the research Niels did for datactive, which culminated in his thesis and a recent paper in New Media & Society, he actively participated in the Internet Engineering Taskforce (IETF). The IETF is one of the main standards and governance bodies of the Internet. While working there Niels’ worked together with others such as Mallory Knodel and Corinne Cath, on addressing exclusionary language in technical standards. An important part of that work was publishing this document, which sparked an extensive discussion in the IETF that up to today has not been resolved. You can read more about it in the New York Times piece.

BigBang Sprint at IETF110 Hackathon

When: March 1-3, 2021

The BigBang project will be working on improving its tool for mailinglist analysis at the IETF 110 Hackathon.

BigBang is an open source research project that studies collaboration and contention in digital infrastructure projects and governance institutions. We do this by combining data science techniques with qualitative methods. For example, with BigBang you can analyze participation, affiliation, gender, and networks in the IETF, ICANN, RIPE, IEEE, or the 3GPP.

We very much welcome both techncial and non-technical contributors! BigBang is built on the scientific Python stack, and we use Jupyter notebooks to make the analysis transparent and accessible.

To join the IETF 110 Hackathon, please register using the link from the Hackathon website. Registration is free!

We intend to work on (some of) the following issues during the hackathon:

– Integration and analysis of 3GPP and IEEE mailing lists
– Integration with the INDELab conversationkg tool
– Produce instructional videos
– Improve linking across datasets (such as the datatracker and mailing lists)
– Query/notebook design to support projects from research community
– Discussion of Star’s boundary object vs. Luhmann’s structural coupling
– The operationalization of _your_ research question!

The BigBang project will have a one-hour team meeting Friday February 26 – 9:00 ET / 14:00 GMT / 15:00 CET before the Hackathon which all are welcome to attend if they are curious about the project. You can join via this link: https://uva-live.zoom.us/s/6365963924

Please don’t hesitate to write Seb (sbenthall at gmail dot com) if you have any questions about the BigBang project or the IETF 110 sprint, or if you have suggestions for research questions!

BigBang

DATACTIVE protests lack of ethical review in the UvA-Huawei collaboration

DATACTIVE, together with Bits of Freedom, the Data Justice Project and many individual scientists, signed the Funding Matters statement that protests the collaboration of a project at the University of Amsterdam and the Vrije University with Huawei. While collaboration with companies is not problematic per se, it is important that such collaborations undergo careful ethical scrutiny. Standards for such structural reviews of the societal impact of such collaborations are currently not in place. Huawei has been accused of collaborating with the Chinese government in human rights violations against the Uyghur people as well as facilitating surveillance in Uganda.

[blogpost] Teaching Students to Question Mr. Robot: Working to Prevent Algorithmic Bias in Educational Artificial Intelligence

Author: Erinne Paisley

Introduction

With the onset of the COVID-19 pandemic, classrooms around the world have moved online. Students from K-12, as well as University-level, are turning to their computers to stay connected to teachers and progress their education. This move online raises questions of the appropriateness of technologies in the classroom and how close to a utopian “Mr. Robot” we can, or should, get. One of the most contested technological uses in the classroom is the adoption of Artificial Intelligence (AI) to teach.

AI in Education

AI includes many practices that process information in a similar way to humans processing of information. Human intelligence is not one-dimensional and neither is AI, meaning AI includes many different techniques and addresses a multitude of tasks. Two of the main AI techniques that have been adapted into potential educational AI are: automation and adaptive learning. Automation means computers are pre-programmed to complete tasks without the input of a human. Adaptive learning indicates that these automated systems can adjust themselves based on use and become more personalized.

The potential of combining these AI techniques into some type of robot teacher, or “Mr. Robot” sounds like something out of a sci-fi cartoon but it is already a reality for some. A combination of these AI techniques have already been used to assessing students’ prior and ongoing learning levels, placing students in appropriate subject levels, scheduling classes, and individualizing instructions. In the Mississippi Department of Education, in the United States, a shortage of teachers has been addressed through the use of an AI-powered online learning program called “Edgenuity”. This program automates lesson plans following the format of: warm-up, instruction, summary, assignment, and quiz.

A screenshot from an Edgenuity lesson plan.

Despite how utopian an AI-powered classroom may sound, there are some significant issues of inequality and social injustice that are supported by these technologies. In September 2019, the United Nations Education, Scientific and Cultural Organization (UNESCO), along with a number of other organizations, hosted a conference titled: “Where Does Artificial Intelligence Fit in the Classroom?” that explored these issues. One of the main concerns raised was: algorithmic bias.

Algorithmic Bias in Education AI

The mainstream attitude towards AI is still one of faith – faith that these technologies are, as the name says, intelligent as well as objective. However, AI bias illustrates how these new technologies are very far from neutral. Joy Buolamwini explains how AI can be biased, explaining that the biases, conscious or subconscious, present in those who create the code is then a part of the digital systems themselves. This creates systems especially skewed against people of colour, women, and other minorities who are not statistically as included in the process of creating these codes, including AI codes. For instance, the latest AI application pool for Stanford University in the United States was 71% male.

Joy Buolamwini’s Tedx talk on algorithmic bias.

In the educational sector, people of colour, girls, and other minorities are already marginalized. Because of this, there is the concern that AI in the classroom that has encoded biases would further these inequalities. For instance, trapping low-income and minority students into low-achievement tracks. This would create a cycle of poverty, supported by this educational framework, instead of having human teachers address students on an individual level and offer specialized support and attention to those facing adversity.

However, the educational field already has its own biases embedded in it – both within individual teachers and throughout the system more generally. Viewed in this way, the increased use of AI in the classrooms creates the opportunity to create less bias if designed in a way that directly aims to address these issues. The work of designing AI that addresses and aims to create progressive technologies has been taken on by teachers, librarians, students or anyone in-between.

Learning to Fight Algorithmic Bias

By including more voices and perspectives in the process of creating the coding AI technologies, algorithmic bias can be prevented and, instead, a technological system that supports a socially just classroom can be supported. In this final section, I will highlight two pre-existing educational projects aimed at teaching students of all ages to identify and fight algorithmic bias while creating technology that creates a more equal classroom.

Algorithmic Bias Lesson Plans

The use of AI to create a more socially just educational system can start in the classroom, as Blakeley Payne showed when she ran a week-long ethics in AI course for 10 to 14-year-olds in 2019. The course included lessons on creating open-source coding, AI ethics, and ultimately taught students to both understand and fight against algorithmic bias. The lessons plans themselves are available for free online for any classroom to incorporate into their own lesson plans – even from home.

Students learn how to identify algorithm bias during the one-week course.

Blakeley Payne’s one-week program focuses on ages 10-14 to encourage students to become interested and passionate about issues of algorithm bias, and the STEM field more broadly, from a young age. Students work on simple activities such as writing an algorithm for the “best peanut butter and jelly sandwich” in order to practice questioning algorithmic bias. This activity in particular has them question what “best” means? Does it mean best looking? Best tasting? Who decides what this means and what are the implications of this?

Non-profits such as Girls Who Code are also actively working to design lesson plans and activities for young audiences that teach critical thinking and design when it comes to algorithms, including those creating AI. The organization runs after school clubs for girls in grades 3-12, as well as college programs for alumni of the program, as well as summer intensives. Their programs focus technically on developing coding skills but also have a large focus on diversifying the STEM fields and creating equitable coding.

Conclusion

The future of AI in the classroom is inevitable. This may not mean every teacher becomes robotic, but the use of AI and other technologies in the educational field is already happening. Although this raises concerns about algorithmic bias in the education system, it also creates more opportunities to re-think how technologies can be used to create a more socially just educational system. As we have seen through existing educational programs that teach algorithmic bias, even at the kindergarten age, interest in learning, questioning, and re-thinking algorithms can easily be nurtured. The answer to how we create more socially just educational system through AI is simple: just ask the students.

 

About the Author

Erinne Paisley is a current Research Media Masters student at the University of Amsterdam and completed her BA at the University of Toronto in Peace, Conflict and Justice & Book and Media Studies. She is the author of three books on social media activism for youth with Orca Book Publishing.

[BigDataSur] Data activism in action: The gigantic size and impact of China’s fishing fleet revealed using big data analytics and algorithms

Author: Miren Gutiérrez

As we grapple with the COVID-19 pandemic, another crisis looms over our future: overfishing. Fishing fleets and unsustainable practices have been emptying the oceans of fish and filling them with plastic. Although other countries are also responsible for overfishing, China has a greater responsibility. Why is looking at the Chinese fleet important? China has almost 17.00 vessels capable of distant water fishing, as reveals for the first time an investigative report published by Overseas Development Insitute, Londres.

 

As part of a team of researchers at the Overseas Development Institute, London, I had access to the world’s largest database of fishing vessels. Combining these data with satellite data from the automatic identification system –which indicates their movements—, we were able to observe their behaviour for two years (2017 and 2018). To do this, we employed big data analytical techniques, machine learning algorithms, and geographic information systems to describe the fleet and analyze how it behaved.

And the first thing we noticed is that China’s fishing fleet is five to eight times larger than any previous estimation. We identified a total of 16,966 Chinese fishing vessels able to fish in “distant waters”, that is, outside its exclusive economic zone, including some 12,500 vessels observed outside Chinese waters during the same period.

Why is this important? If China’s DWF fleet is 5-8 times larger than previous estimates, its impacts are inevitably more significant than previously estimated. This is important for two reasons. First, because millions of people in coastal areas of developing countries depend on fishery resources for their subsistence and food security. Second, due to this extraordinary increase, it is difficult to monitor and control fishing activities in distant waters of China.

The other thing that we observe is the most frequent type of fishing vessel is the trawler. Most of these Chinese trawlers can practice bottom trawling, which is the most damaging fishing technique available. We identified some 1,800 Chinese trawlers, which are more than double what was previously thought.

Furthermore, only 148 Chinese ships were registered in countries commonly regarded as flags of convenience. This shows that the incentives to adopt flags of convenience are few given the relatively lax regulation of the Chinese authorities.

Finally, of the nearly 1,000 registered vessels outside of China, more than half have African flags, especially in west Africa, where law enforcement is limited and fishing rights are often limited to registered vessels in the country, which explains why these Chinese ships have adopted local flags.

What can be said about the ownership of these fishing vessels? It is very complex. We analyzed a subsample of approximately 6,100 vessels to discover that only eight companies owned or operated more than 50 vessels each. That is, there are very few large Chinese fishing companies since small or medium-sized companies own most of them. However, this is only a facade, as many of these companies appear to be subsidiaries of larger corporations, suggesting some form of more centralized control. The lack of transparency hampers monitoring efforts and attempts to hold those responsible for malpractice accountable.

But another exciting facet of the ownership structure is that half of the 183 vessels suspected of involvement in illegal, unreported or unregulated fishing are owned by a handful of companies, and also that several of them are parastatal. This means that focusing on them could solve many problems because these companies own other ships.

There has been an extraordinary boom in Chinese fishing activities that is difficult to control. Chinese companies are free to operate and negotiate their access to fisheries in coastal states of developing countries without being monitored, especially in West Africa. This laxity contrasts with the policy of the European Union to reduce its fishing fleet and exercise greater control over its global operations.

This report is a data activist project that aims at redressing the unfair situation for nations, especially in west Africa, that cannot monitor and police their waters.

 

This is a version of an op-ed published in Spanish by eldiario.es.

 

About the author: Miren Gutiérrez is passionate about human rights, journalism and the environment (with a weakness for fish), and optimistic about what can be done with data-based research, knowledge and communication. Prof. at the University of Deusto and Research Associate at the Overseas Development Institute. Miren is Research Associate @DATACTIVE.

 

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [3/3]

 

Author: Katherine Reilly, Simon Fraser University, School of Communication

Data Stewardship through Citizen Centered Data Audits

In my previous two posts (the first & the second), I talked about the nature of data audits, and how they might be applied by citizens. Audits, I explained, check whether people are carrying out practices according to established standards or criteria with the goal of ensuring effective use of resources. As citizens we have many tools available at our disposal to audit companies, but when we audit companies according to their criteria, then we risk losing sight of our own needs in the community. The question addressed by this post is how to do data audits from a citizen point of view.

Thinking about data as a resource is a first step in changing our perspective on data audits. Our current data regime is an extractive data regime. As I explained in my first post, in the current regime, governments accept the central audit criteria of businesses, and on top of this, they establish the minimal protections necessary to ensure a steady flow of personal data to those same corporate actors.

I would like to suggest that we rethink our data regime in terms of data stewardship. The term ‘stewardship’ is usually applied to the natural environment. A forest might be governed by a stewardship plan which lays out the rights and responsibilities of resource use. Stewardship implies a plan for the management of those resources, both so that they can be sustained, and also so that everyone can enjoy them.

If the raw material produced by the data forest is our personal information, then we are the trees, and we are being harvested. Our data stewardship regime is organized to support that process, and audits are the means to enforce it. The main beneficiaries of the current data stewardship regime are companies who harvest and process our data. Our own benefits – our right to walk through the forest and enjoy the birds, or our right to profit from the forest materially – are not contemplated in the current stewardship regime.

It is tempting to conclude that audits are to blame, but really, evaluation is an agnostic concept. What matters are the criteria – the standards to which we hold corporate actors. If we change the standards of the data regime, then we change the system. We can introduce principles of stewardship that reflect the needs of community members. To do this, we need to start from the audit criteria that represent the localized concerns of situated peoples.

To this end, I have started a new project in collaboration with 5 fellow data justice organizations in 5 countries in Latin America: HiperDerecho in Chile, Karisma in Colombia, TEDIC in Paraguay, HiperDerecho in Peru and ObservaTIC in Uruguay. We will also enjoy the technical support of Sula Batsu in Costa Rica.

Our focus will be on identifying alternative starting points for data audits. We won’t start from the law, or the technology, or corporate policy. Instead, we will start from people’s lived experiences, and use these as a basis to establish criteria for auditing corporate use of personal data.

We will work with small groups who share a common identity and/or experience, and who are directly affected by corporate use of their personal data. For example, people with chronic health issues have a stake in how personal data, loyalty programs and platform delivery services mediate their relationship with pharmacies and pharmaceutical companies. The project will identify community collaborators who are interested in working with us to establish alternative criteria for evaluating those companies.

Our emerging methodology will use a funnel-like approach, starting from broad discussions about the nature of data, passing through explorations of personal practices and the role of data in them, and then landing on more specific and detailed explorations of specific moments or processes in which people share their personal data.

Once the group has learned something about the reality of data in their daily lives – and in particular the instances where data is of particular concern from them – we will facilitate group activities that help them identify their data needs, as well as the behaviors that would satisfy those needs. An example of a data need might be “I need to feel valued as a person and as woman when I interact with the pharmacy.” A statement of how that need might be satisfied could be, for example, “I would feel more valued as a person and as a woman if the company changed its data collection categories.”

We are particularly interested to think through the application of community criteria to companies who have grown in power and influence during the Covid-19 pandemic. Companies like InstaCart, SkipTheDishes, Rapi, Zoom, and Amazon are uniquely empowered to control urban distribution chains that affect the welfare of millions. What do community members require from these companies in terms of their data practices, and how would they fare against an audit based on those criteria?

We find inspiration for alternative audit criteria in data advocacy projects that have been covered by DATACTIVE’s Big Data from the South Blog. For example, the First Nations Information Governance Centre (FNIGC) of Canada has established the principles of ownership, control, access and permission for the management of First Nations data, and New Zealand has adopted Maori knowledge protocols for information systems used in primary health care provision (as reported by Anna Carlson). Meanwhile, the Mexican organization Controla tu Gobierno argues that we need to view data “less as a commodity – which is the narrative that constantly tries to make us understand data as the new oil – and more as a source of meaning” (Guillen Torres and Mayli Sepulveda, 2017).

From examples like these, and given the concept of data stewardship, we can begin to see that data is only as valuable as the criteria used to assess it, and so we urgently need alternative criteria that reflect the desires, needs and rights of communities.

How would corporate actors fare in an audit based on these alternative criteria? How would such a process reposition the value of data within the community? Who should carry out these evaluative processes, and how can they work together to create a more equitable data stewardship regime that better serves the needs of communities?

By answering these questions, we can move past creating data literate subjects for the existing data stewardship regime. Instead, we can open space for discussion about how we actually want our data resources to be used. In a recent Guardian piece, Hare argued that “The GDPR protects data. To protect people, we need a bill of rights, one that protects our civil liberties in the age of AI.”2 The content of that bill of rights requires careful contemplation. Citizen data audits allow us to think creatively about how data stewardship regimes can serve the needs of communities, and from there we can build out the legal frameworks to protect those rights.

 

About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

WomenonWeb censored in Spain as reported by Magma

Author: Vasilis Ververis

The Magma project just published new research on censorship concerning womenonweb.org, a non-profit organization providing support to women and pregnant people. The article describes how the major ISPs in Spain are blocking womenonweb.org’s website. Spanish ISPs have been blocking this website by means of DNS manipulation, TCP reset, HTTP blocking with the use of a Deep Packet Inspection (DPI) infrastructure. Our data analysis is based on network measurements from OONI data. This is the first time that we observe Women on Web being blocked in Spain.

About Magma: Magma aims to build a scalable, reproducible, standard methodology on measuring, documenting and circumventing internet censorship, information controls, internet blackouts and surveillance in a way that will be streamlined and used in practice by researchers, front-line activists, field-workers, human rights defenders, organizations and journalists.

About the author: Vasilis Ververis is a research associate with DATACTIVE and a practitioner of the principles ~ undo / rebuild ~ the current centralization model of the internet. Their research deals with internet censorship and investigation of collateral damage via information controls and surveillance. Some recent affiliations: Humboldt-Universität zu Berlin, Germany; Universidade Estadual do Piaui, Brazil; University Institute of Lisbon, Portugal.

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [2/3]

 

A First Attempt at Citizen Data Audits

Author: Katherine Reilly, Simon Fraser University, School of Communication

In the first post in this series, I explained that audits are used to check whether people are carrying out practices according to established standards or criteria. They are meant to ensure effective use of resources. Corporations audit their internal processes to make sure that they comply with corporate policy, while governments audit corporations to make sure that they comply with the law.

There is no reason why citizens or watchdogs can’t carry out audits as well. In fact, data privacy laws include some interesting frameworks that can facilitate this type of work. In particular, the EU’s General Data Privacy Regulation (GDPR) gives you the right to know how corporations are using your personal data, and also the ability to access the personal data that companies hold about you. This right is reproduced in the privacy legislation of many countries around the world from Canada and Chile to Costa Rica and Peru, to name just a few.

With this in mind, several years ago the Citizen Lab at the University of Toronto set up a website called Access My Info which helps people access the personal data that companies hold about them. Access My Info was set up as an experiment, so the site only includes a fixed roster of Canadian telecommunications companies, fitness trackers, and dating apps. It walks users through the process of submitting a personal data request to one of these companies, and then tracks whether the companies respond. The goal of this project was to crowdsource insights from citizens that would help researchers learn what companies know about their clients, how companies manage personal data, and who companies share data with. The results of this work have been used to advocate for changes to digital privacy laws.

Using this model as a starting point, in 2019, my team at SFU, and a team from the Peruvian digital rights advocate HiperDerecho, set up a website called SonMisDatos (Son Mis Datos translates as “It’s My Data”.) Son Mis Datos riffed on the open source platform developed by Access My Info, but made several important modifications. In particular, HiperDerecho’s Director, Miguel Morachimo, made the site database-driven so that it was easier to update the roster of corporate actors or their contact details. Miguel also decided to focus on companies that have a more direct material impact on the daily lives of Peruvians – such as gas stations, grocery stores and pharmacies. These companies have loyalty programs that are involved in collecting personal data about users.

Then we took things one step further. We used SonMisDatos to organize citizen data audits of Peruvian companies. HiperDerecho mobilized a team of people who work on digital rights in Peru, and we brought them together at two workshops. At the first workshop, we taught participants about their rights under Peru’s personal data protection laws, introduced SonMisDatos, and asked everyone to use the site to ask companies for access to their personal data. Companies need time to fulfill those requests, so then we waited for two months. At our second workshop, participants reported back on the results of their data requests, and then I shared a series of techniques for auditing companies on the basis of the personal data people had been able to access.

Our audit techniques explored the quality of the data provided, corporate compliance with data laws, how responsive companies were to data requests, the quality of their informed consent process, and several other factors. My favorite audit technique reflected a special feature of the data protection laws of Peru. In that country, companies are required to register databases of personal information with a state entity. The registry, which is published online, includes lists of companies, the titles of their databases, as well as the categories of data collected by each database. (The government does not collect the contents of the databases, it only registers their existence.)

With this information, our auditors were able to verify whether the data they got back from corporate actors was complete and accurate. In one case, the registry told us that a pharmaceutical company was collecting data about whether clients had children. However, in response to an access request, the company only provided lists of purchases organized by date, skew number, quantity and price. Our auditors were really bothered by this discovery, because it suggested that the company was making inferences about clients without telling them. Participants wondered how the company was using these inferences, and whether it might affect pricing, customer experience, access to coupons, or the like.

In another case, one of our auditors subscribed to DirecTV. To complete this process, he needed to provide his cell phone number plus his national ID number. He later realized that he had accidentally typed in the wrong ID number, because he began receiving cell phone spam addressed to another person. This was exciting, because it allowed us to learn which companies were buying personal data from DirecTV. It also demonstrated that DirecTV was doing a poor job of managing their customer’s privacy and security! However, during the audit we also looked back at DirecTV’s terms of service. We discovered that they were completely up front about their intention to sell personal information to advertisers. Our auditors were sheepish about not reading the terms of the deal, but they also felt it was wrong that they had no option but to accept these terms if they wanted to access the service.

On the basis of this experience, we wrote a guidebook that explains how to use Son Mis Datos, and how to carry about an audit on the basis of the ‘access’ provisions in personal data laws. The guide helps users think through questions like: Is the data complete, precise, unmodified, timely, accessible, machine-readable, non-discriminatory, and free? Has this company respected your data rights? What does the company’s response to your data request suggest about its data use and data management practices?

We learned a tonne from realizing these audits! We know, for instance, that the more specific the request, the more data a company provides. If you ask a company for “all of the personal data you hold about me” you will get less data that if you ask for “all of my personal information, all of my IP data, all of my mousing behaviour data, all of my transaction data, etc.”

Our experiments with citizen data audits also allow us to make claims about how companies define the term “personal data.” Often companies define personal data very narrowly to mean registration information (name, address, phone number, identification number, etc.). This lies in extreme contrast to the academic definition of personal data, which is any information that can lead to the identification of an individual person. In the age of big data, that means pretty much any digital traces you produce while logged in. Observations like these allow us to open up larger discussions about corporate data use practices, which helps to build citizen data literacy.

However, we were disappointed to discover that our citizen data audits worked to validate a data regime that is organized around the expropriation of resources from our communities. In my first blog post I explained that the 5 criteria driving data audits are profitability, risk, consent, security and privacy.

Since our audit originated with the law, with technology, and with corporate practices, we ended up using the audit criteria established by businesses and governments to assess corporate data practices. And this meant that we were checking to see if they were using our personal and community resources according to policies and laws that drive an efficient expropriation of those very same resources!

The concept of privacy was particularly difficult to escape. The idea that personal data must be private has been ingrained into all of us, so much so that the notion of pooled data or community data falls outside the popular imagination.

As a result, we felt that our citizen data audits did other people’s data audit work for them. We became watchdogs in the service of government oversight offices. We became the backers of corporate efficiencies. I’ve got nothing personal against watchdogs — they do important work — but what if the laws and policies aren’t worth protecting?

We have struggled greatly with the question of how to generate a conversation that moves beyond established parameters, and that situates our work in the community. With this in mind, we’ve begun to explore alternative approaches to thinking about and carrying out citizen data audits. That’s the subject of the final post in this series.

 

About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [1/3]

Author: Katherine Reilly, Simon Fraser University, School of Communication

A curious thing happened in Europe after the creation of the GDPR. A whole new wave of data audit companies came into existence to service companies that use personal data. This is because, under the GDPR, private companies must audit their personal data management practices. An entire industry emerged around this requirement. If you enter “GDPR data audit” into Google, you’ll discover article after article covering topics like “the 7 habits of highly effective data managers” and “a checklist for personal data audits.”

Corporate data audits are central to the personal data protection frameworks that have emerged in the past few years. But among citizen groups, and in the community, data audits are very little discussed. The word “audit” is just not very sexy. It brings to mind green eyeshades, piles of ledgers, and a judge-y disposition. Also, audits seem like they might be a tool of datafication and domination. If data colonization “encloses the very substance of life” (Halkort), then wouldn’t data auditing play into these processes?

In these three blog posts, I suggest that this is not necessarily the case. In fact, we precisely need to develop the field of citizen data audits, because they offer us an indispensable tool for the decolonization of big data. The posts look at how audits contribute to upholding our current data regimes, an early attempt to realize a citizen data audit in Peru, and emerging alternative approaches. The series of the following blogposts will be published the coming weeks:

  1. The Current Reality of Personal Data Audits [find below]

  2. A First Attempt at Citizen Data Audits [link]

  3. Data Stewardship through Citizen Centered Data Audits [link]

 

The Current Reality of Personal Data Audits

Before we can talk about citizen data audits, it is helpful to first introduce the idea of auditing in general, and then unpack the current reality of personal data audits. In this post, I’ll explain what audits are, the dominant approach to data audits in the world right now, and finally, the role that audits play in normalizing the current corporate-focused data regime.

The aim of any audit is to check whether people are carrying out practices according to established standards or criteria that ensure proper, efficient and effective management of resources.

By their nature, audits are twice removed from reality. In one sense, this is because auditors look for evidence of tasks rather than engaging directly in them. An auditor shows up after data has been collected, processed, stored or applied, and they study the processes used, as well as their impacts. They ask questions like “How were these tasks completed, and, were they done properly?”

Auditors are removed from reality in a second sense, because they use standards established by other people. An auditor might ask “Were these tasks done according to corporate policy, professional standards, or the law?” Auditors might gain insights into how policies, standards or laws might be changed, but their main job is to report on compliance with standards set by others.

Because auditors are removed from the reality of data work, and because they focus on compliance, their work can come across as distant, prescribed – and therefore somewhat boring. But when you step back and look at the bigger picture, audits raise many important questions. Who do auditors report to and why? Who sets the standards by which personal data audits are carried out? What processes does a personal data audit enforce? How might audits normalize corporate use of personal data?

We can start to answer these questions by digging into the criteria that currently drive corporate audits of personal data. These can be divided into two main aspects: corporate policy and government regulation.

On the corporate side, audits are driven by two main criteria: risk management and profitability. From a corporate point of view, personal data audits are no exception. Companies want to make sure that personal data doesn’t expose them to liabilities, and that use of this resource is contributing effectively and efficiently to the corporate bottom line.

That means that when they audit their use of personal data, they will check to see whether the costs of warehousing and managing data is worth the reward in terms of efficiencies or returns. They will also check to see whether the use of personal data exposes them to risk, given existing legal requirements, social norms or professional practices. For example, poor data management may expose a company to the risk of being sued, or the risk of alienating their clientele. Companies want to ensure that their internal practices limit exposure to risks that may damage their brand, harm their reputation, incur costs, or undermine productivity.

In total, corporate data audits are driven by, and respond to, corporate policies, and those policies are organized around ensuring the viability and success of the corporation.

Of course, the success of a corporation does not always align with the well-being of the community. We see this clearly in the world of personal data. Corporate hunger for personal data resources has often come at the expense of personal or community rights.

Because of this, governments insist that companies enforce three additional regulatory data audit criteria: informed consent, personal data security, and personal data privacy.

We can see these criteria reflected clearly in the EU’s General Data Privacy Regulation. Under the GDPR, companies must ask customers for permission to access their data, and when they do so, they must provide clear information about how they intend to use that data.

They must also account for the personal data they hold, how it was gathered, from whom, to what end, where it is held, and who accesses it for what business processes. The purpose of these rules is to ensure companies develop clear internal data management policies and practices, and this, in turn, is meant to ensure companies are thinking carefully about how to protect personal privacy and data security. The GDPR requires companies to audit their data management practices on the basis of these criteria.

Taking corporate policy and government regulation together, personal data audits are currently informed by 5 criteria – profitability, risk, consent, security and privacy. What does this tell us about the management of data resources in our current data regime?

In a recent Guardian piece Stephanie Hare pointed out that “the GDPR could have … [made] privacy the default and requir[ed] us to opt in if we want to have our data collected. But this would hurt the ability of governments and companies to know about us and predict and manipulate our behaviour.” Instead, in the current regime, governments accept the central audit criteria of businesses, and on top of this, they establish the minimal protections necessary to ensure a steady flow of personal data to those same corporate actors. This means that the current data regime (at least in the West) privileges the idea that data resides with the individual, and also the idea that corporate success requires access to personal data.

Audits work to enforce the collection of personal data by private companies, by ensuring that companies are efficient, effective and risk averse in the collection of personal data. They also normalize corporate collection of personal data by providing a built in response to security threats and privacy concerns. When the model fails – when there is a security breach or privacy is disrespected – audits can be used to identify the glitch so that the system can continue its forward march.

And this means that audits can, indeed, serve as tools of datafication and domination. But I don’t think this necessarily needs to be the case. In the next post, I’ll explore what we’ve learned from experimenting with citizen data audits, before turning to the question of how they can contribute to the decolonization of big data in the final post.

 

About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.