Category: show on blog page

[blogpost] Teaching Students to Question Mr. Robot: Working to Prevent Algorithmic Bias in Educational Artificial Intelligence

Author: Erinne Paisley

Introduction

With the onset of the COVID-19 pandemic, classrooms around the world have moved online. Students from K-12, as well as University-level, are turning to their computers to stay connected to teachers and progress their education. This move online raises questions of the appropriateness of technologies in the classroom and how close to a utopian “Mr. Robot” we can, or should, get. One of the most contested technological uses in the classroom is the adoption of Artificial Intelligence (AI) to teach.

AI in Education

AI includes many practices that process information in a similar way to humans processing of information. Human intelligence is not one-dimensional and neither is AI, meaning AI includes many different techniques and addresses a multitude of tasks. Two of the main AI techniques that have been adapted into potential educational AI are: automation and adaptive learning. Automation means computers are pre-programmed to complete tasks without the input of a human. Adaptive learning indicates that these automated systems can adjust themselves based on use and become more personalized.

The potential of combining these AI techniques into some type of robot teacher, or “Mr. Robot” sounds like something out of a sci-fi cartoon but it is already a reality for some. A combination of these AI techniques have already been used to assessing students’ prior and ongoing learning levels, placing students in appropriate subject levels, scheduling classes, and individualizing instructions. In the Mississippi Department of Education, in the United States, a shortage of teachers has been addressed through the use of an AI-powered online learning program called “Edgenuity”. This program automates lesson plans following the format of: warm-up, instruction, summary, assignment, and quiz.

A screenshot from an Edgenuity lesson plan.

Despite how utopian an AI-powered classroom may sound, there are some significant issues of inequality and social injustice that are supported by these technologies. In September 2019, the United Nations Education, Scientific and Cultural Organization (UNESCO), along with a number of other organizations, hosted a conference titled: “Where Does Artificial Intelligence Fit in the Classroom?” that explored these issues. One of the main concerns raised was: algorithmic bias.

Algorithmic Bias in Education AI

The mainstream attitude towards AI is still one of faith – faith that these technologies are, as the name says, intelligent as well as objective. However, AI bias illustrates how these new technologies are very far from neutral. Joy Buolamwini explains how AI can be biased, explaining that the biases, conscious or subconscious, present in those who create the code is then a part of the digital systems themselves. This creates systems especially skewed against people of colour, women, and other minorities who are not statistically as included in the process of creating these codes, including AI codes. For instance, the latest AI application pool for Stanford University in the United States was 71% male.

Joy Buolamwini’s Tedx talk on algorithmic bias.

In the educational sector, people of colour, girls, and other minorities are already marginalized. Because of this, there is the concern that AI in the classroom that has encoded biases would further these inequalities. For instance, trapping low-income and minority students into low-achievement tracks. This would create a cycle of poverty, supported by this educational framework, instead of having human teachers address students on an individual level and offer specialized support and attention to those facing adversity.

However, the educational field already has its own biases embedded in it – both within individual teachers and throughout the system more generally. Viewed in this way, the increased use of AI in the classrooms creates the opportunity to create less bias if designed in a way that directly aims to address these issues. The work of designing AI that addresses and aims to create progressive technologies has been taken on by teachers, librarians, students or anyone in-between.

Learning to Fight Algorithmic Bias

By including more voices and perspectives in the process of creating the coding AI technologies, algorithmic bias can be prevented and, instead, a technological system that supports a socially just classroom can be supported. In this final section, I will highlight two pre-existing educational projects aimed at teaching students of all ages to identify and fight algorithmic bias while creating technology that creates a more equal classroom.

Algorithmic Bias Lesson Plans

The use of AI to create a more socially just educational system can start in the classroom, as Blakeley Payne showed when she ran a week-long ethics in AI course for 10 to 14-year-olds in 2019. The course included lessons on creating open-source coding, AI ethics, and ultimately taught students to both understand and fight against algorithmic bias. The lessons plans themselves are available for free online for any classroom to incorporate into their own lesson plans – even from home.

Students learn how to identify algorithm bias during the one-week course.

Blakeley Payne’s one-week program focuses on ages 10-14 to encourage students to become interested and passionate about issues of algorithm bias, and the STEM field more broadly, from a young age. Students work on simple activities such as writing an algorithm for the “best peanut butter and jelly sandwich” in order to practice questioning algorithmic bias. This activity in particular has them question what “best” means? Does it mean best looking? Best tasting? Who decides what this means and what are the implications of this?

Non-profits such as Girls Who Code are also actively working to design lesson plans and activities for young audiences that teach critical thinking and design when it comes to algorithms, including those creating AI. The organization runs after school clubs for girls in grades 3-12, as well as college programs for alumni of the program, as well as summer intensives. Their programs focus technically on developing coding skills but also have a large focus on diversifying the STEM fields and creating equitable coding.

Conclusion

The future of AI in the classroom is inevitable. This may not mean every teacher becomes robotic, but the use of AI and other technologies in the educational field is already happening. Although this raises concerns about algorithmic bias in the education system, it also creates more opportunities to re-think how technologies can be used to create a more socially just educational system. As we have seen through existing educational programs that teach algorithmic bias, even at the kindergarten age, interest in learning, questioning, and re-thinking algorithms can easily be nurtured. The answer to how we create more socially just educational system through AI is simple: just ask the students.

 

About the Author

Erinne Paisley is a current Research Media Masters student at the University of Amsterdam and completed her BA at the University of Toronto in Peace, Conflict and Justice & Book and Media Studies. She is the author of three books on social media activism for youth with Orca Book Publishing.

[BigDataSur] Data activism in action: The gigantic size and impact of China’s fishing fleet revealed using big data analytics and algorithms

Author: Miren Gutiérrez

As we grapple with the COVID-19 pandemic, another crisis looms over our future: overfishing. Fishing fleets and unsustainable practices have been emptying the oceans of fish and filling them with plastic. Although other countries are also responsible for overfishing, China has a greater responsibility. Why is looking at the Chinese fleet important? China has almost 17.00 vessels capable of distant water fishing, as reveals for the first time an investigative report published by Overseas Development Insitute, Londres.

 

As part of a team of researchers at the Overseas Development Institute, London, I had access to the world’s largest database of fishing vessels. Combining these data with satellite data from the automatic identification system –which indicates their movements—, we were able to observe their behaviour for two years (2017 and 2018). To do this, we employed big data analytical techniques, machine learning algorithms, and geographic information systems to describe the fleet and analyze how it behaved.

And the first thing we noticed is that China’s fishing fleet is five to eight times larger than any previous estimation. We identified a total of 16,966 Chinese fishing vessels able to fish in “distant waters”, that is, outside its exclusive economic zone, including some 12,500 vessels observed outside Chinese waters during the same period.

Why is this important? If China’s DWF fleet is 5-8 times larger than previous estimates, its impacts are inevitably more significant than previously estimated. This is important for two reasons. First, because millions of people in coastal areas of developing countries depend on fishery resources for their subsistence and food security. Second, due to this extraordinary increase, it is difficult to monitor and control fishing activities in distant waters of China.

The other thing that we observe is the most frequent type of fishing vessel is the trawler. Most of these Chinese trawlers can practice bottom trawling, which is the most damaging fishing technique available. We identified some 1,800 Chinese trawlers, which are more than double what was previously thought.

Furthermore, only 148 Chinese ships were registered in countries commonly regarded as flags of convenience. This shows that the incentives to adopt flags of convenience are few given the relatively lax regulation of the Chinese authorities.

Finally, of the nearly 1,000 registered vessels outside of China, more than half have African flags, especially in west Africa, where law enforcement is limited and fishing rights are often limited to registered vessels in the country, which explains why these Chinese ships have adopted local flags.

What can be said about the ownership of these fishing vessels? It is very complex. We analyzed a subsample of approximately 6,100 vessels to discover that only eight companies owned or operated more than 50 vessels each. That is, there are very few large Chinese fishing companies since small or medium-sized companies own most of them. However, this is only a facade, as many of these companies appear to be subsidiaries of larger corporations, suggesting some form of more centralized control. The lack of transparency hampers monitoring efforts and attempts to hold those responsible for malpractice accountable.

But another exciting facet of the ownership structure is that half of the 183 vessels suspected of involvement in illegal, unreported or unregulated fishing are owned by a handful of companies, and also that several of them are parastatal. This means that focusing on them could solve many problems because these companies own other ships.

There has been an extraordinary boom in Chinese fishing activities that is difficult to control. Chinese companies are free to operate and negotiate their access to fisheries in coastal states of developing countries without being monitored, especially in West Africa. This laxity contrasts with the policy of the European Union to reduce its fishing fleet and exercise greater control over its global operations.

This report is a data activist project that aims at redressing the unfair situation for nations, especially in west Africa, that cannot monitor and police their waters.

 

This is a version of an op-ed published in Spanish by eldiario.es.

 

About the author: Miren Gutiérrez is passionate about human rights, journalism and the environment (with a weakness for fish), and optimistic about what can be done with data-based research, knowledge and communication. Prof. at the University of Deusto and Research Associate at the Overseas Development Institute. Miren is Research Associate @DATACTIVE.

 

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [3/3]

 

Author: Katherine Reilly, Simon Fraser University, School of Communication

Data Stewardship through Citizen Centered Data Audits

In my previous two posts (the first & the second), I talked about the nature of data audits, and how they might be applied by citizens. Audits, I explained, check whether people are carrying out practices according to established standards or criteria with the goal of ensuring effective use of resources. As citizens we have many tools available at our disposal to audit companies, but when we audit companies according to their criteria, then we risk losing sight of our own needs in the community. The question addressed by this post is how to do data audits from a citizen point of view.

Thinking about data as a resource is a first step in changing our perspective on data audits. Our current data regime is an extractive data regime. As I explained in my first post, in the current regime, governments accept the central audit criteria of businesses, and on top of this, they establish the minimal protections necessary to ensure a steady flow of personal data to those same corporate actors.

I would like to suggest that we rethink our data regime in terms of data stewardship. The term ‘stewardship’ is usually applied to the natural environment. A forest might be governed by a stewardship plan which lays out the rights and responsibilities of resource use. Stewardship implies a plan for the management of those resources, both so that they can be sustained, and also so that everyone can enjoy them.

If the raw material produced by the data forest is our personal information, then we are the trees, and we are being harvested. Our data stewardship regime is organized to support that process, and audits are the means to enforce it. The main beneficiaries of the current data stewardship regime are companies who harvest and process our data. Our own benefits – our right to walk through the forest and enjoy the birds, or our right to profit from the forest materially – are not contemplated in the current stewardship regime.

It is tempting to conclude that audits are to blame, but really, evaluation is an agnostic concept. What matters are the criteria – the standards to which we hold corporate actors. If we change the standards of the data regime, then we change the system. We can introduce principles of stewardship that reflect the needs of community members. To do this, we need to start from the audit criteria that represent the localized concerns of situated peoples.

To this end, I have started a new project in collaboration with 5 fellow data justice organizations in 5 countries in Latin America: HiperDerecho in Chile, Karisma in Colombia, TEDIC in Paraguay, HiperDerecho in Peru and ObservaTIC in Uruguay. We will also enjoy the technical support of Sula Batsu in Costa Rica.

Our focus will be on identifying alternative starting points for data audits. We won’t start from the law, or the technology, or corporate policy. Instead, we will start from people’s lived experiences, and use these as a basis to establish criteria for auditing corporate use of personal data.

We will work with small groups who share a common identity and/or experience, and who are directly affected by corporate use of their personal data. For example, people with chronic health issues have a stake in how personal data, loyalty programs and platform delivery services mediate their relationship with pharmacies and pharmaceutical companies. The project will identify community collaborators who are interested in working with us to establish alternative criteria for evaluating those companies.

Our emerging methodology will use a funnel-like approach, starting from broad discussions about the nature of data, passing through explorations of personal practices and the role of data in them, and then landing on more specific and detailed explorations of specific moments or processes in which people share their personal data.

Once the group has learned something about the reality of data in their daily lives – and in particular the instances where data is of particular concern from them – we will facilitate group activities that help them identify their data needs, as well as the behaviors that would satisfy those needs. An example of a data need might be “I need to feel valued as a person and as woman when I interact with the pharmacy.” A statement of how that need might be satisfied could be, for example, “I would feel more valued as a person and as a woman if the company changed its data collection categories.”

We are particularly interested to think through the application of community criteria to companies who have grown in power and influence during the Covid-19 pandemic. Companies like InstaCart, SkipTheDishes, Rapi, Zoom, and Amazon are uniquely empowered to control urban distribution chains that affect the welfare of millions. What do community members require from these companies in terms of their data practices, and how would they fare against an audit based on those criteria?

We find inspiration for alternative audit criteria in data advocacy projects that have been covered by DATACTIVE’s Big Data from the South Blog. For example, the First Nations Information Governance Centre (FNIGC) of Canada has established the principles of ownership, control, access and permission for the management of First Nations data, and New Zealand has adopted Maori knowledge protocols for information systems used in primary health care provision (as reported by Anna Carlson). Meanwhile, the Mexican organization Controla tu Gobierno argues that we need to view data “less as a commodity – which is the narrative that constantly tries to make us understand data as the new oil – and more as a source of meaning” (Guillen Torres and Mayli Sepulveda, 2017).

From examples like these, and given the concept of data stewardship, we can begin to see that data is only as valuable as the criteria used to assess it, and so we urgently need alternative criteria that reflect the desires, needs and rights of communities.

How would corporate actors fare in an audit based on these alternative criteria? How would such a process reposition the value of data within the community? Who should carry out these evaluative processes, and how can they work together to create a more equitable data stewardship regime that better serves the needs of communities?

By answering these questions, we can move past creating data literate subjects for the existing data stewardship regime. Instead, we can open space for discussion about how we actually want our data resources to be used. In a recent Guardian piece, Hare argued that “The GDPR protects data. To protect people, we need a bill of rights, one that protects our civil liberties in the age of AI.”2 The content of that bill of rights requires careful contemplation. Citizen data audits allow us to think creatively about how data stewardship regimes can serve the needs of communities, and from there we can build out the legal frameworks to protect those rights.

 

About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

WomenonWeb censored in Spain as reported by Magma

Author: Vasilis Ververis

The Magma project just published new research on censorship concerning womenonweb.org, a non-profit organization providing support to women and pregnant people. The article describes how the major ISPs in Spain are blocking womenonweb.org’s website. Spanish ISPs have been blocking this website by means of DNS manipulation, TCP reset, HTTP blocking with the use of a Deep Packet Inspection (DPI) infrastructure. Our data analysis is based on network measurements from OONI data. This is the first time that we observe Women on Web being blocked in Spain.

About Magma: Magma aims to build a scalable, reproducible, standard methodology on measuring, documenting and circumventing internet censorship, information controls, internet blackouts and surveillance in a way that will be streamlined and used in practice by researchers, front-line activists, field-workers, human rights defenders, organizations and journalists.

About the author: Vasilis Ververis is a research associate with DATACTIVE and a practitioner of the principles ~ undo / rebuild ~ the current centralization model of the internet. Their research deals with internet censorship and investigation of collateral damage via information controls and surveillance. Some recent affiliations: Humboldt-Universität zu Berlin, Germany; Universidade Estadual do Piaui, Brazil; University Institute of Lisbon, Portugal.

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [2/3]

 

A First Attempt at Citizen Data Audits

Author: Katherine Reilly, Simon Fraser University, School of Communication

In the first post in this series, I explained that audits are used to check whether people are carrying out practices according to established standards or criteria. They are meant to ensure effective use of resources. Corporations audit their internal processes to make sure that they comply with corporate policy, while governments audit corporations to make sure that they comply with the law.

There is no reason why citizens or watchdogs can’t carry out audits as well. In fact, data privacy laws include some interesting frameworks that can facilitate this type of work. In particular, the EU’s General Data Privacy Regulation (GDPR) gives you the right to know how corporations are using your personal data, and also the ability to access the personal data that companies hold about you. This right is reproduced in the privacy legislation of many countries around the world from Canada and Chile to Costa Rica and Peru, to name just a few.

With this in mind, several years ago the Citizen Lab at the University of Toronto set up a website called Access My Info which helps people access the personal data that companies hold about them. Access My Info was set up as an experiment, so the site only includes a fixed roster of Canadian telecommunications companies, fitness trackers, and dating apps. It walks users through the process of submitting a personal data request to one of these companies, and then tracks whether the companies respond. The goal of this project was to crowdsource insights from citizens that would help researchers learn what companies know about their clients, how companies manage personal data, and who companies share data with. The results of this work have been used to advocate for changes to digital privacy laws.

Using this model as a starting point, in 2019, my team at SFU, and a team from the Peruvian digital rights advocate HiperDerecho, set up a website called SonMisDatos (Son Mis Datos translates as “It’s My Data”.) Son Mis Datos riffed on the open source platform developed by Access My Info, but made several important modifications. In particular, HiperDerecho’s Director, Miguel Morachimo, made the site database-driven so that it was easier to update the roster of corporate actors or their contact details. Miguel also decided to focus on companies that have a more direct material impact on the daily lives of Peruvians – such as gas stations, grocery stores and pharmacies. These companies have loyalty programs that are involved in collecting personal data about users.

Then we took things one step further. We used SonMisDatos to organize citizen data audits of Peruvian companies. HiperDerecho mobilized a team of people who work on digital rights in Peru, and we brought them together at two workshops. At the first workshop, we taught participants about their rights under Peru’s personal data protection laws, introduced SonMisDatos, and asked everyone to use the site to ask companies for access to their personal data. Companies need time to fulfill those requests, so then we waited for two months. At our second workshop, participants reported back on the results of their data requests, and then I shared a series of techniques for auditing companies on the basis of the personal data people had been able to access.

Our audit techniques explored the quality of the data provided, corporate compliance with data laws, how responsive companies were to data requests, the quality of their informed consent process, and several other factors. My favorite audit technique reflected a special feature of the data protection laws of Peru. In that country, companies are required to register databases of personal information with a state entity. The registry, which is published online, includes lists of companies, the titles of their databases, as well as the categories of data collected by each database. (The government does not collect the contents of the databases, it only registers their existence.)

With this information, our auditors were able to verify whether the data they got back from corporate actors was complete and accurate. In one case, the registry told us that a pharmaceutical company was collecting data about whether clients had children. However, in response to an access request, the company only provided lists of purchases organized by date, skew number, quantity and price. Our auditors were really bothered by this discovery, because it suggested that the company was making inferences about clients without telling them. Participants wondered how the company was using these inferences, and whether it might affect pricing, customer experience, access to coupons, or the like.

In another case, one of our auditors subscribed to DirecTV. To complete this process, he needed to provide his cell phone number plus his national ID number. He later realized that he had accidentally typed in the wrong ID number, because he began receiving cell phone spam addressed to another person. This was exciting, because it allowed us to learn which companies were buying personal data from DirecTV. It also demonstrated that DirecTV was doing a poor job of managing their customer’s privacy and security! However, during the audit we also looked back at DirecTV’s terms of service. We discovered that they were completely up front about their intention to sell personal information to advertisers. Our auditors were sheepish about not reading the terms of the deal, but they also felt it was wrong that they had no option but to accept these terms if they wanted to access the service.

On the basis of this experience, we wrote a guidebook that explains how to use Son Mis Datos, and how to carry about an audit on the basis of the ‘access’ provisions in personal data laws. The guide helps users think through questions like: Is the data complete, precise, unmodified, timely, accessible, machine-readable, non-discriminatory, and free? Has this company respected your data rights? What does the company’s response to your data request suggest about its data use and data management practices?

We learned a tonne from realizing these audits! We know, for instance, that the more specific the request, the more data a company provides. If you ask a company for “all of the personal data you hold about me” you will get less data that if you ask for “all of my personal information, all of my IP data, all of my mousing behaviour data, all of my transaction data, etc.”

Our experiments with citizen data audits also allow us to make claims about how companies define the term “personal data.” Often companies define personal data very narrowly to mean registration information (name, address, phone number, identification number, etc.). This lies in extreme contrast to the academic definition of personal data, which is any information that can lead to the identification of an individual person. In the age of big data, that means pretty much any digital traces you produce while logged in. Observations like these allow us to open up larger discussions about corporate data use practices, which helps to build citizen data literacy.

However, we were disappointed to discover that our citizen data audits worked to validate a data regime that is organized around the expropriation of resources from our communities. In my first blog post I explained that the 5 criteria driving data audits are profitability, risk, consent, security and privacy.

Since our audit originated with the law, with technology, and with corporate practices, we ended up using the audit criteria established by businesses and governments to assess corporate data practices. And this meant that we were checking to see if they were using our personal and community resources according to policies and laws that drive an efficient expropriation of those very same resources!

The concept of privacy was particularly difficult to escape. The idea that personal data must be private has been ingrained into all of us, so much so that the notion of pooled data or community data falls outside the popular imagination.

As a result, we felt that our citizen data audits did other people’s data audit work for them. We became watchdogs in the service of government oversight offices. We became the backers of corporate efficiencies. I’ve got nothing personal against watchdogs — they do important work — but what if the laws and policies aren’t worth protecting?

We have struggled greatly with the question of how to generate a conversation that moves beyond established parameters, and that situates our work in the community. With this in mind, we’ve begun to explore alternative approaches to thinking about and carrying out citizen data audits. That’s the subject of the final post in this series.

 

About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [1/3]

Author: Katherine Reilly, Simon Fraser University, School of Communication

A curious thing happened in Europe after the creation of the GDPR. A whole new wave of data audit companies came into existence to service companies that use personal data. This is because, under the GDPR, private companies must audit their personal data management practices. An entire industry emerged around this requirement. If you enter “GDPR data audit” into Google, you’ll discover article after article covering topics like “the 7 habits of highly effective data managers” and “a checklist for personal data audits.”

Corporate data audits are central to the personal data protection frameworks that have emerged in the past few years. But among citizen groups, and in the community, data audits are very little discussed. The word “audit” is just not very sexy. It brings to mind green eyeshades, piles of ledgers, and a judge-y disposition. Also, audits seem like they might be a tool of datafication and domination. If data colonization “encloses the very substance of life” (Halkort), then wouldn’t data auditing play into these processes?

In these three blog posts, I suggest that this is not necessarily the case. In fact, we precisely need to develop the field of citizen data audits, because they offer us an indispensable tool for the decolonization of big data. The posts look at how audits contribute to upholding our current data regimes, an early attempt to realize a citizen data audit in Peru, and emerging alternative approaches. The series of the following blogposts will be published the coming weeks:

  1. The Current Reality of Personal Data Audits [find below]

  2. A First Attempt at Citizen Data Audits [link]

  3. Data Stewardship through Citizen Centered Data Audits [link]

 

The Current Reality of Personal Data Audits

Before we can talk about citizen data audits, it is helpful to first introduce the idea of auditing in general, and then unpack the current reality of personal data audits. In this post, I’ll explain what audits are, the dominant approach to data audits in the world right now, and finally, the role that audits play in normalizing the current corporate-focused data regime.

The aim of any audit is to check whether people are carrying out practices according to established standards or criteria that ensure proper, efficient and effective management of resources.

By their nature, audits are twice removed from reality. In one sense, this is because auditors look for evidence of tasks rather than engaging directly in them. An auditor shows up after data has been collected, processed, stored or applied, and they study the processes used, as well as their impacts. They ask questions like “How were these tasks completed, and, were they done properly?”

Auditors are removed from reality in a second sense, because they use standards established by other people. An auditor might ask “Were these tasks done according to corporate policy, professional standards, or the law?” Auditors might gain insights into how policies, standards or laws might be changed, but their main job is to report on compliance with standards set by others.

Because auditors are removed from the reality of data work, and because they focus on compliance, their work can come across as distant, prescribed – and therefore somewhat boring. But when you step back and look at the bigger picture, audits raise many important questions. Who do auditors report to and why? Who sets the standards by which personal data audits are carried out? What processes does a personal data audit enforce? How might audits normalize corporate use of personal data?

We can start to answer these questions by digging into the criteria that currently drive corporate audits of personal data. These can be divided into two main aspects: corporate policy and government regulation.

On the corporate side, audits are driven by two main criteria: risk management and profitability. From a corporate point of view, personal data audits are no exception. Companies want to make sure that personal data doesn’t expose them to liabilities, and that use of this resource is contributing effectively and efficiently to the corporate bottom line.

That means that when they audit their use of personal data, they will check to see whether the costs of warehousing and managing data is worth the reward in terms of efficiencies or returns. They will also check to see whether the use of personal data exposes them to risk, given existing legal requirements, social norms or professional practices. For example, poor data management may expose a company to the risk of being sued, or the risk of alienating their clientele. Companies want to ensure that their internal practices limit exposure to risks that may damage their brand, harm their reputation, incur costs, or undermine productivity.

In total, corporate data audits are driven by, and respond to, corporate policies, and those policies are organized around ensuring the viability and success of the corporation.

Of course, the success of a corporation does not always align with the well-being of the community. We see this clearly in the world of personal data. Corporate hunger for personal data resources has often come at the expense of personal or community rights.

Because of this, governments insist that companies enforce three additional regulatory data audit criteria: informed consent, personal data security, and personal data privacy.

We can see these criteria reflected clearly in the EU’s General Data Privacy Regulation. Under the GDPR, companies must ask customers for permission to access their data, and when they do so, they must provide clear information about how they intend to use that data.

They must also account for the personal data they hold, how it was gathered, from whom, to what end, where it is held, and who accesses it for what business processes. The purpose of these rules is to ensure companies develop clear internal data management policies and practices, and this, in turn, is meant to ensure companies are thinking carefully about how to protect personal privacy and data security. The GDPR requires companies to audit their data management practices on the basis of these criteria.

Taking corporate policy and government regulation together, personal data audits are currently informed by 5 criteria – profitability, risk, consent, security and privacy. What does this tell us about the management of data resources in our current data regime?

In a recent Guardian piece Stephanie Hare pointed out that “the GDPR could have … [made] privacy the default and requir[ed] us to opt in if we want to have our data collected. But this would hurt the ability of governments and companies to know about us and predict and manipulate our behaviour.” Instead, in the current regime, governments accept the central audit criteria of businesses, and on top of this, they establish the minimal protections necessary to ensure a steady flow of personal data to those same corporate actors. This means that the current data regime (at least in the West) privileges the idea that data resides with the individual, and also the idea that corporate success requires access to personal data.

Audits work to enforce the collection of personal data by private companies, by ensuring that companies are efficient, effective and risk averse in the collection of personal data. They also normalize corporate collection of personal data by providing a built in response to security threats and privacy concerns. When the model fails – when there is a security breach or privacy is disrespected – audits can be used to identify the glitch so that the system can continue its forward march.

And this means that audits can, indeed, serve as tools of datafication and domination. But I don’t think this necessarily needs to be the case. In the next post, I’ll explore what we’ve learned from experimenting with citizen data audits, before turning to the question of how they can contribute to the decolonization of big data in the final post.

 

About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

[BigDataSur] Data journalism without data: challenges from a Brazilian perspective

Author: Peter Füssy

For the last decade, data journalism has attracted attention from scholars, some of whom have provided distinct definitions in order to understand the changes in journalistic practices. Each one of them emphasizes a particular aspect of data journalism; from new forms of collaboration to open-source culture (Coddington, 2014). Yet, even among clashing definitions, it is possible to say they all agree that there is no data journalism without data. But which data? Relevant data does not generate by itself and it is usually related to power, economic, and/or political struggles (De Maeyer et. al, 2014). While journalists in the Global North mostly benefit from open government mechanisms for public scrutiny, journalists working in countries with less transparency and democratic tradition still face infrastructural issues when putting together data and journalism (Borges-Rey, 2019; Wright, Zamith & Bebawi, 2019).

For the next paragraphs, I draw from academic research, reports, projects, and my own experience to briefly problematize one of the most recurring challenges to data journalism in Brazil: access to information. Since relevant data is rarely available immediately, a considerable part of data-driven investigative projects in Brazil relies on Freedom of Information (FOI) law that forces governments to provide data of public interest. Also known as Access to Information or Right to Information, these acts are an essential tool to increase transparency, accountability, citizens agency, and trust. Yet, implementation and compliance of the regulation in Brazil are inefficient in all levels of government bodies (Michener, 2018; Abraji, 2019; Fonseca, 2020; Venturini, 2017).

More than just a bureaucratic issue inherited from years of dictatorship and lack of competences, this inefficiency is also a political act. As Torres argued, taking Mexico as an example, institutional resistance to transparency is carried out through subtle and non-political actions that diminish data activists agency and have the effect of producing or reinforcing inequalities (Torres, 2020). In the case of Brazil, however, recent reports imply that institutional resistance to transparency is not necessarily subtle. It may also be a political flag.

Opacity and Freedom of Information

According to Berliner, the first FOI act was passed in Sweden in 1766, but the recent wave follows the example of the United States’ act from 1966. After the US, there is no clear pattern for adoption; for example, Colombia passed a law in 1985, while the United Kingdom did so only in 2000. FOI acts are more likely to pass when there is a highly competitive domestic political environment, rather than pressure from civil society or international institutions (Berliner, 2014).

Sanctioned in 2011, the Brazilian FOI came to effect only in 2012. In the first six years, 611.3 thousand requests were filled just in the federal government (excluding state and municipal bodies). The average of 279 requests per day or 11 per hour suggests how eager the population was to decentralise information. Although public authorities often give insufficient responses and say that the request was granted, it is possible to say the law was about to “stick”. From the total requests, 458.4 thousand (75%) resulted in partial or full access to the requested information (Valente, 2018).

At the beginning of 2019, while president Jair Bolsonaro was at his first international appearance as the Brazilian head of state in Davos, vice president general Hamilton Mourão signed a decree to limit access to information by allowing government employees to declare confidentiality of public data up to the top-secret level, which makes documents unavailable for 25 years (Folha de S.Paulo, 2019). Until then, this could be done only by the president and vice president, ministers of state, commanders of the armed forces and heads of diplomatic missions abroad. Facing a backlash from civil society, Bolsonaro lost support in Congress to pass that bill and withdraw the resolution a few weeks later. Nonetheless, reports show that the issues regarding FOI requests are growing under his presidency.

Data collected from the Brazilian FOI electronic system by Agência Pública revealed that Federal Government’s denials of requests with the justification of “fishing expedition” increased from 8 in 2018 to 45 in the first year of Bolsonaro’s presidency (Fonseca, 2020). The term “fishing expedition” is pejorative and usually related to secret or non-stated purposes, like using an unrelated investigation or questioning to find evidence to be used against an adversary in a different context. However, according to the Brazilian FOI, the reason behind a request must not be taken into account when deciding to provide information or not.

At the same time, journalists’ perception of difficulties to retrieve information via FOI reached the highest numbers in 2019, when 89% of the interviewed journalists described issues like answers after the legal deadline, missing information, data in closed format, and denial of information (Abraji, 2019). In 2013, 60% reported difficulties, and the number dropped to 57% in 2015.

For example, after more than one year in the office, Bolsonaro’s presidency still refuses to make public the guest list of his inauguration reception. In addition to the guest list, the government keeps in secrecy more than R$ 15 million in expenses made with corporate cards from the Presidency and Vice President’s Office. The confidentiality remains even after a decision by the Supreme Court that overturned the confidentiality in November last year.

More from less

Despite the challenges, Brazilian journalists are following the quantitative turn in the field and creating innovative data-driven projects. As reported by the Brazilian Association of Investigative Journalism (Abraji), at least 1.289 news stories built on data from FOI requests were published from 2012 to 2019. In 2017, the “Ctrl+X” project, which scraped thousands of lawsuits to expose politicians trying to silence journalists in courts, won a prize in the Global Editors’ Data Journalism Awards.

In the following year, G1 won the public choice award with a project that tracked every single murder in the country for a week. The results from the “Violence Monitor” showed a total of 1,195 deaths, one in every eight minutes. However, this project did not rely on FOI requests but on an unprecedented collaboration of 230 journalists employed by the biggest media group in Brazil, Globo. They gathered the data from scratch at police stations all over the country to tell the stories of the victims. Besides that, G1 partnered with Universidade de São Paulo for analysis and launched a campaign on TV and social media so that people could identify some of the victims.

Regardless of the lack of resources, freedom, and safety, these projects show that data journalism can be a tool to rebuild trust from audiences. However, activism to break the resistance to transparency is a challenge even more prominent when opacity seems to be encouraged by institutional actors.

 

About the author

Peter is a journalist trying to explore new media in depth, from everyday digital practices to the undesired consequences of a highly connected environment. After more than 10 years of writing and multimedia reporting for some of the most relevant news outlets in Brazil, he is now second years Research Master’s student in Media Studies at the University of Amsterdam.

 

References

Berliner, Daniel. “The political origins of transparency.” The journal of Politics 76.2 (2014): 479-491.

Borges-Rey, Eddy. “Data Journalism in Latin America: Community, Development and Contestation.” Data Journalism in the Global South. Palgrave Macmillan, Cham, 2019. 257-283.

Coddington, Mark. “Clarifying journalism’s quantitative turn: A typology for evaluating data journalism, computational journalism, and computer-assisted reporting.” Digital journalism 3.3 (2015): 331-348.

De Maeyer, Juliette, et al. “Waiting for data journalism: A qualitative assessment of the anecdotal take-up of data journalism in French-speaking Belgium.” Digital journalism 3.3 (2015): 432-446.

Fonseca, Bruno. Governo Bolsonaro acusa cidadãos de “pescarem” dados ao negar pedidos de informação pública. Agência Pública. 6 Feb, 2020. 

Michener, Gregory, Evelyn Contreras, and Irene Niskier. “From opacity to transparency? Evaluating access to information in Brazil five years later.” Revista de Administração Pública 52.4 (2018): 610-629.

Michener, Gregory, et al. “Googling the requester: Identity‐questing and discrimination in public service provision.” Governance (2019).

Valente, Jonas. “LAI: governo federal recebeu mais de 600 mil pedidos de informação”. Agência Brasil. May 16, 2018. 

Venturini, Lilian. “Se transparência é regra, por que é preciso mandar divulgar salários de juízes?”. Nexo Jornal. São Paulo, 3 Sept. 2017.

Wright, Kate, Rodrigo Zamith, and Saba Bebawi. “Data Journalism beyond Majority World Countries: Challenges and Opportunities.” Digital Journalism 7.9 (2019): 1295-1302.

[blog] I quattro nemici (quasi) invisibili nella prima pandemia dell’era della società dei dati

by

originally published on Il Manifesto, 24 April 2020

Big Data e Covid. La pandemia sta facendo emergere fenomeni e caratteristiche della società dei dati che, in circostanze di emergenza come quelle di queste settimane, rischiano di concretizzare quelli che—fino a poco fa—potevano essere considerati solo scenari estremi, conseguenze inaspettate, o effetti collaterali

La pandemia globale COVID-19 è la prima a manifestarsi su un piano così esteso e in modalità così tanto gravi in una fase avanzata della cosiddetta “società dei dati”. Ci troviamo, di fatto, in un momento spartiacque per il nostro stesso definire cosa significhi vivere un’epoca di pressoché totale trasformazione in dati delle attività umane in qualsiasi ambito.

Una situazione estrema come quella che stiamo vivendo in queste settimane di lockdown quasi totale inevitabilmente mostra tutte le sfumature di questo fenomeno, dalle più virtuose alle più potenzialmente inquietanti.

Non esiste nella storia recente un avvenimento di portata simile che possa competere con l’attuale stato di pandemia globale da un punto di vista di definizione del contemporaneo. Bisogna tornare indietro di due decenni, fino all’11 settembre 2001, per ritrovare un altro momento paragonabile di onnicomprensivo stress-test degli assunti culturali e dei fondamenti della nostra società nel complesso. Il 2001 e il 2020, però, hanno pochi punti di contatto per quanto riguarda gli ecosistemi tecnologici, le infrastrutture digitali e, di conseguenza, gli impatti sociali e politici di questi assetti tecnologici.

La società dei dati mette al centro la produzione di dati e il loro uso per creare valore aggiunto, dalla gestione del traffico al miglioramento dei servizi pubblici, dalla pubblicità personalizzata sul digitale fino alle app di contact tracing contro il COVID-19.

Il paradosso è che, anche in circostanze normali, siamo noi stessi a generare la maggior parte di questi dati, attraverso per esempio gli smartphone, le carte di credito, lo shopping online e i social media. La monetizzazione su larghissima scala dei dati relativi alle nostre preferenze e comportamenti ha generato il valore su cui si sono costruite società come Google e Amazon che hanno nell’analisi e nella predizione i loro maggiori punti di forza, o i loro monopoli.

Noi cittadini produciamo però dati anche ricorrendo al sistema sanitario pubblico o semplicemente passeggiando nelle nostre città, oramai popolate da una miriade di videocamere di sorveglianza o sistemi “intelligenti” di riconoscimento facciale. Molti di questi dati finiscono poi in mani private, anche quando apparentemente sembrano essere sotto il controllo di entità statali: spesso i server sono gestiti da imprese come Accenture, IBM o Microsoft.

Questa geografia variabile di dati, infrastrutture, e entità pubbliche e private è un cocktail potenzialmente esplosivo, soprattutto per la sua scarsa trasparenza verso gli utenti e i rischi per la privacy individuale e collettiva. La società dei dati è infatti anche la culla di quello che l’economista americana Shoshana Zuboff ha chiamato “capitalismo della sorveglianza”, il cui motore è la mercificazione delle informazioni personali anche a costo di ridurre la nostra capacità di agire in modo indipendente e compiere libere scelte. In altre parole, è il nostro stesso essere cittadini che cambia, e non necessariamente in meglio.

Il dramma a più livelli—dall’umano all’economico al sociale—scatenato dalla pandemia COVID-19 contribuisce a mostrare i lati più oscuri e controversi del sistema di mercificazione dei dati.

La pandemia sta infatti facendo emergere fenomeni e caratteristiche della società dei dati che, in circostanze di emergenza come quelle di queste settimane, rischiano di concretizzare quelli che—fino a poco fa—potevano essere considerati solo scenari estremi, conseguenze inaspettate, o effetti collaterali.

Per quelli che possono essere gli ambiti di interesse delle scienze sociali, ci sono almeno quattro diverse aree in cui la pandemia sta agendo da acceleratore di dinamiche potenzialmente pericolose fin qui rimaste più in nuce.

Al di fuori di ogni determinismo—sia tecnologico che epidemiologico—sono fin qui emerse almeno quattro distinte tendenze. Queste sono il positivismo acritico, l’information disorder, il vigilantismo e la normalizzazione della sorveglianza: quattro nemici resi quasi invisibili dal dramma umano della pandemia, ma che feriscono la collettività quasi quanto il virus. E sono destinati ad avere conseguenze di lungo termine a dir poco pericolose. Vediamoli assieme.

Il positivismo acritico

Il primo nemico invisibile è associato ad un verbo di uso comune, “contare”, un’azione che in questi giorni ci viene giustamente presentata come un alleato. “Facciamo parlare i numeri”, si sente spesso dire. Chi non rimane col fiato sospeso in attesa delle tabelle della protezione civile che ci comunicano il numero di morti, di guariti, di ospedalizzati? Il contare, e ancora di più il contarsi, è per ogni società un momento di presa di coscienza importante: basti pensare ai censimenti che hanno un ruolo cruciale nella definizione dello stato nazione. Per di più contare ha che fare con l’essenza stessa delle pandemie: i grandi numeri.

Di norma, tendiamo a credere di più ai dati statistici che alle parole, poiché vi associamo una sorta di verità di ordine superiore. Si tratta di un fenomeno noto anche come “dataism”, dataismo, un’ideologia che ripone eccesiva fiducia nel potere soluzionista e predittivo dei dati.

La fede nei numeri ha radici lontane, da rintracciare nei giorni del positivismo del 19esimo secolo, che postulava la fiducia nella scienza e nel progresso scientifico-tecnologico. Nel suo “Discorso sullo spirito positivo” (1844), il filosofo Augusto Comte spiega come il positivismo riporti al centro “il reale, in opposizione al chimerico”, e come si prefigga di “opporre il preciso al vago”, configurandosi come “il contrario di negativo”, vale a dire identificando un atteggiamento propositivo di fiducia nel futuro.

Di sicuro, riportare i fatti concreti al centro della narrazione del virus e della ricerca di soluzioni non può che essere cosa buona e giusta dopo una stagione buia per la scienza, in cui sono stati messi in questione perfino i vaccini. Purtroppo, però, la fiducia nei numeri è spesso mal riposta poiché, come si è spesso detto in queste settimane, i dati ufficiali tendono a raccontare una porzione limitata e spesso fuorviante della realtà pandemica.

Ciononostante, i numeri e i dati sono al cuore della narrazione del virus. Si tratta, però, di una narrazione poco accurata, spesso decontestualizzata, e non per questo meno ansiogena. Il risultato è un positivismo acritico che tende ad ignorare il contesto e non spiega come si faccia di conto e perché. Decisioni che coinvolgono intere nazioni vengono prese e giustificate sulla base di numeri che non hanno, però, a disposizione dati necessariamente affidabili.

L’Intelligent Retail Lab di Walmart negli Usa – foto Ap

L’information disorder nella pandemia

Il contesto informazionale di una pandemia è stato equiparato ad un’”infodemia”, un’espressione utilizzata in primis dalla stessa Organizzazione mondiale della sanità per definire circostanze in cui vi è una sovrabbondanza di informazioni—accurate o meno—che rendono molto difficile orientarsi tra le notizie o anche solo distinguere le fonti affidabili da quelle che affidabili non sono. La pandemia è di conseguenza anche una situazione particolarmente rischiosa per quanto riguarda il diffondersi di varie tipologie di “information disorder”—letteralmente disturbi dell’informazione—come varie forme di disinformazione o misinformazione.

Nell’infodemia da COVID-19 la cattiva informazione si è manifestata in vari modi. Il Reuters Institute for the Study of Journalism (RISJ) dell’Università di Oxford ha pubblicato uno dei primi studi sulle caratteristiche del fenomeno in questa pandemia, concentrandosi su un campione di notizie in lingua inglese vagliate da iniziative di fact-checking come il network non profit First Draft. Lo studio, che è un primo tentativo esplorativo di analisi del problema, rivela come la varietà delle fonti di disinformazione sulla pandemia possano essere sia “top-down” (quando sono promosse dalla politica o da altre personalità pubbliche) o “bottom-up”, ossia quando partono dagli utenti comuni.

Se la prima tipologia rappresenta il 20% del totale del campione analizzato dal RISJ, è anche vero però che la disinformazione top-down tende a generare molto più buzz sui social media rispetto a quanto prodotto dal basso. Scrive il RISJ, inoltre, che la fetta più grande della misinformazione emersa in queste settimane sarebbe costituita da contenuti “riconfigurati”, modificati ovvero in alcune loro parti. Solo una minoranza (il 38% circa) sarebbe invece composta da contenuti completamente inventati ex novo.

Lo studioso Thomas Rid, uno dei massimi esperti mondiali di campagne di disinformazione nell’ambito della sicurezza nazionale (alla cui storia ha dedicato un libro molto atteso e di prossima pubblicazione, “Active Measures”), ha fatto notare inoltre sul New York Times come la situazione di pandemia possa costituire anche un terreno particolarmente fertile per potenziali operazioni di “information warfare” volte a creare confusione e tensioni nelle opinioni pubbliche dei paesi colpiti, sulla scia di quanto si è visto negli Usa durante le elezioni presidenziali del 2016. Non va nemmeno dimenticata la misinformazione che sfocia nel razzismo e alimenta pulsioni xenofobe, come la falsa notizia, circolata in vari ambienti, che vorrebbe gli africani immuni al virus.

Un data center Bitcoin in Virginia – foto Ap

Il vigilantismo (digitale e non)

Di questi tempi, molti runner si sono sentiti apostrofare malamente, e in alcuni casi sono stati anche aggrediti fisicamente, da altri cittadini indispettiti dal potenziale pericolo per la salute pubblica che può rappresentare un individuo in libera uscita.

Persone che si stavano recando al lavoro hanno raccontato di essere stati vittime di ingiurie di vario tipo per non essere “stati a casa”. Innumerevoli video sono stati caricati sui social media con lo scopo di denunciare chi sarebbe presumibilmente andato a spasso infischiandosene del lockdown. Questo fenomeno è conosciuto in criminologia e sociologia come “vigilantismo”, come ha spiegato il criminologo Les Johnston già nel 1996.

Il vigilantismo riguarda privati cittadini che volontariamente assumono ruoli che non competono loro, come quello di controllo del comportamento degli altri e la relativa denuncia pubblica delle malefatte altrui, vere o presunte. Con le sue azioni di difesa delle norme sociali, il vigilante cerca di offrire delle garanzie di sicurezza a sé stesso e agli altri.

L’avvento dei social media e dei dispositivi mobili ha favorito la diffusione su larga scala di un “vigilantismo digitale” che, come racconta il ricercatore dell’Università di Rotterdam Daniel Trottier, ha lo scopo di attaccare e svergognare l’autore del mancato rispetto delle regole attraverso un’esposizione al pubblico ludibrio, che è spesso duratura e irrispettosa della privacy altrui, e alimenta aggressività e sentimenti di rivalsa.

Se il fenomeno è tipico di momenti storici in cui l’ordine costituito è a rischio, o viene percepito come tale, la sua comparsa e diffusione nei giorni dell’emergenza Coronavirus appare quasi inevitabile. Il vigilantismo digitale da COVID-19 è però particolarmente rischioso, per almeno due ragioni. Anzitutto, questo bisogno di odiare “chi esce da casa”, crea esclusione e stigma sociale, additando ed esponendo individui sulla base di indizi puramente visivi che non possono discriminare tra chi effettivamente rompe le regole e chi invece ha una buona ragione per farlo (per esempio, si sta recando al lavoro).

Questa pericolosissima creazione di “nemici del popolo” sfocia in danni ingenti a livello psicologico—dal senso di solitudine all’incomprensione al desiderio di ritorsione—che molto probabilmente sopravvivranno all’emergenza Coronavirus.

Questo fenomeno finisce inoltre per giustificare simili comportanti trasgressivi, sulla base dell’errato ragionamento che “se lo fanno gli altri, lo posso fare pure io”. In secondo luogo, il vigilantismo digitale o meno divide la collettività, con gravi e duraturi effetti a livello di divisioni sociali tra presunti buoni e cattivi, tra meritevoli e non. Finisce per intaccare la narrazione tanto necessaria di una comunità unita forte proprio della sua unità, in grado di fronteggiare l’emergenza in maniera razionale, proprio nel momento in cui vi è un bisogno estremo di sapere che il sacrificio individuale alimenta lo sforzo collettivo.

Sistemi di riconoscimento facciale in Germania – foto Ap

Privacy e normalizzazione della sorveglianza

La pandemia ha anche riacceso il dibattito sul ruolo della privacy nella società dei dati e, in particolare, in un contesto di emergenza sanitaria come quello di queste settimane. Da più parti, seguendo l’esempio—o i presunti “modelli”—offerti da alcuni paesi asiatici variabilmente democratici o non democratici come Cina, Singapore e Corea del Sud, è stato chiesto di intraprendere soluzioni tecnologiche di sorveglianza e monitoraggio digitale per cercare di rallentare la diffusione del virus tramite il monitoraggio digitale dei cittadini, in varie forme.

Anche in Europa, diversi governi hanno iniziato a lavorare a possibili soluzioni tecniche e, nel complesso, il dibattito si è orientato verso lo sviluppo di applicazioni di “deconfinamento” che potessero sfruttare varie funzioni degli smartphone per fare “contact tracing”, ovvero monitorare i contatti sociali delle persone infettate o potenzialmente esposte a focolai di contagio.

Ad accumunare queste soluzioni, ad ogni modo, sono le complesse e pericolose ripercussioni in termini di diritti, privacy e sicurezza, temi che è fin troppo semplice perdere di vista se si guarda alla tecnologia con lenti eccessivamente deterministe, soluzioniste o dal punto di vista del “positivismo acritico” di cui sopra.

Difficile riassumere il polifonico dibattito italiano sulla questione dell’app “Immuni” scelta dal governo a questo scopo, ma numerosi elementi indicano come si sia cercato da subito di far passare la privacy come un ostacolo per l’attuazione di misure fondamentali.

Questo si è visto all’estremo in Francia, il primo paese a ufficialmente chiedere a Google e Amazon di allentare le misure di protezione della privacy per facilitare l’adozione di app di tracciamento dei contatti. In Italia, “Immuni” andrà nella direzione di un approccio basato su Bluetooth e decentralizzazione, certamente meno invasivo di altre opzioni che sono state sul tavolo delle varie task force governative, ma alcune indicazioni interessanti emergono dal dibattito che ha accompagnato queste decisioni.

Per quanto questa soluzione sembrerebbe sulla carta meno invasiva di altre, anche in questo caso rimangono aperte diverse questioni di opportunità. Claudio “Nex” Guarnieri, uno dei massimi esperti mondiali di sicurezza informatica, ha commentato le varie soluzioni tecniche avanzate, ricordando come anche il Bluetooth non offra comunque garanzie in termini di efficacia.

Le scienze sociali e vari studi sul giornalismo e sulla sorveglianzamostrano come la “normalizzazione” della sorveglianza sia un fenomeno frequente nei dibattiti pubblici sul tema. Qualcosa di simile si è avvertito anche nel dibattito italiano ed europeo nel mezzo della pandemia: i timori degli esperti (sia tecnici che legali) sono stati speso bollati sbrigativamente come problemi di secondo piano, mentre si è fatta passare la falsa dicotomia tra privacy e difesa della salute pubblica, come se la prima invariabilmente ostacolasse la seconda. In realtà, come ha scritto anche lo scrittore e autore dell’acclamato Homo Deus (2017) Yuval Noah Harari sul Financial Times, porre i due temi come in antitesi è scorretto, in quanto non si dovrebbe chiedere ai cittadini di scegliere tra due diritti fondamentali che tra di loro non si auto-escludono di certo.

La domanda da porsi è a quanti e quali diritti siamo e saremo disposti a rinunciare—anche solo in parte—e per quali obiettivi? Una visione troppo deterministica delle potenzialità di queste soluzioni tecniche potrebbe anche portare a sopravvalutare la loro effettiva capacità di essere d’aiuto in questo scenario.

Troppo spesso, inoltre, si è banalizzato il discorso attorno alla privacytentando, in modo disonesto, di mettere sullo stesso piano le abitudini online degli utenti—spesso frivole—con un programma di monitoraggio statale della salute pubblica. La privacy non è morta, come si è invece letto da più parti, e per quanto in parte erosa dal più che problematico sfruttamento commerciale in essere sul web, non si può ridurre questo dibattito al novero delle scelte individuali, azzerandolo con un click.

L’altra questione a restare aperta è infine quella del ritorno alla normalità: a emergenza finita, come assicurarsi che le tecnologie di tracciamento e infrastrutture di controllo pensate per tempi di crisi vengano effettivamente disattivate (e i loro dati cancellati)? Su questo punto si è espressa in modo chiaro anche la Commissione Europea, che si è pronunciata con alcune raccomandazioni e una toolbox, auspicando per gli stati membri un approccio pan-europeo nella difesa della privacy e della protezione dei dati, oltre a standard tecnici condivisi e quanto più decentralizzati.

Il grande assente nello scenario italiano rimane però il dibattito parlamentare sulla questione—come invece sta avvenendo per esempio in Olanda proprio mentre scriviamo—che pur sarebbe doveroso per assicurare il controllo democratico, la accountability e il rispetto di norme e valori democratici di base in scelte tanto delicate.

Gli anticorpi

Ma come si combattono questi quattro nemici insidiosi? Purtroppo la soluzione non è né semplice né immediata. E non esiste (né mai esisterà) un vaccino capace di magicamente immunizzare la collettività contro il positivismo acritico, l’information disorder, il vigilantismo digitale e la normalizzazione della sorveglianza. Possiamo però lavorare sugli anticorpi e fare in modo che si diffondano il più possibile nelle nostre comunità. La società dei dati ha bisogno di utenti critici e consapevoli, che sappiano usare e contestualizzare gli strumenti sia digitali che statistici, che sappiano comprendere i rischi che invariabilmente vi sono associati ma anche cavalcarne i potenziali benefici. E che possano aiutare le fasce meno digitalizzate della popolazione a navigare la propria presenza digitale.

In questo processo assume un ruolo centrale la cosiddetta “data literacy”, ovvero l’alfabetizzazione informatica estesa alla società dei dati. Tale alfabetizzazione deve prendere in considerazione la questione della cittadinanza nell’era dei big data e dell’intelligenza artificiale e deve metterci in grado di compiere scelte consapevoli per quanto riguarda i contorni della nostra azione sul web, comprese le complesse considerazioni in materia di protezione dei dati personali.

Deve aiutarci a distinguere tra le fonti di informazione e a districarci tra gli algoritmi di personalizzazione dei contenuti che inficiano la nostra libera azione sul web. La sfida è aperta ma anche particolarmente urgente se è vero che l’Italia è il fanalino di coda tra i 34 paesi OEDC (Organization for Economic Co-Operation and Development) per quanto riguarda l’alfabetizzazione digitale. Una ricerca recente (2019) proprio dell’ OEDC ha rivelato come solo il 36% degli italiani sia in grado di fare “un uso complesso e diversificato di internet” —il che crea un terreno fertile per i quattro nemici che abbiamo identificato.

Il mondo dell’educazione ha certamente un ruolo chiave da giocare, affiancando una rinnovata educazione alla cittadinanza sul web alla bistrattata educazione civica. Per questo serve una seria formazione del corpo docente, ma servono anche dei fondi dedicati per strumenti, infrastrutture e preparazione. Si tratta però di un progetto di medio e lungo termine, che difficilmente si potrà attuare durante la pandemia. La questione da non perdere di vista è che il mondo “post-Coronavirus” è in costruzione proprio ora, nel vortice della pandemia.

Le scelte intraprese oggi avranno un inevitabile impatto sugli scenari futuri della società dei dati. Più che mai, a dettare queste scelte deve essere un approccio inclusivo, trasparente e onesto per non trovarsi in un futuro dove a dominare sono “scatole nere” tecnologiche, oscure, discriminanti e potenzialmente anti-democratiche.

Gli autori

Philip Di Salvo è ricercatore post-doc e docente presso l’Istituto di media e giornalismo dell’Università della Svizzera italiana (USI) di Lugano. Si occupa di leaks, giornalismo investigativo e sorveglianza di Internet. “Leaks. Whistleblowing e hacking nell’età senza segreti” (LUISS University Press) è il suo ultimo libro.

Stefania Milan è professoressa associata di New Media e Digital Culture presso l’Università di Amsterdam, dove insegna corsi di data journalism e attivismo digitale, e gestisce il progetto di ricerca DATACTIVE, finanziato dal Consiglio Europeo della Ricerca (Horizon2020, Grant Agreement no. 639379).

[blog] The true cost of human rights witnessing

Author: Alexandra Elliott – Header image: Troll Patrol India, Amnesty Decoders

Witnessing is widely accepted as an established element of enforcing justice, and recent increase in accessibility to big data revolutionizes this process. Data witnessing, now, can be conducted by remote actors using digital tools to code large amounts of information – a process exemplified for instance by Amnesty International’s Amnesty Decoders. Gray presents an account of the Amnesty Decoders initiative and provides examples of their cases, such as “Decode Darfur” (977) in which volunteers successfully identified the destruction of villages during war by comparing the before and after satellite imagery. A critical, yet under-discussed consequence of this type of work is the significant mental toll of engaging with this amount of confronting material. The nature of human rights exposés means witnesses are working with disturbing imagery often depicting violence and devastation, which can lead to secondary trauma and must be managed accordingly.

This blog-post should be read as an overview of completed research into the mental health effects of data witnessing and the initiatives that should be put in place to mitigate this. It concludes by highlighting Berkley’s Investigations Lab as an example of the efficient implementation of protective measures in human rights research. The text below presents, however, only the tip of the iceberg of detailed scholarship and I recommend turning to the Human Rights Resilience Project for a more thorough inventory.

The Human Rights Resilience Project is an “interdisciplinary research initiative […] working to document, awareness-raising, and the development of culturally-sensitive training programs to promote well-being and resilience among human rights workers” (“Human Rights Resilience Project – NYU School Of Law – CHRGJ”). Whilst not undertaking any human rights witnessing itself, it functions as a toolbox for those who do. It provides an excellent example of bringing the issue to the forefront of discourse, advocating for the psychological risks of engaging in human rights witnessing to receive the attention it’s severity demands so that both workers and institutions can prepare and manage accordingly.

Data Witnessing and Mental Health

We have reached a point in research in which the correlation between declining mental health and exposure to confronting material in data witnessing work is undeniable. There is a large collection of papers available which evidence the harmful impact on mental wellbeing within the human rights industry.

Dubberly, Griffin and Mert Bal’s research provides a clear overview of “the impact that viewing traumatic eyewitness media has upon the mental health of staff working for news, human rights and humanitarian organisations” (4). They introduce the notion of a “digital frontline” (5) as online data witnessing relocates the confrontation of graphic, disturbing material previously encountered exclusively in the physical field to an office desk far removed from the scene of the crime. 55% of the humanitarian workers and data witnesses observed in the research viewed shocking profanity at least weekly. Carried along with this shift is the psychological impact affiliated with engaging with disturbing content. The effects detected included that workers “developed a negative view of the world, feel isolated, experience flashbacks, nightmares and stress-related medical conditions” (5).

Over the past few years, a range of similar research was undertaken, of which I have presented merely a selection, all confirming a correlation between human rights witnessing and a negative headspace. In Knuckey, Satterthwaite, and Brown list human rights work practices that would contribute to fluctuating mental states, being; trauma exposure, a sensation of hopelessness, high standards and self-criticism, and inflexibility towards coping mechanisms. Similarly, Reiter and Koenig also discuss impacts of humanitarian research on workers’ mental health. Flores Morales et al. conducted a study of human rights defenders and journalists in Mexico of whom are consistently exposed to traumatic content in their work. They detect strong levels of secondary traumatic stress symptoms amongst 36.4% of participants. Finally in one of the earlier investigations into the concern, Joscelyne et al. surveyed international human rights workers to determine the consequences their work had on their psychological wellbeing. The results stated participant levels of 19.4% for PTSD and 18.8% for subthreshold PTSD. Depression was present amongst 14.75 of workers surveyed. Shockingly, these proportions are very similar to those observed amongst combat veterans reiterating the severity of the matter and emphasising the requirement for action.

A Call to Action

Several sectors of the literature on the relationship between data witnessing and mental health focus on what initiatives are currently adopted by organisations to identify, prevent and counteract occasions of trauma and depression amongst researchers or proposes new, potentially effective strategies.

Satterthwaite et al. is an example study that aims to map established techniques for recognizing and reacting to mental health concerns within human rights work. Ultimately it is concluded that the current action of organisations is weak and the suggestion is for targeted training programmes and further academic discourse. Observations of negligence seem to become a trend, with Dubberly et al. also reporting a lack of protective processes in place amongst the majority of organisations studied. In what is dubbed a “tough up or get out” culture (7), humanitarian efforts deny proper recognition of the effects of trauma upon their researchers and thus offer no support or compensation. Additionally, new employees are not notified of the degree of profanity of their daily work material and are consequentially inappropriately prepared.

Acknowledging this gap in current support structures, academics have sought to develop strategies detecting, preventing and reducing declining mental health amongst data witnesses. For instance, Reiter and Koenig’s “Challenges and Strategies for Researching Trauma” describe protective techniques that aim to strengthen resilience; eg. explicitly acknowledgement of the psychological consequences and subsequently fostering a supportive workplace community.

Academics too urge the need for tools for self-care. Distinct from the pampering sessions and beauty treatments commonly affiliated with the term, here self-care practices are put to use to strengthen mental health. Pyles (2018) promotes self-care within the work of data witnessing for its ability to “cultivate the conditions that might allow them to feel more connected to themselves, their clients, colleagues and communities” (xix). This sense of community and grounding within a greater environment is important to counteract any feelings of isolation. Kanter and Sherman also encourage human rights organisations to adopt a “culture of self-care” to mitigate the risk of mental burnout and Pigni’s book “The Idealist’s Survival Kit” was written to provide human rights researchers and witnesses with an artillery of 75 self-care techniques.

As mentioned by Satterthwaite et al., tt is important to acknowledge the lack of mitigating practices in place may well be because of a lack of funding rather than an act of negligence. Dependency on external fundraisers introduces a complex network in which responsibility is distributed amongst a range of actors with varying motivations.

Berkeley

Leading by Example

The tendency for human right organisations to neglect their workers’ mental wellbeing is fortunately not universal. There are instances of hiring counselors and enforcing regular breaks and rotations (Duuberly et al.), one standout initiative is that of the University of Berkley’s Human Rights Centre Investigations Lab.

Following a similar format to the Amnesty Decoders, workers at the Investigations Lab “use social media and other publicly available, internet-based sources to develop evidence for advocacy and legal accountability” (“HRC Investigations Lab | Human Rights Center”). What sets the Lab apart is its dedication to “resiliency resources” – a programme of training and tools aiming to support the witnesses’ wellbeing. Upon orientation to the lab, workers receive resiliency training in which they receive small practical tips to avoid secondary trauma; “use post-its to block out graphic material when viewing a video repeatedly” (“Resiliency Resources | Human Rights Center”), for example. Additionally they are encouraged to regularly check in with an allocated resiliency manager.

Concluding Thoughts

The material human rights witnesses engage with is horrific and the protection of their mental health must be prioritized by the institutions for which they work. However it is also important to remember the necessity of their work in detecting human rights violations and war crimes. The role of data witnessing is admirable and cannot simply be omitted. Therefore the way forward is for human rights institutions to guarantee a support network of education, tools and community so that witnesses can continue to strengthen humanitarian action without personal detrimental consequences.

About the author

Alexandra grew up in Sydney, Australia before moving to England to complete her Bachelors degree at Warwick University. She is currently undertaking a Research Masters in Media Studies at the University of Amsterdam. It is through this course that she became involved with the Good Data tutorial and DATACTIVE project.

References

Dubberley, Sam, Elizabeth Griffin, and Haluk Mert Bal. “Making secondary trauma a primary issue: A study of eyewitness media and vicarious trauma on the digital frontline.” Eyewitness Media Hub (2015).

Flores Morales, Rogelio et al. “Estrés Traumático Secundario (ETS) En Periodistas Mexicanos Y Defensores De Derechos Humanos”. Summa Psicológica, vol 13, no. 1, 2016, pp. 101-111. Summa Psicologica UST, doi:10.18774/448x.2016.13.290.

Gray, Jonathan. “Data Witnessing: Attending To Injustice With Data In Amnesty International’S Decoders Project”. Information, Communication & Society, vol 22, no. 7, 2019, pp. 971-991. Informa UK Limited, doi:10.1080/1369118x.2019.1573915.

“HRC Investigations Lab | Human Rights Center”. Humanrights.Berkeley.Edu, https://humanrights.berkeley.edu/students/hrc-investigations-lab.

“Human Rights Resilience Project – NYU School Of Law – CHRGJ”. Chrgj.Org, https://chrgj.org/focus-areas/human-rights-resilience-project/.

Joscelyne, Amy et al. “Mental Health Functioning In The Human Rights Field: Findings From An International Internet-Based Survey”. PLOS ONE, vol 10, no. 12, 2015, p. e0145188. Public Library Of Science (Plos), doi:10.1371/journal.pone.0145188.

Kanter, Beth, and Aliza Sherman. “Updating The Nonprofit Work Ethic”. Stanford Social Innovation Review, 2016, https://ssir.org/articles/entry/updating_the_nonprofit_work_ethic?utm_source=Enews&utm_medium=Email&utm_campaign=SSIR_Now&utm_content=Title

Knuckey, Sarah, Margaret Satterthwaite, and Adam Brown. “Trauma, depression, and burnout in the human rights field: Identifying barriers and pathways to resilient advocacy.” HRLR Online 2 (2018): 267.

Pigni, Alessandra. The Idealist’s Survival Kit: 75 Simple Ways to Avoid Burnout. Parallax Press, 2016.

Pyles, Loretta. Healing justice: Holistic self-care for change makers. Oxford University Press, 2018.

Reiter, Keramet, and Alexa Koenig. “Reiter And Koenig On Researching Trauma”. Www.Palgrave.Com, 2017, https://www.palgrave.com/gp/blogs/social-sciences/reiter-and-koenig-on-researching-trauma.

“Resiliency Resources | Human Rights Center”. Humanrights.Berkeley.Edu, https://humanrights.berkeley.edu/programs-projects/tech-human-rights-program/investigations-lab/resiliency-resources.

Satterthwaite, Margaret, et al. “From a Culture of Unwellness to Sustainable Advocacy: Organizational Responses to Mental Health Risks in the Human Rights Field.” S. Cal. Rev. L. & Soc. Just. 28 (2019): 443.

Image References

Berkeley. “Human Rights Investigations Lab: Where Facts Matter”. Human Rights Centre, https://humanrights.berkeley.edu/programs-projects/tech/investigations-lab.

Perpetual Media Group. “14 Things Marketers Should Never Do On Twitter”. Perpetual Media Group, https://www.perpetualmediagroup.ca/14-things-marketers-should-never-do-on-twitter/.

[BigDataSur] A widening data divide: COVID-19 and the Global South

COVID-19 shows the need for a global alliance of experts who can fast-track the capacity building of developing countries in the business of counting.

Stefania Milan & Emiliano Treré

The COVID-19 pandemic is sweeping the world. First identified in mainland China in December 2019, it has rapidly reached the four corners of the globe, to the point that the only “corona-free” land is reportedly Antarctica. News reports globally are filled with numbers and figures of various kinds. We count the number of tests, we follow the rise of the total individuals who tested positive to the virus, we mourn the dead looking at the daily death toll. These numbers are deeply ingrained in their socio-economic and political geography, as the virus follows distinct diffusion curves, but also because distinct countries and institutions count differently (and often these distinct ways of counting are not even made apparent). What is clear is that what gets counted exists, in both state policies and people’s imaginaries. Numbers affect our ability to care, share empathy, and donate to relief efforts and emergency services. Numbers are the condition of existence of the problem, and of a country or given social reality on the global map of concerns. Yet most countries from the so-called Global South are virtually absent from this number-based narration of the pandemic. Why, and with what consequences?

Data availability and statistical capacity in developing countries

If numbers are the conditions of existence of the COVID-19 problem, we ought to pay attention to the actual (in)ability of many countries in the South to test their population for the virus, and to produce reliable population statistics more in general–let alone to adequately care for them. It is a matter of a “data gap” as well as of data quality, which even in “normal” times hinders the need for “evidence-based policy making, tracking progress and development, and increasing government accountability” (Chen et al., 2013). And while the World Health Organization issues warning about the “dramatic situation” concerning the spread of COVID-19 in the African continent, to name just one of the blind spots of our datasets of the global pandemic, the World Economic Forum calls for “flattening the curve” in developing countries. Progress has been made following the revision of the United Nations’ Millennium Development Goals in 2005, with countries in the Global South have been invited (and supported) to devise National Strategies for the Development of Statistics. Yet, a cursory look at the NYU GovLab’s valuable repository of data collaboratives” addressing the COVID-19 pandemic reveals the virtual absence of data collection and monitoring projects in the South of the emisphere. The next obvious step is the dangerous equation “no data=no problem”. 

Disease and “whiteness”

Epidemiology and pharmacogenetics (i.e. the study of the genetic basis of how people respond to pharmaceuticals), to name but a few amongst the number of concerned life sciences, are largely based on the “inclusion of white/Caucasians in studies and the exclusion of other ethnic groups” (Tutton, 2007). In other words, modeling of disease evolution and the related solutions are based on datasets that take into account primarily–and in fact almost exclusively–the caucasian population. This is a known problem in the field, which derives from the “assumption that a Black person could be thought of as being White”, dismissing specificities and differences. This problem has been linked to the “lack of social theory development, due mainly to the reluctance of epidemiologists to think about social mechanisms (e.g., racial exploitation)” (Muntaner, 1999, p. 121). While COVID-19 represents a slight variation on this trend, having been first identified in China, the problem on the large scale remains. And in times of a health emergency as global as this one, risks to be reinforced and perpetuated.

A succulent market for the industry

In the lack of national testing capacity, the developing world might fall prey to the blooming industry of genetic and disease testing, on the one hand, and of telecom-enabled population monitoring on the other. Private companies might be able to fill the gap left by the state, mapping populations at risk–while however monetizing their data. The case of 23andme is symptomatic of this rise of industry-led testing, which constitutes a double-edge sword. On the one hand, private actors might supply key services that resource-poor or failing states are unable to provide. On the other hand, however, the distorted and often hidden agendas of profit-led players reveals its shortcomings and dangers. If we look at the telecom industry, we note how it has contributed to track disease propagation in a number of health emergencies such as Ebola. And if the global open data community has called for smoother data exchange between the private and the public sector to collectively address the spread of the virus,in the absence of adequate regulatory frameworks in the Global South, for example in the field of privacy and data retention, local authorities might fall prey to outside interventions of dubious nature. 

The populism and racism factors

Lack of reliable numbers to accurately portray the COVID-19 pandemic as it spreads to the Southern hemisphere also offers fertile ground to distorted and malicious narratives mobilized for political reasons. To name just one, it allows populist leaders like Brazil’s Jair Bolsonaro to announce the “return to normality” in the country, dismissing the harsh reality as a collective “hysteria”. In Italy, the ‘fake news’ that migrant populations of African origin would be “immune” to the disease sweeped social media, unleashing racist comments and anti-migrant calls for action. While the same rumor that has reportedly been circulating in the African continent as well and populism has been hitting hard in Western democracies as well, it might be have more dramatic consequences in the more populous countries of the South. In Mexico, left-wing populist president Andrés Manuel López Obrador responded to the coronavirus emergency insisting that Mexicans should “keep living life as usual”. He did not stop his tour in the south of the country and frequently contradicted the advice of public health officials, systematically ignoring social distancing by touching, hugging and kissing his supporters and going as far as considering the pandemic as a plot to derail his presidency. These dangerous comments, assumptions and attitudes are a byproduct of the lack of reliable data and testing that we signal in this article. 

The risk of universalising the problem

Luckily, the long experience and harsh familiarity in coping with disasters, catastrophes and emergencies has also prompted various countries from the Global South to deploy effective measures of containment more quickly than many countries in the Global North. 

In the lack of reliable data from the South, however, modeling the diffusion of the disease might be difficult. The temptation will likely be to ”import” models and “appropriate” predictions from other countries and socio-economic realities, and then base domestic measures and policies on them. “Universalizing” the problem as well as the solutions, as we warned in a 2019 article, is tempting, especially in these times of global uncertainty. Universalizing entails erroneously thinking that the problem manifests itself in exactly the same manner everywhere, disregarding local features to “other” approaches. Coupled with the “whiteness” observed earlier, this gives rise to an explosive cocktail that is likely to create more problems than it solves. 

Beyond the blind spot? 

While many have enough to worry about “at home”, the largest portion of the world population today resides in the so-called Global South, with all the very concrete challenges of the situation. For instance, for a good portion of the 1,3 billion Indian citizens now on lockdown, staying at home might mean starving. How can the global community–open data experts, researchers, life science scholars, digital rights activists, to name but a few–contribute to “fix” the widening data divide that risks severely weakening any local effort to curb the expansion of COVID-19 to populations that are often already at the margins? We argue that the issue at stake here is not simply whether we pump in the much-needed resources or how we collaborate, but it is also a matter of where do we turn the eye–in other words, where we decide to look. COVID-19 will likely make apparents the need of a global alliance of experts of various kinds who, jointly with civil society organizations, can fast-track the capacity building of developing countries in the business of counting. 

This article has been published simultaneously on the the Big Data from the South blog and on Open Movements / Open Democracy.

Cover image credits: Martin Sanchez on Unsplash

Acknowledgements. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639379-DATACTIVE; https://data-activism.net).