Category: show on blog page

May 2, 2022

[BigDataSur] Open Data Sovereignty? Lessons from Hong Kong

Rolien Hoyng and Sherry Shek discuss the multiple relations between open data and data sovereignty, interrogating their compatibility.

by Rolien Hoyng and Sherry Shek

Open Data and Data Sovereignty both seem desirable principles in data politics. But are they compatible? For different reasons, these principles might be mutually exclusive in so-called open societies with free markets of information as well as in restrictive contexts such as China. Hong Kong, which combines traits of both societies, shows the promises and dangers of both principles and the need for a new vision beyond current laws and protocols.

What are Open Data and Data Sovereignty

Open Data refers to free and unrestrained access to data, to be used by anyone for whatever purpose. Open Data initiatives are usually adopted and led by governments who open up data they collect or generate related to census, economy, government budget and operation, among others.

Open Data initiatives have been touted to democratize information and enhance citizen participation. Individuals, communities, and intermediaries such as journalists, data activists, and civic hackers can produce critical insights on the basis of Open Data, to hold governments accountable, or self-organize to undertake community projects for their own betterment. This approach to Open Data reflects a democratic ideal. Individuals and communities possess political rights, which include control over their own data, as well as access to data about their communities, their government, and society at large. This is also how Data Sovereignty can be realized.

Open Data’s irreverence

But a more critical look shows that Open Data can also violate Data Sovereignty. The democratic ideal of Open Data does not easily realize itself. In actuality, Open Data initiatives often primarily serve economic strategies to stimulate innovation and data-driven industries, which becomes clear in the kinds of datasets that are released. The practice of Open Data also seems more problematic when placed in the context of indigenous peoples’ struggles. In Australia and elsewhere, colonial struggles have historically involved control over knowledge and information from and about tribes. Such struggles continue today in the age of data-driven innovation, which produces new forms of data extraction and discrimination especially at the social margins of current societies. Given that Open Data may aid or intensify such processes, it is not always, or only, innocent.

To observe the principle of Data Sovereignty, it is important that data is managed in a way that aligns with the laws, ethical sensibilities, and customs of a nation state, group, or tribe. Though the concern around data sharing, sales, and (re)use is commonly framed in terms of privacy, the principle of Data Sovereignty demands a more expansive view. Societies that do not question data usage in light of the rights of the people whom the data is about may be considered free markets for information, but they don’t offer Data Sovereignty.

Data politics in China

For China, Open Data is vital to its pursuit of global leadership in Artificial Intelligence (AI) and developing commercial and government platforms that function as the infrastructures of everyday life. But the country also has recently made global news with path-breaking laws constraining the unlimited use of data and development of AI, which seem in some respects to exceed the EU’s privacy-oriented GDPR and enhance Data Sovereignty. So far, however, Data Sovereignty in China has been interpreted foremost in a statist way: the state rather than the individual or the community controls usage of data.

The tension between Open Data and Data Sovereignty is reconciled in China by formal procedures that create room for withholding data from the public in the name of protecting sensitive information and state secrets. For instance, according to the definition of Open Data by the Open Knowledge Foundation, all datasets should be provided under the same “open” license, allowing individuals to use the datasets with the least restrictions possible. But in the Chinese context, user registration is required for access to certain datasets. Providing differentiated access to data is seen by local experts as a preferable and advanced solution to the security concerns that Open Data brings to bear. For instance, Shanghai’s recently introduced Data Regulation categorizes certain public data to only be “conditionally open”, including those that require high data security and processing capabilities, are time-sensitive, or need continuous access. Worries remain though because seemingly “innocent” data such as average income levels can always be repurposed and rearticulated, and hence become a threat.

Data politics in Hong Kong

During the 2010s, rendering datasets openly available was key to the endeavor to transform Hong Kong into Asia’s “smart” world city, serving the goal of building “a world-famed Smart Hong Kong characterized by a strong economy and high quality of living”, as the Hong Kong government framed it. But the smart-city turn also gave new momentum to old struggles over data and information. Right before the 1997 Handover of Hong Kong from Britain to mainland China, the colonial regime resisted calls for a Freedom of Information Bill, and until today there is no law in Hong Kong to provide the public the legal right to access government-held information. With the government’s turn to Open Data, data advocacy flourished and the struggle for access to information found new means.

The momentum did not last in the face of new political struggles. In her election manifesto in 2017, Carrie Lam stated that she held a positive view of the Archives Law and would follow up upon taking office. She also promised “a new style of governance,” making government information and data more transparent for the sake of social policy research and public participation. In 2018, Lam’s Policy Address announced a review of the Code on Access to Information, which provides a non-legally binding framework for public access to government-held information. However, since the turbulent events of 2019-2020, there has been no mention of Open Data, Archives Law, or the Code on Access to Information in addresses by Lam. The Law Reform Commission Sub-committee still is to come forward with a follow-up on its 2018 consultation paper.

Neither open nor closed

Reconciling an Open Data regime that supports data-driven economic development and a Data Sovereignty regime that is increasingly statist, data in Hong Kong is neither truly “open” nor “restricted.” A systematic and codified approach to Data Sovereignty along the lines of mainland China’s is lacking. But some recent events suggest a more ad hocapproach to incidents in which otherwise mundane data suddenly turn out to be politically sensitive. For instance, the journalist Choy Yuk-ling was arrested in November 2020 and found guilty later of making false statements to obtain vehicle ownership records. She collected the data for an investigative journalism project related to the gang violence that unfolded in the district of Yuen Long on the night of July 21, 2019, and that targeted protesters and members of the public.

The application for access to vehicle records asks requesters to state their purpose of applying for a Certificate of Particulars of Motor Vehicle. It used to be the case that, next to legal proceedings and sale and purchase of vehicles, there was a third option, namely “other, please specify,” though the fine print already restricted usage to traffic and transportation matters. But since October 2019, this third option has changed, now explicitly excluding anything else other than traffic and transport related matters. Such checkbox politics indicate that seemingly mundane data can suddenly become the center of controversy.

Other seemingly minor, bureaucratic changes likewise seek to affix data and constrain their free use: the Company and Land Registry has started to request formal identification of data requesters, something that members of the press have expressed to put journalists at risk.

These changes suggest that while a full-fledged strategy serving statist Data Sovereignty remains absent in Hong Kong and the stated reason for new restrictions is personal privacy instead, the use of data is not entirely free and open either. Interestingly though, recent political and legal developments in Hong Kong have so far not prevented the city’s climb in the ranks of the Open Data Inventory, conducted by Open Data Watch. In the past year, Hong Kong moved from the 14^th rank to the 12^th, while mainland China fell to the 155^th position. Yet some civic Open Data advocates have drawn their own conclusions after the implementation of the National Security Law in 2020. The organizer of a now disbanded civic data group argued that the law draws an invisible redline and practitioners can’t bear the risk that their use of data is found illegal.

Envisioning Open Data Sovereignty

As a place at the crossroads of diverse geopolitical and technological influences, Hong Kong offers a critical lens on the data politics of both so-called open societies and controlling ones. The unrestricted availability of data as customary in open societies that are pushing for smart cities and data-driven industries can undermine Data Sovereignty by ignoring the rights of individuals and communities to whom the data pertain. But restrictions or conditions regarding data usage enacted in the name of Data Sovereignty can hinder freedoms, too. While all too obvious in the case of regimes abiding by statist Data Sovereignty, the tension between the latter principle and Open Data runs deeper. After all, the potential of data to be repurposed, recontextualized, and rearticulated always implies both threat and possibility. Current Open Data rankings and Data Sovereignty laws alike are insufficient to guide us through this conundrum. More critical nuance and imagination is needed to conceive something as elusive as Open Data Sovereignty.

About the authors

Rolien Hoyng is an Assistant Professor in the School of Journalism and Communication at The Chinese University of Hong Kong. Her work is primarily situated in Hong Kong and Istanbul and addressed digital infrastructures, technological practices, and urban and ecological politics.

Sherry Shek, is a graduate student in the School of Journalism and Communication at the Chinese University of Hong Kong. Her research addresses global internet governance and China. She has previously worked for the Internet Society of Hong Kong and specialized in Open Data.

September 16, 2021

Art as data-activism

The second day of the DATACTIVE closing workshop, hosted online by SPUI25, focused on artistic responses to datafication and mass data collection. The DATACTIVE team has interviewed many civil society actors from the field of digital rights, privacy, and technology activism. Artists take part in this field, but often they don’t figure as the core actors in what is being highlighted as data activism. In this event, we wanted to stage artistic interventions in particular, in order to tease out what artists do and can do, and what insights and further questions do they generate. In this event we spoke with Karla Zavala and Adriaan Odendaal from The Internet Teapot Design Studio, Manu Luksch, and Viola van Alphen, about their work and ideas:

The Internet Teapot Design Studio is a Rotterdam-based collaboration that focuses on speculative and critical design projects and research. Karla and Adriaan explained how their work is based on the idea that ephemeral data processes have material effects in the world, and that it is needed to focus on the conditions of their production. In order to bring in this focus with their audiences, Karla and Adriaan organize co-creation workshops. In these workshops they aim to create counter-discourses, critical practices, and algorithmic literacy. Part of their approach is working with so called ‘political action cards’, a way to design pathways through the datafied society. In this way, they stimulate creative responses and make people aware of processes of datafication and, for instance, machine learning. One example of such a creative response is participants writing a diary entree from the perspective of a biased machine vision system. By taking the position of the machine, they would imagining processes such as inputs, black boxes, and outputs. Through their workshop, their audience engages with major conceptual themes such as Digital Citizenry, Surveillance Capitalism, Digital Feminism.

Manu Luksch is an intermedia artist and filmmaker who interrogates conceptions of progress and scrutinises the effects of network technologies on social relations, urban space, and political structures. She talked about her work on predictive analysis and how through her work she tries to involve publics in matters of algorithmic decision making. She showed us a part ‘Algo-Rhythm’, a film that “scrutinizes the limitations, errors and abuses of algorithmic representations” . The film, which was shot in Dakar in collaboration with leading Sengalese hip-hop performers, addresses practices of political microtargeting. As she explained, the film is an example of how she frames her findings in a speculative narrative on the basis of observations and analyses. The film got translated in eight languages and has been included in curricula across schools in Germany, which shows how her work finds a place outside of the more classic art settings and operates as a societal intervention.

Viola van Alphen is an activist, writer, and the creative director and curator of Manifestations, an annual Art & Tech festival in Eindhoven, the Netherlands. Viola showed us trailers of Manifestations, and explained how ‘fun’ is an important element for passing the message on an art and technology festival. She provided many examples of how artists try to materialize datafication and concerns around the digital economy. Some of these examples included a baby outfit with an integrated smartphone, data poker games, and candy machines that give candies in return for personal data. She also told us about her experiences of hosting the exhibition online in virtual worlds, and how artists typically managed to push the boundaries of the platform and be kicked off the platform. This, in turn, exhibits ‘the rule of platforms’, but how artists found counter measures via alternative self-hosted and decentralized servers. Other examples included 3D printed face masks that would confuse the Instagram facial recognition system, and a film that disclosed how corporations, including ones that sponsored the exhibition, take part in the weapon industry. For her, artists are important in making complex issues about datafication simple. They can boil them down to a key problem and make that sensible.

In the discussion, we touched upon a variety of issues. In our DATACTIVE workshop, we have talked about the question whether the context of datafication has changed over the last 5-6 years. This question is important to us, because the project started in the wake of the Snowden disclosures and questions about mass data collection and security were relatively new to the larger audience. The Internet Teapot Design Studio addressed how the practices of data tracing and identification are seemingly much more present in the public domain now. Adriaan mentioned how tactics of ‘gaming the system’ are present on social media, and not only amongst the typical tech activists. According to him, algorithmic awareness has become more part of public discourse, as shown by Instagram influencers talking about gaming the algorithms. Karla added how, during social protests in Colombia, tips were being shared about how beauty filters can be repurposed to prevent online facial recognition software to recognize people. They find it interesting to see user generated content emerging that is critical about algorithms.

In response the question about societal change, Manu pointed to the fact that datafication existed also before the digital, and that for years, fears to be outpaced by technological competition hindered data regulation. She stated that it is an urgent task to remind ourselves that data is not immaterial, and that it is not some substrate that we sweat out. She commented that, when looking back, the notion of the ‘data shadow’, a concept that has been used to explain our ‘data profiles’, was maybe an attractive but an ‘unlucky choice’. Data is rather an extension, that opens and closes doors. In other words, data has much more agency than being just a trace that we leave behind.
We also talked about the question whether the artists follow up with their audiences. All participants work on awareness raising. But are people really empowered on the longer term? According Viola, who regularly ‘tests out’ ideas for her exhibition with neighbors and friends, it is important to break out of one’s bubble. Art can touch individual people in their hart, and they might remember single art projects for years, but one needs to invest in speaking a variety of languages. Amongst her visitors are professionals, kids, refugees, and corporate stakeholders. Sustaining awareness is both a continuous and customized process.

The Teapot Design Studio does see communities emerging that keep in touch via social media after workshops. The studio can function as a stepping stone for people to get familiar with the topic, after which they might hopefully become interested in bigger events such as Ars Electronica or Mozilla Fest.

We concluded the event with the following question: If you were looking forward to the future, what methods are needed? What approaches would you teach art students? The Teapot studio stated that one shouldn’t be intimidated by tech in a material way. And also: Digital media is not new: people need to work on understanding what is the post-digital and what are its aesthetics. Manu advises people to take their time to become data literate, develop their sense for values (including values and skills associated to the analogue space and time), and never stop dreaming. Viola states that art projects need to be easy and digestible with only one headline. If people don’t understand it in one minute, they are off again.

There is much more to know. Watch the video of our event to hear Karla and Adriaan about what ‘teapots’ have to do with the internet, to understand how Manu has investigated the way legal regimes co-shape what is returned as ‘an image’ after doing FOIA requests in the context of CCTV surveillance, and to hear Viola reflect upon how robots can provide multi-sensory experiences and raise questions about war. The DATACTIVE team is looking forward to follow the work of the speakers in the coming years. Some of work discussed in the event is also accessible through our website.

The first day of DATACTIVE’s final event also featured a more condensed, albeit exciting panel dedicated to the intersections between data / art / activism. Next to the artists already mentioned above, we also had the opportunity to have a peek on the work of Joana Moll, a Barcelona/Berlin based artist and researcher whose work critically explores the way techno-capitalist narratives affect the alphabetization of machines, humans and ecosystems. Stay tuned for more info on this event in an upcoming post!

April 30, 2021

Niels’ research featured in the New York Times

TL;DR: have a look at the piece in the New York Times that covers Niels’ work.

During the research Niels did for datactive, which culminated in his thesis and a recent paper in New Media & Society, he actively participated in the Internet Engineering Taskforce (IETF). The IETF is one of the main standards and governance bodies of the Internet. While working there Niels’ worked together with others such as Mallory Knodel and Corinne Cath, on addressing exclusionary language in technical standards. An important part of that work was publishing this document, which sparked an extensive discussion in the IETF that up to today has not been resolved. You can read more about it in the New York Times piece.

February 16, 2021

BigBang Sprint at IETF110 Hackathon

When: March 1-3, 2021

The BigBang project will be working on improving its tool for mailinglist analysis at the IETF 110 Hackathon.

BigBang is an open source research project that studies collaboration and contention in digital infrastructure projects and governance institutions. We do this by combining data science techniques with qualitative methods. For example, with BigBang you can analyze participation, affiliation, gender, and networks in the IETF, ICANN, RIPE, IEEE, or the 3GPP.

We very much welcome both techncial and non-technical contributors! BigBang is built on the scientific Python stack, and we use Jupyter notebooks to make the analysis transparent and accessible.

To join the IETF 110 Hackathon, please register using the link from the Hackathon website. Registration is free!

We intend to work on (some of) the following issues during the hackathon:

– Integration and analysis of 3GPP and IEEE mailing lists
– Integration with the INDELab conversationkg tool
– Produce instructional videos
– Improve linking across datasets (such as the datatracker and mailing lists)
– Query/notebook design to support projects from research community
– Discussion of Star’s boundary object vs. Luhmann’s structural coupling
– The operationalization of _your_ research question!

The BigBang project will have a one-hour team meeting Friday February 26 – 9:00 ET / 14:00 GMT / 15:00 CET before the Hackathon which all are welcome to attend if they are curious about the project. You can join via this link: https://uva-live.zoom.us/s/6365963924

Please don’t hesitate to write Seb (sbenthall at gmail dot com) if you have any questions about the BigBang project or the IETF 110 sprint, or if you have suggestions for research questions!

October 16, 2020

DATACTIVE protests lack of ethical review in the UvA-Huawei collaboration

DATACTIVE, together with Bits of Freedom, the Data Justice Project and many individual scientists, signed the Funding Matters statement that protests the collaboration of a project at the University of Amsterdam and the Vrije University with Huawei. While collaboration with companies is not problematic per se, it is important that such collaborations undergo careful ethical scrutiny. Standards for such structural reviews of the societal impact of such collaborations are currently not in place. Huawei has been accused of collaborating with the Chinese government in human rights violations against the Uyghur people as well as facilitating surveillance in Uganda.

August 6, 2020

[blogpost] Teaching Students to Question Mr. Robot: Working to Prevent Algorithmic Bias in Educational Artificial Intelligence

Author: Erinne Paisley

Introduction

With the onset of the COVID-19 pandemic, classrooms around the world have moved online. Students from K-12, as well as University-level, are turning to their computers to stay connected to teachers and progress their education. This move online raises questions of the appropriateness of technologies in the classroom and how close to a utopian “Mr. Robot” we can, or should, get. One of the most contested technological uses in the classroom is the adoption of Artificial Intelligence (AI) to teach.

AI in Education

AI includes many practices that process information in a similar way to humans processing of information. Human intelligence is not one-dimensional and neither is AI, meaning AI includes many different techniques and addresses a multitude of tasks. Two of the main AI techniques that have been adapted into potential educational AI are: automation and adaptive learning. Automation means computers are pre-programmed to complete tasks without the input of a human. Adaptive learning indicates that these automated systems can adjust themselves based on use and become more personalized.

The potential of combining these AI techniques into some type of robot teacher, or “Mr. Robot” sounds like something out of a sci-fi cartoon but it is already a reality for some. A combination of these AI techniques have already been used to assessing students’ prior and ongoing learning levels, placing students in appropriate subject levels, scheduling classes, and individualizing instructions. In the Mississippi Department of Education, in the United States, a shortage of teachers has been addressed through the use of an AI-powered online learning program called “Edgenuity”. This program automates lesson plans following the format of: warm-up, instruction, summary, assignment, and quiz.

A screenshot from an Edgenuity lesson plan.

Despite how utopian an AI-powered classroom may sound, there are some significant issues of inequality and social injustice that are supported by these technologies. In September 2019, the United Nations Education, Scientific and Cultural Organization (UNESCO), along with a number of other organizations, hosted a conference titled: “Where Does Artificial Intelligence Fit in the Classroom?” that explored these issues. One of the main concerns raised was: algorithmic bias.

Algorithmic Bias in Education AI

The mainstream attitude towards AI is still one of faith – faith that these technologies are, as the name says, intelligent as well as objective. However, AI bias illustrates how these new technologies are very far from neutral. Joy Buolamwini explains how AI can be biased, explaining that the biases, conscious or subconscious, present in those who create the code is then a part of the digital systems themselves. This creates systems especially skewed against people of colour, women, and other minorities who are not statistically as included in the process of creating these codes, including AI codes. For instance, the latest AI application pool for Stanford University in the United States was 71% male.

Joy Buolamwini’s Tedx talk on algorithmic bias.

In the educational sector, people of colour, girls, and other minorities are already marginalized. Because of this, there is the concern that AI in the classroom that has encoded biases would further these inequalities. For instance, trapping low-income and minority students into low-achievement tracks. This would create a cycle of poverty, supported by this educational framework, instead of having human teachers address students on an individual level and offer specialized support and attention to those facing adversity.

However, the educational field already has its own biases embedded in it – both within individual teachers and throughout the system more generally. Viewed in this way, the increased use of AI in the classrooms creates the opportunity to create less bias if designed in a way that directly aims to address these issues. The work of designing AI that addresses and aims to create progressive technologies has been taken on by teachers, librarians, students or anyone in-between.

Learning to Fight Algorithmic Bias

By including more voices and perspectives in the process of creating the coding AI technologies, algorithmic bias can be prevented and, instead, a technological system that supports a socially just classroom can be supported. In this final section, I will highlight two pre-existing educational projects aimed at teaching students of all ages to identify and fight algorithmic bias while creating technology that creates a more equal classroom.

Algorithmic Bias Lesson Plans

The use of AI to create a more socially just educational system can start in the classroom, as Blakeley Payne showed when she ran a week-long ethics in AI course for 10 to 14-year-olds in 2019. The course included lessons on creating open-source coding, AI ethics, and ultimately taught students to both understand and fight against algorithmic bias. The lessons plans themselves are available for free online for any classroom to incorporate into their own lesson plans – even from home.

Students learn how to identify algorithm bias during the one-week course.

Blakeley Payne’s one-week program focuses on ages 10-14 to encourage students to become interested and passionate about issues of algorithm bias, and the STEM field more broadly, from a young age. Students work on simple activities such as writing an algorithm for the “best peanut butter and jelly sandwich” in order to practice questioning algorithmic bias. This activity in particular has them question what “best” means? Does it mean best looking? Best tasting? Who decides what this means and what are the implications of this?

Non-profits such as Girls Who Code are also actively working to design lesson plans and activities for young audiences that teach critical thinking and design when it comes to algorithms, including those creating AI. The organization runs after school clubs for girls in grades 3-12, as well as college programs for alumni of the program, as well as summer intensives. Their programs focus technically on developing coding skills but also have a large focus on diversifying the STEM fields and creating equitable coding.

Conclusion

The future of AI in the classroom is inevitable. This may not mean every teacher becomes robotic, but the use of AI and other technologies in the educational field is already happening. Although this raises concerns about algorithmic bias in the education system, it also creates more opportunities to re-think how technologies can be used to create a more socially just educational system. As we have seen through existing educational programs that teach algorithmic bias, even at the kindergarten age, interest in learning, questioning, and re-thinking algorithms can easily be nurtured. The answer to how we create more socially just educational system through AI is simple: just ask the students.

About the Author

Erinne Paisley is a current Research Media Masters student at the University of Amsterdam and completed her BA at the University of Toronto in Peace, Conflict and Justice & Book and Media Studies. She is the author of three books on social media activism for youth with Orca Book Publishing.

June 2, 2020

[BigDataSur] Data activism in action: The gigantic size and impact of China’s fishing fleet revealed using big data analytics and algorithms

Author: Miren Gutiérrez

As we grapple with the COVID-19 pandemic, another crisis looms over our future: overfishing. Fishing fleets and unsustainable practices have been emptying the oceans of fish and filling them with plastic. Although other countries are also responsible for overfishing, China has a greater responsibility. Why is looking at the Chinese fleet important? China has almost 17.00 vessels capable of distant water fishing, as reveals for the first time an investigative report published by Overseas Development Insitute, Londres.

As part of a team of researchers at the Overseas Development Institute, London, I had access to the world’s largest database of fishing vessels. Combining these data with satellite data from the automatic identification system –which indicates their movements—, we were able to observe their behaviour for two years (2017 and 2018). To do this, we employed big data analytical techniques, machine learning algorithms, and geographic information systems to describe the fleet and analyze how it behaved.

And the first thing we noticed is that China’s fishing fleet is five to eight times larger than any previous estimation. We identified a total of 16,966 Chinese fishing vessels able to fish in “distant waters”, that is, outside its exclusive economic zone, including some 12,500 vessels observed outside Chinese waters during the same period.

Why is this important? If China’s DWF fleet is 5-8 times larger than previous estimates, its impacts are inevitably more significant than previously estimated. This is important for two reasons. First, because millions of people in coastal areas of developing countries depend on fishery resources for their subsistence and food security. Second, due to this extraordinary increase, it is difficult to monitor and control fishing activities in distant waters of China.

The other thing that we observe is the most frequent type of fishing vessel is the trawler. Most of these Chinese trawlers can practice bottom trawling, which is the most damaging fishing technique available. We identified some 1,800 Chinese trawlers, which are more than double what was previously thought.

Furthermore, only 148 Chinese ships were registered in countries commonly regarded as flags of convenience. This shows that the incentives to adopt flags of convenience are few given the relatively lax regulation of the Chinese authorities.

Finally, of the nearly 1,000 registered vessels outside of China, more than half have African flags, especially in west Africa, where law enforcement is limited and fishing rights are often limited to registered vessels in the country, which explains why these Chinese ships have adopted local flags.

What can be said about the ownership of these fishing vessels? It is very complex. We analyzed a subsample of approximately 6,100 vessels to discover that only eight companies owned or operated more than 50 vessels each. That is, there are very few large Chinese fishing companies since small or medium-sized companies own most of them. However, this is only a facade, as many of these companies appear to be subsidiaries of larger corporations, suggesting some form of more centralized control. The lack of transparency hampers monitoring efforts and attempts to hold those responsible for malpractice accountable.

But another exciting facet of the ownership structure is that half of the 183 vessels suspected of involvement in illegal, unreported or unregulated fishing are owned by a handful of companies, and also that several of them are parastatal. This means that focusing on them could solve many problems because these companies own other ships.

There has been an extraordinary boom in Chinese fishing activities that is difficult to control. Chinese companies are free to operate and negotiate their access to fisheries in coastal states of developing countries without being monitored, especially in West Africa. This laxity contrasts with the policy of the European Union to reduce its fishing fleet and exercise greater control over its global operations.

This report is a data activist project that aims at redressing the unfair situation for nations, especially in west Africa, that cannot monitor and police their waters.

This is a version of an op-ed published in Spanish by eldiario.es.

About the author: Miren Gutiérrez is passionate about human rights, journalism and the environment (with a weakness for fish), and optimistic about what can be done with data-based research, knowledge and communication. Prof. at the University of Deusto and Research Associate at the Overseas Development Institute. Miren is Research Associate @DATACTIVE.

May 14, 2020

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [3/3]

Author: Katherine Reilly, Simon Fraser University, School of Communication

Data Stewardship through Citizen Centered Data Audits

In my previous two posts (the first & the second), I talked about the nature of data audits, and how they might be applied by citizens. Audits, I explained, check whether people are carrying out practices according to established standards or criteria with the goal of ensuring effective use of resources. As citizens we have many tools available at our disposal to audit companies, but when we audit companies according to their criteria, then we risk losing sight of our own needs in the community. The question addressed by this post is how to do data audits from a citizen point of view.

Thinking about data as a resource is a first step in changing our perspective on data audits. Our current data regime is an extractive data regime. As I explained in my first post, in the current regime, governments accept the central audit criteria of businesses, and on top of this, they establish the minimal protections necessary to ensure a steady flow of personal data to those same corporate actors.

I would like to suggest that we rethink our data regime in terms of data stewardship. The term ‘stewardship’ is usually applied to the natural environment. A forest might be governed by a stewardship plan which lays out the rights and responsibilities of resource use. Stewardship implies a plan for the management of those resources, both so that they can be sustained, and also so that everyone can enjoy them.

If the raw material produced by the data forest is our personal information, then we are the trees, and we are being harvested. Our data stewardship regime is organized to support that process, and audits are the means to enforce it. The main beneficiaries of the current data stewardship regime are companies who harvest and process our data. Our own benefits – our right to walk through the forest and enjoy the birds, or our right to profit from the forest materially – are not contemplated in the current stewardship regime.

It is tempting to conclude that audits are to blame, but really, evaluation is an agnostic concept. What matters are the criteria – the standards to which we hold corporate actors. If we change the standards of the data regime, then we change the system. We can introduce principles of stewardship that reflect the needs of community members. To do this, we need to start from the audit criteria that represent the localized concerns of situated peoples.

To this end, I have started a new project in collaboration with 5 fellow data justice organizations in 5 countries in Latin America: HiperDerecho in Chile, Karisma in Colombia, TEDIC in Paraguay, HiperDerecho in Peru and ObservaTIC in Uruguay. We will also enjoy the technical support of Sula Batsu in Costa Rica.

Our focus will be on identifying alternative starting points for data audits. We won’t start from the law, or the technology, or corporate policy. Instead, we will start from people’s lived experiences, and use these as a basis to establish criteria for auditing corporate use of personal data.

We will work with small groups who share a common identity and/or experience, and who are directly affected by corporate use of their personal data. For example, people with chronic health issues have a stake in how personal data, loyalty programs and platform delivery services mediate their relationship with pharmacies and pharmaceutical companies. The project will identify community collaborators who are interested in working with us to establish alternative criteria for evaluating those companies.

Our emerging methodology will use a funnel-like approach, starting from broad discussions about the nature of data, passing through explorations of personal practices and the role of data in them, and then landing on more specific and detailed explorations of specific moments or processes in which people share their personal data.

Once the group has learned something about the reality of data in their daily lives – and in particular the instances where data is of particular concern from them – we will facilitate group activities that help them identify their data needs, as well as the behaviors that would satisfy those needs. An example of a data need might be “I need to feel valued as a person and as woman when I interact with the pharmacy.” A statement of how that need might be satisfied could be, for example, “I would feel more valued as a person and as a woman if the company changed its data collection categories.”

We are particularly interested to think through the application of community criteria to companies who have grown in power and influence during the Covid-19 pandemic. Companies like InstaCart, SkipTheDishes, Rapi, Zoom, and Amazon are uniquely empowered to control urban distribution chains that affect the welfare of millions. What do community members require from these companies in terms of their data practices, and how would they fare against an audit based on those criteria?

We find inspiration for alternative audit criteria in data advocacy projects that have been covered by DATACTIVE’s Big Data from the South Blog. For example, the First Nations Information Governance Centre (FNIGC) of Canada has established the principles of ownership, control, access and permission for the management of First Nations data, and New Zealand has adopted Maori knowledge protocols for information systems used in primary health care provision (as reported by Anna Carlson). Meanwhile, the Mexican organization Controla tu Gobierno argues that we need to view data “less as a commodity – which is the narrative that constantly tries to make us understand data as the new oil – and more as a source of meaning” (Guillen Torres and Mayli Sepulveda, 2017).

From examples like these, and given the concept of data stewardship, we can begin to see that data is only as valuable as the criteria used to assess it, and so we urgently need alternative criteria that reflect the desires, needs and rights of communities.

How would corporate actors fare in an audit based on these alternative criteria? How would such a process reposition the value of data within the community? Who should carry out these evaluative processes, and how can they work together to create a more equitable data stewardship regime that better serves the needs of communities?

By answering these questions, we can move past creating data literate subjects for the existing data stewardship regime. Instead, we can open space for discussion about how we actually want our data resources to be used. In a recent Guardian piece, Hare argued that “The GDPR protects data. To protect people, we need a bill of rights, one that protects our civil liberties in the age of AI.”² The content of that bill of rights requires careful contemplation. Citizen data audits allow us to think creatively about how data stewardship regimes can serve the needs of communities, and from there we can build out the legal frameworks to protect those rights.

About the author: Dr. Katherine Reilly is Associate Professor in the School of Communication at Simon Fraser University in Vancouver, Canada. She is the recipient of a SSHRC Partnership Grant and an International Development Research Centre grant to explore citizen data audit methodologies alongside Derechos Digitales in Chile, Fundacion Karisma in Colombia, Sula Batsu in Costa Rica, TEDIC in Paraguay, HiperDerecho in Peru, and ObservaTIC in Uruguray.

May 13, 2020

WomenonWeb censored in Spain as reported by Magma

Author: Vasilis Ververis

The Magma project just published new research on censorship concerning womenonweb.org, a non-profit organization providing support to women and pregnant people. The article describes how the major ISPs in Spain are blocking womenonweb.org’s website. Spanish ISPs have been blocking this website by means of DNS manipulation, TCP reset, HTTP blocking with the use of a Deep Packet Inspection (DPI) infrastructure. Our data analysis is based on network measurements from OONI data. This is the first time that we observe Women on Web being blocked in Spain.

About Magma: Magma aims to build a scalable, reproducible, standard methodology on measuring, documenting and circumventing internet censorship, information controls, internet blackouts and surveillance in a way that will be streamlined and used in practice by researchers, front-line activists, field-workers, human rights defenders, organizations and journalists.

About the author: Vasilis Ververis is a research associate with DATACTIVE and a practitioner of the principles ~ undo / rebuild ~ the current centralization model of the internet. Their research deals with internet censorship and investigation of collateral damage via information controls and surveillance. Some recent affiliations: Humboldt-Universität zu Berlin, Germany; Universidade Estadual do Piaui, Brazil; University Institute of Lisbon, Portugal.

May 7, 2020

[BigDataSur] The Challenge of Decolonizing Big Data through Citizen Data Audits [2/3]

A First Attempt at Citizen Data Audits

Author: Katherine Reilly, Simon Fraser University, School of Communication

In the first post in this series, I explained that audits are used to check whether people are carrying out practices according to established standards or criteria. They are meant to ensure effective use of resources. Corporations audit their internal processes to make sure that they comply with corporate policy, while governments audit corporations to make sure that they comply with the law.

There is no reason why citizens or watchdogs can’t carry out audits as well. In fact, data privacy laws include some interesting frameworks that can facilitate this type of work. In particular, the EU’s General Data Privacy Regulation (GDPR) gives you the right to know how corporations are using your personal data, and also the ability to access the personal data that companies hold about you. This right is reproduced in the privacy legislation of many countries around the world from Canada and Chile to Costa Rica and Peru, to name just a few.

With this in mind, several years ago the Citizen Lab at the University of Toronto set up a website called Access My Info which helps people access the personal data that companies hold about them. Access My Info was set up as an experiment, so the site only includes a fixed roster of Canadian telecommunications companies, fitness trackers, and dating apps. It walks users through the process of submitting a personal data request to one of these companies, and then tracks whether the companies respond. The goal of this project was to crowdsource insights from citizens that would help researchers learn what companies know about their clients, how companies manage personal data, and who companies share data with. The results of this work have been used to advocate for changes to digital privacy laws.

Using this model as a starting point, in 2019, my team at SFU, and a team from the Peruvian digital rights advocate HiperDerecho, set up a website called SonMisDatos (Son Mis Datos translates as “It’s My Data”.) Son Mis Datos riffed on the open source platform developed by Access My Info, but made several important modifications. In particular, HiperDerecho’s Director, Miguel Morachimo, made the site database-driven so that it was easier to update the roster of corporate actors or their contact details. Miguel also decided to focus on companies that have a more direct material impact on the daily lives of Peruvians – such as gas stations, grocery stores and pharmacies. These companies have loyalty programs that are involved in collecting personal data about users.

Then we took things one step further. We used SonMisDatos to organize citizen data audits of Peruvian companies. HiperDerecho mobilized a team of people who work on digital rights in Peru, and we brought them together at two workshops. At the first workshop, we taught participants about their rights under Peru’s personal data protection laws, introduced SonMisDatos, and asked everyone to use the site to ask companies for access to their personal data. Companies need time to fulfill those requests, so then we waited for two months. At our second workshop, participants reported back on the results of their data requests, and then I shared a series of techniques for auditing companies on the basis of the personal data people had been able to access.

Our audit techniques explored the quality of the data provided, corporate compliance with data laws, how responsive companies were to data requests, the quality of their informed consent process, and several other factors. My favorite audit technique reflected a special feature of the data protection laws of Peru. In that country, companies are required to register databases of personal information with a state entity. The registry, which is published online, includes lists of companies, the titles of their databases, as well as the categories of data collected by each database. (The government does not collect the contents of the databases, it only registers their existence.)

With this information, our auditors were able to verify whether the data they got back from corporate actors was complete and accurate. In one case, the registry told us that a pharmaceutical company was collecting data about whether clients had children. However, in response to an access request, the company only provided lists of purchases organized by date, skew number, quantity and price. Our auditors were really bothered by this discovery, because it suggested that the company was making inferences about clients without telling them. Participants wondered how the company was using these inferences, and whether it might affect pricing, customer experience, access to coupons, or the like.

In another case, one of our auditors subscribed to DirecTV. To complete this process, he needed to provide his cell phone number plus his national ID number. He later realized that he had accidentally typed in the wrong ID number, because he began receiving cell phone spam addressed to another person. This was exciting, because it allowed us to learn which companies were buying personal data from DirecTV. It also demonstrated that DirecTV was doing a poor job of managing their customer’s privacy and security! However, during the audit we also looked back at DirecTV’s terms of service. We discovered that they were completely up front about their intention to sell personal information to advertisers. Our auditors were sheepish about not reading the terms of the deal, but they also felt it was wrong that they had no option but to accept these terms if they wanted to access the service.

On the basis of this experience, we wrote a guidebook that explains how to use Son Mis Datos, and how to carry about an audit on the basis of the ‘access’ provisions in personal data laws. The guide helps users think through questions like: Is the data complete, precise, unmodified, timely, accessible, machine-readable, non-discriminatory, and free? Has this company respected your data rights? What does the company’s response to your data request suggest about its data use and data management practices?

We learned a tonne from realizing these audits! We know, for instance, that the more specific the request, the more data a company provides. If you ask a company for “all of the personal data you hold about me” you will get less data that if you ask for “all of my personal information, all of my IP data, all of my mousing behaviour data, all of my transaction data, etc.”

Our experiments with citizen data audits also allow us to make claims about how companies define the term “personal data.” Often companies define personal data very narrowly to mean registration information (name, address, phone number, identification number, etc.). This lies in extreme contrast to the academic definition of personal data, which is any information that can lead to the identification of an individual person. In the age of big data, that means pretty much any digital traces you produce while logged in. Observations like these allow us to open up larger discussions about corporate data use practices, which helps to build citizen data literacy.

However, we were disappointed to discover that our citizen data audits worked to validate a data regime that is organized around the expropriation of resources from our communities. In my first blog post I explained that the 5 criteria driving data audits are profitability, risk, consent, security and privacy.

Since our audit originated with the law, with technology, and with corporate practices, we ended up using the audit criteria established by businesses and governments to assess corporate data practices. And this meant that we were checking to see if they were using our personal and community resources according to policies and laws that drive an efficient expropriation of those very same resources!

The concept of privacy was particularly difficult to escape. The idea that personal data must be private has been ingrained into all of us, so much so that the notion of pooled data or community data falls outside the popular imagination.

As a result, we felt that our citizen data audits did other people’s data audit work for them. We became watchdogs in the service of government oversight offices. We became the backers of corporate efficiencies. I’ve got nothing personal against watchdogs — they do important work — but what if the laws and policies aren’t worth protecting?

We have struggled greatly with the question of how to generate a conversation that moves beyond established parameters, and that situates our work in the community. With this in mind, we’ve begun to explore alternative approaches to thinking about and carrying out citizen data audits. That’s the subject of the final post in this series.