Category: Uncategorized

“Public OSINT”: new Working Paper by Lonneke van der Velden

A new Working Paper in our Working Paper Series is out:

Velden, van der. (2021) “Public OSINT: What is Open Source in Open Source Investigations?”, DATACTIVE Working paper series, No 1/2021 ISSN: 2666-0733.



About the DATACTIVE working paper series
The DATACTIVE Working Paper Series presents results of the DATACTIVE research project. The series aims to disseminate the results of their research to a wider audience. An editorial committee consisting of the DATACTIVE PI and Postdoctoral fellows reviews the quality of the Working Papers. The Series aims to disseminate research results in an accessible manner to a wider audience. Readers are encouraged to provide the authors with feedback and/or questions.


[BigDataSur-COVID] Towards Civic Data Policies: Participatory Safeguards in COVID-19 Times

By Arne Hintz

The pervasive tracing, tracking, and analysing of citizens and populations has emerged as the tradeoff of an increasingly datafied world. Citizens are becoming more transparent to the major data-collecting institutions of the platform economy and the state, while they have limited possibilities to intervene into processes of data governance, control the data that is collected about them, and affect how they are profiled and assessed through data assemblages. The COVID-19 pandemic has highlighted the centrality of these dynamics. Contact tracing and detailed identification of outbreak clusters have been essential responses to COVID-19. Yet, detailed data about our movements, interactions and pastimes is now tracked, stored, and analysed, both “online” through the use of contact-tracing apps and “offline” (e.g., when we fill in a form at a bar or restaurant). The rise of tracking raises the question of how exactly data is collected and analysed, by whom, for what purposes, and with what limitations. Essentially, it signals the necessity of legal safeguards to ensure that data analytics fulfil their purpose while preventing privacy infringements, discrimination, and the misuse of data. The COVID-19 pandemic thus alerts us to the importance of effective regulatory frameworks that protect the rights and freedoms of digital citizens. It also demands public involvement in a debate that affects our lives during the pandemic and beyond.

The wider context of data policy in the wake of major data controversies by both public and commercial institutions—from the Snowden revelations to Cambridge Analytica—is currently ambiguous. On the one hand, it reflects a deeply entrenched commitment to expansive data collection. On the other hand, it increasingly recognises the need for enhanced data protection and citizens’ data rights. In many countries, the possibilities for monitoring people’s data traces (particularly by state agencies) have significantly expanded. The UK Investigatory Powers Act from 2016 serves as a stark example, because it legalised a broad range of measures, including the “bulk collection” of people’s data and communication; the “internet connection records” (i.e., people’s web browsing habits); and “computer network exploitation” (i.e., state-sponsored hacking into the networks of companies and other governments as well as the devices of individual citizens).1

At the same time as these encroachments, we have also seen the strengthening of data protection rules, most prominently by the European Union General Data Protection Regulation (GDPR) in 2018. The GDPR enhances citizen control over data by providing rights to access and withdraw personal data, request an explanation for data use, and deny consent to data tracking by platforms. It requires that data be collected only for specific purposes to reduce indiscriminate data sharing and trading. The GDPR also limits the processing of sensitive personal data. While some elements of the GDPR have been controversial and the regulation overall is often described as insufficient, it has been recognised as an important building block towards a citizen-oriented data policy framework. The emerging policy environment of data collection and data use has been significant in societies that are increasingly governed through data analysis and processes of automated decision-making. Profiling citizens and segmenting populations through detailed analysis of personal and behavioural data are now at the core of governance processes and shape state-citizen relations.

What does the shifting data environment mean during COVID-19 times? How should regulatory frameworks enable and constrain the tracking and tracing of virus outbreaks, and what boundaries should exist? If we accept that some data collection and analysis is useful to address the pandemic and its serious health implications, the purpose limitation of this data (as highlighted by the GDPR) becomes crucial. In some countries, contact-tracing apps were designed to track a much wider range of data than initially necessary for tracing infection chains and enable government agencies to use that data for non-medical tracking purposes. In order to avoid contact-tracing becoming a Trojan Horse for widespread citizen surveillance, strict purpose limitation would be an essential cornerstone of a robust regulatory framework. Similarly, limitations to the collection of sensitive data and the deletion of all data at fixed times during or after the pandemic would be core components of such a framework. While it may be debatable whether wider data collection and sharing would be acceptable as long as the affected individuals give their consent, a consent model often leads to pressures and incentives for citizens to hand over data against their will and interest, which would make strict prohibitions seem a more appropriate mechanism. The COVID-19 contact-tracing case thus points to some of the elements that are increasingly discussed and regulated as part of policy reforms such as the GDPR, and it highlights the challenges of indiscriminate data collection.

Indiscriminate data collection also poses questions about who should develop such policy, and whether broader public involvement would be desirable or even necessary. The COVID-19 pandemic helps us explore the role of citizens as policy actors. Contributions to the regulatory and legislative environment by civic actors outside the realm of traditional “policymakers” have received increased attention in recent years. These range from the role of civil society in multi-stakeholder policy processes to policy influences by social movements and to the development of specific legislation by citizens in the form of what has been called crowd law and policy hacking.’ The COVID-19 case demonstrates multiple dimensions of these kinds of public engagement. It shows the strong normative role of technical developers arguing for decentralised data storage options in contact-tracing apps (e.g., the Decentralised Privacy-Preserving Proximity Tracing project), who have prevailed in many cases over the initial government intention to centralise data handling. Further, we have seen legal scholars taking the lead in proposing relevant legislative frameworks, for example, by developing a dedicated Coronavirus Safeguards Bill for the UK (which has not, so far, been adopted by the UK government but has still influenced the debate on contact-tracing). The public discourse on COVID-19 responses in many countries has also considered the problem of data collection and possible privacy infringements, thus placing data analytics firmly on the public agenda.

The current pandemic has shown that emergency situations require the rapid adoption of legal safeguards, and a wider public debate on what data analyses are acceptable and where boundaries lie. Policy components from recent regulatory frameworks such as the GDPR can be an important part of this endeavour, as should critical reflection on data extraction laws such as the Investigatory Powers Act. Expert proposals from civil society have promoted rules that address problems raised by the pandemic while protecting civic rights. At the “margins” of established policy processes, these interventions by civil society and the public play a significant role in advancing normative pressure on civic data policies.


About the author

Arne Hintz is Reader at Cardiff University’s School of Journalism, Media and Culture and Co-Director of its Data Justice Lab. His research focuses on digital citizenship and the future of democracy and participation in the age of datafication. He is Co-Chair of the Global Media Policy Working Group of the International Association for Media and Communication Research and co-author of Digital Citizenship in a Datafied Society (Polity, 2019).

cómo colaborar en “COVID-19 from the margins”

¡Gracias por tu interés en colaborar en el blog COVID-19 from the margins! Aquí encontrarás información acerca de cómo preparar tu entrega.
Por favor, leela atentamente.

Qué publicamos

  • Invitamos a colaborar involucrando las diferentes formas de impacto del COVID-19 en el Sur y en los diferentes Sures, incluyendo las consecuencias en infraestructura y redistribución en relación con la sociedad estudiada (vigilancia, estadísticas, intentos de construcción de narraciones). En particular, buscamos publicar entradas de blog que exploren aquellas consecuencias y las maneras en las que las personas y las comunidades de los Sures responden a ellas.
  • ¿Cuál es nuestra definición de Sur(es)? El Sur es una entidad amalgamada y plural, que si bien incluye una connotación geográfica (el “Sur Global”), va más allá de ella. Desde esta perspectiva, el Sur(es) es a place of (and a proxy for) alterity, resistance, subversion, and creativity (Milan and Treré 2019, p. 235).
  • Los autores de publicaciones aceptadas recibirán una retribución standard (€ 50). Las categorías son las siguientes: estudiantes, personas desempleadas o trabajadores precarios, en particular del llamado Sur Global. Los pedidos serán evaluados de manera individual.
    Rogamos tener en cuenta que actualmente estamos recaudando fondos y por ahora tenemos una retribución confirmada solo para las primeras veinte entradas.

Cómo preparar tu manuscrito

  • Cantidad de palabras: entre 600 y 1.200 (máximo 1.500), estilo blog, accesible para una audiencia amplia. Publicaciones más largas podrían ser publicadas como una serie de “episodios” vinculados entre sí.
  • Por favor, seguir la hoja de estilo blog para preparar tu manuscrito (blog stylesheet to prepare your manuscript)
  • Una vez que esté listo, enviarlo a

how to contribute to “COVID-19 from the margins”

Thanks for your interest in contributing to the blog COVID-19 from the margins! Here you can find information about how to prepare your submission. Please read them carefully.

What we publish

  • We invite contributions engaging various forms of impact of COVID-19 on the Souths, including its economic, infrastructural and redistributional consequences in relation to the datafied society (e.g., surveillance, statistics, grassroots efforts to counter narratives). In particular, we seek to publish blog posts that explore such consequences and the ways people and communities across the Souths respond to them.

To be considered in the blog, your post should
1) explicitly reflect on one or more aspect of the datafied society at the time of the pandemic (e.g., surveillance, data production, data-based narratives, technological solutions or obstacles, data justice, data activism…).
2) explicitly take a human-centred perspective in exploring the consequences of the pandemic (e.g. how it is affecting people and communities on the ground, its impact on data privacy, redistribution of resources, access to key services, inclusion/exclusion from service provision…)

  • What is our definition of the South(s)? The South(s) is a composite and plural entity, including but also going beyond the geographical connotation (i.e., “global South”). In this understanding, the South(s) is a place of (and a proxy for) alterity, resistance, subversion, and creativity (Milan and Treré 2019, p. 235).
  • A standard retribution (€ 50) will be allocated to authors of accepted posts in the following categories: students, unemployed or precarious workers, in particular in the so-called Global South. Requests will be evaluated on a case-by-case basis. Please note that we are currently fundraising, and to date we have secured a retribution for the first twenty posts only.

How to prepare your manuscript

  • Length: between 600 and 1,200 words (max 1,500), blog style accessible to a wider audience. Longer posts might be published as a series of “episodes” linked to each other.
  • Please follow the blog stylesheet to prepare your manuscript. Don’t forget to include a title and a teaser, and a picture to accompany your post.
  • When ready, send to




Magma guide release announcement

January 29, 2020

By Vasilis Ververis, DATACTIVE

We are very pleased to announce you that the magma guide has been released.

What is the magma guide?

An open-licensed, collaborative repository that provides the first publicly available research framework for people working to measure information controls and online censorship activities. In it, users can find the resources they need to perform their research more effectively and efficiently.

It is available under the following website:

The content of the guide represents industry best practices, developed in consultation with networking researchers, activists, and technologists. And it’s evergreen, too–constantly updated with new content, resources, and tutorials. The host website is regularly updated and synced to a version control repository (Git) that can be used by members of the network measurements community to review, translate, and revise content of the guide.

If you or someone you know is able to provide such information, please get in touch with us or read on how you can directly contribute to the guide.

All content of the magma guide (unless otherwise mentioned) is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).

Many thanks to everyone who helped make the magma guide a reality.

You may use any of the communication channels (listed in contact page) to get in touch with us.


Vasilis Ververis is a research associate with DATACTIVE and a practitioner of the principles ~ undo / rebuild ~ the current centralization model of the internet. Their research deals with internet censorship and investigation of collateral damage via information controls and surveillance. Some recent affiliations: Humboldt-Universität zu Berlin, Germany; Universidade Estadual do Piaui, Brazil; University Institute of Lisbon, Portugal.

Davide in Lugano with paper on algorithms as online discourse (January 30)

Davide Beraldo will be in Lugano, Switzerland, to present a paper on ‘Algorithms as Online Discourse. Exploring topic modeling and network analysis to study algorithmic imaginaries’, co-authored with Massimo Airoldi (Lifestyle Research Center, (EMLYON Business School). The paper is a contribution to the ‘Rethinking Digital Myths. Mediation, narratives and mythopoiesis in the digital age’ workshop hosted at the Università della Svizzera Italiana.

[blog] Why Psychologists need New Media Theory

by Salvatore Romano


I’m a graduate student at the University of Padova, Italy. I’m studying Social Psychology, and I spent four months doing an Erasmus Internship with the DATACTIVE team in Amsterdam.


It’s not so common to find a student of psychology in the department of Media Studies; some of my Italian colleagues asked me the reason for my choice. So I would like to explain four good reasons for a student of psychology to get interested in New Media Theory and Digital Humanities. In doing that, I will quote some articles to give a starting point to other colleagues who would like to study similar issues.

I participated in the “Digital Method Summer School,” which has been an excellent way to get a general overview of the different topics and methodologies in use in the department. In just two weeks, we discussed many things: from a sociological point of view on the Syrian war to an anthropological comprehension of alt-right memes, passing by semantic analysis, and data scraping tools. In the following months, I had the chance to deepen the critical approach and the activist’s point of view, collaborating with the Tracking Exposed project. The main question that drove my engagement for the whole period has been: “what reflections should we make before using the so-called ‘big data’ made available by digital media?”.

The first important point to note is: research through media should always be also research about media. It is possible to use this data to investigate the human mind and not just to make assumptions about the medium itself. However, it is still essential to have specific knowledge about the medium. New Media theory is interesting not only because it tells you what New Media are, but rather because it is crucial to understand how to use new media data to answer different questions coming from various fields of studies. That’s why, also as psychologists, we can benefit from the discussion.

The second compelling reason is that you need specific and in-deep knowledge to deal with technical problems related to digital media and its data. I experienced some of the difficulties that you can face while researching social media data: most of the time you need to build your research tools, because no one had your exact question before you or, at least, you need to be able to adapt someone else’s tool to your needs. And this is just the beginning; to keep your (or other’s) tool working, you need to update it really often, sometimes also fighting with a company that tries to obstruct independent research as much as possible. In general, the world of digital media is changing much faster than traditional media; you could have a new trendy platform each year; stay up to date is a real challenge, and we cannot turn a blind eye to all of this.

Precisely for that reason, the third reflection I made is about the reliability of the data we use for psychological research. Especially in social psychology, students are familiar with using questionnaires and experiments to validate their hypotheses. With those kinds of methodologies, the measurement error is mostly controlled by the investigator that creates the sample and assures that the experimental conditions are respected. But with big data social sciences experiment, the possibility to trace significant collective dynamics down to single interactions, as long as you can get those data and analyze them properly. To make use of this opportunity, we analyze databases that are not recorded by us, and that lack an experimental environment (for example, when using Facebook API). This lack of independence could introduce distortions imputable to the standardization operated by social media platforms and not monitorable by the researcher. Moreover, to use APIs without general knowledge about what kind of media recorded those data is really dangerous, as the chances to misunderstand the authentic meaning of the communication we analyze are high.

Also if we don’t administer a test directly to the subjects, or we don’t make assumptions just from experimental set-up, we still need to reproduce a scientific accuracy to analyze big data produced by digital media. It is essential to build our tools to create the database independently; it’s necessary to know the medium to reduce misunderstandings, and all this is something we can learn from a Media Studies approach, also as psychologists.

The fourth point is about how digital media implement psychological theory to shape at best their design. Those platforms use psychology to augment the engagement (and profits), while psychologists use very rarely the data stored by the same platforms to improve psychological knowledge. Most of the time, omnipotent multinational corporations play with targeted advertising, escalating to psychological manipulation, while a lot of psychologists struggle to understand the real potential of those data.

Concrete examples of what we could do are the analysis of the hidden effects of the Dark Patterns adopted by Facebook to glue you to the screen; the “Research Personas” method to uncover the affective charge created by apps like Tinder; the graphical representation of the personalization process involved in the Youtube algorithm.


In general, I think that it’s essential for us, as academic psychologists, to test all the possible effects of those new communication platforms, not relying just on the analysis made by the same company about itself, we need instead to produce independent and public research. The fundamental discussion about how to build the collective communications system should be driven by those types of investigations, and should not just follow uncritically what is “good” for those companies themselves.


Stefania in Tel Aviv for the workshop “Algorithmic Knowledge in Culture and in the Media” (October 23-25)

On October 23-25, Stefania will be in Tel Aviv to take part in the international workshop “Algorithmic Knowledge in Culture and in the Media” at the Open University of Israel. The invitation-only workshop is organized by Eran Fisher, Anat Ben-David and Norma Musih. Stefania will present a paper on the ALEX project, DATACTIVE’s spin-off, as an experiment into algorithmic knowledge.

Unpacking the Effects of Personalization Algorithms: Experimental Methodologies and Their Ethical Challenges

Stefania Milan, University of Amsterdam

With social media platforms playing an ever-prominent role in today’s public sphere, concerns have been raised by multiple parties regarding the role of personalization algorithms in shaping people’s perception of the world around them. Personalization algorithms are accused of promoting the so-called ‘filter bubble’ (Pariser 2011) and suspected of intensifying political polarization. What’s more, said algorithms are shielded behind trade secrets, which contributes to their technical undecipherability (Pasquale 2015). Against this backdrop, the ALgorithms EXposed (ALEX) project, has set off trying to unpack the effects of personalization algorithms, experimenting with methodologies, software developments, and collaborations with hackers, nongovernmental organizations, and small enterprises. In this presentation, I will reflect on four aspects of the ALEX project as an experiment into algorithmic knowledge, and namely: i) software development, illustrating the working of the browser extensions and; ii) experimental collaborations within and beyond academia; iii) methodological challenges, including the use of bots; and iv) ethical challenges, in particular the development of data reuse protocols allowing users to volunteer their data for scientific research while individual safeguarding data sovereignty.

YouTube Algorithm Exposed: DMI Summer School project week 1

DATACTIVE participated in the first week of the Digital Methods Initiative summer school 2019 with a data sprint related to the side project ALEX. DATACTIVE’s insiders Davide and Jeroen, together with research associate and ALEX’s software developer Claudio Agosti, pitched a project aimed at exploring the logic of YouTube’s recommendation algorithm, using the ALEX-related browser extension ytTREX allows you to produce copies of the set of recommended videos, with the main purpose to investigate the logic of personalization and tracking behind the algorithm. During the week, together with a number of highly motivated students and researchers, we engaged in collective reflection, experiments and analysis, fueled by Brexit talks, Gangnam Style beats, and the secret life of octopuses. Our main findings (previewed below, and detailed later in a wiki report) pertain look into which factors (language settings, browsing behavior, previous views, domain of videos, etc.) help trigger the highest level of personalization in the recommended results.


Algorithm exposed_ investigasting Youtube – slides