[BigDataSur] Challenges and opportunities of studying big data in the Global South

Technical University of Munich, School of Governance, Professorship of Computational Social Science and Big Data

In a pilot project at the Technical University of Munich’s School of Governance, we are currently analyzing big data-related government policies. The broader goals of the project are:

to shed light on how governments conceptualize big data in the Global North and Global South;
to explain these narratives, particularly in reference to how they reflect and/or shape popular opinion; and
to identify the political and economic implications of big data, both within national contexts (e.g. in supporting/hindering development) as well as among Northern and Southern countries (e.g. in reference to global governance of data).

In this post, we will, first, briefly introduce our project and describe some preliminary results. Subsequently, we will discuss the challenges we have faced in examining big data narratives within a Southern context. We conclude by making some suggestions for overcoming these issues.

Our aim is to stimulate more interest in this and related topics, which we see as vital for understanding how the world economy and diverse world polities will develop in the coming years. Before turning to these tasks, however, we would like to underline our appreciation for the establishment of this blog and mailing list, which we hope will facilitate widespread cooperation among data studies scholars, data activists, data governance and policy experts across the globe. We are grateful for the opportunity to present our reflections here.

Preliminary results

Our current project starts from the recognition that big data evokes controversy in the international relations literature and that this controversy centers on how government actors understand (big) data as useful in achieving political and economic goals. For instance, the handling of data can be conceived as a force for liberation, enabling greater public participation in decision-making (e.g. via open government data), or as a force for repression, empowering governments and corporations to more effectively monitor and control citizen behavior (e.g. via logging transactional or behavioral data). Likewise, scholars have highlighted both the opportunities and challenges of data in stimulating economic development in poorer countries. What does the “big” stand for in big data related policies? How is social change envisioned by using and regulating the growing amounts of data generated by the government, research, and industry? What social values are attached to imaginaries affiliated with big data, among them innovation, growth, transparency, and efficiency?

By performing a content analysis of laws and strategy documents from three emerging power states (Brazil, India and China; henceforth the BICs), we examine these controversies in more detail. Our intention is to determine the relevance of diverse policy narratives within and across countries, to evaluate the extent to which narratives are likely to become policy, and to make pragmatic suggestions regarding how Southern governments can optimize the use of big data to achieve desired outcomes. A first publication from this project, written with Jürgen Pfeffer and entitled “Policy Visions of Big Data: Views from the Global South,” has recently appeared in Third World Quarterly. In it, we find the BICs promote three visions of big data: as a force for political liberation or repression, for improving the efficiency and effectiveness of public services and for facilitating development more broadly. The analysis suggests successful BIC cooperation is likely related to development and government services, but less probably related to the liberation/repression vision.

These similarities and differences are reflected in the word clouds below, which visualize the frequencies with which words appear in the strategic documents examined. For example, public access to information appears more important in Brazil and India than China. At the same time, though, the Brazilian government’s vision appears relatively more focused on this topic than the Indian government’s, which places greater emphasis on the role of the government (“central”, “government”, “Commissioner”, etc.) and less on principles of data governance (“public”, “access”, “information”).

BRAZIL

INDIA

CHINA

The dataset employed, including all documents in original language and English-language translation, as well as documentation of the analysis, including our codebook and guidelines for figure creation, are available at Harvard Dataverse.

Issues of studying big data visions and practices in a Southern context

We have faced four issues in our work on big data in the Global South to date.

The first was linguistic. While official versions of many documents were available in English, this was not the case for all documents examined. This problem is not unique to the Global South – policy documents in the Global North are written in a variety of languages as well! – but it remains a hurdle for researchers without the networks, language skills or financial capacity to overcome it.

Second, while there was ample literature on related topics (e.g. privacy and surveillance, economic development, democracy, etc.) within the context of the Global South, as far as we can tell, there is not yet an established network of scholars working on data and technology policies specifically within and from a Southern context. As a result, the literature to date seems to be quite thin, and we agree with Nir Kshetri, who recently wrote, “much initial research in this area” is still needed. Admittedly, perhaps we have simply failed to access literature in languages other than English or German. However, this makes the problem no less serious: such hurdles – and particularly the lack of contacts in areas being researched – can undermine the validity of findings and discourage researchers from engaging in such research at all.

Third, in studying (big) data in any location, we are faced with a rapidly changing context. From developments in China’s social credit system to privacy regulations in Europe to new approaches to big data mining and analysis in the United States, scholars must grapple with a moving target which is technologically innovative and politically sensitive and which interacts with a variety of disciplines, including economics, political science, development studies, law and computer science. Nowhere is this truer than in the Global South, where the hopes and experiments already underway with using data for development highlight both the worthiness and the challenges of keeping up with this changing context.

Finally, and bringing together all issues mentioned before, our research on the Global South follows a rather traditional framing of national states and economic policymaking. However, a truly regional perspective is lacking. Put differently, we are studying the Global South from the Global North, which can lead to significant problems if not controlled for through genuine research partnerships with scholars from the Global South. These include, among other things, the exclusion of critical perspectives, incorrect interpretation of results or notable gaps in the datasets upon which we base our analyses. The next section presents our suggestions for addressing these challenges. In general, we call for a reflective approach to scholarship related to data in the Global South characterized by engagement with diverse local actors.

What to do? Suggestions for overcoming these challenges

Starting with the linguistic challenge, we combined human and digital translation as necessary to enable access to crucial primary documents. Documents were first translated using multiple online, automatic translation tools (e.g., Google Translate). By comparing the resulting translations, we could identify passages where no clear, single meaning was apparent. These were then double-checked with native speakers. We were fortunate in this respect, having access to native speakers and scholars in all three languages used in our study. For those less fortunate, however, we think this procedure could be equally advantageous, for instance, helping scholars save on translation costs by whittling down the amount of text requiring translation in any given study. In our case, native speakers additionally performed spot checks to confirm the accuracy of our automatic translations. Finally, we drew on reputable, crowdsourced translations from websites such as chinalawtranslate.com. However, translation is certainly not the only challenge.

Regarding the issue of limited expert knowledge and literature, we began by drawing on our own, professional networks. Friends and colleagues were excited about our new project and enthusiastic in helping us contact relevant local experts, in checking that our data was an accurate reflection of government policies and in making suggestions which enriched our research process. For younger scholars or scholars new to this line of research, scholarly networks – such as this one! – can be equally helpful. The Big Data Sur network could serve such a purpose, becoming a platform of expertise which enables researchers from all over the world to find – and connect! – with one another. We hope that it will help to “engage with theoretical and empirical content from the regions on its own terms” as Amrita Narlikar demands in her call for the recognition of diversity and globalization of social scientific research.

Finally, we see the rapidly changing context of (big) data in the Global South as an opportunity as well as a challenge. Lucky us: we have embarked on a fascinating field of study where there is still lots to do! The pace of current events urges us to constantly update our theoretical and empirical understandings of the (digital) world and the close link between these events and individuals’ well-being makes our efforts both intellectually satisfying and morally motivating. Clearly, keeping in touch with scholars working on similar topics is crucial in accommodating the crippling pace of digital developments. We must support one another. Along these lines, transparency about ongoing research efforts, sharing of data and discussions of the challenges involved can help improve our methodologies and build a solid foundation of knowledge upon which to explore the implications of these developments. Such collaboration will additionally enable us to reflect on what is unique about data developments in the Global North and thus also more capable of viewing worldwide developments from diverse perspectives.

However, collaboration with local scholars is not enough. Rather, we should also include insights from non-governmental actors, politicians, journalists, scholars and citizens in these research efforts and networks. This could be accomplished via traditional research methods, such as interviews or surveys. Alternatively, it could involve participatory or citizen science projects which get the people affected by these developments involved in co-creating projects, gathering data, and analyzing and interpreting findings.

In conclusion, future advances in research on big data in the Global South will require new paths for collaborative research and creative thinking about how to structure research projects and allocate funding in ways which maximize international exchange. We hope that readers interested in exploring and comparing socio-economic, political and cultural effects of big data polices and data governance strategies across the globe will not hesitate to get in touch. You can reach us via laura.mahrenbach@hfp.tum.de and katja.mayer@hfp.tum.de

Katja Mayer works as a post-doc at the TUM School of Governance. Trained in Social Studies of Science and Technology (STS), she is currently studying open data practices in the computational social sciences.

Laura C. Mahrenbach is a post-doc at the TUM School of Governance. Trained in international political economy (IPE), Laura’s current work focuses on the intersection of changing power relations, new technologies and global economic governance.