Provocations
Introduction
Chair: Jarmo de Vries (University of Edinburgh)
Content
Miguel García-Sancho, University of Edinburgh
The day started with two provocations by Miguel García-Sancho and Liliana Doganova, followed by general discussion. García-Sancho proposed the following diagram to investigate data platforms across different knowledge domains. Watch the video for a discussion of the diagram as it is elaborated:
- Biography
-

Miguel Garcia-Sancho is a Senior Lecturer at the University of Edinburgh. His research interests are in the history of contemporary biomedicine, with special emphasis on the transition between molecular biology and new forms of knowledge production at the fall of the 20th century: biotechnology, bioinformatics and genomics. He is leading a project entitled Medical Translation in the History of Modern Genomics (TRANSGENE ), with funding from the European Research Council. His research focuses on the history of concerted DNA mapping and sequencing initiatives, with special attention to the human and other whole-genome projects that proliferated in the 1980s and 90s. Along with his team, he has produced a large dataset documenting the practice of DNA sequencing across three different species: yeast, Homo sapiens and the domestic pig Sus scrofa. He has also investigated the emergence of agricultural biotechnology and the cloning of Dolly the sheep. His book Biology, Computing and the History of Molecular Sequencing: From Proteins to DNA was published by Palgrave-Macmillan. He previously worked as a journalist and is interested in science communication and public engagement.
Liliana Doganova, CSI Mines Paris Tech
Provocation
In her provocation, Doganova's starting point was the conclusion formulated by Miguel García-Sancho in his introduction to this workshop: that the affordances and limitations of large datasets in genomics are set by the communities that produce them, the ways they were organised, the people they included or excluded, the tools and discourses created, and the goals and ambitions motivating the work. This posed the question of whether this model of dynamic interaction between communities and a dataset could be exported to other datasets. Doganova's answer was a tentative yes, based on two conditions. First, that the model has to be generalised: indeed, the sociology of science and technology teaches us that this model of dynamic interaction holds for the production of scientific knowledge and technology more generally. Second, that the model has to be specified: if dynamic interaction between knowledge, technology and user communities is the general model, then what are the specificities of particular kinds of data, and what how are communities that ‘use’ data defined?
To make this clear, Doganova discussed the workshop presentations and showed how they present different types of communities and the relationships that they have with the data(sets). Based on this, her assessment was that the definition of communities needs to be clarified and extended. It should not just include the actors involved in the production of science and data, but also various users and end-users of it; it needs to include all actors from production to consumption, as Robert Bud suggested in a previous commentary during the workshop. Often, it is unclear who the users and consumers of data are, for both scientific research and commercial endeavours. Users and consumers are omnipresent in narratives about data but at the same time appear intangible. Therefore, an important task for scholars investigating data is to look for users and specify who they are.
- Biography
-

Dr Liliana Doganova is an Associate Professor at the Centre de sociologie de l’innovation, Mines-ParisTech, PSL University. At the intersection of economic sociology and Science and Technology Studies, her work has focused on business models, the valorization of public research and markets for bio- and clean-technologies. She has published in journals such as Economy and Society, the Journal of Cultural Economy, Research Policy, Science as Culture, and Science and Public Policy. She is the author of Valoriser la science (2012) and a co-author of Capitalization: A Cultural Guide (2017). She is currently completing a monograph on the historical sociology of discounting the future as a valuation technique, and conducting research on the pricing of drugs and the valuation of forests.
Discussion following presentations
Following the provocations, three questions (and follow-up queries) were raised that started a free-flowing discussion:
- Is it possible to jointly analyse and compare data and data platforms produced in different domains or are they somehow incommensurable? Are the categories that scholars use to understand them transposable?
- How can the origins and trajectories of data in the life sciences and finance be traced and assessed once they have been integrated into repositories? What are the tools that scholars can use to do this?
- Are there some general observations that can be made about the relationship between local data repositories and infrastructures on the one hand, and ones with more global ambition and scope on the other?
The figure of the user became an important focus of the discussion. Data users played a much more important role in the presentations of the second panel than those of the first, an absence that was highlighted by Robert Bud’s commentary at the end of the first day of the workshop. The rhetoric concerning users and how the term may obscure that users are often the producers as well as the consumers of data was addressed. Relatedly, studying the rhetoric, discourse, and narratives of data was offered as a way of comparing and getting insight into different data platforms. The platforms and trajectories discussed in both panels are extremely varied. It was proposed that discourses and narratives around the ownership, openness, closeness, privacy and security across different domains of data may offer a way in to analysing how commensurable these data platforms are.
The matter of how datasets acquire value was also posed. In this discussion as well as in the Q&A following Kean Birch’s presentation, it was noted that the valuation of many platform companies was based on expectations of future monopolies and the perceived value of their databases. It was also posed that producers often cannot anticipate which data or datasets will become valuable, how they may be used in the future, and which other data they may be combined with. This is, in part, because what are seen as data and appropriate uses depend on local contexts and changing circumstances. As each dataset likely reflects the needs of the community that produced it that means that additional work needs to be done to combine datasets, make them interoperable or change their context of production. Another problem is that taking care of datasets is expensive and there is only limited funding to maintain them. To better understand these matters, it was suggested that how datasets become valuable and the life-cycles of both extant and extinct datasets be examined.
García-Sancho and James Lowe used the concept of communities in their analysis of the pig, yeast, and human genome sequencing projects. Other participants challenged the appropriateness of the term for wider use in the analysis of data platforms. To some it seemed too friendly a term, pointing out that it overlooks processes of inclusion and exclusion and the labour that is done with the data. The concepts of publics or communities of practice were potentially seen as more applicable. Additionally, it was suggested that the concept of communities may not be valid for discussions of platform companies and the economic relationships that are implicated in their operation. However, it was contented that communities may be an apposite term for capturing how, for example, Facebook users interact. It was also observed that many early internet communities turned later into commercial ventures. Similarly, platforms in plant science often started out as community endeavours, and were later developed as non-profit organisations. The discussants concluded that regardless of the concept used to describe the actors involved in particular data platforms, there is a need for looking at who benefits from these platforms, and who does so to a lesser extent.
The participants also shared their ideas about what further studies of data platforms should look like. Many of the presentations were micro-social studies of data production and use, of knowledge production, of transformations, and the specific circumstances in which these occurred. They were seen as valuable starting points for making comparisons between cases and then trying to generalise from them. However, the discussions of the last three days also made clear that there is a need for broader studies that investigate longer-term histories of data and infrastructures (including ones deemed to have failed), and the political economy in which these platforms arose and developed.