Media Coverage in

I was recently interviewed for an article in the about the lab’s work on digital humanities, which has been funded for the last three years on a MIS (Mandat d’Impulsion Scientifique) by the FNRS, the Wallonian funding agency for scientific projects. Read on for the interview! The original was published in French; I’ve translated it quickly here.

My many thanks to Adrian Dewer, author for the magazine who patiently interviewed me despite my poor French, and produced a really lovely article at the end of the process! You can also find a PDF file of the original magazine article (in French) here.

Statistical Analysis in the Service of Our Scientific Heritage

Over the course of time, disagreements appear in the scientific community or in society in general. Notably, this includes those around the meaning of certain words, certain terms, which have evolved, changed over the course of history. To evaluate these disagreements concerning part of our scientific heritage, philosophical research is trying its hand at a new discipline: statistical analysis.

The development of science has been accompanied by an increasing production of scientific articles. “If you take only the journal Nature which started publication in 1869,” as points out Charles Pence, Chargé de cours at UCLouvain and beneficiary of an FNRS Mandat d’Impulsion Scientifique, “you would have to read ten articles a day for a hundred years to get a complete knowledge of these archives! Faced with this problem, we’ve imagined, with my former doctoral advisor, an alternative method to the ’traditional’ one.”

Gathering Large Quantities of Articles

To work on the meaning of the concept of “fitness” at the heart of debates concerning Darwin’s theory of evolution, Charles Pence set about assembling as many scientific articles as possible connected to scientific debates at the turn of the 19th and 20th centuries. “At first, the journals were reticent to give us access to their publications. We had to enter into very precise contracts and we worked closely with the general counsel of the university many times. It was the start of this kind of research; today the journals are more inclined to offer access.” The contracts signed, Charles Pence ended by having at his disposal an impressive database of 300,000 articles [misunderstanding in the interview, today it nears 1.5M], including those from the journal Nature. Now all that was left was to analyze the content of this heritage…

“I first tried to do this research with huge Excel spreadsheets, which quickly failed,” admits the philosopher of science. “As I liked programming since I was a kid, I ended up developping a digital tool enabling us to perform searches in the text of the articles. Simple searches via dates or authors aren’t very difficult. The challenge, however, remains in finding information about the meaning of particular terms – in order to follow the evolution of a term over time or its connections with a term of interest, and thus to analyze how it is used.

Preparing the Data

Obviously, Pence’s method is not without risks. At the start of a data mining project, “a large part of the work consists in preparing the data, cleaning them, and being sure that we have a set of information that permits us to formulate and test a hypothesis,” he emphasizes. “Further, a large number of data don’t necessarily result in objectivity. We never have all of the articles, but only a selection, however large it might be, which thus entails a bias. We have to be careful not to do work that does nothing but confirm the hypothesis that we started with.”

If this new statistical approach doesn’t aim to replace that of reading a smaller number of texts by a more limited number of authors or even around a specific concept, “it permits to formulate new hypotheses and to test them in parallel with a more conventional method,” assures its inventor.

This approach allows us, for example, to evaluate the disagreements that appear in the scientific community or in society at large. “The concept of ‘species’ is an excellent example of this disagreement,” Pence notes. “It goes back to Aristotle, but science has of course evolved since then and known radical changes, in particular after the development of DNA sequencing. How to combine at the same time birds and bacteria under a single concept of species that still works? In this case, certainly, they’re too different for the concept to work!”

The classification of life into classes, orders, families, etc., and the quantification of its diversity rests indeed on these terms with old meanings. “We find ourselves often with words that have a very long history and for which the meanings have become obscure as a result of scientific discoveries. For instance, in the scientific community, some argue that we should abandon the concept of biodiversity while others argue in its favor. There’s a dozen different definitions of the term ‘species’ today. Our statistical approach can let us have a view of the distribution of these different meanings attached to a concept.” Or even to build a cartography of a term with its most common or relevant associations!

Playing a Role in Scientific and Societal Debate

“These concepts are at the heart of debates and social problems. Biodiversity is a nice example. The classic measure of biodiversity, as we know related to so many ecological problems, is based on the number of species. But that’s not enough! We have to add phylogenetic links between species, or even clades or families.”

A statistical approach in philosophy will permit, Pence hopes, to enlarge the role of the philosopher in the scientific community as well as in society. “Today, philosophy of science can play a descriptive and a normative role. This last approach is our challenge in the coming years: how to intervene in the scientific process? The scientist is more concerned with her research and its results than with problems of definition. It’s there where our work can be most interesting and where we can find a role as philosophers. Research on the meaning and history of certain terms can permits us to clarify the sense of concepts which have been used by researchers. In standardizing some of our results – this is work which will be started by a postdoc quite soon – we could play a concrete role in scientific practice. All that while having an impact in debates and social questions around biodiversity.”