Uphold the code: How complete, detailed and Open code can enhance understanding, improve reproducibility, and change the shape of the research article

May 24, 2022 PLOS Open Code Open Science

Just over a year ago, PLOS Computational Biology introduced a new journal policy requiring authors to make public any code directly related to the results of their article upon publication of the work. We look forward to sharing the outcomes of the policy in a future post. Today, we explore the drivers behind this policy change, including community demand, research integrity, and the way we consume information.

The importance of Open Code

Like all methodological tools and documentation, researcher-generated scripts and code are scientific assets with real and lasting value to the research community. Augmenting a research article with Open Code contextualizes results and enhances understanding, supports reproducibility and reuse, and increases efficiency across the entire scientific ecosystem.

Researchers certainly appreciate the strength and relevance of open code. In a recent survey of computational biology researchers, 75% of respondents reported looking at code associated with articles they are reading, with 39% consulting code ‘frequently’ or ‘very frequently.’ When asked why they looked at code, 70% of respondents expressed a desire to improve their understanding of the study, and 48% hoped to reuse or adapt the code. Even before the introduction of the code policy in 2021, PLOS Computational Biology authors voluntarily shared code at a high rate (61% in 2020), indicating that this community not only consumes code, but also actively contributes to it.

Jason Papin, co-Editor-in-Chief of PLOS Computational Biology nicely summarizes the sentiments among researchers in the discipline. “Code is the life-blood of research in our field. In order to truly build on the discoveries and advances in our computational biology, we must be able to build on the work of others. Computational models that are readily accessible and shared can be probed in new ways to further drive biological insight; the implementation of newly developed methods with readily accessible code can be applied to new data sets to generate new hypotheses.”

Making the decision to share code signals confidence and integrity on the part of the authors, which in turn supports trust in the community. It suggests that the authors want their work to be scrutinized, and understood on its real merits. And that they are ready and willing to open doors for fellow researchers by enabling them to more easily reuse or adapt scripts for their own studies, helping the whole field to advance more quickly.

Code as the crux of research

In a recent perspective article published in PLOS Computational Biology, DuPree and colleagues draw a distinction between the scholarly article, which they argue functions as a kind of preview or entrypoint to the research, and the real scholarly work. That is, the methods researchers develop (in this case software and code), and the data they gather and analyze in order to produce a result. It’s this data and code which forms the core of the research, the fulcrum on which everything else turns. For that reason, code not only deserves a place in the literature, it is required in order to fully understand and appreciate the scholarship.

Just as importantly, code shouldn’t be flattened or diminished in order to fit within the confines of a classic research article. Instead, it’s the joint responsibility of publishers and the scientific community to search out the most effective ways to capture and communicate different types of knowledge—whether that means rich text and linking, integrated and interactive research objects, or other tools.

Reflecting on DuPree et al’s perspective, Feilim Mac Gabhann, co-Editor-in-Chief of PLOS Computational Biology noted, “the concept of integrating more complex research products into publications has been around for some time—from embedding videos in papers to interactive visualizations and runnable code—and given some recent progress in the field this seemed like a timely perspective. Particularly in relation to computational biology, which generates very shareable and interactive materials, and has a culture of sharing and open science that matches PLOS’ mission. Computational biology and computational sciences more broadly can be at the vanguard of enhancing or rethinking what a research paper can be and can include, which enhances scientific communication as well as the reach and uptake of the tools developed.”

Why now?

Why is Open Code so important to researchers today? For that matter, why have methods documentation and data, in all their varied forms and formats, risen to such prominence in recent years?

There are many reasons, of course. A growing awareness that published research may not be reproducible. The lived experience of researchers attempting to replicate results, conduct systematic reviews, or adapt methods for use in another study. Among the most intriguing relates to the increasing size and complexity of scientific investigations, and our evolving standards for burden of proof.

Nikola Stikov of the University of Montreal explains, “work has gotten more collaborative, the barriers to new knowledge are higher, and in many fields, ideas are getting harder to find…so I believe the burden of proof is higher, and as a result code and data need to enter the picture.”

Looking ahead

Each editor and researcher we spoke with pointed out that, when it comes to Open Code, simple linking is just the beginning of what’s possible. From code notebooks, to repositories, to containerization, there are myriad ways to preserve, share and understand Open Code, prompting us to consider what an optimized, modern research article might look like.

At the same time, as popular as Open Code is in computation fields, there is still a long way to go before it becomes generally accepted best practice across the scientific landscape.

Watch our full conversation with Nikola Stikov of University of Montreal and Jean-Baptiste Poline of McGill University. We explore subjects relating to Open Code practice, the role of data and code in communicating reproducible research, and what the future may hold.