Trends in preprint, data, and code sharing, 2019-2022

Explore the latest Open Science Indicators dataset

April 4, 2023 PLOS Open Access Open Code Open Data Open Science Open Science Indicators Preprints

Written by Lauren Cadwallader, Lindsay Morton, and Iain Hrynaszkiewicz

PLOS recently introduced Open Science Indicators (OSIs), a large public dataset identifying and quantifying Open Science practices like preprint posting, data sharing and code sharing in PLOS articles, as well as a selection of comparator articles published elsewhere. Now, we are delighted to release another six months of data from the second half of 2022, providing a new view of Open Science practices by researchers, over four years. The latest results continue to show incremental growth in all three areas. Read on for more details on the project, detailed numbers, and a closer look at preprints.

The why and how of Open Science Indicators

PLOS’ founding mission centers on breaking down barriers to publish, read, and reuse trusted scholarly research with the overall aim of accelerating progress across science and medicine. We do that by making Open Access publishing options and Open Science best-practices more accessible to all researchers, in all disciplines, all around the world.

We aren’t alone in this work: across the scholarly ecosystem, researchers, funders, institutions, librarians, professional organizations, and other Open Access publishers are embracing Open practices that support equity and inclusion, research integrity and reproducibility, and collaboration across distance and discipline. But to understand whether we are making progress towards increased adoption, we first have to understand where researchers are today. Open Science Indicators allow us to establish benchmarks, and to track changes over time.

After developing a set of principles and definitions for Open Science practices, PLOS partnered with DataSeer to create “indicators” which identify and measure specific Open Science practices in published research articles. In December we introduced our first three indicators: data sharing, code sharing, and preprint posting. We are currently working on a fourth indicator for protocol sharing. In the future we aim to continue refining our methods and expanding the dataset. We appreciate your comments and feedback, both on the latest data, how you are using it, and the data points and features you’d find most valuable in the future. Comment below, or email us at community [at] plos.org.

Share your format preference

Download the dataset

A closer look at the latest Open Science Indicators

About the dataset

The March 2023 dataset includes all 71,109 PLOS research articles published during the 4 year period from January 2019 to December 2022, along with a comparator set of 7,635 publicly-available research articles from PubMed Central, an increase of 16% over the previous dataset.

A few caveats to bear in mind: Open Science Indicators report only machine-detectible traits. Unclear labeling or missing metadata may mean that some practices are underrepresented in the dataset. Accuracy rates for the dataset have been updated and expanded in this second release, offering a more nuanced view of the quality of the dataset for advanced users.

Updates through the end of 2022

Open Science Indicators results continued to follow established patterns through the remainder of 2022. Specifically for PLOS articles:

Rates of data repository use continued to rise, from 26% in 2021 to 28% for articles published in 2022. Overall rates of data sharing also rose to 75%.
Code sharing rates rose slightly in 2022, compared to 2021, reaching 15% for all articles published in 2022
The rate of preprints associated with published articles held steady at 24% across 2021 and 2022

A closer look at preprint posting

To complement the summary of data- and code-sharing results that accompanied our first data release, here we take a more in-depth look at preprint posting.

While the proportion of published articles with an associated preprint remained the same across 2021 and 2022, if we look at preprint habits based on the date the preprint was posted, rather than the date the associated article was published, a different pattern emerges. For PLOS articles, 57% more preprints were posted in Q2 2020 compared to the previous quarter, likely in part in response to the Sars-CoV-2 pandemic. Since then, rates have gradually normalized until, by Q4 2021, they were slightly above pre-pandemic levels, in keeping with the gradual upward trend observed in prior years. Comparator data sees a less marked pandemic increase and levels remain more stable throughout 2021 (although this is a much smaller sample size). Future iterations of the dataset will provide greater clarity on the ongoing trends for preprints.

_{*Preprints posted in 2022 are date are excluded, as many of these papers have not yet gone on to publication}

One benefit of the Open Science Indicators dataset is that it provides a more complete view of preprint posting than has previously been possible for PLOS. Among the most popular preprint servers for both PLOS and comparators were the community/discipline specific servers bioRxiv, medRxiv. Differences in server use between PLOS and comparators is likely correlated with the preprint servers different publishers work most closely with.

Using the Open Science Indicators

There are many ways to view and analyze Open Science Indicators: take a closer look at the data: identify the most popular data and code repositories, look for patterns in how researchers use repositories vs Supporting Information files to share data, or cross reference to investigate differences in disciplinary or regional norms. Just last month, Robyn Price, Imperial College London wrote on The Bibliomagician how institutions might use Indicators to better understand their researchers’ publishing patterns while taking a deeper look at the data for a specific institution. However you use Open Science Indicators, we want to hear about it! Share your interpretations, thoughts, and questions—and be sure to let us know what additional data points would be most useful to you.

1.	bioRxiv	57%*
2.	Research Square	16%
3.	medRxiv	15%
4.	arXiv	4%
5.	PsyArXiv	3%

1.	Research Square	46%*
2.	bioRxiv	31%
3.	JMIR	8%
4.	medRxiv	7%
5.	Preprints.org	3%