Written by Lauren Cadwallader, Lindsay Morton, and Iain Hrynaszkiewicz Open Science is on the rise. We can infer as much from the…
Trends in preprint, data, and code sharing, 2019-2022
Explore the latest Open Science Indicators datasetWritten by Lauren Cadwallader, Lindsay Morton, and Iain Hrynaszkiewicz
PLOS recently introduced Open Science Indicators (OSIs), a large public dataset identifying and quantifying Open Science practices like preprint posting, data sharing and code sharing in PLOS articles, as well as a selection of comparator articles published elsewhere. Now, we are delighted to release another six months of data from the second half of 2022, providing a new view of Open Science practices by researchers, over four years. The latest results continue to show incremental growth in all three areas. Read on for more details on the project, detailed numbers, and a closer look at preprints.
The why and how of Open Science Indicators
PLOS’ founding mission centers on breaking down barriers to publish, read, and reuse trusted scholarly research with the overall aim of accelerating progress across science and medicine. We do that by making Open Access publishing options and Open Science best-practices more accessible to all researchers, in all disciplines, all around the world.
We aren’t alone in this work: across the scholarly ecosystem, researchers, funders, institutions, librarians, professional organizations, and other Open Access publishers are embracing Open practices that support equity and inclusion, research integrity and reproducibility, and collaboration across distance and discipline. But to understand whether we are making progress towards increased adoption, we first have to understand where researchers are today. Open Science Indicators allow us to establish benchmarks, and to track changes over time.
After developing a set of principles and definitions for Open Science practices, PLOS partnered with DataSeer to create “indicators” which identify and measure specific Open Science practices in published research articles. In December we introduced our first three indicators: data sharing, code sharing, and preprint posting. We are currently working on a fourth indicator for protocol sharing. In the future we aim to continue refining our methods and expanding the dataset. We appreciate your comments and feedback, both on the latest data, how you are using it, and the data points and features you’d find most valuable in the future. Comment below, or email us at community [at] plos.org.
Share your format preference
Download the dataset
A closer look at the latest Open Science Indicators
About the dataset
The March 2023 dataset includes all 71,109 PLOS research articles published during the 4 year period from January 2019 to December 2022, along with a comparator set of 7,635 publicly-available research articles from PubMed Central, an increase of 16% over the previous dataset.
A few caveats to bear in mind: Open Science Indicators report only machine-detectible traits. Unclear labeling or missing metadata may mean that some practices are underrepresented in the dataset. Accuracy rates for the dataset have been updated and expanded in this second release, offering a more nuanced view of the quality of the dataset for advanced users.
Updates through the end of 2022
Open Science Indicators results continued to follow established patterns through the remainder of 2022. Specifically for PLOS articles:
- Rates of data repository use continued to rise, from 26% in 2021 to 28% for articles published in 2022. Overall rates of data sharing also rose to 75%.
- Code sharing rates rose slightly in 2022, compared to 2021, reaching 15% for all articles published in 2022
- The rate of preprints associated with published articles held steady at 24% across 2021 and 2022
A closer look at preprint posting
To complement the summary of data- and code-sharing results that accompanied our first data release, here we take a more in-depth look at preprint posting.
While the proportion of published articles with an associated preprint remained the same across 2021 and 2022, if we look at preprint habits based on the date the preprint was posted, rather than the date the associated article was published, a different pattern emerges. For PLOS articles, 57% more preprints were posted in Q2 2020 compared to the previous quarter, likely in part in response to the Sars-CoV-2 pandemic. Since then, rates have gradually normalized until, by Q4 2021, they were slightly above pre-pandemic levels, in keeping with the gradual upward trend observed in prior years. Comparator data sees a less marked pandemic increase and levels remain more stable throughout 2021 (although this is a much smaller sample size). Future iterations of the dataset will provide greater clarity on the ongoing trends for preprints.
*Preprints posted in 2022 are date are excluded, as many of these papers have not yet gone on to publication
One benefit of the Open Science Indicators dataset is that it provides a more complete view of preprint posting than has previously been possible for PLOS. Among the most popular preprint servers for both PLOS and comparators were the community/discipline specific servers bioRxiv, medRxiv. Differences in server use between PLOS and comparators is likely correlated with the preprint servers different publishers work most closely with.
Top preprint servers for PLOS Content
1. | bioRxiv | 57%* |
2. | Research Square | 16% |
3. | medRxiv | 15% |
4. | arXiv | 4% |
5. | PsyArXiv | 3% |
Top preprint servers for comparator content
1. | Research Square | 46%* |
2. | bioRxiv | 31% |
3. | JMIR | 8% |
4. | medRxiv | 7% |
5. | Preprints.org | 3% |
*Calculated as a percentage of total preprints posted in a detectable repository
The prevalence of preprints also varies by region. Looking at broad geographic areas, we see the highest preprint rates (27%) in the Americas. Africa (excluding North Africa), Europe, and Australasia all post preprints at a similar rate (21-23%), while Asia and North Africa/Middle East (MENA) have lower adoption at 15% and 17% respectively. We haven’t so far sought to interpret why these differences in preprint posting by region exist, but welcome comments. These latest Open Science Indicator results may complement the work of others who have begun to look at regional preprint adoption rates in the context of equity in scientific publication.
Using the Open Science Indicators
There are many ways to view and analyze Open Science Indicators: take a closer look at the data: identify the most popular data and code repositories, look for patterns in how researchers use repositories vs Supporting Information files to share data, or cross reference to investigate differences in disciplinary or regional norms. Just last month, Robyn Price, Imperial College London wrote on The Bibliomagician how institutions might use Indicators to better understand their researchers’ publishing patterns while taking a deeper look at the data for a specific institution. However you use Open Science Indicators, we want to hear about it! Share your interpretations, thoughts, and questions—and be sure to let us know what additional data points would be most useful to you.