When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS BLOGS The Official PLOS Blog

Go forth and reproduce

Sorry for the slight hiatus in blog posts. Things have been a bit lively in the office the last couple of days. Not least of the things that have been occupying our minds has been the Nature news piece about us. We have been taking some deep breaths before writing a response but you can read it here first. So please be patient Glyn.

Before that though I wanted to mention reproducibility.

This was prompted by an email that came to me from Roger Peng of Johns Hopkins Bloomberg School of Public Health. The salient parts are:

“I am very excited about the introduction of the new PLoS ONE journal. It seems that you have a number of interesting features planned that will make the journal stand apart from traditional paper journals. I would like to encourage you to allow authors to make their articles *reproducible* if they so wish.
That is, allowing authors to also provide their analytic data (or a pointer to it), analytic computer code, and documentation along with the article. That way, readers could reproduce and conduct alternative analyses of primary findings (tables, figures, etc.).”

This is a really important point. It is of course crucial that the validity of a piece of scientific work can be fully assessed from what is published. In this regard it has long been the expectation that methods in papers should be described in sufficient detail to allow the experiments to be repeated by other researchers, given sufficient skill and resources. That may be the expectation but it is a little depressing how rarely this is either achieved or enforced.

But this isn’t what Roger means by ‘reproducible’.

A figure in a paper is a way of representing the raw data in such a way to best illustrate the point the author is making. A figure then is the product of an operation upon the raw data, and that operation results in a loss of information.

The raw data could have been presented in a host of different ways possibly supporting other conclusions not thought of by the author. Equally if a reader had raw data compatible with that the author obtained wouldn’t it be useful if it could be processed in the same way for comparison? Wouldn’t it be much better for readers to have access not only to the figures in a paper but also to the underlying data and the transform that created it. In this way no information, neither implicit nor explicit, is lost.

This idea isn’t new. I first came across it in the work of Jon Claerbout at Stanford. At the time he was working with some software called Cake but has since moved into less arcane systems and has developed a number of GNU tools for making reproducible documents.

Another initiative in this area is the R project for statistical computing, an endeavour in which Wolfgang Huber, who has been advising us on PLoS ONE, is involved.

I can’t promise any specific initiatives here but we are very keen to support reproducibility where possible. It is very important to us that Open Access publishing should facilitate the use and reuse of data, not just the conventional paper. PLoS ONE, when it is launched, should have the flexibility to accommodate reproducibility. Add to that the open source publishing platform which we are developing and I hope that we will have a medium to drive forward the development of the scientific paper as well as a great sandbox in which to play.

Discussion
  1. I know that you are still trying to work out the details of PLoS ONE but based on this post I am getting the impression that you are thinking of including raw experimental data. Is this basically along the lines of supplementary materials for each paper or do you envision publishing raw data prior to integration and analysis into a formal paper? And is this something that will cost? From my perspective as a researcher, in order to really play in your sandbox, it would have to be free. Is there any chance that you could get sufficient outside funding to make publication of any kind in PLoS ONE free for the contributors?

  2. Well that would be something to shoot for. Right at this moment we aren’t looking at publishing raw data straight from the bench. I hope that PLoS ONE will be able to capture the evolution of a paper, and we are looking at enabling revision of papers to be published but I don’t think we can be as radical as Jean-Claude would like just yet. As for making PLoS ONE free for contributors as well as readers to achieve that we will have to find some new ways to fund ourselves. Any suggestions gratefully received.

  3. At some point in the future I’d like to see PLoS ONE allow authors to include data, but not in the style of supplementary materials in other journals. I think it’s important that the data be linked to the specific results in the paper so that we can understand the complex analysis that makes up a paper.

    I like to think of reproducible research as the scientific research analogue of the open source idea in software. Simply giving someone raw data is like giving someone the source code to a large project without giving them the Makefile. Yes, it’s possible to figure it all out eventually, but we can surely do better to achieve the goal of reproducibility.

  4. Absolutely.

    Supplementary information is an easy option but also the least effective and user-friendly. From day one PLoS ONE can allow as many SI files as anyone could want just so long as no individual file is larger than 10MB–that is no more than we have available on all PLoS journals–but we want to be able to do better than that.

    This is of course one of the reasons why the TOPAZ publishing platform is being developed as an Open Source project. We want to enable the people who best understand specific publishing problems to create solutions that can be easily shared.

  5. This kind of data archiving would help reproducing data analysis. It’s often the case that data collection is hard to reproduce. I suspect that, at least for some articles, it would be worthwhile to attach video of the experimental methods being performed to an article. That way, everything involved in generating the raw data will be documented.

    I don’t think this will ever be commonly done. And even if investigators were willing to do it now, the bandwidth and storage requirements would probably be too great. But at some time in the future I expect that it will be done, especially in the event of a dispute.

  6. From day one PLoS ONE can allow as many SI files as anyone could want just so long as no individual file is larger than 10MB–that is no more than we have available on all PLoS journals

    The relevant policy section actually says “All supporting material…should be smaller than 10 MB in size because of the difficulties that some users will experience in loading or downloading files of a greater size.” This implied to me that the TOTAL should be less than 10Mb, not that each individual file should be less than 10Mb. For the sake of clarity you might want to change it to something like “All supporting material… should be composed of individual files no larger than 10Mb…”.

Leave a Reply

Your email address will not be published. Required fields are marked *


Add your ORCID here. (e.g. 0000-0002-7299-680X)

Back to top