Go forth and reproduce

June 22, 2006 Chris Surridge Publishing

Sorry for the slight hiatus in blog posts. Things have been a bit lively in the office the last couple of days. Not least of the things that have been occupying our minds has been the Nature news piece about us. We have been taking some deep breaths before writing a response but you can read it here first. So please be patient Glyn.

Before that though I wanted to mention reproducibility.

This was prompted by an email that came to me from Roger Peng of Johns Hopkins Bloomberg School of Public Health. The salient parts are:

“I am very excited about the introduction of the new PLoS ONE journal. It seems that you have a number of interesting features planned that will make the journal stand apart from traditional paper journals. I would like to encourage you to allow authors to make their articles *reproducible* if they so wish.
That is, allowing authors to also provide their analytic data (or a pointer to it), analytic computer code, and documentation along with the article. That way, readers could reproduce and conduct alternative analyses of primary findings (tables, figures, etc.).”

This is a really important point. It is of course crucial that the validity of a piece of scientific work can be fully assessed from what is published. In this regard it has long been the expectation that methods in papers should be described in sufficient detail to allow the experiments to be repeated by other researchers, given sufficient skill and resources. That may be the expectation but it is a little depressing how rarely this is either achieved or enforced.

But this isn’t what Roger means by ‘reproducible’.

A figure in a paper is a way of representing the raw data in such a way to best illustrate the point the author is making. A figure then is the product of an operation upon the raw data, and that operation results in a loss of information.

The raw data could have been presented in a host of different ways possibly supporting other conclusions not thought of by the author. Equally if a reader had raw data compatible with that the author obtained wouldn’t it be useful if it could be processed in the same way for comparison? Wouldn’t it be much better for readers to have access not only to the figures in a paper but also to the underlying data and the transform that created it. In this way no information, neither implicit nor explicit, is lost.

This idea isn’t new. I first came across it in the work of Jon Claerbout at Stanford. At the time he was working with some software called Cake but has since moved into less arcane systems and has developed a number of GNU tools for making reproducible documents.

Another initiative in this area is the R project for statistical computing, an endeavour in which Wolfgang Huber, who has been advising us on PLoS ONE, is involved.

I can’t promise any specific initiatives here but we are very keen to support reproducibility where possible. It is very important to us that Open Access publishing should facilitate the use and reuse of data, not just the conventional paper. PLoS ONE, when it is launched, should have the flexibility to accommodate reproducibility. Add to that the open source publishing platform which we are developing and I hope that we will have a medium to drive forward the development of the scientific paper as well as a great sandbox in which to play.