When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS BLOGS The Official PLOS Blog

Implications of Data Sharing for the Neuroimaging Research Community – My thoughts on the new PLOS Data Policy

By Thomas E. Nichols

Studying the brain with Functional Magnetic Resonance Imaging (fMRI) is a laborious and highly multidisciplinary task. In a typical study psychologists or neuroscientists design the study and recruit the participants, programmers develop the experimental paradigms, MRI Physicists develop the acquisition sequences and ensure good data quality from the magnet, and, finally, statisticians fit models to the complex data and find signals in the noisy data.

All told, an fMRI study requires hundreds of man-hours, costly scanner time, and laborious data analysis to process gigabytes of image data.  Yet, what is the quantitative result that is the core of a published paper?  A list of x, y, z brain atlas coordinates of activation, a dataset that can be recorded on a Post-it note!  While figures may show the pattern of brain activation, if any quantitative result (point estimate +/- standard error) is given, it is only for a selected region of the brain.  Is it really acceptable practice that, of the gigabytes of raw and processed data that are generated, only centibytes of data are ultimately shared? The issue of reproducibility is everywhere right now (type “social psychology reproducibility” into a search engine to see particularly spirited discussions), but it brings attention to some particular weaknesses of neuroimaging science, and fMRI in particular.

Let’s face it:  In brain imaging, we don’t just have a data sharing problem, we have a results reporting problem.

In virtually every discipline of science it is expected that your measurement of interest will be reported with a point estimate (e.g. a mean), a measure of uncertainty (e.g. standard error of the mean, or estimated population standard deviation) and a P-value and/or confidence interval.  In fMRI, “a point estimate” is a picture, a 3D volume image, as is the standard error, and we spend hours studying their ratio, a T-test statistic image.  But these images usually never leave the lab, and only the scrap-paper-sized summary, the x,y,z locations, make it into publication.  But the images are just the start of it.

Everyone can step up

Everyone involved in brain imaging can help increase reproducibility and transparency of the research they conduct.

  • Investigators should plan on data sharing from the outset, ensuring that their ethics paperwork allows for them to share suitably anonymised versions of their data.
  • The trainees or staff actually collecting the data can make sure it is organized in a way that will make sharing easy; see e.g. OpenfMRI.org and Data Organization.
  • The software developers creating the analysis software should make data export, along with relevant provenance of what was done to the data, as easy and interoperable as possible.
NIDM Component Layer Cake, developed by the INCF Neuroimaging Datasharing Task Force (NIDASH)
NIDM Component Layer Cake, developed by the INCF Neuroimaging Datasharing Task Force (NIDASH)

Two important initiatives in this area are Nipype, a glue for multiple analysis packages that includes provenance tracking, and NIDM, an on-going effort to establish standards for communicating neuroimaging analyses and results.

  • Data repositories can make it as easy as possible for users to upload their data and provide stable URL’s to reference the data; OpenfMRI and Neurovault are two places to upload individual studies, while LORIS and COINS offer more comprehensive project-level solutions for sharing data.  
  • Finally, journals can provide guidelines that require minimal data sharing and encourage sharing as much data as possible; in this, PLOS has led the way with their policy on sharing of data, materials, and software.

Neuroimaging data are big, but they almost don’t qualify as “big data”.

Brain image data are highly structured, and the processed data that goes into group level analysis is measured in MB, not GB.  Given electrophysiology experiments that generate TB’s in a single session, we really can’t use the size of the data as an excuse.  The fMRI community is also fortunate to have a widely accepted NIfTI file format to store the image data.

While these work in our favor, we suffer from a lack of standards to describe all the metadata, that is the things that surround the data… the experimental design, the precise details of the statistical model fit, and how the inference (aka thresholding) was conducted.

The NIDM project is trying to address some of these standards issues, but it doesn’t stop individual investigators from starting the beginning:  Share those statistic maps! Upload them to Neurovault.

From the outset, think: The day my paper appears in print I will get an email asking for my full image data, and analysis details; how will I respond to this?

If we all do our part, and plan on data sharing from the get-go, we will make reproducible science happen.

NicholsT Thomas E. Nichols is a Professor at the University Warwick, holding a joint position between the Department of Statistics and WMG.  He is a statistician with a 20 year focus on neuroimaging, known for his contributions to the modelling of functional and structural MRI data.  His current focus is on the meta-analysis of neuroimaging data and tools data sharing. He is also a PLOS ONE Academic Editor.

 

 

Discussion
  1. “In virtually every discipline of science it is expected that your measurement of interest will be reported with a point estimate (e.g. a mean), a measure of uncertainty …”

    The problem of reporting an uncertainty measure in fMRI could at least be reasonably estimated. This should be contrasted with EEG and MEG activity maps wherein the uncertainty associated with the mathematically-enforced constraints necessary to solve the inverse problem can not be reasonably estimated. Wishful thinking has no error bounds.

    Yet thousands of papers with EEG and MEG activity maps have been published. Does anybody do science anymore?

  2. Thanks for your comment. I agree that in some situations it might be challenging to get an accurate (or any!) estimate of uncertainty. But usually scientists are interested inference over a group of subjects. In that case, no matter how complicated the measure, you always can always compute a standard deviation over subjects. Thus whether the method is fMRI, PET, EEG or MEG, we should be routinely sharing the inter-subject means and standard deviations.

  3. I am assuming that you are suggesting that the inter-subject means and SD of the raw data be shared. In the case of MEG and EEG activity maps what good would that do when the error due to the wishfully conjured mathematical constraints has no known reasonable estimate and is without question large compared to raw data variance?

Leave a Reply

Your email address will not be published. Required fields are marked *


Add your ORCID here. (e.g. 0000-0002-7299-680X)

Back to top