When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS BLOGS The Official PLOS Blog

Sunday Outage Sunday

We experienced a hardware malfunction yesterday that caused the TOPAZ hosted journals to be offline from 4pm – 10pm PST.

James sent an email late afternoon yesterday indicating site errors on the PLoS journal websites. Soon after his email, the IT team started receiving SMS alerts. I assumed that something had occurred with the Topaz framework and started looking at the appropriate log files but couldn’t find anything. I spent the requisite amount of time banging my head against the “site error” wall without success and I called up Russ for assistance. After a bit of digging through the server logs, he found the culprit – a drive had failed on the Mulgara server. This drive is part of a RAID 5 configuration, so we didn’t lose any data but we also mysteriously lost the connection from the Mulgara server to the DAS array (disk storage for the Mulgara data).

We restarted the server but couldn’t confirm that it was rebuilding the RAID correctly. I drove down to the colo, confirmed the drive failure and babysat the server until the platform was healthy. We’ll swap out the defective drive on Wednesday during the migration to a pre-release of Topaz 0.9.

In case you missed the reference to U2’s Sunday Bloody Sunday….

Discussion
  1. I first noticed that PLoS Genetics articles were unavailable on Saturday evening, not Sunday afternoon, probably about 7pm PDT. I think I checked the other PLoS journals then too and found a lot of them unavailable. I posted a message about this to the PLoS One discussion pages on Sunday morning.

    If the hardware didn’t die until Sunday afternoon, something else must have been at fault earlier.

Leave a Reply

Your email address will not be published. Required fields are marked *


Add your ORCID here. (e.g. 0000-0002-7299-680X)

Back to top