There is a fun paper in this months Physical Review E (that’s the one that covers “Statistical, Nonlinear and Soft Matter Physics”) which looks at exactly how fleeting is the interest in items posted on websites. Specifically it looks at news items on a Hungarian news site (a number of the authors are based in Hungary) and the headline figure is that on average news items have been read by half of the people who are ever going to read them within 36 hours of posting.
The paper “Dynamics of information access on the web” (doi 10.1103/PhysRevE.73.066132) comes from a group headed by Albert-LÃ¡szlÃ³ BarabÃ¡si who is one of my all time favourite authors. One of my small claims to fame is that I accepted for publication the first paper on which he was an author that ‘got the cover’ of Nature. That was about the wetting of granular material and attempted to answer the question “What keeps sandcastles standing?” (there is a copy of the pdf available from the University of Notre Dame). LÃ¡szlÃ³ claims that an earlier paper exists but lets not let the facts spoil a good story.
Anyway in this new paper, on which there is a nice general language report on physicsweb, the team used cookies to look at how often news and other items on a Hungarian news and entertainment portal were accessed, by whom and what other pages they looked at. If you want to read the paper the Phys Rev E version is, of course, locked up behind subscription barriers but I’m glad to say that a nearly identical version “Fifteen Minutes of Fame: The Dynamics of information access on the web” is available on the arXiv preprint server.
As I mentioned above the headline figure is that the average news item has a half-life of about a day and a half. The surprise here is not that this is a short length of time, but rather that it is even that long. Who reads yesterday’s news? To explain this distribution the authors had to consider the behaviour of individual users. They don’t, as you naively might expect, visit the site on a fairly regular basis but instead “numerous frequent downloads are followed by long periods of inactivity, a bursting, non-Poisson activity pattern that is a generic feature of human behaviour“.
If this is how we use news items on web sites there is a very good chance that it is how we use the scientific literature as well, though with longer time scales. I’ve done a rough and ready analysis of the page downloads on 383 PLoS Biology research papers and that gives a half-life of just a touch over 4 months. The spread is quite large, some papers’ half lives were less than a month while others were over 18.
It will be great to be able to supply these sorts of measures for all papers published as some form of measure of the immediate buzz about a paper. It would also be interesting to see how this initial interest in papers relates to the much longer timescales of citation. I can’t find the original source of the attached graph but it comes from ISI and shows citations for papers published in journals classified as ‘biology’ in 2004. Here there is a lag (it takes time to write, peer review and publish papers) with peak citations occurring in the fourth year and a half-life of about 7-8 years.
I haven’t any conclusions to draw from all this I’m afraid. I’m just a sucker for a thorough mathematical analysis.