Skip to content

Availability decay of Bioinformatics web resources : Yes widgets can change it

2010 March 6
by abhishektiwari
Quality and availability of bioinformatics resources is always a matter of great debate. HTTP 404 not found is quite frequent phenomenon for bioinformatics researchers looking to use some published web accessible database or analysis tool. A 4-year follow-up survey on the lack of persistence of bioinformatics resources was published in year 2008 by Jonathan D. Wren suggests that approximately 20% of URLs published in MEDLINE abstracts are now inaccessible, and the most common types of inaccessible content are computer programs (43%) and databases (19%). A similar kind of study was taken by Veretnik et. al where they analysed the availability of web resources published in the Nucleic Acids Research (NAR) Web server issues between 2004 to 2007. They found that a significant number web server goes offline after two years from their data of publication. I suspect the numbers are more higher than what we are seeing here and roughly 50% of published bioinformatics web resources have lifespan of only 2 years or less.

Many people see this availability decay as huge problem in bioinformatics community but I suggest that sudden death of majority of bioinformatics resources simply implies their unpopularity among the potential users and it is part of solution to filter the reliable and quality resources. Jonathan D. Wren used term data tombs for the this kind of web resources, either they are rarely accessed due to there narrow scope

Data tombs, in large part, seem to have resulted from a ‘build it and they will come’ philosophy, which is OK as a means of justifying database creation, but not publication.

or there are major design flaws and lack of user perspective in which case users will fall on alternative option that is less complicated but easier to use. In simple words,

Developers often equate the power of their software with the number of options, but users usually equate the number of options with the number of barriers between them and their results.

One of the major problem with dead web resources is their ghost, aka citations for the published paper which solely describes the resource itself. Nonetheless, people keep citing the dead bioinformatics resources without any check and for all wrong reasons. We definitely need a mechanism to curb these ghost citations.

Apart from the quality and narrow scope there are several other reasons which may affect the accessibility of the web resources. Frequent change of URLs, funding issues, lack of time and resources to maintain the published applications are some of them. Time to time people have suggested various solutions, and two most compelling options are

1. Archiving of web resources in public repositories. This is can be done by a third party very much like Sourceforge, Google Code, Github or Bioconductor. For me the best example here is the BioModels database- a database of published systems biology models – for each release they keep a public dump in the Sourceforge repository. This is something resources developers should consider before they start developing the application especially how they can archive the database or tool. Again in recent past a lot of shops closed due to lack of funding, I think this is something which is inevitability for most of bioinformatics resources currently available except few very big ones such as NCBI, PDB etc. I said before and I will suggest again, rather than depending on funding agencies for a sustainable funding model for bioinformatics resources, project manager should consider these publicly available repositories and make regular dumps for their web resources.

2. Use of PURLs or Persistent uniform resource locators. This is a good alternative for resources with changing URLs. Whenever the URL changes, resources creator has to manually update the PURL to the new URL.

In a latest PLoS Computational Biology paper Bourne et. al argue that widgets centric web resources can change the way the computational biologist currently engage with online database and analysis tools (Thanks to Sushant for pointer to this article which was published very recently). Authors suggest that use of widgets is a compelling idea for long term availability of bioinformatics resources and I also think this can curb the growth of unwanted data tombs. A web widget is nothing but code snippet commonly a JavaScript chunk which can be embedded and executed within any separate HTML-based web page by an end user. Web widgets are around us for quite long time now, any one who maintains a blog knows very well about the widgets and the functionality they bring to the webpage. So what big fuss about widgets in bioinformatics? Well in simple words, if bioinforamtics is an Apple platform (iPhone or iPad) then widgets are just like iPhone application, extending the functionality of bioinforamtics resources in a modular way. Bourne et. al summarize it very well

First, it brings the application to you; it is an example of drop technology (simply drop the application into your Web page) and it facilitates use. You do not have to remember where to go and possibly be faced by a series of complex choices—the widget can offer a simplified interface to a subset of features. Second, and more importantly, assuming the use of widgets takes off, you can customize your own Web page to take advantage of work done by a variety of other scientists each producing widgets. So for example, you could aggregate a variety of remote methods that perform sequence and structure comparison using a variety of widgets from a variety of reputable sources, thereby creating a single point of reference. Taking this a step further, you can create and customize workflows composed of different widgets in a plug and play environment.

Just think about potential implications of this technology, for instance a bioinformatics papers describing a database or analysis tool can be embedded in online version of paper itself, while reading and citing user can check current accessibility or availability of the resources described in the publication and if the resources is not available without further wasting their time to read the whole paper they can just move to alternative option.

Widgets not just to people, but to programs also
If used in its purest form another advantage with widget centric approach will be ability of widgets to communicate with other widgets or programs which is nothing but an API centric model. For that we will need a standards for widget development both on the server side as well as on the client side, may be we need a community wide effort something like The minimum information required for publishing a computer application. I genuinely believe this is a very nice suggestion from Bourne et. al.

References:
Bourne, P., Beran, B., Bi, C., Bluhm, W., Dunbrack, R., Prlić, A., Quinn, G., Rose, P., Shah, R., Tao, W., Weitzner, B., & Yukich, B. (2010). Will Widgets and Semantic Tagging Change Computational Biology? PLoS Computational Biology, 6 (2) DOI: 10.1371/journal.pcbi.1000673

Wren, J., & Bateman, A. (2008). Databases, data tombs and dust in the wind Bioinformatics, 24 (19), 2127-2128 DOI: 10.1093/bioinformatics/btn464

Veretnik, S., Fink, J., & Bourne, P. (2008). Computational Biology Resources Lack Persistence and Usability PLoS Computational Biology, 4 (7) DOI: 10.1371/journal.pcbi.1000136

Wren, J. (2008). URL decay in MEDLINE–a 4-year follow-up study Bioinformatics, 24 (11), 1381-1385 DOI: 10.1093/bioinformatics/btn127

Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • Slashdot
  • HackerNews
  • Reddit
  • del.icio.us
  • Twitter
  • Facebook
  • Google Bookmarks
  • Posterous
  • Tumblr
13 Responses leave one →
  1. March 7, 2010

    Availability decay of Bioinformatics web resources : Yes widgets can change it – http://goo.gl/Z9Af

  2. March 7, 2010

    Availability decay of Bioinformatics web resources : Yes widgets …: Fisheye Perspective: Availability deca… http://tinyurl.com/ydq3bnm

  3. March 7, 2010

    Availability decay of Bioinformatics web resources : Yes widgets can change it http://goo.gl/fb/BhRt

  4. March 7, 2010

    RT @ResearchBlogs: Availability decay of Bioinformatics web resources : Yes widgets can change it http://goo.gl/fb/BhRt

  5. March 7, 2010

    RT @ResearchBlogs: Availability decay of Bioinformatics web resources : Yes widgets can change it http://goo.gl/fb/BhRt

  6. March 7, 2010

    Availability decay of Bioinformatics web resources : Yes widgets can change it http://bit.ly/9KbWS2

  7. March 7, 2010

    Availability decay of Bioinformatics web resources : Yes widgets can change it http://bit.ly/9KbWS2 #science

  8. March 7, 2010

    Availability decay of Bioinformatics web resources : Yes widgets … http://bit.ly/cT9J6W

  9. John Manus permalink
    March 7, 2010

    Well said Abhishek. Issues of inaccessible software and databases is quite old now, unfortunately bioinformatics as community never learned from the past experiences.

  10. Pradeep permalink
    March 7, 2010

    Ohh man, all bioinfo softwares published these days are useless, only difference is some are less useless than others.

  11. March 8, 2010

    I agree with comments, question is what is solution for developers. Most of bioinformatics webservers are developed by biologist. Developing webserver using biologist is no problem using PERL/PHP but its difficult to create high standards. Its difficult to use complex language like Java which may provide platform independent or standard coding usable by other users. Second biological databases are growing so method trained on old data become obsolete in two or three years, it is not worth to maintain a webserver for long. Most of cases student/staff working in a group do not stay for long so it become impossible to maintain a webserver/ database. In addition computer is changing very fast so you have to shift your servers from old computer to new computer in every two years (average), which is a big headache. Above all you hardly got any credit to maintain your servers, most of persons are publishing and forgetting.

  12. March 8, 2010

    Thanks for your comment Dr Raghava, in fact your group is an ideal example how to maintain bioiinformatics servers for long time which is always done in very limited resources. You are correct that updated data sets make old model/methods obsolete and at some point it will make no sense to use or maintain such web resources. Any data data intensive discipline has to cope with expiry of methods/tools but not in 2 years.

  13. August 16, 2010

    Been a bit slow to stumble upon this, but nice piece on decay of Bioinformatics web resources by @abhishektiwari http://bit.ly/cz0jny

Leave a Reply

Note:You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS