Availability decay of Bioinformatics web resources : Yes widgets can change it
Many people see this availability decay as huge problem in bioinformatics community but I suggest that sudden death of majority of bioinformatics resources simply implies their unpopularity among the potential users and it is part of solution to filter the reliable and quality resources. Jonathan D. Wren used term data tombs for the this kind of web resources, either they are rarely accessed due to there narrow scope
Data tombs, in large part, seem to have resulted from a ‘build it and they will come’ philosophy, which is OK as a means of justifying database creation, but not publication.
or there are major design flaws and lack of user perspective in which case users will fall on alternative option that is less complicated but easier to use. In simple words,
Developers often equate the power of their software with the number of options, but users usually equate the number of options with the number of barriers between them and their results.
One of the major problem with dead web resources is their ghost, aka citations for the published paper which solely describes the resource itself. Nonetheless, people keep citing the dead bioinformatics resources without any check and for all wrong reasons. We definitely need a mechanism to curb these ghost citations.
Apart from the quality and narrow scope there are several other reasons which may affect the accessibility of the web resources. Frequent change of URLs, funding issues, lack of time and resources to maintain the published applications are some of them. Time to time people have suggested various solutions, and two most compelling options are
1. Archiving of web resources in public repositories. This is can be done by a third party very much like Sourceforge, Google Code, Github or Bioconductor. For me the best example here is the BioModels database- a database of published systems biology models – for each release they keep a public dump in the Sourceforge repository. This is something resources developers should consider before they start developing the application especially how they can archive the database or tool. Again in recent past a lot of shops closed due to lack of funding, I think this is something which is inevitability for most of bioinformatics resources currently available except few very big ones such as NCBI, PDB etc. I said before and I will suggest again, rather than depending on funding agencies for a sustainable funding model for bioinformatics resources, project manager should consider these publicly available repositories and make regular dumps for their web resources.
2. Use of PURLs or Persistent uniform resource locators. This is a good alternative for resources with changing URLs. Whenever the URL changes, resources creator has to manually update the PURL to the new URL.
In a latest PLoS Computational Biology paper Bourne et. al argue that widgets centric web resources can change the way the computational biologist currently engage with online database and analysis tools (Thanks to Sushant for pointer to this article which was published very recently). Authors suggest that use of widgets is a compelling idea for long term availability of bioinformatics resources and I also think this can curb the growth of unwanted data tombs. A web widget is nothing but code snippet commonly a JavaScript chunk which can be embedded and executed within any separate HTML-based web page by an end user. Web widgets are around us for quite long time now, any one who maintains a blog knows very well about the widgets and the functionality they bring to the webpage. So what big fuss about widgets in bioinformatics? Well in simple words, if bioinforamtics is an Apple platform (iPhone or iPad) then widgets are just like iPhone application, extending the functionality of bioinforamtics resources in a modular way. Bourne et. al summarize it very well
First, it brings the application to you; it is an example of drop technology (simply drop the application into your Web page) and it facilitates use. You do not have to remember where to go and possibly be faced by a series of complex choices—the widget can offer a simplified interface to a subset of features. Second, and more importantly, assuming the use of widgets takes off, you can customize your own Web page to take advantage of work done by a variety of other scientists each producing widgets. So for example, you could aggregate a variety of remote methods that perform sequence and structure comparison using a variety of widgets from a variety of reputable sources, thereby creating a single point of reference. Taking this a step further, you can create and customize workflows composed of different widgets in a plug and play environment.
Just think about potential implications of this technology, for instance a bioinformatics papers describing a database or analysis tool can be embedded in online version of paper itself, while reading and citing user can check current accessibility or availability of the resources described in the publication and if the resources is not available without further wasting their time to read the whole paper they can just move to alternative option.
Widgets not just to people, but to programs also
If used in its purest form another advantage with widget centric approach will be ability of widgets to communicate with other widgets or programs which is nothing but an API centric model. For that we will need a standards for widget development both on the server side as well as on the client side, may be we need a community wide effort something like The minimum information required for publishing a computer application. I genuinely believe this is a very nice suggestion from Bourne et. al.
References:
Bourne, P., Beran, B., Bi, C., Bluhm, W., Dunbrack, R., Prlić, A., Quinn, G., Rose, P., Shah, R., Tao, W., Weitzner, B., & Yukich, B. (2010). Will Widgets and Semantic Tagging Change Computational Biology? PLoS Computational Biology, 6 (2) DOI: 10.1371/journal.pcbi.1000673
Wren, J., & Bateman, A. (2008). Databases, data tombs and dust in the wind Bioinformatics, 24 (19), 2127-2128 DOI: 10.1093/bioinformatics/btn464
Veretnik, S., Fink, J., & Bourne, P. (2008). Computational Biology Resources Lack Persistence and Usability PLoS Computational Biology, 4 (7) DOI: 10.1371/journal.pcbi.1000136
Wren, J. (2008). URL decay in MEDLINE–a 4-year follow-up study Bioinformatics, 24 (11), 1381-1385 DOI: 10.1093/bioinformatics/btn127




Availability decay of Bioinformatics web resources : Yes widgets can change it – http://goo.gl/Z9Af
Availability decay of Bioinformatics web resources : Yes widgets …: Fisheye Perspective: Availability deca… http://tinyurl.com/ydq3bnm
Availability decay of Bioinformatics web resources : Yes widgets can change it http://goo.gl/fb/BhRt
RT @ResearchBlogs: Availability decay of Bioinformatics web resources : Yes widgets can change it http://goo.gl/fb/BhRt
RT @ResearchBlogs: Availability decay of Bioinformatics web resources : Yes widgets can change it http://goo.gl/fb/BhRt
Availability decay of Bioinformatics web resources : Yes widgets can change it http://bit.ly/9KbWS2
Availability decay of Bioinformatics web resources : Yes widgets can change it http://bit.ly/9KbWS2 #science
Availability decay of Bioinformatics web resources : Yes widgets … http://bit.ly/cT9J6W
Well said Abhishek. Issues of inaccessible software and databases is quite old now, unfortunately bioinformatics as community never learned from the past experiences.
Ohh man, all bioinfo softwares published these days are useless, only difference is some are less useless than others.
I agree with comments, question is what is solution for developers. Most of bioinformatics webservers are developed by biologist. Developing webserver using biologist is no problem using PERL/PHP but its difficult to create high standards. Its difficult to use complex language like Java which may provide platform independent or standard coding usable by other users. Second biological databases are growing so method trained on old data become obsolete in two or three years, it is not worth to maintain a webserver for long. Most of cases student/staff working in a group do not stay for long so it become impossible to maintain a webserver/ database. In addition computer is changing very fast so you have to shift your servers from old computer to new computer in every two years (average), which is a big headache. Above all you hardly got any credit to maintain your servers, most of persons are publishing and forgetting.
Thanks for your comment Dr Raghava, in fact your group is an ideal example how to maintain bioiinformatics servers for long time which is always done in very limited resources. You are correct that updated data sets make old model/methods obsolete and at some point it will make no sense to use or maintain such web resources. Any data data intensive discipline has to cope with expiry of methods/tools but not in 2 years.
Been a bit slow to stumble upon this, but nice piece on decay of Bioinformatics web resources by @abhishektiwari http://bit.ly/cz0jny