Bio-Mirrors: To be, or not to be
2009 February 26
Yesterday I got email from SBML mailing list about release of BioModels Database’s first Mirror site at Caltech. I am very much surprise, especially why BioModels database need a mirror site, consider the fact- size of whole BioModels database is around 3.46 MB (compressed twelfth release) which consist of 293 models including both curated non curated branch. Now days this is very common trend to have multiple mirrors for biological databases, for example there are about 17 official BNL PDB web site mirrors. A mirror resource can be a database mirror (such as ClamAV), web site mirror (such as PDB or BioModels), or it can be a repository mirror (such as SourceForge). One of the primary reason to maintain mirrors is to increase resources availability, and these mirrors help to uphold minimum services in failure and high traffic situations. Normally user should be automatically redirected to closest mirror service, which will ensure accessibility and speed of a resource. Geographic distribution of web content via mirror sites is critical to balance the traffic between all mirrors. Most of biological database mirrors are web site mirrors and they are also not mirrors in true sense, they are more isolated replicates of the principal database. Principal server (URL) never redirects users to their nearest geographical mirrors, not even in high traffic situations, believe it or not most of us never use these mirror sites. If I have to browse the PDB structures then I will just point my browser to main RSCB website, and not to those 17 PDB mirrors. Maintaining these bio-mirrors by putting so much resources on hold does not serve any purpose, at least I am not convinced why we need this kind of mirrors. BioModels is special case and due to it’s small size it does not need very huge resources to maintain a mirror, but I am not sure about others.
9 Responses
leave one →




Bio-Mirrors: To be, or not to be: Yesterday I got email from SBML mailing list about release of BioModels Databa.. http://tinyurl.com/ddkrzy
Bio-Mirrors: To be, or not to be: Yesterday I got email from SBML mailing list about release of BioModels Databa.. http://tinyurl.com/ddkrzy
Abhishek, I think you are mainly right, but the situation you describe is not general in life sciences. A good example is UniProt. If you are in the US, you are redirected to UniProt@PIR, in Germany to UniProt@SIB and in the UK in UniProt@EBI. It is very much our purpose to move toward a similar systems for the EBI. But these mirrors play another role. If the central domain is down, you still need to be able to access independently the other resources. Over the last few years, there were half a dozen times when the EBI had to shut down, between 1 hour and 3 days. Because we had no data replication, BioModels DB was effectively unavailable.
And just FYI, the 3.5 MB tarball is not the whole of BioModels Database, but only the export of the models. multiply that by 3000 to get the current size at the EBI
(for instance with all the metadata, used to search the models, and all the versions of the models.)
I guess i am mentioning (dragging) BioModels a lot in my posts !!!!!!!
Nicholas, thanks for comment and information, we need well established mirrors with timely synchronization and good inter connectivity, just by copying the whole website at difference locations does not make sense (except there are funding and political issues) particularly for big databases such as PDB. UniProt as you mentioned is perfect example how we should establish biological mirror and I was not really aware about that. I hope BioModels will make similar kind of arrangements
.
Abhishek,
As mentioned, the UniProt architecture is a class “no single point of failure” design, and serves to provide fast response to requests.
The life sciences can learn a lot from the web in general where people are serving up information globally. Different use cases, but many lessons there
Yes, that’s very true life sciences informatics in learning mode is better
Looking up http://www.uniprot.org simply returns multiple A records (i.e. one for each mirror) in a round-robin manner. So if one of the sites is unreachable, browsers (or other HTTP clients) should automatically try another mirror. Also, all sites are behind reverse proxies, so if a site is down e.g. because it is being updated, requests are proxied to another mirror. But there’s no geo-affinity. That was tried for a while and it sort of worked, but at the cost of the failover. In either case having three mirrors (two of them in Europe) rather than two is useless and even a bit counterproductive, from a technical point of view.
thanks Eric, whole comment is self-explanatory
I guess nothing left to say