Skip to content

Mosaic of biological standards and domain-specific languages

2009 April 13
by abhishektiwari
One of the positive outcomes from the CellML et al. workshop was community consensus towards a possibility of well woven interoperability and connectivity of biological standards , ontologies and domain specific languages. In last few years we have seen various domain specific standards, for example in systems biology community there are SBML, CellML, BioPAX and many other standards specializing different aspects of computational modeling. Over the time people have realized that unlike other areas in biology it will be never possible to have a single domain specific standard which can cover all aspects of domain and fulfill requirements of everyone. Take a simple example of biological sequence data which is most simple data model we have, when we look on the standards related to sequence information there are at least a couple of dozen sequence formats in existence at the moment such as FASTA, EMBL, GenBank, SwissProt, PIR etc. These sequence standards can be scaled on incorporated information and annotations. Despite of having so many sequence standards with each standard having their own pros and cons it was never an issue to bioinformatics community about which sequence standard they should use preferentially, because there are excellent tools those can provide standard interoperability by converting one format to other. So no one ever raised the question which of these formats is 1st standard for bioinformatics community. This was the case of simplest data model in biology, there is more complex data sets such as data generated by microarrays, proteomics experiments and it seems unreasonable that these communities should use one standard preferentially. I always advocated this view that having multiple domain standards with overlapping domain properties have no harms if there is sufficient interoperability and connectivity between these standards provided there is minimal information loss. Unlike word wide web http protocol which is only standard when it comes for internet communication, there will be never such a case in biological standards and it seems people started buying this philosophy that having multiple views of domain through multiple standards is good at the end at least from user perspective. Views such as one standard for one domain can be supported only when it is critical requirement as in case of internet communication otherwise it is unnecessary restriction for everyone. Similarly taking a unrelated example from mobile communication where we see two parallel protocols GSM and CDMA working together and it was end user who really enjoyed the benefits of competition between these two technologies. I guess Systems Biology Graphical Notation (SBGN) is more convincing example for my argument, which graphically represents quantitative models, biochemical pathways, genetic interactions and includes three orthogonal languages-Process diagrams, Entity Relationships diagrams, and Activity Flow diagrams to unambiguously describe biochemical and cellular events as graph structure. Each of these languages provides differential views for same biological network with different levels of granularity, for example Process diagrams can be used to show biochemistry view, Entity Relationships diagrams will display molecular biology view, and Activity Flow Diagram is planned for physiology and genetics view.
Mosaic of biological standards(Image from talk MIBBI, MIASE and all that-Nicolas Le Novère, EMBL-EBI)

Further extending this discussion in systems biology domain, computation modelling in systems biology can be seen as three levels-model creation, model simulation and simulation results. Community has developed multiple standards, guidelines and ontologies for each of these level, there is interoperability between standards (such as CellML to SBML or SBML to CellML), connectivity between standard and ontology (such using SBO terms in SBML), and there are ongoing discussions to explore the nesting of ontologies at various levels for example linking SBO (Systems Biology Ontology) terms in OPB (Ontology of Physics for Biology). At simulation level SBML associates have developed SED-ML while CellML community has very own RDF based CellML simulation metadata standard, again each have their own advantages and disadvantages (I am not sure if CellML community has decided to drop their simulation metadata standard and adopt the SED-ML as one of the outcomes of this workshop).
Now when we know that interoperability is imperative for these standards, the big question is how these standards can address different domain specific interoperability needs, if there is any minimum interoperability requirements they should fulfill, and whether specifications those don’t support interoperability should be kept out of the standards.
Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • Slashdot
  • HackerNews
  • Reddit
  • del.icio.us
  • Twitter
  • Facebook
  • Google Bookmarks
  • Posterous
  • Tumblr
One Response leave one →
  1. April 14, 2009

    Mosaic of biological standards and domain-specific languages: One of the positive outcomes from the CellML et al.. http://tinyurl.com/d4mglp

Leave a Reply

Note:You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS