Skip to content

Manually Curated Databases- How much reliable they are?

2009 January 22
by abhishektiwari
One of recent studies published in Nature Methods reveals that manually curated protein-protein interaction (PPI) datasets can be highly error-prone and possibly of lower quality than previously thought. Study determined that the portion of reported interactions that are valid (and hence reproducible) is very much discouraging. My personal observations about quality of curated databases are more alarming than article mentioned and I am not surprised with these findings. Biological curation is now a well established practice, which is supported by different curation, annotation and reporting guidelines such as MIRIAM(Minimum information requested in the annotation of biochemical models), MIMIx(The minimum information required for reporting a molecular interaction experiment) etc. Scientists have also assumed that literature-curated PPIs data were more reliable than high-throughput datasets, but nobody had ever actually tested that assumption, current findings shows that these assertions are not validated. Other major issues raised by authors is surprisingly low overlap between different curated datasets for same organism, particularly in case of yeast PPIs (MINT, IntAct and DIP) overlap is so small that after years of intense curation it may be reason for concern, which suggests coverage of curated literature is far from comprehensive.
Databases investigated in this study are mostly publicly funded projects and hence study opens a Pandora box. Indeed, some curators of these databases already claiming that the study’s findings are faulty. Never mind, later section of study, estimating curation reliability by recuration brings a reality check about how these errors could have been avoided. I guess so called curation project leader are more concerned about number of entries rather than quality issues, which is very much justified in rat race of funding and publications. Proclaimed Minimum Information (MI) standards or reporting guidelines can greatly improve curation by reducing the curation errors, but who will make sure that these standards are flawless.

Reference:

Michael E Cusick, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan, Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-François Rual, Heather Borick, Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki, David E Hill, Joseph R Ecker, Frederick P Roth, Marc Vidal (2009). Literature-curated protein interaction datasets Nature Methods, 6 (1), 39-46 DOI: 10.1038/nmeth.1284
Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • Slashdot
  • HackerNews
  • Reddit
  • del.icio.us
  • Twitter
  • Facebook
  • Google Bookmarks
  • Posterous
  • Tumblr

No related posts.

3 Responses leave one →
  1. anilbioma permalink
    January 22, 2009

    I think this is very common problem with community curation projects, curation products from companies too have similar issues, developing more quality controlled curation workflows with standard guidelines is only solution,

  2. anilbioma permalink
    January 22, 2009

    I think this is very common problem with community curation projects, curation products from companies too have similar issues, developing more quality controlled curation workflows with standard guidelines is only solution,

  3. Adar permalink
    July 23, 2009

    It is a highly worrying article. We need to seriously rethink how we handle data.

Leave a Reply

Note:You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS