A Warning Sign for Biomedical Databases

Users of the highy popular OMIM database (On-Line Mendelian Inheritance in Human) [1] may have noticed that NCBI [2] is not providing further funds to sustain OMIM’s development. One of the reasons for halting the funding may have to do with curation work not deemed worthy of funds. Funding agencies might have thus started a trend to not willing to dedicate funds for curation of database entries.

The flip side of this is the nascent trend to outsource database annotation to the general public. Databases like Rfam [3] or Pfam [4], two popular RNA and protein family databases, have adopted the strategy of outsourcing their annotation to Wikipedia. Realizing that it is impossible to keep up with the literature, an attempt was made by Rfam to seed Wikipedia with database-specific information. They then developed a system to collect Wikipedia text from created entries periodically to repopulate back the corresponding RNA entry. The price they had to pay was losing control on what gets entered into the Wikipedia entry. However, benefits seem to outstrip this loss of control, including ready access to an army of casual annotators and a dramatically increased exposure of the database itself (Wikipedia consistently ranks top of the list for most RNA family searches in Google). This means that their chances of having up-to-date content is increased, as well as better awareness of the resource, justifying future cycles of funding.

Something that started as an experiment in Rfam seems to be spreading to other databases as they begin to assess how to address their annotation bottleneck. It seems that outsourcing annotation of Biomedical databases to Wikipedia is a solution worth considering as curation practices continue evolving to cope with current fund shortages. Generalized lack of funding for research and the establishment of community wiki-style annotation practices may mean that funding agencies may be ever more reluctant to provide funding for database curation. Perhaps this is the time to start rethinking future plans for those of us who care about biological databases and their contents. Is now the time ripe for embracing Wikipedia to the full?






Categories: Bioinformatics, Databases

Tagged as: , , , , ,

9 replies »

  1. Another emblematic Biomedical database whose funding is being affected.

    I just learnt that KEGG will be requesting payment from academics to download data from their ftp site. KEGG’s PI, Minoru Kanehisa, says that his current grant is not “sufficient to continue to hire [his] talented crew of KEGG curators and software developers.”

    Here is his plea document for support:


  2. Every time I look at this problem I wish it could be solved technologically e.g. by linked data/semantic web. I blame the publishing industry for not doing better at making scientific publications machine readable rather than the funding agencies for stepping away from funding of long-term manual curation exercises.


  3. I’m new to this subject and might be stepping out of line.
    But @Kevin, what stops the curation experts to keep on peer reviewing the database? they will need to add less data and do a lot more checking of new data coming in.
    @John I would expect that the same pharmaceuticals that use the data are interested in keeping it exact thus it seems that the market need would mean that pharmaceuticals (or what ever company, research institute, university needs it) would hire curators to peer review on their behalf and check for fraud from other companies. Though I do not think that falsifying data benefits anybody and I think companies know this. Plus since it’s wikipedia you can subscribe to certain areas where you could be an expert in.

    If it works for other lines of research I don’t see why not for this. I find that this kind of information is better out in the open.


    • @Andres
      Although things are changing, pharma don’t necessarily have an interest in providing useful information to the wider community; generally they will want to keep useful annotations in-house. In the UK, which I know best, universities will never fund curators out of their core funding, and it is difficult if not impossible to fund them from grant funding. Institutes may fund curators for data in their immediate area of interest but this is unlikely to cover the entire area.


  4. One big risk for biomedical databases is deliberate, but subtle fraud. There are huge amounts of money flowing around drug development, and scientific fraud is a much bigger problem in biomedical research than in other fields. Opening up annotation completely, without peer review or curation by experts opens up room for lots of fraud.

    Databases a little further from the money pots are probably a bit safer from fraud. Still, there are many databases out there, and little incentive for people to add information to them. Without funding for curation, many databases will fade in value rapidly.

    Funding agencies have always preferred creating new knowledge (and throwing it away) to maintaining databases. Manual curation is the most expensive part of most databases, and the price tag is too high for review panels that value hypothesis-driven wet-lab work above all.

    Of course, there are a lot of databases out there that are not worth the expense of maintaining, since they are dominated by a closely related database. I don’t know enough about OMIM to know whether it falls into this class.


  5. Although OMIM has its weaknesses its major strength has always been that the information in it is curated by experts. There are core databases in any field that researchers rely on to provide definitive information, and when this is hand curated they have more confidence in the content. I agree that the wiki-based models have the potential to provide similar levels of confidence (although many people are suspicious of wikimedia generally, often unfairly) but the trick is in motivating the community to carry out the curation. I’m thinking, for example, of an organism database like the Mouse Genome Database which employs dozens of curators and curates every gene in the mouse genome in principle. How does one motivate the entire population of researchers who use the mouse, many of whom have only a passing interest in the organism as a whole and may only be using the mouse as a way of studying a human gene, to enter useful information into such a database?


    • The argument that I put forward to encourage people to edit Wikipedia entries is the following. We, scientists and researchers alike, have the responsibility to make sure that whatever information is in Wikipedia is accurate.

      This is not just a question of being ‘nice’. If we are to spend tax payers money in our undertakings, unless we feed back in an accessible manner an accurate account of our findings, we are arguably wasting their money. Wikipedia as the ‘de facto’ top recipient of Google searches is inevitably the first place people will look at. Ultimately researchers will have to realise this or their funding will not be renewed.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s