Biomedical Community-Wide Annotation Using Wikipedia

The pace of data generation is leaving far behind our ability to convert this data into usable knowledge. Even well funded biomedical databases find it increasingly difficult to keep up to speed. In order to tackle this problem, some databases have opted for increasing automation in the way data is deposited, reducing the time needed for interpreting results. The problem with this approach is that generated knowledge as a result is less accurate than manually annotated entries and of lower quality. Another potential solution has been to engage leading experts, creating a sort of consortium where they give some of their time to curate data entries that match their specialties. Unfortunately, engaging world experts in curating biomedical resources has not had a lot of success, with a few contributing a lot and many hardly ever dedicating any time to curation no matter how much they were fetched.

A new revolutionary idea has come from Alex Bateman‘s group to engage not just the community of experts but the whole of the Internet, using Wikipedia. One of his group’s databases, Rfam, which characterises RNA families, is now providing all of its annotation via Wikipedia. Wikipedia is already the leader reference resource for all kinds of information. It possesses the know-how and capability to mediate the curation of database entries as well as managing to have extremely resounding success in terms of gathering reasonably high quality knowledge.

After having a persuasive discussion with Alex, I decided to give it a try myself and add my very first entry to Wikipedia, which I thought it could potentially help the database I develop outsource its public/non-sensitive data annotation part.

I copied, edited and formatted parts of a non-sensitive entry (a Syndrome description) to Wikipedia. I learnt –contrary to what I expected- that as long as one has an account and no entry exists on the topic, a page can be added on the fly. So I added a page and started editing, copying and pasting.

It took me a bit of time to get used to some of the conventions and formatting tags used by Wikipedia but very early on I had help from Wikipedia ‘agents’. It really surprised me how quickly these agents picked up my entry and immediately made me know the criteria for making sure this Wikipedia entry achieves a high standard.

I learnt about important concepts in the Wikipedia context such as Notability and Conflicts of Interests. Apparently one cannot write about oneself for example, and personal opinions or articles are not accepted. So far this was OK for me although problems came when one of this agents pointed at some copywriting issues: I was trying to copy an entry of a website/database.

Blatant copy of public content from another website is considered a copyright violation unless a correct license is put in place and one ‘owns’ the data. In our case, the Creative Commons License, which is the one we hold, was not OK because although it lets public use of the information, it does not allow alteration. This means that people would not be able to edit my Wikipedia entry.

I must admit I felt intimidated at this point. Despite that, I was extremely impressed with the efficacy with which agents acted as well as how quickly they responded to my queries. I can understand why they have to be so tough so that they prevent abuse.

Overall I feel quite satisfied with what I have learnt in the process and I am extremely eager to keep exploring the use of Wikipedia for database curation. Of course this is just a try and our adopted solution for keeping up with current annotation may be something different in the end. However, it is worth a try.

9 replies »

    • Yes, this is a project on the back burner at the moment, but it is something that could happen in the next year or so for all public data that we have for Syndrome entries.


  1. There are also other wikis collecting content too specialized for Wikipedia.

    You might want to look into the proteopedia project, for example.


    • Hi Kevin

      Fortunately it seems there is a high chance Wikipedia will accept our Syndrome entries. Of course some are already there (Di George Syndrome) but some others like 17q21.3 Recurrent Microdeletion Syndrome are not.

      As medical practice embraces more and more sequencing technologies, we expect the rate of discovery of new Genomic Disorders to accelerate dramatically.

      Although the ones we deal with are classified as “Rare Genetic Disorders” (their frequency is <1/300) when considering the whole of the world's population, some may affect more than a million people.

      Moreover, there are lots of people who are either carriers or do not show symptoms. So they may not be so "rare" after all.


    • Because of the reasons above, according to Wikipedia agents our entries would comply with Wikipedia’s Notability, Neutrality and Conflict of Interest regulations.


    • Thanks for your comment. That is precisely what I have just done, we have made it public domain.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s