You are here
Protein “Dark Matter” and the Annotation of Complete Proteomes
Within the next few years, as sequencing technologies get faster and cheaper, tenths of thousands of newly sequenced genomes will become available (see for example [1, 2]). In contrast, experimental characterisation of genomes’ functional elements, which is expected to lead to breakthroughs in understanding molecular and cellular processes [3], remains in most cases an expensive and painstakingly slow process. In this scenario, computational biology appears as one scalable way to make sense of the vast amount of sequence data that are being deposited (daily) in the public domain. While computational predictions cannot substitute for experiments, they can generate hypotheses to guide targeted experimental validation. In this seminar, I will illustrate ongoing work that aims to shed some light on the “Dark Matter” of the protein sequence universe [4]. Additionally, I will discuss ideas to improve annotation of pathways, protein complexes and organelles in fully sequenced organisms.
References:
1. Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered 2009, 100(6):659-674.
2. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF, Darling A, Malfatti S, Swan BK, Gies EA et al: Insights into the phylogeny and coding potential of microbial dark matter. Nature 2013, 499(7459):431-437.
3. Reeves GA, Talavera D, Thornton JM: Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009, 6(31):129-147. 4. Levitt M: Nature of the protein universe. Proc Natl Acad Sci U S A 2009, 106(27):11079-11084.