You are here

Novel software tools for very fast protein sequence searching and for the discovery of regulatory motifs in nucleotide sequences

Date: 
Wednesday, May 27, 2015 - 16:00
Speaker: 
Johannes Söding
Address: 
Salle des Thèses, Campus des Cordeliers 15, rue de l'école de médecine 75006 Paris
Affiliation: 
Computational Genome and Protein Biology, Gene Center of the University of Munich (LMU), Munich (Germany)
Abstract: 

The talk will cover results from three as yet unpublished projects. It will start with an introduction into protein sequence searching and to our software package HH-suite for very sensitive remote homology detection. I will show how we can significanlty boost the sensitivity of searches in HH-suite using a homology-enriched profile-HMM database.

Second, I will present our new software package MMseqs (Many-against-Many sequence searching) for very fast batch protein sequence searches and clustering of huge protein sequence data sets, such as UniProt or sets of predicted open reading frames from large metagenomics experiments. MMseqs achieves a comparable sensitivity as protein BLAST (blastpgp) at 400x its speed.

Third, I present a method for the discovery of regulatory motifs in nucleotide sequences that goes beyond the standard position weight matrix model by learning the dependence of binding affinities on neighboring nucleotides. Importantly, our method automatically learns the dependencies up to the optimum order (i.e. k-mer length) supported by the data, without risk of overtraining. 

Type: 
Interdisciplinary Seminar

Open Positions