You are here
Novel software tools for very fast protein sequence searching and for the discovery of regulatory motifs in nucleotide sequences
The talk will cover results from three as yet unpublished projects. It will start with an introduction into protein sequence searching and to our software package HH-suite for very sensitive remote homology detection. I will show how we can significanlty boost the sensitivity of searches in HH-suite using a homology-enriched profile-HMM database.
Second, I will present our new software package MMseqs (Many-against-Many sequence searching) for very fast batch protein sequence searches and clustering of huge protein sequence data sets, such as UniProt or sets of predicted open reading frames from large metagenomics experiments. MMseqs achieves a comparable sensitivity as protein BLAST (blastpgp) at 400x its speed.
Third, I present a method for the discovery of regulatory motifs in nucleotide sequences that goes beyond the standard position weight matrix model by learning the dependence of binding affinities on neighboring nucleotides. Importantly, our method automatically learns the dependencies up to the optimum order (i.e. k-mer length) supported by the data, without risk of overtraining.