You are here
Learning from biological sequences: promises and challenges in functional and evolutionary genomics
I will give an overview of my recent, ongoing and future work in machine learning for biological sequences. This will cover in particular: - Microbial GWAS: tools to identify genetic determinants of phenotypic traits such as antimicrobial resistances. The importance of accessory genes in microbes make the usual focus on SNPs inappropriate. I am developing solutions based on k-mers, i.e., the presence of short sequences in the genomes. - Prediction from biological sequences: neural networks that take a biological sequence as input and predict a property of this sequence. This applies e.g. to regulatory genomics or fold prediction. I am working on regularization and on statistical inference on the features extracted by these networks. - Machine learning for evolutionary genomics: neural networks to infer parameters of sequence evolution models. For some complex models, likelihood maximization is too hard but sampling is easy. I use a large amount of simulated data to learn a function that reverses the model, and goes e.g. from a gene family to a phylogeny.