High Throughput Unsupervised Genetic Sequence Analysis

Daniel Boley - University of Minnesota
Data e ora
mercoledì 14 maggio 2014 alle ore 16.30 - 16:15 rinfresco; 16:30 inizio seminario
Ca' Vignal 2, Piano 1°, Aula L
Alessandro Farinelli
Referente esterno
Data pubblicazione
2 maggio 2014


 The rapid growth of the genome sequence data in recent years has offered a new dimension in big data visualization and interpretation. An application paradigm is presented here to visualize the evolution of the influenza virus using an unsupervised machine learning approach to non-numeric genetic sequence data based on Principal Component Analysis. Two influenza virus cases are presented in this talk: (1) human A/H3N2 vs avian H5 evolution history and (2) North American swine influenza virus since the swine H1N1 pandemic of 2009.  The results in the first case suggest a hypothesis that vaccination could be one of the driving forces in the evolution of the human A/H3N2 influenza virus.  The evolution in the second case shows a strong correlation between the diversification of the North American swine influenza virus and the mutations at two specific sites in the hemaggluttinin protein.  By using unsupervised methods, we minimize the need to make assumptions about the relationships among the viruses.


© 2002 - 2021  Università degli studi di Verona
Via dell'Artigliere 8, 37129 Verona  |  P. I.V.A. 01541040232  |  C. FISCALE 93009870234