The Institute of Immunology and Experimental Therapy of the Polish Academy of Science has been investing in IT infrastructure for several years, including a computing center. The crowning achievement of these efforts was establishing in 2019 the Genomics and Bioinformatics Laboratory, whose task is to develop bioinformatic analysis solutions based on artificial intelligence (AI) and machine learning (ML) algorithms. The Institute has equipment for omics analyzes (genetic material sequencers, electron microscopes, NMR, gas spectrometers), which is used to obtain data necessary for multivariate analyzes.

Currently, the Institute is planning a project on the use of artificial intelligence algorithms to fully characterize pathogens. It is planned to try to analyze omics data. By using high-throughput sequencers, it is possible to obtain a complete genomic and transcriptome map. Using MALDI TOF, we obtain data related to proteomics. On the other hand, the crayomicroscopy and NMR techniques enable the analysis of the protein or/and glycan structures.

Thanks to high-throughput sequencing, we can obtain the complete sequence of the genome of a given pathogen. Using the classification algorithms, we can determine the degree of similarity between the sequences of the genetic code of the tested microorganism and the reference sequences of genes. Thanks to such analysis, we can characterize the pathogen in terms of the presence of genes encoding various types of metabolites. Thanks to this information, it is possible to molecularly characterize the pathogen if it contains substances that determine antibiotic resistance or toxins. . It should be emphasized that thanks to the use of clustering algorithms and cluster analysis, we are able to identify sequences with previously unknown properties.

sNMR spectroscopy provides basic structural data on the structure of biological macromolecules at the atomic level. This data is quantitative, is obtained in a repeatable and non-destructive manner, and is rich in information on the complexity of the structure of the analyzed molecules, obtained as large data sets. NMR spectra represent unique profiles (“fingerprints”) that allow the comparison, differentiation and classification of bacterial strains based on differences in glycan structures. The data obtained from one- and multivariate NMR experiments allow the use of mathematical methods of pattern recognition and multivariate analysis (PCA) for this purpose. These methods largely overlap with the techniques currently being developed using neural networks. Extending the scope of these techniques with analyzes using AI (machine learning, deep learning) algorithms in the processing of NMR data creates a starting point for the development of methods and automated procedures for verification and prediction of structures in correlation with real and simulated NMR data sets. In the case of bacterial glycans, AI techniques can be used in (1) the classification (chemotyping) and automatic recognition of the structural elements of bacterial glycans and (2) the reconstruction of NMR spectra based on a limited number of data (e.g. obtained by NUS technique, non-uniform sampling) by increasing efficiency of measurements and optimizing the time of use of NMR spectroscopes.

Thanks to the use of artificial intelligence algorithms, we can combine the analytical areas described above and create a general molecular model of pathogens.