Bioinformatics Facility

Group Leader

Ivan Arisi

Zainab Nazari

The Bioinformatics Facility provides a wide spectrum support to research, modelling and data analyses activities, and collaborates with research groups at EBRI and different laboratories and health institutions in Italy and abroad. The facility performs analysis of experimental data obtained from microarray and Next Generation Sequencing (NGS) technologies, such as RNA-Seq, smallRNA-Seq, ChIP-Seq and DNA mutation mapping, within different research fields: neurodegeneration, neuropathic pain, neurotrophins and cancer.

The facility is currently working in different areas:

  1. Analysis of -omics data

We provide updated analysis protocols for the large NGS datasets, mainly based on open-source platforms such as R-Bioconductor, tailored on the specific research needs, both for first level analysis and for the following biological interpretation of data.

  1. Systems Biology and modelling

The Systems Biology approach is crucial to understand cells and tissues. Using this strategy, the facility works on data interpretation in a framework of biological interactions, where the ensemble function of genes and proteins is more meaningful than for single entities, allowing to propose predictive models for diseases or processes of interest. We combine the transcriptomic profiles with the neurological, physiological or psychological phenotypes, such as from clinical data records, using multivariate statistical approaches and Machine Learning methods.

The network models are connected to the biology of systems: the dynamics of protein interaction networks may be simulated to test the effect of external perturbations, such as drug administration. Mathematical models are useful also for specific NGS profiles, such as the distribution of molecular species in sequenced antibody library, to estimate the complexity of immune response.

  1. Biostatistics

Biostatistics is a cross approach that we use in different contexts (genomics, environmental monitoring, biochemical assays), to design and optimize experimental protocols for pre-clinical and clinical research and to compute proper sample size estimation, as requested by Ethical Committees.

  1. Machine Learning

Using automatic learning systems we are able to analyse large-scale genomic and clinical datasets, typically composed by hundreds to thousands of cases, both to classify subjects, as in the case of diagnostic models, and to predict numerical measures through regression models. Machine Learning normally requires large computational resources, but allows to integrate clinical and molecular variables within the same model. With such large-scale datasets, conventional statistical methods are often unfit to describe non-linearity of systems and suffer of poor statistical power.

Selected Publications

Sarkar, Atay Y, Erickson AL, Arisi I, Saltini C and Kahveci T, “An efficient algorithm for identifying mutated subnetworks associated with survival in cancer”, IEEE/ACM Trans Comput Biol Bioinform 2019 Apr 15. doi: 10.1109/TCBB.2019.2911069.

Fantini M, Pandolfini L, Lisi S, Chirichella M, Arisi I, Terrigno M, Goracci M, Cremisi F, Cattaneo A, “Assessment of Antibody Library Diversity through Next Generation Sequencing and Technical Error Compensation”, PLoS One 2017 May 15;12(5):e0177574. doi:
10.1371/journal.pone.0177574. eCollection 2017. Full Text

Reichwald K, Petzold A, Koch P, Downie BR, Hartmann N, Pietsch S, Baumgart M, Chalopin D, Felder M, Bens M, Sahm A, Szafranski K, Taudien S, Groth M, Arisi I, Weise A, Bhatt SS, Sharma V, Kraus J, Schmid F, Priebe S, Liehr T, Goerlach M, Than M, Hiller M, Kestler HA, Volff JN, Schartl M, Cellerino A, Englert C, Platzer M, “The genome of a short-lived vertebrate provides insights into the early stages of XY sex chromosome evolution and the genetic control of life-history traits”,
Cell. 2015 Dec 3;163(6):1527-38. Full Text