Some of my research articles in the period 2020-2023 are next summarized.
Reviews and models on viral data. These articles describe our progressive understanding of viral data and knowledge. Publications are on Briefings in Bioinformatics vol. 22 (2021), International Conference on Conceptual Modeling (ER) 2020 and 2022, and Scientific Data vol 9. (2022)..
Bioinformatic tools for viral surveillance. A progression of tools to support researchers on understanding the properties of the viral genome, partially funded by the GeCo ERC AdG, partially byEIT ‘DATA against COVID-19’ sprint innovation activity n. 20663. We support basic search (ViruSurf) and comparative data visualization (VirusViz), then the storing and analysis of viral sequences in privacy-protected contexts, such as hospitals and biotech companies (VirusLab). We next moved to supporting integrated search for viral sequences and epitopes – relevant to immunological research (EpiSurf), to user-friendly tools for supporting data analytics through clustering (ViruClust) and to tools for supporting data annotation by experts (CoVEffect); this last tool makes heavy use of deep learning models. The most recent tool, focused on effective viral surveillance with arbitrary regions and periods (VariantHunter). Publications are on Nucleic Acids Research vol. 49(D1 and 15) (2021), Database vol. 2021, BioTech vol. 10 (2021), Bioinformatics vol. 38 (2022) , GigaScience 2023, and Database 2023.
Big Data-driven models of viral evolution. These models use a big-data approach for revealing viral evolution using big data. The approach is radically new, as it substitutes for phylogenesis and yet obtains very credible (perhaps more credible) results. Publications are on Scientific Reports vol. 11 (2021) and Computational and Structural Biotechnology Journal vol. 20 (2022). A new model for detecting variant recombination is on Nature Communications 15 (1) (2024).
Clinical research with human or viral genomic integration. Our research assumes the importance of linking genomic data to clinical data. We collected evidence that the approach can obtain significant results after collaborations with the groups of Prof. Alessandra Renieri (Siena University) – linking the patient to human genetics and Prof. Giuliano Rizzardini (Sacco Hospital) – linking the patient to viral charge. Our group has coordinated clinical data design http://gmql.eu/phenotype/ in the COVID-19 Host Genetic Initiative which has so far produced three major publications on Nature, Nature 600, 472–477 (2021), Nature 607, 97–103 (2022), Nature 608, E1–E10 (2022), with thousands of contributors. Publications [40,42,43,44] are on BioMed vol. 2 (2022), European Journal of Human Genetics vol 29 (2021), Cardiology and Cardiovascular Medicine vol 5 (2021), European Journal of Human Genetics 2021, and PLOS ONE vol 18 (2023).
Socio-economic models. The first model in describes the effects of the first lock-down using a mix of macro-economic and mobility data; The second model, describing how social and economic variables explain COVID-19 diffusion in European regions, Publications are on Scientfic Reports 2021 and 2023.
Social Analytics. This research reports investigations on how social media, with specail empjhasis on how social communication was affected by COVID-19. Publications are on Plos One 2020, Information Processing and Management 2020, ICWSM 2021 and Journal of Communication Inquiry 2023.
User-friendly dialogic interfaces. This research reports efforts to design new multi-modal interfaces which integrate data and speech to ease the quality of interaction with users. Publications are on CONVERSATIONS 2020, ACM Transactions on Computing for Healthcare 2021 and IEEE Access 2023.
Data management for knowledge graphs. This work is jointly performed with the Computer Science group of the Bank of Italy (also through executive PhD students) and uses Vadalog, a rule-based knowledge management system under development at Bank of Italy, over large graphs of Italian and European-scale company ownerships. The last article describes optimization techniques based on rule rewriting. Publications are on ICDE 2021, Data and Knowledge Engineering 2022, EDBT 2023 and ICDE 2023 (best paper, Industrial Track).
Topological Domains in Genomics. This research is concerned with the fundamental organization of the human genome; topological domains are studied by considering the impact of CTCF binding mutational processes on cancer types, the structure of CTCF bindings and its impact on the strenght and conservation of topological domains, the correlation of chromain confirmation and gene co-expression, and the organization of DNA within compartments. Publications are on Plos-One (15) 2020, Genome Biology 2020, Bioinformatics 2021 and Nature Communications 2021.
Drug Prediction using matrix tri-factorization. We have used matrix tri-factorization as a prediction model for various aspects of drug repurposing and synergism. Publications are on IEEE Journal of Biomedical and Health Bioinformatics 2020 and on IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021.
Synthetic Lethality and Transcription Factor Interaction, This research has studied sythetic lethality in the context of DNA-damage response, with a publication on BMC BioInformatics, 2022, and the interaction between transcription factors, published on Biology Direct 2020.
Architectural aspects of large systems for human genomics. We have completed our GeCo research on the infrastructures for building, storing, distributing and querying repositories for human genomics, with works on IEEE-Data Engineering 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics 2020, Applied Sciences 2020, and Briefings in Bioinformatics 2021,
Models of twitter diffusion. This work is focused on undertanding how the topology of information spreading can reveal misleading information, with works on Scientific Reports 2020 and EPJ Data Science 2020.