Theses Proposals – Master level

Master Thesis Proposal: Sound of Genome

The human genome emits a variety of “signals”, produced by applying different kinds of methods/technologies for extracting information out of it. Among them, mutations, gene expression, peaks of expressions revealing protein bindings, copy number alterations, 3D contacts, and so on. Most of genome signals can be shown aligned on the genome by using a “genome browser”, which displays the information among the genome at different levels of resolution (from each individual base up to compact representations where an entire chromosome fits on the screen). However, the collective perception of these signals for interpreting their meaning and answering specific research/clinical questions if far from trivial; among the most intriguing questions, separating the signals from genomes of healthy (wild-type) humans from those affected by diseases (and particularly cancer).

Sonification has proven effective in integrating a variety of signals and providing auditive information that, once perceived as a sound, is very informative about global properties of the underlying reality. The objective of this thesis is to apply sonification methods to genome signals, displayed and selected visually, so as to produce an interesting and new form of investigation. This master thesis will take advantage of preliminary work where sound production is integrated within a sonification platform driven by the “integrated genome browser” (IGB); it requires an open mind to interdisciplinary work with not much background in any of the cited disciplines. The thesis will take advantage from an interdisciplinary group of tutors, which includes experts in genomics, music informatics and sonification, and also professional musicians.

Gruppo di tutor coinvolti:

Coordinamento: Stefano Ceri, Informatica musicale (UniMI): Goffredo Haus, https://soundcloud.com/goffredo-haus, Giorgio Valentinini https://avanzini.di.unimi.it/teaching.php. Sound Engineering: Augusto Sarti, https://sarti.faculty.polimi.it/Augusto_Sarti/CV_and_publications.html , Alberto Bernardini, https://www.deib.polimi.it/eng/people/details/936408 . Genomica: Giorgio Valentini, https://valentini.di.unimi.it/, Pietro Pinoli  https://scholar.google.com/citations?hl=en&user=tWtkmawAAAAJ, Luca Nanni https://scholar.google.com/citations?hl=en&user=3HNjL0gAAAAJ. Data Scientist: Anna Bernasconi, https://annabernasconi.faculty.polimi.it/, Francesco Invernici (PhD Student). Compositore (da coinvolgere quando avremo un dimostratore): Luca Francesconi, https://lucafrancesconi.com/

Master theses on Causal Loop Diagrams

Systemic Design is an emerging field driven by the ambitious objective of understanding, making sense of, and addressing complex problems in terms of “relationship and global dynamics”, rather than isolated components. An important instrument of systemic design is the description of complex systems’ dynamics using Causal Loop Diagrams, which illustrate systems’ behaviors at an abstract level; they include nodes, describing factors causing or affecting the problems, and edges, expressing causal relationships and emphasizing the influence between nodes, in qualitative terms. Diagrams are inspected for finding loops, i.e., cyclic paths looping from one variable back to the same variable, and associate -to each loop- a given characterization, namely balancing (B) or reinforcing (R), based on a simple inspection of the edges involved in the loop. In recent work, we have described a model for CLDs; we aim to integrate CLD modeling within the information system design lifecycle to provide high-level insights about causality aspects that could be very relevant for motivating choices before delving into technical solutions. Providing a model is a first step towards the systematic study of CLDs, each corresponding to a master thesis proposal – from a technology, methodology, or application perspective.

1.  Exploring CLD repositories with data science methods. The model allows us to read, store, and manipulate any CLD; a (graph) database implementing the physical store of our metamodel can be filled with several instances corresponding to CLDs that have been proposed in the literature or that are used in consulting practice, so as to provide catalogs of solutions. Interactive tools can assist designers in exploring such patterns; simple search algorithms can be used to extract patterns in response to user interactions.

2.  Integrating CLDs within the Information System Lifecycle. For a given CLD, stored as a graph, it would be possible to extract all possible feedback loops, for instance by focusing on target variables that belong to certain classes or all possible causal routes connecting pairs of extracted variables; with alternative loops or routes, systematically explore their agreement or disagreement. This could provide important input to follow-up information systems design.

3.  Studying complex scenarios  (such as “climate change” or “one health” challenges) using CLDs.

Master theses on Reactive Knowledge Graph Management

Today’s large knowledge graphs are conceived mainly for supporting search and e-commerce within large companies such as Google or Amazon, with well-crafted knowledge creation rules.

Our recent experience of the COVID-19 pandemic, when knowledge has grown at unprecedented rates and has often been contradictory, inspired us to capture a huge gap in existing concepts and technology: today’s knowledge management does not adequately support such a disruptive process. A recent master’s thesis was dedicated to the design and small-scale prototyping of knowledge management systems that support domain diversity and scientific evolution as foundational ingredients. Change management is based on a reactive approach, well-established in database systems, but so far lacking in knowledge systems. In this approach, knowledge hubs “own” a portion of a common knowledge representation. Reactive rules cross the hub’s borders and create the premises for a disciplined knowledge evolution, even under the pressure of crises. The follow-up of this thesis opens several interesting directions.

(1) Stronger support of reactive knowledge management on top of graph databases (such as Neo4j, the Graph database market leader) by defining the knowledge hub and reactive processing components in a distributed/federated setting, both from a computational and an organizational point of view.

(2) Dealing with rule design principles and practices, by applying this approach to other complex crises scenarios beyond COVID-19 – such as climate change. 

(3) Experiment on how reactive processing can support what-if scenarios, by designing different cases of rule reactions and then showing their effects, targeting a tight integration of reactive processing with hypothetical reasoning.

Master theses on Adding stronger trigger support to graph databases

Graph databases are emerging as the leading data management technology for storing large knowledge graphs; significant efforts are ongoing to produce new standards (such as the Graph Query Language, GQL), as well as enrich them with properties, types, schemas, and keys. In this context, and in collaboration with the IT Group of Banca d’Italia, we recently introduced PG-Triggers, a complete proposal for adding triggers to Property Graphs, along the direction marked by the SQL3 Standard, by defining a simple but effective syntax and semantics of PG-Triggers.

Neo4j is the most widely adopted graph database; it has introduced Cypher, a declarative language for querying and manipulating graph data, the de facto standard in graph databases. Neo4j does not support triggers natively, however, triggers are included in APOC (Awesome Procedures on Cypher), a community-provided extension library for Neo4j. Several master theses can be built for extending this work, to (1) Define a new collection of APOC procedures respecting the new standard proposal and adding them to the APOC implementation (initial inspection of APOC code has shown that this work can be done by local rewrite of some critical aspects of the community-provided code). (2) Define rule analysis methods for detecting conflicts among rules that may lead to the lack of termination or confluence.    

Master Thesis on Automatic generation of reasoning tasks on knowledge graphs (Honors Program, DEIB)
Declarative query languages based on logic programming like Datalog and its extensions have recently found successful applications in modeling complex real systems in specialized domains, such as finance. While their declarative nature increases transparency in modeling, fully addressing this requirement entails building comprehensible, clear, and relevant explanations of conclusions inferred by such deductive AI systems. The thesis aims to develop a pipeline that leverages and combines logic programming (Datalog) and NLP techniques to build a user-friendly interface that will allow non-technical users to perform reasoning tasks on a knowledge graph, starting from questions in natural language, using state-of art AI translation models. This thesis takes advantage of our cooperation with the IT Department of the Bank of Italy; in the context of the cooperation agreement, data about Italian companies  can be used for solving real problems.

Master Thesis on Understanding and predicting the duration of judicial trials, with a data-driven analysis.

Europe is concerned with the length of trials in Italy, hence several EU-funded projects are focused on the direction of monitoring the performance of terminated cases and ongoing trial.  The increased use of information systems allowed analyzing more and more in depth some Key Performance indicators, such as the Disposition Time (DT) and the Clearance Rate (CR) of judicial trials.

In particular, we focus on analyzing and discussing the factors that have most impact on the duration of trials. An analysis framework is proposed, which takes advantage of a large dataset, collected in earlier phase of this work, describing five years of trials in the Court of Appeal of Milan. We examine both the phases and total length of the trials and we propose techniques to identify events that are potentially critical, as they have a major impact on their duration. In particular, we aim at predicting the future evolution of processes, given the information available from past process executions and the current state.

MS Thesis Proposal: Sound of Genome and 3D Genome Structure

Loader Loading…
EAD Logo Taking too long?

Reload Reload document
| Open Open in new tab
Scroll to Top