# 2021 REU Project Descriptions

# Analyzing data from patients with pulmonary hypertension (Summer 2021)

**Prerequisites:** Differential equations, interest in biology, programming experience

**Outline: **Pulmonary hypertension is a rare but deadly disease, which requires both imaging and invasive measurements to diagnose. The disease is often detected late as it shares symptoms with several other diseases, and it is not easy to determine how successful the given treatments are. To do so requires integrating imaging data with dynamic measurements.

**Research objectives:** To design and validate a fluid mechanics model integrating CT images and dynamic blood pressure measurements from right heart catheterization that can predict the load on the heart both at rest and during exercise.

**Outcomes**: Mathematical model integrating imaging data with dynamic measurements; local and global sensitivity analysis determining what model parameters impact predictions of blood pressure and flow at rest and during exercise (5 min walk test); a study of what parameters can be identified given the model and data; and simulations predicting effects of vasodilatory treatment at rest and during exercise.

# Natural language processing of medical records (Summer 2021)

**Prerequisites:** Linear algebra, basic statistics

**Outline**: Important information in medical records often takes the form of unstructured text, such as a doctor’s notes. Natural language processing (NLP) is needed to translate text into structured variables that can then be analyzed using statistical or machine learning methods. While standard methods are well-developed and straightforward to implement, they rely on simplifying assumptions that may limit their effectiveness. In this project, students will adapt conformal methods to improve NLP. Conformal prediction is an exciting idea from machine learning, related to the foundation of statistics, and is flexible and easy to interpret and code.

**Research objectives:** To evaluate the effectiveness of conformal prediction algorithms in the context of unstructured text data. The students will evaluate the novel NLP method through extensive comparisons with standard NLP methods using simulated data and develop these ideas in the context of an application to publicly available medical transcription data (www.mtsamples. com).

**Outcomes**: Techniques for analyzing unstructured text data, they will study the significance of, and how to construct a conformal prediction algorithm, and write code to implement these methods.

# Probabilistic prediction of extreme events in dynamical systems (Summer 2021)

**Prerequisites:** Differential equations, programming experience, elementary probability.

**Outline: **Extreme events occur in many natural and engineering systems. Examples include ocean rogue waves, extreme weather patterns, tsunamis, and stock market crashes, among many others. These systems can be described as chaotic dynamical systems. In practice, however, we are often unable to measure all degrees of freedom of the dynamical system. This project seeks to address the following question: Given a dynamical system, which degrees of freedom must be measured to ensure accurate predictions of upcoming extreme events.

**Research objectives: **To formulate and solve an appropriate optimization problem whose solution is a reliable precursor of upcoming extreme events, predicting these events with minimal false-positive and false-negative rates.

**Outcomes**: A computational toolbox with Graphic User Interface (GUI). This toolbox takes observational data as input, solves an optimization problem to discover the extreme event precursor, and produces graphs/tables summarizing the prediction results.

# A functional data analysis of disease outbreak data

**Prerequisites:** Linear algebra, basic statistics

**Outline:** Since the onset of the COVID-19 pandemic, curves depicting the number of infected members within a population have become commonplace in most citizens’ lives. Functional data analysis (FDA) is a field of statistics that models the distribution of functions, particularly, curves. FDA methods are excellent for extracting low-dimensional summaries of functional data. They can be used for visualization, to cluster similar curves and to determine factors that contribute the curves’ shapes, and magnitude. In this project, we will apply FDA methods to freely available daily county-level COVID-19 data.

**Research objectives: **To provide new insights into the factors that determine the spread of the disease and contribute to the development of mitigation strategies. Analysis of effects of demographic, socioeconomic and environmental variables on the shape and magnitude of the disease outbreak curves, and investigation of the effects of government interventions such as school closings and shelter-in-place orders as well as citizen mobility data as provided by Google (https://www.google.com/covid19/mobility/).

**Outcomes**: Estimates of intervention effects and improved understanding of the conditions that are indicative of a disease hot spot.

# Mathematical optimization approaches to radiation therapy treatments of brain metastases (Summer 2021)

**Prerequisites**: Algorithms; basic optimization or linear algebra

**Outline**: Radiation therapy (RT) is one of the primary forms of treatment for a wide variety of cancers including brain tumors. During RT, patients receive several identical doses of ionizing radiation on consecutive days with the goal of destroying the tumor cells while sparing the adjacent healthy tissue from radiation damage as much as possible. The ideal treatment is personalized for each patient, and it is computed using mathematical optimization techniques.

**Research objectives**: The students will investigate mathematical optimization models and solution approaches for the treatment of multiple lesions in the brain. Recent studies show that patients with this type of tumor can potentially benefit from deviating from the conventional clinical practice and receiving different doses of radiation on each treatment day. However, the corresponding optimization problems cannot be solved with the algorithms used in conventional RT.

**Outcomes**: Mathematical models and techniques for formulating radiotherapy optimization problems for the treatment of multiple brain metastases that can be solved to optimality in a clinically reasonable time.

# Goal-oriented data acquisition for parameter estimation

**Prerequisites:** Multivariable calculus, differential equations, and basic programming skills

**Outline:** Mathematical models play a crucial role in understanding real-world phenomena and making predictions. Such models include parameters that are needed but are unknown/uncertain. These parameters can be estimated by solving an inverse problem, using the model and measurements to estimate the unknown parameters. The quality of the estimated parameters depends on the availability of informative data. A key question is: how much and what types of data are needed for reliable parameter estimation? Hyper-differential sensitivity analysis (HDSA) provides a novel approach for investigating this by computing the sensitivity of the solution of an inverse problem to different measurements.

**Research objectives: **Exploration of HDSA in inverse problems governed by ODEs describing the COVID-19 epidemic to understand better the measurements that are essential for estimating the key parameters such as disease transmission rate, incubation period, recovery rate.

**Outcomes**: Computational framework for understanding the sensitivity of inverse problems governed by ODEs describing COVID-19. Results will provide tools for quickly determining which measurements are essential for estimating the critical model parameters. Computer codes will be made publicly available to stimulate further research in this direction.

# Optimal packing of cells in tissues

**Prerequisites:** Calculus, Physics I, programming experience

**Outline:** How cells pack in tissue determines the nature of the tissue. Controlling how cells pack in a tissue provides a biological engine for changing the shape and function of a tissue. What is the correspondence between what we see under the microscope, and the physics generating the observed patterns? Epithelial tissues consist of cells with negligible space between them and can be considered a partitioning of space. The partitioning is governed by physics (surface tension and adhesion) and biology (e.g., as the minimal size of the nucleus).

**Research objectives: **Development of a model for packing cells in tissues. This model will be analyzed to determine optimal packing under different or changing constraints. The analysis will include exact minimal solutions for specific cases, and statistical analysis of variability within more general cases. Students will also statistically analyze the geometric features of cells in example tissues, via 3D image analysis.

**Outcomes**: Comparison of model results with microscopy data will contribute to our understanding of the physics controlling cell packing in tissues.

# Improving generalization for dropout in deep learning

**Prerequisites:** Probability, basic programming and algorithms.

**Outline:** A central concern in statistical machine learning is that algorithms generalize well to newly observed data by avoiding overfitting. One algorithm commonly used to achieve this is the random subspace method (22), a variant of algorithms broadly adopted in deep learning under the name dropout (23). Despite their widespread use, these methods’ theoretical properties are not fully understood, especially for non-parametric problems and deep learning. This project will investigate the theoretical and practical aspects of these algorithms.

**Research objectives: **To investigate the effects of specific noise models used to implement dropout and the relationship between the network depth and the efficacy of dropout, extending recently developed theoretical results, which indicate that there is a gap in deep networks Bernoulli’s dropout noise, is not able to prevent overfitting. Students will investigate other dropout noise models, including depth-dependent noise.

**Outcomes**: New algorithms implementing the research ideas, as well as publicly available code demonstrating and comparing the effectiveness of these algorithms.

# Inverse modeling for scanning transmission electron microscopy

Ralph Smith and Kimberly Weems

**Prerequisites:** Probability, statistical computing.

**Outline:** A fundamental task in materials science is to understand materials at the atomic level. With this understanding, materials can be engineered with desirable properties such as hardness, heat resistance, etc. Scanning transmission electron microscopy (STEM) has revolutionized this process by providing direct observations of the atoms that form a crystalline material. However, these image data are noisy and thus, statistical inverse modeling is needed to provide more precise estimates of material properties and quantify uncertainty.

**Research objectives**: To use Bayesian hierarchical modeling to extract information about material properties from STEM images. In particular, they will study the distributions of defects such as atom-column vacancies and displacements using data-fusion methods to resolve STEM images and physics models of the defect effects.

**Outcomes**: Using the proposed methods for a case study quantifying age distribution. Python code will available to supplement the STEM imaging toolbox.