# 2020 REU Projects

# Topology and fluid dynamics in vascular networks - Summer 2020

Mette Olufsen (https://olufsen.wordpress.ncsu.edu/) and Radmila Sazdanovic (https://sazdanovic.wordpress.ncsu.edu/)

__Prerequisites:__ Linear algebra, differential equations, interest in graph theory and computer programming

Metric graphs are types of metric spaces used to model and represent ubiquitous, geometric relations in data such as biological networks, social networks, and road networks. We focus on analyzing vascular networks feeding the lungs by first by providing a qualitative description of the corresponding metric graphs using topological summaries. Graph properties such as connectivity, degree, and motifs will be coupled to biologically relevant information such as vessel radius, vessel function, principal vs. minor branches, etc. and encoded in a feature vector. Topological data analysis (TDA), a novel tool robust to noise and scale, was successfully used to characterize lungs of patients with chronic obstructive pulmonary disease. TDA and machine learning will be used for extracting the consensus networks, relevant features and detecting dependencies. TDA is robust to noise and scale, will be used for extracting the consensus networks and will be used together with machine learning for extracting relevant features and detecting dependencies. These techniques will be used to analyze network data extracted from computed tomography (CT) images from patients with pulmonary hypertension. Topological results will be coupled with a 1D fluid dynamics model and used to understand how the networks remodel in altered environments, e.g. rarefaction/pruning (due to the loss of small vessels) observed in animals under hypoxiainduced hypertension. The combined approach embedding TDA with fluid dynamics simulations allows us to generate a new tool that could be employed to distinguish between controls and patients with hypertension, as well as track changes along with disease progression. This project will lay the groundwork for understanding how vascular network remodels in patients with pulmonary hypertension.

# Geometric methods in computer vision - Summer 2020

Irina Kogan (https://iakogan.math.ncsu.edu/)

__Prerequisites:__ Linear or abstract algebra, differential equations, interest in computation.

Geometric methods find native applications in computer vision, including scene analysis, medical imaging, handwriting recognition, automatics assembly, and many others. Even with significant progress, the search for new accurate, robust, and computationally efficient methods continues. A search in the IEEE publication database lists more than a thousand publications on the topic of curve matching, one of the fundamental problems in computer vision, published in the last decade. The goal of this project is to develop new efficient algorithms for curve matching, based on integral and differential invariants and their numerical approximations. In this study, we will also look for new algorithms to solve a challenging object-image correspondence problem of determining if a given planar object is an image of a given object in three-dimensional space when the position and the parameters of the camera are unknown, as well as the problem of reconstructing an object from its multiple-view images.

# Phylogenetic supertree construction via tropical geometry

Seth Sullivant (https://sethsullivant.wordpress.ncsu.edu/)

__Prerequisites__: Discrete math or abstract algebra, interest in mathematical biology

The fundamental problem in phylogenetics is to recover the evolutionary tree relating a collection of taxa (taxon is short for taxanomic unit, which could be a species, a gene, an individual organism, etc.). On large sets of taxa, it might only be possible to infer the underlying tree structure of smaller subsets of the taxa. The supertree problem is to take trees on overlapping taxa sets and piece them together into a larger tree involving all the taxa. When the trees also have metric information (i.e. the lengths of the tree edges), we would like to construct a supertree that matches this metric information as closely as possible. The BHV-tree space is quite difficult to compute in. An alternate natural space to try to perform these computations are the tropical Grassmannian or the space of ultrametic trees. Lin, Sturmfels, Tang and Yoshida showed that for computations involving convex hulls in tree space, the ultrametric tree space with the tropical metric is much simpler to compute in than the BHV tree space. The goal of this project would be to synthesize and combine the geometric perspective on supertrees with the computational tractability of the tropical geometry of metric tree space to develop new algorithms for computing supertrees. Students would learn about these methods in discrete mathematics and computational geometry and develop code in Matlab for solving problems in evolutionary biology.

# Efficient statistical estimation with shape constraints

David Papp (https://dpapp.math.ncsu.edu/)

__Prerequisites:__ Probability and algorithms

A problem in statistics is the estimation of functions from data with additional constraints on the shape of the estimated function. Popular examples include estimation of log-concave density functions, isotonic regression, and convex regression. Due to the presence of noise, the commonly used least-squares, maximum likelihood, and kernel estimators may not yield estimators with the desired shape. Instead, the estimation problems can be treated as constrained optimization problems with the shape constraints. It can be shown that for several problems of this type, the best (non-parametric) estimator is a piecewise affine function; this includes multivariate convex regression. The algorithmic consequence of this is that the best estimators can be computed by solving (very large scale) linear programs. The goal of this project is to adapt large-scale linear programming techniques to convex regression. Students will learn about the basics of statistical estimation and linear optimization and will develop code for solving convex regression problems.

# Randomized algorithms for spectral clustering

Arvind Saibaba (https://saibab.math.ncsu.edu)

__Prerequisites:__ Linear algebra, numerical analysis, and some background in computer programming.

Clustering is a popular machine learning tool for exploratory data analysis. Clustering involves segregating object collections into groups, in which the constituents of a group are more similar to each other than to those belonging to other groups. Applications of clustering involve image processing (segmenting images to distinguish multiple regions), genomics (to understand relations between genes and diseases), and social networks (to analyze and understand communities). Spectral clustering is a graph-based algorithm that has emerged as a powerful tool for clustering. A dominant cost of the spectral clustering involves computing eigenvectors of the so-called graph Laplacian, where denotes the number of clusters that is sought or desired. To ameliorate this computational cost, we propose to use randomized matrix methods to compute the approximate eigenvectors. These algorithms use randomness as a key computational tool to identify and leverage computational benefits and have recently become popular because they are easy to implement, accurate, and computationally efficient. This project can be tailored to the interests of the UG students. It requires that they (i) develop and implement the randomized matrix algorithms within a spectral clustering framework, (ii) validate this computational framework on a variety of datasets and applications such as hyperspectral image segmentation, (iii) analyze the error in the clustering assignment due to the randomization. This combination of theory and application exposes the students to a variety of topics and tools in scientific computing and data science.

# Numerical methods for hyperbolic systems of conservation and balance laws

Alina Chertock (https://chertock.wordpress.ncsu.edu/)

__Prerequisites:__ Partial differential equations, numerical analysis

Many physical phenomena can be described by nonlinear hyperbolic systems of conservation and balance laws. The main source of difficulties in solving such problems, is lack of smoothness, as solutions of hyperbolic conservation/balance laws can develop complicated nonlinear wave structures including shocks, rarefaction waves and contact discontinuities. The level of complexity increases when solutions reveal a multiscale character and/or the system includes additional terms such as friction terms, geometrical terms, nonconservative products, etc., which must be accounted for to describe the physical phenomena. In such cases, it is important to design a numerical method that is consistent with the PDEs and preserves structural and asymptotic properties of the underlying problem at the discrete level. While a variety of numerical methods exist, there are still many open problems, for which the derivation of reliable high-resolution numerical methods still remains a challenging task. During the REU, students will learn about recent advances in the development of two classes of structure preserving numerical methods: (i) well-balanced and positivity preserving numerical schemes, which are capable of exactly preserving steady-state solutions as well as maintaining the positivity of the numerical quantities when it is required by the physical application, and (ii) asymptotic preserving schemes, which provide accurate and efficient numerical solutions in certain stiff and/or asymptotic regimes of physical interest. Students will develop codes in Matlab for solving problems with application in fluid dynamics, oceanography, and atmospheric sciences.

# Numerical solutions to ODE/PDEs with applications

Zhilin Li (https://zhilin.math.ncsu.edu/)

__Prerequisites:__ Calculus, linear algebra, differential equations, interest in numerical analysis

The purpose of this project is to solve differential equations either ordinary (ODEs) or partial (PDEs) from the modeling and application part of the REU program. Our focus will be two-phase or multi-phase flow problems, free boundary and moving interface problems. In this project, we will explore ways to compute the solution and its derivative to an ODE/PDE accurately at the same time, particularly using finite difference or finite element discretization’s. Dr. Li has developed the immersed interface method (IIM), the immersed finite method (IFEM), and augmented strategies, and some ODE/PDE solver packages. For this project, we will also explore traditional and new methods for evolving free boundaries and moving interface such as the front tracking method, the level set method, and arbitrary Lagrangian approach. We will also encourage REU students to explore parallel and super-computing and carry out a theoretical analysis of the numerical methods.

# Quantum error-correcting codes

Bojko Bakalov (https://sites.google.com/a/ncsu.edu/bakalov/home)

__Prerequisites:__ Abstract algebra, introduction to discrete mathematics

Quantum computers utilize physical principles, e.g. quantum superposition and quantum entanglement that cannot be implemented using classical computers. For certain problems, such computers are more powerful than conventional computers (14). NCSU is the first university-based IBM Q Hub in North America and has access to the IBM Q commercial quantum computing devices including the most advanced 20 qubit system to be followed by a next-generation 50 qubit prototype anticipated in 2019. Error-correcting codes are widely used in satellite and telecommunications, DVDs, barcodes, etc. Quantum computers will also need error correction because quantum states are very fragile; errors may arise from faulty gates, leakage, loss, environmental noise, and decoherence. Quantum error correction utilizes classical coding theory, as well as linear algebra, group theory, finite fields, and topology. Students participating in this project will learn about coding theory and quantum computing. The goal is to construct new quantum codes using existing methods and to develop new methods and generalizations.

# Comparing stochastic and random differential equations to study tumor growth variability

Kevin Flores (https://kbflores.wordpress.ncsu.edu/)

__Prerequisites:__ Differential equations, probability

Vector affine systems of stochastic differential equations (SDEs) are a class of mathematical models that have pointwise equivalence with a corresponding set of random differential equations (RDEs). This equivalence is important because there exists a functional analytic framework, called the Prohorov metric framework, that can be used to non-parametrically estimate parameters of RDEs by treating each parameter as a random variable. Thereby, estimation for the parameters of Fokker-Planck equations describing the time evolution of the probability density function for SDEs can be performed within an RDE framework as long as an equivalent RDE system exists. The goal of this project is to derive equivalent RDEs for several biologically relevant nonlinear SDEs. Students will investigate the accuracy in using stochastic versions of several commonly used ordinary differential equations, i.e. nonlinear SDEs, for modeling the variability observed from *in vitro*tumor cell line data. Students will derive Fokker-Planck equations from the SDEs and then, using Ito calculus, transform these into affine SDEs. Pointwise equivalent RDEs will then be derived, and parameter estimation will be performed within the Prohorov metric framework. This interdisciplinary project will train students in the areas of stochastic processes, functional analysis, statistics, partial differential equations, mathematical biology, and parameter estimation.

# Financial mathematics, optimization and applications

Tao Pang (https://math.sciences.ncsu.edu/people/tpang/) and Negash Medhin (https://math.sciences.ncsu.edu/people/ngmedhin/)

__Prerequisites:__ Probability, optimization, interest in computing.

In financial markets, portfolio allocation is one of the major problems faced by an investor. The investor’s goal is to find an optimal combination to distribute a given fund on a set of existing assets minimizing risk and maximizing gain. The risk can be measured by variance, or by other measures such as Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR). CVaR is a convex risk measure that better describes the investor’s risk of loss than variance. Financial regulators now require banks to use CVaR as the appropriate risk measure. For example, the Black-Litterman type of portfolio optimization problem over a single period with CVaR as risk measure is solved. One goal of this project is to extend these results to multiple period portfolio optimization problems. Moreover, to estimate the CVaR from the market data is also a very challenging problem. Another goal of this project is to develop some efficient CVaR estimation methods to deal with multiple assets with correlation. Further, the portfolio optimization problem becomes computationally expensive when there are more than a few hundreds of assets to select from, which is common in practice. This can be formulated as a stochastic programming problem. The third goal of this project is to develop efficient methods to handle large portfolios using tools from multi-objective and game-theoretic stochastic optimization, and Monte Carlo computations.