I am broadly interested in applied probability with applications in mathematical biology, population genetics, and stochastic networks. Below I describe a few of my main
research areas and projects within those areas:

Rare events in Markov processes

Stochastic processes in cancer biology

Mathematical models of cancer stem cells

One main area of interest is rare event analysis and the design of optimal simulation techniques for estimating rare event probabilities. Rare event estimation problems arise in many applications, including stochastic queueing networks (arising in communication and manufacturing models), risk and reliability analysis, and also problems in evolutionary biology and ecology. Large deviations theory is a method of characterizing the asymptotic likelihood of a rare event and also the asymptotically most likely method for a rare event to occur. I have been interested in large deviations analysis of Markov processes and establishing importance sampling algorithms via game-theoretic approaches.

**Splitting algorithms for stochastic networks.**Splitting methods generate multiple particles that follow the original dynamics of the system; however, as a particle moves towards a target set of interest it crosses splitting thresholds where it is split into several identical offspring. This algorithm is a sequential Monte Carlo method that is useful for estimating the probability of a stochastic process hitting a rare set. I have been interested in studying the efficiency of spliting methods in the setting of estimating overflow probabilities in Jackson networks, which are a broad class of stochastic queuing networks. In our work, we established precise upper bounds on the performance of splitting estimators; these upper bounds facilitated a comparison between splitting algorithms and alternatives such as importance sampling or solving the associated linear system. We proved that well-designed splitting algorithms are less computationally intensive than directly solving the associated linear system, and that in the setting of a tandem Jackson network splitting outperforms importance sampling in certain cases. This result theoretically establishes a widely held belief that performance of splitting algorithms are largely robust to the dimension of the problem. PDF (w/J. Blanchet and Y. Shi)

**Rare events in random walks.**I have been interested in estimating the large deviations probabilities of random walks: $P(S_n\in nA)$ where $S_n$ is the partial sum of i.i.d random variables whose mean does not lie in the closure of the set $A$. In this work we developed an importance sampling change of measure for estimating this probability, based on utilizing the solution to an Isaacs equation, which is a PDE obtained in the large deviations limit via a variational formula. Under this change of measure, we proved that the relative error of the estimator remains bounded as $n$ increased to infinity. This result is applicable for almost all light-tailed one dimensional random walks, and also for a certain class of multidimensional random walks.

"Strongly efficient algorithms for light-tailed random walks: An old folk song sung to a faster new tune..." (with J. Blanchet and P. Glynn) Monte Carlo and Quasi Monte Carlo Methods (2009).**Importance sampling for sums of heavy-tailed random variables.**In this work we developed the first importance sampling estimator for heavy-tailed problems with the property of bounded relative error. The change of measure introduced in this paper is now widely-used for rare-event problems involving heavy-tailed phenomena.

"Importance sampling for sums of random variables with regularly varying tails." (with P. Dupuis and H. Wang) ACM Transactions on Modeling and Computer Simulation (2007).**Serve-the-longest-queue.**The weighted-serve-the-longest-queue (WSLQ) policy is a common and intuitive service model for a server that serves multiple queues: customers arrive at queues as a Poisson arrival process, and the longest queue is serviced first. Since the arrival or service completion of a single customer can change the behavior of the server, this system will have multiple interfaces of discontinuity in the interior of the state space, posing a challenge for establishing a large deviations principle. In our initial work we established a sample path large deviation principle for the WSLQ system, without any symmetry conditions or dimension restrictions. In a follow-up work we used our sample path large deviations result to characterize the explicit exponential decay rate of the buffer overflow probabilities in the WSLQ system. This was done by using subsolutions and supersolutions to a Hamiltonian associated with the rare event probability. We also constructed an asymptotically optimal importance sampling estimator. This estimator was derived by formulating the search for an importance sampling measure as a two person differential game, and then building the sampling measure from subsolutions to a Isaacs equation associated with an asymptotic limit of the game. PDF1 (w/P. Dupuis and H. Wang), PDF2 (w/P. Dupuis and H. Wang)**Server slowdown.**I have also worked on studying buffer overflow events in a tandem network (i.e. two queues in series) with server slowdown. To elaborate, when the queue size of the second server in a tandem network reaches a pre-determined threshold, the service rate of the first server decreases. This is a commonly used strategy in manufacturing and ethernet design. Analogous to the WSLQ systems, this problem has discontinuous statistics in the interior of the domain. In this work we proved a large deviations result for the probability that the second buffer overflows. We also derived an asymptotically efficient importance sampling estimator for this probability. As in the WSLQ system, this problem was solved using the PDE and game theoretic approach to large deviations and importance sampling. PDF (w/P. Dupuis and H. Wang)**Stochastic recurrence equations.**Stochastic recurrence equations are powerful models that are widely used in finance, insurance, and biology. They can be very useful in the modeling of populations undergoing stochastic growth with immigration/emigration. I have been interested in the development of rare event simulation algorithms for stochastic recurrence equations with heavy-tailed innovations. In this work, we have designed importance sampling algorithms that estimate these probabilities with bounded relative error.

"Importance sampling for stochastic recurrence equations with heavy tailed innovations" (w/J. Blanchet and H. Hult), please email me for preprint.**Lyapunov inequalities and subsolutions for efficient importance sampling.**I have also been interested in connections between Lyapunov methods and subsolutions of an associated Isaacs equation for the design of efficient importance sampling schemes. Subsolutions, which arise by taking an appropriate limit of an associated Lyapunov inequality, have been recently proposed and have been succesfully applied to many challenging problems in rare-event simulation. Lyapunov inequalities have been used for testing the efficiency of state-dependent importance sampling schemes in heavy-tailed or discrete settings. While subsolutions provide a powerful and versitile vehicle for constructing - through an analytic criterion - efficient samplers, Lyapunov inequalities are useful for finding more precise information on the behavior of the coefficient of variation of the associated importance sampling estimator. The use of Lyapunov inequalities also allows one to gain insight into the various mollification procedures that are often required in constructing associated subsolutions. In this work, we have shown that for a class of first passage time problems of multidimensional Markov random walks a suitable mollification parameter allows one to obtain a strongly efficient estimator (in the sense that the coefficient of variation remains bounded as the probability of interest decreases). We have also shown that an appropriate mollification scheme in a sampler proposed in (Dupuis et al 2007) for the overflow probability at level $n$ of the whole population of a Markovian tandem networks in a busy period has a coefficient of variation that grows at most at rate $(d+2-\#$ of bottlenecks) as $n\nearrow\infty$. PDF (w/J. Blanchet and P. Glynn)

Many interesting mathematical problems arise from the study of evolutionary processes and population genetics. I have been particularly interested in applying mathematical models to study the evolutionary processes at the cellular level which occur during the initiation, progression, and treatment of cancer.

**Evolutionary dynamics of tumorigenesis with random mutational fitness advances.**During cancer progression, neoplasms (abnormal growths) arise due to the accumulation of a series of rare events such as point mutations in the DNA sequence, epigenetic alterations or chromosomal changes occurring amongst the somatic cells within the body. These cell populations evolve by natural selection mediated by heritable variability within the population and fitness differences between various cell types. To study the effects of random mutation accumulation in tumorigenesis, collaborators and I have studied a stochastic model of an expanding population undergoing random mutational fitness advances. Specifically, we studied a birth death process with mutations, during each mutation the mutated offspring would have new birth/death rate that is the parent cells birth/death rate plus a random variable $X$. Our method of study for this process was to break the population into subpopulations based on the number of mutations that a given cell had, we called these subgroups `waves'. We studied the effects of the distribution of $X$ on the growth rate of the population and the diversity properties of the system. We established that as $t\to\infty$ properly scaled versions of wave $k$ ($k\geq 1$) population converges weakly to a non-degenerate random variable which we identify via its Laplace transform. In addition we established that the weak limit of wave 1 converged to the sum of points of a nonhomogenous Poisson process. Importantly we observed that the limiting random variables were largely robust to the distribution of $X$, assuming $X$ has bounded support. In fact the only property of $X$ that had a significant impact was the upper limit of its support.

"Evolutionary dynamics of tumor progression with random fitness values" (w/ R. Durrett, J. Foo, J. Mayberry, and F. Michor) Theoretical Population Biology (2010)**Mathematical models of tumor heterogeneity.**All cancers display significant variability among the cancer cells within a single tumor. This heterogeneity has direct clinical implications on disease classification and prognosis, as well as on treatment efficacy and drug target identification. For example, the degree of genetic clonal diversity in Barrett's esophagus has been correlated to clinical progression to esophageal cancer (Maley et al 2006, Nature); similar phenomena has been observed in several other tumor types. In this paper we used the previous result characterizing the limit of wave 1 as the sum of points of a nonhomogeneous Poisson process to study the diversity properties of the first wave of mutants. We characterized diversity measures such as the Simpson's index and the ratio of the largest clone to the first n clones in size. We also analyzed the contribution of each wave of mutants in the small mutation limit. This work contributes to a mathematical understanding of diversity in tumors, and gives insight into the impact of various evolutionary parameters on tumor diversity. PDF (w/ R. Durrett, J. Foo, J. Mayberry, and F. Michor)**Stochastic dynamics of cancer initiation.**Most human cancer types result from the accumulation of multiple genetic and epigenetic alterations in a single cell. Once the first change (or changes) have arisen, tumorigenesis is initiated and the subsequent emergence of additional alterations drives progression to more aggressive and ultimately invasive phenotypes. Elucidation of the dynamics of cancer initiation is of importance for an understanding of tumor evolution and cancer incidence data. In this paper, we developed a novel mathematical framework to study the processes of cancer initiation. Cells at risk of accumulating oncogenic mutations are organized into small compartments of cells and proliferate according to a stochastic process. During each cell division, an (epi)genetic alteration may arise which leads to a random fitness change, drawn from a probability distribution. Cancer is initiated when a cell gains a fitness sufficiently high to escape from the homeostatic mechanisms of the cell compartment. To investigate cancer initiation during a human lifetime, a `race' between this fitness (Moran) process and the aging process of the patient is considered; the latter is modeled as a second stochastic Markov process in an aging dimension. This model allows us to investigate the dynamics of cancer initiation and its dependence on the mutational fitness distribution. Our framework also provides a methodology to assess the effects of different life expectancy distributions on lifetime cancer incidence.

"Stochastic dynamics of cancer initiation (w/J. Foo and F. Michor)," Physical Biology (2010)**Evolution of drug resistance.**The development of drug resistance in cells or pathogens often occurs due to random genetic or epigenetic events which occur during reproduction. I have been interested in using multitype branching process models to study the evolution of drug resistance in the context of anti-cancer therapies. Chronic myeloid leukemia (CML) is one of the rare success stories of cancer research; a targeted therapy (Gleevec) was developed in the 90s that has been largely succesful in inducing remission. However, many patients develop leukemic cells that are resistant to this drug. In fact, there have been over 90 different point mutations that have been identied to confer resistance to Gleevec. Using experimental results quantifying growth kinetics of the wild-type CML and resistant mutants, we have designed a mathematical model to predict the number of resistant cells present in patients at diagnosis and the number of distinct types of mutants present. We model the wild-type CML cell population as a supercritical branching process and assume that the 11 mutant strains arrive according to a nonhomogenous Poisson process whose rate at time t is proportional to the wild type population at time t. Newly arrived mutant strains begin binary Markov branching processes. Using this model we determined that in most patients there will be at most a single resistant cell type at detection. In addition, we characterized the relative benefits of monotherapy with imatinib (Gleevec) versus combination therapy using imatinib and second-generation inhibitors, and the dependence of these quantities on early versus late diagnosis.

"Diversity in pre-existing resistance to BCR-ABL inhibitors in chronic myeloid leukemia," (w/ J. Foo, B. Skaggs, M.E. Gorre, C. Sawyers, and F. Michor), please email me for preprint.

### Mathematical models of cancer stem cells

**Plasticity of the cancer stem cell phenotype.** During normal development, differentiation from stem cells to final producs is unidirectional. Some data suggests that oncogenic
mutations lead to loss
of the ability for cells to maintain their
differentiated state. Dedifferentiation of tumor progenitor cells can lead to a repopulation of the cancer stem cell pool and therefore may counteract the effects of treatment. In this work we developed a
novel mathematical framework to investigate the consequences of tumor cell dedifferentiation on the response to treatment and the risk of resistance. We found that the ability to dedifferentiate substantially
reduces the effectiveness of therapy directed at cancer stem cells by leading to higher rates of resistance, and conclude that plasticity of the cancer stem cell phenotype is an important determinant of the
prognosis of tumors.

"Therapeutic implications of plasticity of the cancer stem cell phenotype," (w/E. Holland and F. Michor),