## February 19, 2012

### Talks Next Week

Attention conservation notice: Only of interest if you (1) like hearing people talk about statistics and machine learning, and (2) will be in Pittsburgh next week.

Mark Davenport, "To Adapt or Not To Adapt: The Power and Limits of Adaptivity for Sparse Estimation"
Abstract: In recent years, the fields of signal processing, statistical inference, and machine learning have come under mounting pressure to accommodate massive amounts of increasingly high-dimensional data. Despite extraordinary advances in computational power, the data produced in application areas such as imaging, remote surveillance, meteorology, genomics, and large scale network analysis continues to pose a number of challenges. Fortunately, in many cases these high-dimensional signals contain relatively little information compared to their ambient dimensionality. For example, signals can often be well-approximated as sparse in a known basis, as a matrix having low rank, or using a low-dimensional manifold or parametric model. Exploiting this structure is critical to any effort to extract information from such data.
In this talk I will overview some of my recent research on how to exploit such models to recover high-dimensional signals from as few observations as possible. Specifically, I will primarily focus on the problem of estimating a sparse vector from a small number of noisy measurements. To begin, I will consider the case where the measurements are acquired in a nonadaptive fashion. I will establish a lower bound on the minimax mean-squared error of the recovered vector which very nearly matches the performance of $\ell1_$-minimization techniques, and hence shows that these techniques are essentially optimal. I will then consider the case where the measurements are acquired sequentially in an adaptive manner. I will prove a lower bound that shows that, surprisingly, adaptivity does not allow for substantial improvement over standard nonadaptive techniques in terms of the minimax MSE. Nonetheless, I will also show that there are important regimes where the benefits of adaptivity are clear and overwhelming.
Time and place: 4--5 pm on Monday, 20 February 2012, in Scaife Hall 125
Ambuj Tewari, "From Probabilistic to Game Theoretic Foundations for Learning and Prediction"
Abstract: The probabilistic approach to prediction problems assumes that the data is generated from an underlying stochastic process. A reasonable goal then is to minimize the expected loss, or risk. The game theoretic approach, in contrast, views prediction as a repeated game between the learner and an adversary. The learner's goal then is to do well no matter what strategy is followed by the adversary. Minimizing regret is one of the well known ways to operationalize the notion of doing well. With a long history in varied disciplines such as Computer Science, Economics, Information Theory, and Statistics, the game theoretic approach has witnessed a vigorous development. Yet the suite of standard tools available for the probabilistic setting, such as Rademacher & Gaussian averages, covering numbers, and combinatorial dimensions, was missing in the game theoretic setting. In this talk, I will show how it is indeed possible to develop analogues of these tools for the game theoretic setting. Unlike the probabilistic setting, where empirical risk minimization is a canonical algorithm, we will not be able to exhibit a corresponding canonical algorithm for the game theoretic setting. However, under the additional assumption of convexity, I will show that Mirror Descent, a classic algorithm from optimization theory, is a canonical algorithm achieving minimax regret rates.
(Talk is based on papers written jointly with Alexander Rakhlin, Nathan Srebro, and Karthik Sridharan.)
Time and place: 10--11 am on Wednesday, 22 February 2012, in Gates Hall 6115
Forrest W. Crawford, "Birth, Death, Sex, Lies: Markov Counting Processes in Genetics and Beyond"
Abstract: A general birth-death process (BDP) is a continuous-time Markov chain that counts the number of particles in a system over time. At any moment in time, a particle may give birth or die, and the rate at which these events occur depends on the number of particles in the system at that time. While widely used in population biology, genetics, and evolution, statistical inference techniques for general BDPs remain elusive. In fact, the likelihood of a discrete observation from many of these processes cannot be written in closed form. In this talk, I outline several fundamental results that allow computation of transition probabilities and maximum likelihood estimates for general BDPs. I apply these novel methods to three important applied problems. First, I describe a technique for determining the effect of antibody treatment on the growth of lymphoma cells in vitro. Second, I investigate the evolution of DNA microsatellites in humans and chimpanzees using a log-linear model for the rates of repeat duplication and deletion. Finally, I use a BDP to infer true counts of sex acts from rounded self-reported counts in a longitudinal study of risky behaviors in young people living with HIV. These applications illustrate the mathematical, statistical, and computational challenges involved in learning from BDPs in biology, medicine, and public health.
Time and place: 4--5 pm on Wednesday, 22 February 2012, in Scaife Hall 125
Ron Bekkerman, "Scaling Up Machine Learning"
Abstract: In this talk, I'll provide an extensive introduction to parallel and distributed machine learning. I'll answer the questions "How actually big is the big data?", "How much training data is enough?", "What do we do if we don't have enough training data?", "What are platform choices for parallel learning?" etc. Over an example of k-means clustering, I'll discuss pros and cons of machine learning in Apache Pig, MPI, DryadLINQ, and CUDA. Time permitting, I'll take a dive into a super large scale text categorization task.
Time and place: 1:30--2:30 pm on Thursday, 23 February 2012, in Newell-Simon Hall 1305

As always, the talks are free and open to the public.

(You see why I have trouble keeping up with these.)

Posted at February 19, 2012 12:30 | permanent link