Lorentz Center - Online Algorithms and Learning from 17 Nov 2014 through 21 Nov 2014
  Current Workshop  |   Overview   Back  |   Home   |   Search   |     

    Online Algorithms and Learning
    from 17 Nov 2014 through 21 Nov 2014


Abstracts                                                                             last update 10 November 2014

Jake Abernethy


Yossi Azar
Serving in the Dark should be done Non-Uniformly!


We study the following balls and bins stochastic game between a player and an adversary: there are B bins and a sequence of ball arrival and extraction events. In an arrival event a ball is stored in an empty bin chosen by the adversary and discarded if no bin is empty. In an extraction event, an algorithm selects a bin, clears it, and gains its content. We are interested in analyzing the gain of an algorithm which serves in the dark without any feedback at all, i.e., does not see the sequence, the content of the bins, and even the content of the cleared bins (i.e. an oblivious algorithm). We compare that gain to the gain of an optimal, open eyes, strategy that gets the same online sequence. We name this gain ratio "the loss of serving in the dark".


The randomized algorithm that was previously analyzed is choosing a bin independently and uniformly at random, which resulted in a competitive ratio of about 1.69. We show that although no information is ever provided to the algorithm, using non-uniform probability distribution reduces the competitive ratio. Specifically, we design a $1.56$-competitive algorithm and establish a lower bound of $1.5$. We also prove a lower bound of 2 against any determinstic algorithm. This matches the preformace of the round robin 2-competitive strategy. Finally, we present an application relating to a prompt mechanism for bounded capacity auction.

Joint work with Ilan Cohen


Francis Bach

Beyond stochastic gradient descent for large-scale machine learning


Many machine learning and signal processing problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are O(1/\sqrt{n}) for general convex functions and reaches O(1/n) for strongly-convex functions. In this talk, I will show how the smoothness of loss functions may be used to design novel algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of O(1/n) without strong convexity assumptions, while in the practical finite-data setting, an appropriate combination of batch and online algorithms leads to unexpected behaviors, such as a linear convergence rate for strongly convex problems, with an iteration cost similar to stochastic gradient descent. (joint work with Nicolas Le Roux, Eric Moulines and Mark Schmidt).   


Maria (Nina) Balcan

Agnostic Active Learning of Linear Separators

Active learning is a modern learning paradigm where the algorithm itself can ask for labels of carefully chosen examples from a large pool of unannotated data with the goal of minimizing human labeling effort. In this talk, I will present a computationally efficient, noise tolerant, and
label  efficient active learning algorithm for learning linear separators under log-concave and nearly log-concave distributions. I will also discuss unexpected implications of these results for the classic supervised learning paradigm.


Rui Castro
Adaptive Sensing for Estimation of Structured Sparse Sets


In many practical settings one can sequentially and adaptively guide the collection of future data, based on information extracted from data already collected, in what is known as sequential experimental design, active learning, or adaptive sensing/sampling (depending on the context).  The intricate relation between data analysis and acquisition in adaptive sensing paradigms is extremely powerful, and allows for reliable estimation in situations where non-adaptive sensing would fail dramatically.  In this talk I consider estimation and detection of high-dimensional structured sparse signals in noise, and present near-optimal adaptive sensing procedures that provably outperform the best possible inference methods based on non-adaptive sensing. The methods developed can also be used also for adaptive compressive sensing and for detection of (sparse) correlations.

This talk is based on joint works with E. Tánczos, and with Pierre-André Savalle and Gábor Lugosi.

Nicolň Cesa-Bianchi
Real-time bidding and regret minimization

In real-time bidding (RTB), ad exchanges run second-price auctions in a few milliseconds, allowing publishers to sell ad spaces to advertisers on a per-impression basis. The advantages of RTB, which allows the accurate tailoring of impressions to the features of each individual user, has fueled the demand for algorithmic platforms that serve the needs of either the seller or the buyer. In this talk, we focus on the problem, faced by the seller, of dynamically optimizing the reserve price in each auction with the goal of maximizing overall revenue. We cast this problem in a regret minimization setting, and describe a computationally efficient algorithm that achieves a regret of order T^{1/2} in a sequence of T auctions (ignoring logarithmic factors) under the assumption that all bids are independently drawn from the same unknown and arbitrary distribution.

Joint work with: Claudio Gentile (Varese) and Yishay Mansour (Tel-Aviv).

Amit Daniely
From average case complexity to improper learning complexity

It is presently still unknown how to show hardness of learning problems. There are huge gaps between our upper and lower bounds in the area. The main obstacle is that standard NP-reductions do not yield hardness of learning . All known lower bounds rely on (unproved) cryptographic assumptions.


We introduce a new technique to this area, using reductions from problems that are hard on average. We put forward a natural generalization of Feige's assumption about the complexity of refuting random K-SAT instances. Under this assumption we show:

1. Learning DNFs is hard.

2. Learning an intersection of super logarithmic number of halfspaces is hard.


In addition, the same assumption implies the hardness of virtually all learning problems that were previously shown hard under cryptographic assumptions.

Joint work with Nati Linial and Shai Shelev-Shwartz



Nikhil Devanur
Fast Algorithms for Online Stochastic Convex Programming

We introduce the online stochastic Convex Programming(CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints. Online stochastic packing and covering form special cases of online stochastic CP. We present fast algorithms for these problems, which achieve near-optimal regret guarantees for both the i.i.d. and the random permutation models of stochastic inputs. Our techniques make explicit the connection of primal-dual paradigm and online learning to online stochastic CP

Joint work with Shipra Agrawal, Microsoft Research.

Guy Even
Improved Deterministic Online Packet-Routing Algorithms on Grids

We consider the following online packet routing problem. An adversary injects packets arbitrarily at sources, each packet with an arbitrary route. Each node has some buffer space, and each link has some capacity. An execution progresses in synchronous rounds: In each round each packet is either sent to the next node on it route (subject to link capacity constraints), or is stored in its current node (subject to buffer capacity constraints), or dropped. The goal is to maximize the goodput, i.e., the number of packets delivered at their destination.


Our main result is an $O(\log^{3/2} n)$-competitive deterministic algorithm for the special case of a line of $n$ nodes, using buffers of size at least 5 and assuming link capacity at least 5. The best previous algorithm was $O(\log^5 n)$-competitive. We note that if the algorithm is allowed to accept packets within $O(\log n)$ steps (not immediately), then the competitive ratio drops to $O(\log n)$.

Joint work with Moti Medina and Boaz Patt-Shamir

Moritz Hardt
The Reusable Holdout: Preserving Validity in Adaptive Data Analysis

A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of holdout sets and sophisticated cross-validation techniques, to procedures for
controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of science: the theory assumes a fixed collection of hypotheses to be tested, or learning algorithms to be
applied, selected non-adaptively before the data are gathered, whereas science is by definition an adaptive process, in which data are shared and re-used, and hypotheses and new studies are generated on the basis of data exploration and previous outcomes.

Surprisingly, the challenges of adaptivity can be addressed using insights from differential privacy, a field of study supporting a definition of privacy tailored to private data analysis. As a corollary we show how to safely reuse a holdout set a great many times without undermining its validation power, even when hypotheses and computations are chosen adaptively. Armed with this technique, the analyst is free to explore the data ad libitum, generating and evaluating hypotheses, verifying results on the holdout, and backtracking as needed.

Joint work with Cynthia Dwork, Vitaly Feldman, Toni Pitassi, Omer Reingold and Aaron Roth

Sungjin Im
Competitive Scheduling from Competitive Equilibria

In this talk, we introduce a general scheduling problem that we term the Packing Scheduling problem (PSP). In this problem, jobs can have different arrival times and sizes, and the rates at which a scheduler can process jobs are subject to arbitrary packing constraints. The PSP framework captures a variety of scheduling problems, including the classical problems of heterogeneous machines scheduling, broadcast scheduling, and scheduling jobs of different parallelizability. It also captures scheduling constraints arising in diverse modern environments ranging from individual computer architectures to data centers. More concretely, PSP models multidimensional resource requirements and parallelizability, as well as network bandwidth requirements found in data center scheduling. We design non-clairvoyant online algorithms for PSP and its special cases, meaning that they make scheduling decisions without knowing the future jobs, or even outstanding jobs sizes until their completion. Our algorithms, which are inspired by equilibria in game theory, minimize the average completion time approximately while processing jobs fairly.

This is joint work Janardhan Kulkarni and Kamesh Munagala.

Martin Jaggi
Communication-Efficient Distributed Dual Coordinate Ascent

Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. We propose a communication-efficient framework, COCOA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of distributed online algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to state-of-the-art mini-batch versions of SGD and SDCA algorithms, COCOA converges to the same .001-accurate solution quality on average 25× as quickly. http://arxiv.org/abs/1409.1458


joint work with Virginia Smith, Martin Takáč, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I. Jordan

Satyen Kale
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classication problems and achieves the statistically optimal regret guarantee with only O*(sqrt{KT}) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes.

Wouter Koolen
Learning the Learning Rate for Prediction with Expert Advice

Most standard algorithms for prediction with expert advice depend on a parameter called the learning rate. This learning rate needs to be large enough to fit the data well, but small enough to prevent overfitting. For the exponential weights algorithm, a sequence of prior work has established theoretical guarantees for higher and higher data-dependent tunings of the learning rate, which allow for increasingly aggressive learning. But in practice such theoretical tunings often still perform worse (as measured by their regret) than ad hoc tuning with an even higher learning rate. To close the gap between theory and practice we introduce an approach to learn the learning rate. Up to a factor that is at most (poly)logarithmic in the number of experts and the inverse of the learning rate, our method performs as well as if we would know the empirically best learning rate from a large range that includes both conservative small values and values that are much higher than those for which formal guarantees were previously available. Our method employs a grid of learning rates, yet runs in linear time regardless of the size of the grid.


Janardhan Kulkarni

Non-clairvoyant Scheduling to Minimize Weighted Flow-time.


In this talk, I will describe the recent progress made on the problem of minimizing weighted flow-time in the online non-clairvoyant scheduling model, arguably the most general model of machine scheduling.  In this problem, a set of jobs $J$ arrive over time to be scheduled on a set of $M$ machines. Each job $j$ has processing length $p_j$, weight $w_j$, and is processed at a rate of $\ell_{ij}$ when scheduled on machine $i$. The online scheduler knows the values of $w_j$ and $\ell_{ij}$ upon arrival of the job, but is not aware of the quantity $p_j$.  We present the first online algorithm that is scalable ($(1+\eps)$-speed $O(\frac{1}{\epsilon^2})$-competitive for any constant $\eps > 0$) for the total weighted flow-time objective. No non-trivial results were known for this setting, except for the most basic case of identical machines. This result resolves a major open problem in online scheduling theory.  Moreover, we also show that no job needs more than a logarithmic number of migrations. Next, I will sketch how the framework can be easily extended to the objective of minimizing total weighted flow-time plus energy cost for the case of  unrelated machines.


The key algorithmic idea is to let jobs migrate selfishly until they converge to an equilibrium. Towards this end, we define a game where each job's utility which is closely tied to the instantaneous increase in the objective the job is responsible for, and each machine declares a policy that assigns priorities to jobs based on when they migrate to it, and the execution speeds. This has a spirit similar to coordination mechanisms that attempt to achieve near optimum welfare in the presence of selfish agents (jobs).


The work was done in collaboration with Kamesh Munagala, Kirk Purhs and Sungjin Im and appeared in FOCS 2014.


Yishay Mansour
Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

We present and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and interpolates between the well studied full-information setting (where all losses are revealed) and the bandit setting (where only the loss of the action chosen by the player is revealed). We provide several algorithms addressing different variants of our setting, and provide tight regret bounds depending on combinatorial properties of the information feedback structure.

Joint work with, Noga Alon, Nicolo Cesa-Bianchi, Claudio Gentile, Shie Mannor, Ohad Shamir

Vahab Mirrokni
Recent advances in online stochastic allocation problems In this talk, I will discuss recent advances in online stochastic allocation from theoretical and practical standpoint.

Specifically, I will discuss recent results about simultaneous or robust adversarial and stochastic allocation, and time-permitting describe a recent result about online submodular welfare maximization on a random order.


Marco Molinaro
How the Experts Algorithm Can Help Solve LPs Online

We consider the problem of solving packing/covering LPs online, when the columns of the

constraint matrix are presented in random order. This problem has received much attention: the main open question is to figure out how large the right-hand sides of the LPs have to be (compared to the entries on the left-hand side of the constraint) to get (1 + eps)-approximations online? It is known that the RHS has to be Omega(eps^(-2) log m) times the left-hand sides, where m is the number of constraints.


We give a primal-dual algorithm to achieve this bound for all packing LPs, and also for a class of mixed packing/covering LPs. Our algorithms construct dual solutions using a regret-minimizing online learning algorithm in a black-box fashion, and use them to construct primal solutions. The adversarial guarantee that holds for the constructed duals help us to take care of most of the correlations that arise in the algorithm; the remaining correlations are handled via martingale concentration and maximal inequalities. These ideas lead to conceptually simple and modular algorithms, which we hope will be useful in other contexts.


This is joint work with Anupam Gupta.


Adi Rosen
Online Computation with Advice

In this talk we will give a short overview of the relatively recent framework of online computation with advice. We will review the motivation and a couple of variants of the model proposed in the literature, which are aimed to give a general framework to study the interplay between the amount of available information about the future, and the attainable competitive ratio. We will give a number of examples of results obtained in recent years, and discuss some open problems in this area.

Aaron Roth
Adaptively Learning Tolls to Induce Target Flows

We consider a natural problem, recently introduced by Bhaskar, Ligett, Schulman, and Swamy: a city has a fixed road network, and would like to set tolls on the roads to induce some target traffic flow at equilibrium. However, the city does not know the drivers' demands, nor the latency functions of the edges. How can the city adaptively set tolls, as a function of observed flows, to quickly induce the target equilibrium? Bhaskar at all showed how to solve this problem when the unknown latency functions are known to be convex polynomials of fixed degree: they set up a convex program with variables for the unknown coefficients of the polynomial latency function, and show that the induced player flows can be used to build a separation oracle, to use together with the Ellipsoid algorithm to solve this convex program.

In this work, we show how to significantly generalize and simplify this result using techniques from online learning. We give a simple, decentralized tatonnement process (which can be used to adjust tolls independently at each edge, without the need for communication) that quickly causes the induced flow to converge to the target. Moreover, our method requires almost no assumptions on the form of the latency functions: they do not need to be polynomials, do not need to be convex (nor indeed have any concise representation), or even be differentiable (or Lipschitz!) -- we require only that they be increasing functions. Our solution frames the problem as solving for an equilibrium in a game that is strategically equivalent to a zero sum game, which allows us to leverage the powerful properties of no-regret algorithms to converge to the minmax solutions of such games, without the need to know any details about the payoff function.

Joint work with Jon Ullman and Steven Wu.

Yevgeny Seldin
Interpolation between various online learning settings

Online learning problems can be characterized by several key features, including the amount of feedback (full information, bandit feedback, partial monitoring, etc.); adversariality of the environment ("stochastic" regime, adversarial regime, adaptive adversaries, etc); and structural richness (stateless problems, problems with side information, MDPs, etc.). There is a significant amount of research on isolated settings in this huge space, for example, "prediction with expert advice", "stochastic multiarmed bandits", "adversarial multiarmed bandits", and so on. In most situations algorithms developed for one problem, say, stochastic multiarmed bandits, are either unsuitable or suboptimal for any other problem. We present a number of algorithms that achieve the optimal performance (up to logarithmic factors) in a number of different regimes and the optimal performance in intermediate regimes. This includes one algorithm that achieves the optimal performance in both stochastic and adversarial multiarmed bandits. The same algorithm achieves the optimal performance in two intermediate regimes, moderately contaminated stochastic regime and adversarial regime with a gap. We also present two other algorithms that achieve the optimal performance on a continuous range of problems extending from prediction with expert advice to adversarial multiarmed bandits and further on into partial monitoring games. The results naturally lead to a number of challenging open questions: (1) how far can we push the "general algorithms" line - is there a limit on the number of different settings that can be solved with one algorithm? (2) is there a price to be paid for "generality"? (the optimal performance is achieved up to logarithmic factors, but are these factors inevitable and if yes, how to derive lower bounds establishing that no single algorithm can achieve the optimal performance in several different regimes without paying a price).

Based on joint work with Aleksandrs Slivkins, Peter Bartlett, Koby Crammer, and Yasin Abbasi-Yadkori.

Ohad Shamir
Fundamental Limits of Online (and Distributed) Algorithms for Statistical Learning

Many machine learning approaches are characterized by information constraints on how they interact with training data. A prominent example is online learning, where examples are processed one-by-one, often with a space-efficient algorithm (i.e. not all examples can be memorized), while other examples include online learning with partial information (e.g. multi-armed bandits), and communication constraints in distributed learning. The effect of space and communication constraints are well-explored in the algorithms community, but there are surprisingly few results for statistical learning problems, where one assumes i.i.d. data. In this talk, I'll discuss these issues, and describe how a single set of results sheds some light on them, simultaneously for different types of information constraints and for different online learning problems. 

Karthik Sridharan
Online Learning with Partial Information Through Relaxations

We provide an algorithm design principle/framework for designing learning algorithms for online learning algorithms with partial information as feedback. The algorithms are based on relaxation for partial information learning problems. The notion of relaxations for designing online learning algorithms in the full information/feedback setting was explored in earlier joint work with Alexander Rakhlin and Ohad Shamir. In that work, it was shown that  most existing online learning algorithms and many new ones for the full information learning setting can be derived in a principled way through the idea of building recursive mappings called relaxations and solving on each round an optimization problem that looks like approximate dynamic programming. In this talk we extend this framework to deal with online learning with partial information/feedback. This includes problems like bandit problems, contextual bandit problems and online sparse regression with sparse feedback.

This work is joint with Alexander Rakhlin

Ambuj Tewari

Some problems in online learning to rank


We consider online algorithms for learning to rank objects (such as documents in information retrieval or products in recommender systems). In a setting with covariates/features, we derive a perceptron-like algorithm that updates only on rounds where a non-zero loss in incurred. Under a margin condition, we show that the cumulative ranking loss of the perceptron-like algorithm is finite. Our result holds for ranking losses derived from popular ranking measures such as NDCG (Normalized Discounted Cumulative Gain) and MAP (Mean Average Precision). We also explore a partial monitoring setting, without covariates/features, where the learner is only given feedback on the top item in the rankings it produces. We give regret guarantees for unnormalized versions of NDCG, AUC, and MAP. Surprisingly, we can also show that the normalized versions do not admit no-regret algorithms under the top-1 feedback model.

Tim van Erven
Tuning the learning rate: contradictory results (discussion topic)

Both in prediction with expert advice (PwEA) and in online convex optimization (OCO), there has been significant interest in designing algorithms that exploit easy problem instances, for which faster than O(sqrt(T)) rates for the regret can be obtained.


In PwEA, recent work has focused on second-order bounds, which depend on the variance of some quantity on the data. To exploit small variance and get fast rates, it has been found that *very large* learning rates are necessary.


In OCO, research has focused on the degree of strong convexity of the objective function. To exploit more convexity and get fast rates, it has been found that *very small* learning rates are necessary.


Thus, there is an apparent contradiction between the learning rates that we need for fast rates in PwEA and in OCO. But there cannot be a real contradiction, because PwEA is a special case of OCO (with a linear objective function)! So it must be possible to reconcile both worlds and obtain a unified understanding that takes both the variance and the

convexity into account. I would like to raise the question of how to proceed towards

such a unified understanding.