# Schedule for: 19w5091 - Multi-Stage Stochastic Optimization for Clean Energy Transition

Arriving in Oaxaca, Mexico on Sunday, September 22 and departing Friday September 27, 2019

Sunday, September 22 | |
---|---|

14:00 - 23:59 | Check-in begins (Front desk at your assigned hotel) |

19:30 - 22:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

20:30 - 21:30 | Informal gathering (Hotel Hacienda Los Laureles) |

Monday, September 23 | |
---|---|

07:30 - 08:45 | Breakfast (Restaurant at your assigned hotel) |

08:45 - 09:00 | Introduction and Welcome (Conference Room San Felipe) |

09:00 - 09:30 | Roundtable (Conference Room San Felipe) |

09:30 - 10:00 | Energy transition panorama and challenges (Conference Room San Felipe) |

10:00 - 10:30 |
Jyoti U. Devkota: Results from Sample Surveys of Renewable Energy Users of Nepal ↓ Nepal is an agriculture based society. According to 2011 Census, 64% of the households use wood/fire wood for meeting their energy needs. With an aim of conducting a detailed study of energy consumption dynamics of rural Nepal, three surveys were conducted. A survey of 400 households of biogas consumers, 300 households of national grid electricity users and 51 households of micro hydro project users was conducted. This presentation is based on results obtained from these surveys. Conduction of such surveys has a great significance for country like Nepal; as it is without a strong backbone of good quality official data. Remote geographical locations, lack of awareness and lack of incentives are the main reasons behind this plight of official statistics. This survey generated more than 350 multivariate data. In these surveys, the base questionnaire is the same. It is a structured questionnaire with answers given as multiple choices that resulted in categorical data. This categorical data could be analyzed on ordinal scale. Because of large sample size this could be treated as a continuous data by the application of central limit theorem. The questionnaire was pretested. Multivariate statistics is used here to quantify and explain the relationship between these variables. Consumer profile databases were constructed. Data based research is evidence based research. Evidence based research is objective and undisputable. This presentation highlights this evidence based approach of handling an issue. As what gets measured also gets done. (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Andy Philpott: Multistage Stochastic Capacity Planning Using JuDGE ↓ Julia Dynamic Generation Expansion (JuDGE) is a Julia package for solving stochastic capacity expansion problems formulated in a “coarse-grained “ scenario tree that models long-term uncertainties. The user provides JuDGE with a coarse-grained tree and a JuMP formulation of a stage problem to be solved in each node of this tree. JuDGE then applies Dantzig-Wolfe decomposition to this framework based on the general model of Singh et al. (2009). The stage problems are themselves single-stage capacity expansion problems with integer capacity variables, but quite general constraints that can model, for example, operations in random environments, or even equilibrium constraints, as long as they can be solved exactly (e.g. via reformulation as mixed integer programs).
This presentation outlines the theoretical background for JuDGE, and shows the results of applying it to several problem instances:
i. a knapsack problem with expanding capacity;
ii. optimal capacity expansion in an electricity distribution network subject to reliability constraints;
iii. national capacity expansion to meet renewable energy targets;
iv. optimal transmission expansion for an electricity wholesale market with imperfectly competitive agents.
References
Singh, K., Philpott, A.B. and Wood, K., Dantzig-Wolfe decomposition for solving multi-stage stochastic capacity planning problems, Operations Research, 57, 1271-1286, 2009. (Conference Room San Felipe) |

11:45 - 12:30 |
Michel Gendreau: A Hybrid Dynamic Programming-Tabu Search Approach for the Long-Term Energy Planning Problem ↓ We consider the long-term energy planning of an extensive hydroelectric power system. The problem ultimately aims at evaluating the impact of additional firm load contracts on the energy reliability of the system and the future revenues for the next fifteen years, taking into account the uncertainty of future energy inflows. Energy inflows are obtained from water inflows using an energy aggregation process and therefore behave like hydrological models. Various forms of long-term persistence assumptions can be modeled in the energy inflow process using hydrological models. However, these forms can be challenging to include in the framework required by state-of-the-art methods. We propose a method combining stochastic dynamic programming and Tabu Search approaches to solve the long-term energy-planning problem without the need to assume a prior form for the long-term persistence of future energy inflows.
We compare the policies resulting from this hybrid method with policies obtained from two others versions of stochastic dynamic programming known to explicitly handle persistence of inflows: one with an additional state variable and the other coupled with a Hidden Markov Model. The results show the effectiveness of the hybrid method in long-term persistence cases. (Conference Room San Felipe) |

12:30 - 13:00 |
Rüdiger Schultz: Towards a Decomposition Method for Linear Multi-Stage Stochastic Integer Programs with Discrete Distributions ↓ Thanks to linear programming duality, the classical linear stochastic`
program with recourse, as introduced individually by Beale and Dantzig
in 1955, obeys convexity and duality properties facilitating the
development of theory and the design of algorithms considerably.\\
Practical needs in operations research and elsewhere quickly drove the
modeling beyond the convex case. For instance, $0$-$1$ decisions and
related integer variables to model switching and indivisibility or
nonlinearites of physical nature such as
squared-differences of square expressions marking drop of potential
related to flow along the pipes.\\
In the talk extensions of the classical model concerning all key
ingredients, namely the measure, the integrand, the second-stage
optimization problem, and the nonanticipative first-stage variable will
be put into perspective. Since convexity is lost instantly when
extending the model, identification of proper mathematical alternatives
is becoming crucial. \\
A promising source for such alternatives is provided by computer
algebra which opens up a rich world of approaches in mathematical
structures not having been in the research focus until very
recently. Foremostly, it is Ideal Theory that has been addressed in
this respect.
In the talk, some first successful approaches using algebraically
motivated structures and methods will be addressed. (Conference Room San Felipe) |

13:20 - 13:30 | Group Photo (Hotel Hacienda Los Laureles) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

16:00 - 16:30 | Coffee Break (Conference Room San Felipe) |

16:30 - 17:15 |
Michael C. Ferris: Computation in Markets with Risk ↓ We consider modeling of coupled systems using an equilibrium framework.
The systems are coupled by market constraints and can be effectively modeled in the EMP framework.
Standard and new mechanisms to model resulting complementarity constraints will be outlined,
and the agents problems will be formulated to allow risk preferences.
Some computational results related to energy systems and other applications will be given. (Conference Room San Felipe) |

17:15 - 17:45 |
Napat Rujeerapaiboon: A Day-Ahead Decision Rule Method for Multi-Market Multi-Reservoir Management ↓ Peak/off-peak spreads in European electricity spot markets are eroding due to the nuclear phaseout and the recent growth in photovoltaic capacity. The reduced profitability of peak/off-peak arbitrage forces hydropower producers to participate in the reserve markets. We propose a two-layer stochastic programming framework for the optimal operation of a multi-reservoir hydropower plant which sells energy on both the spot and the reserve markets. The backbone of this approach is a combination of decomposition and decision rule techniques. Numerical experiments demonstrate its effectiveness. (Conference Room San Felipe) |

17:45 - 18:15 |
Julio Deride: A computation strategy for a two-stage stochastic equilibrium problem ↓ In this talk, we present a problem of strategic planning for capacity investment and production under uncertainty, in a competitive market. We model it as a general equilibrium problem, i.e., a collection of multi-agent optimization problems with an equilibrium constraint (supply meets demand), and we propose a solution method based on a scheme that considers: i) a stagewise-decomposition procedure using Progressive Hedging, ii) a representative agent representation, and iii) a decomposition-type method for its solution, such as the Alternating Direction Method of Multipliers (ADMM). We illustrate the numerical performance of the algorithm by solving a stochastic infrastructure planning problem for the electric vehicule fast-charging station problem over a small network. (Conference Room San Felipe) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Tuesday, September 24 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 09:45 | Alejandro Jofre: Massive Entry of Nonconventional Renewal Energies, Strategic Behavior and Risk Analysis for Network Electricity Markets (Conference Room San Felipe) |

09:45 - 10:30 |
Darinka Dentcheva: Statistical Estimation of Composite Risk Functionals ↓ We analyze composite functionals representing distributional characteristics of random data. The functionals depend on the decision maker’s choice when used as objectives in optimization problems. Very frequently, models of risk are non-linear with respect to the underlying distributions, however, we can represent them as structured compositions. Composite functionals also arrise in the context of machine learning problems.
We consider the use of smooth estimators with particular attention being paid to kernel estimators for composite functionals and for the optimal value of optimization problems using those as objectives. Strong law of large numbers for the estimators, for the optimal values and the optimal solutions are established under mild conditions on the functions involved. Central limit theorems for the estimated composite functionals and the optimal value of composite optimization problems are presented as well. We compare the performance of the estimators to the empirical estimators numerically. Several popular risk measures are discussed as illustrative examples.
While we show that many known coherent measures of risk can be cast in the presented structures, we emphasize that the results are of more general nature with a wider applicability. Applications of the results to hypothesis testing of stochastic orders and portfolio efficiency are outlined. (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Andrzej Ruszczynski: Risk-Averse Optimization and Control of Partially Observable Systems ↓ We introduce the concept of a risk form, which is a real functional on the product of two spaces: the space of measurable functions and the space of measures on a Polish space. We present a dual representation of risk forms and generalize the classical Kusuoka representation to this setting. For a risk form acting on a product space, we define marginal and conditional forms and we prove a disintegration formula, which represents a risk form as a composition of its marginal and conditional forms. We apply the proposed approach to two-stage optimization problems with partial information and decision-dependent observation distribution. Next, we consider risk measurement in controlled partially observable Markov systems in discrete time. In such systems, part of the state vector is not observed, but affects the transition kernel and the costs. We introduce new concepts of risk filters and study their properties. We also introduce the concept of conditional stochastic time consistency. We derive the structure of risk filters enjoying this property and prove that they can be represented by a collection of law invariant risk measures on the space of function of the observable part of the state. We also derive the corresponding dynamic programming equations. (Conference Room San Felipe) |

11:45 - 12:05 |
Thomas Martin: Towards a stochastic dynamic formulation of procurement problems ↓ Procurement is the way a company acquires the ressources needed for its activity: components, primary ressources and/or natural ressources. With goods available on a large scale market, companies often need to either buy what is available at the moment, or wait for later opportunities. In our work we try to leverage the stopping time like structure of such problems to scale them, and add more complexity in the models. Naturally, what a company buys has a direct impact on what it sells. We first investigate the case in which selling prices are stochastic, before adding stochasticity and dynamics in the buying process. Finally, we discuss replacing the mathematical expectation by a risk measure to capture the possible risk aversion of a company. (Conference Room San Felipe) |

12:05 - 12:15 |
Cyrille Vessaire: Optimisation of reservoir development and design under uncertainty ↓ In this talk, I presented my thesis' subject. Consider given resources in a reservoir. How can we design a network to extract it while minimizing an economic indicator in a context of high uncertainty? (Conference Room San Felipe) |

12:15 - 13:00 |
Michael Ludkovski: Stochastic Control with Local Probabilistic Constraints for Microgrid Management ↓ We investigate microgrid management where the controller tries to optimally dispatch a diesel generator as backup to primary renewable sources while maintaining low probability of blackouts. Dispatch takes place at discrete epochs (15 min in our example), while balancing takes place continuously, so only probabilistic guarantees are possible. Moreover, the likelihood of a blackout during the next dispatch period is not available analytically and can only be estimated. We formulate the problem as stochastic control where the Bellman equation features local probabilistic constraints that lead to an implicit state-dependent admissible control set. To tackle this challenge we develop novel Monte Carlo based algorithms, in particular empirical simulation procedures for learning the admissible control set as a function of system state. We propose a variety of relevant statistical tools including logistic regression, Gaussian process regression, quantile regression and support vector machines, which we then incorporate into an overall Regression Monte Carlo (RMC) framework for approximate dynamic programming. Our results indicate that using logistic or Gaussian process regression to estimate the admissibility probability outperforms the other options. Our algorithms offer an efficient and reliable extension of RMC to probability-constrained control. We illustrate our findings with two case studies for the microgrid setup with time-stationary and daily-seasonal net load dynamics. (Conference Room San Felipe) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

16:00 - 16:30 | Coffee Break (Conference Room San Felipe) |

16:30 - 17:15 |
Michel De Lara: An Overview of Decomposition-Coordination Methods in Multistage Stochastic Optimization ↓ Multistage stochastic optimization problems are, by essence, complex because
their solutions are indexed both by stages (time) and by uncertainties
(scenarios). Quite often, solutions are also indexed by decision units,
like nodes in a graph (space), or agents in a team.
Hence, their large scale nature makes decomposition methods appealing.
We present, in an unified framework, three main approaches and methods
to decompose multi-stage stochastic optimization problems for numerical
resolution:
time decomposition (and state-based resolution methods,
like Stochastic Dynamic Programming, in Stochastic Optimal Control);
scenario decomposition (like Progressive Hedging in Stochastic Programming);
spatial decomposition (price or resource decompositions).
We show how writing a dynamic programming equation on the increasing sets
of histories paves the way for state reduction at specified stages; this
makes it possible to develop what we call time block decomposition.
We also show how price or resource decompositions quite naturally provide
decomposed lower and upper bounds for minimization problems.
Finally, we point to some mathematical questions raised by the mixing
(blending) of different decompositions methods to tackle large scale problems.
We hint at the potential of blending for the management of new energy systems
(smart grids), as they will be developed in the next two talks. (Conference Room San Felipe) |

17:15 - 17:45 |
Jean-Philippe Chancelier: Mixing Time Blocks and Price/Resource Decompositions Methods ↓ We provide a method to decompose multistage stochastic optimization problems by
time blocks. This method is based on reducing the so-called history space
using a compressed ``state'' variable. It leads to a reduced dynamic
programming equation. Then We apply the reduction method by
time blocks to two time-scales stochastic optimization problems
arising from long term storage management of batteries.
We present a stochastic optimization model aiming at minimizing the investment
and maintenance costs of batteries for a house with solar panels. For any given
capacity of battery it is necessary to compute a charge/discharge strategy
as well as maintenance to maximize revenues provided by intraday energy
arbitrage while ensuring a long term aging of the storage devices. Long term
aging is a slow process while charge/discharge control of a storage handles
fast dynamics. For this purpose, we have designed algorithms that take into
account this two time scales aspect in the decision making process.
We show on instances with huge time steps how one
of our algorithm can be used for the optimal sizing of a storage taking into
account charge/discharge strategy as well as aging.
Numerical results show that it is economically significant to control aging.
We also compare our algorithms to SDP and Stochastic Dual Dynamic Programming
on small instances and we observe that they are less computationally costly while displaying similar performances on the control of a storage. (Conference Room San Felipe) |

17:45 - 18:30 |
Pierre Carpentier: Mixing Dynamic Programming and Spatial Decomposition Methods ↓ We consider a stochastic optimization problem in which different units
are connected together via a network. Each unit is a (small) control
system, located at a node. Each unit state evolution is affected by
uncertainties and controls of the neighboring nodes transmitted through
edges. Static constraints couple all units at each time. We formulate
the associated global stochastic optimization problem. We propose two
decomposition methods, whether we decouple the constraints by prices
or by resources. We show that the optimal value of the global problem
can be bounded above by a sum of resource-decomposed nodal value,
and below by a sum of price-decomposed nodal value. We provide
conditions under which these nodal values can be computed by dynamic
programming. We illustrate these results with numerical studies that
tackle the optimization of urban micro-grids of large size. Finally,
we introduce two different information structures for these microgrids,
namely the centralized and the decentralized ones, and we analyse the
lower and upper bounds when considering these information structures. (Conference Room San Felipe) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Wednesday, September 25 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 09:45 |
R.Terry Rockafellar: Progressive Hedging in Nonconvex Stochastic Programming ↓ The progressive hedging algorithm minimizes an expected "cost" by
iteratively decomposing into separate subproblems for each scenario.
Up to now it has depended on convexity of the underlying "cost"
function with respect to the decision variables and the constraints on
them. However, a new advance makes it possible to obtain convergence
to a locally optimal solution when the procedure is executed close
enough to it and a kind of second-order local sufficiency condition is
satisfied.
Besides applications in which costs and associated constraints may
directly be nonconvex, there are applications to stochastic
programming problems in which those are convex but the probabilities
for the scenarios may be decision-dependent. For example, in a
two-stage problem the probabilities in the recourse stage could be
influenced by the first-stage decision. (Conference Room San Felipe) |

09:45 - 10:30 |
Claudia Sagastizabal: Algorithms for Two-Stage Stochastic Optimization Problems ↓ For nonconvex optimization problems with nonlinear constraints,
possibly nonsmooth,
a convergent primal-dual solution algorithm is proposed.
The approach applies a proximal bundle method to a dual problem
that arises in the context of
generalized augmented Lagrangians and that yields zero duality gap.
The methodology is tailored so that
Lagrangian subproblems can be solved inexactly without hindering the
primal-dual convergence properties of the algorithm.
Primal convergence is ensured even when the dual solution set is empty. (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Tito Homem-de-Mello: Effective Scenarios in Multistage Stochastic Programs ↓ Traditional stochastic programs optimize the expected value of some function that depends on the decision variables as well as on some random variables that represent the uncertainty in the problem. Such formulations assume that the probability distribution of those random variables is known. However, in practice the probability distribution oftentimes is not known or cannot be accurately approximated. One way to address such ambiguity is to work with distributionally robust stochastic programs (DRSPs), which minimize the worst-case expected value with respect to a set of probability distributions. In this presentation we discuss some recent advances in the research on DRSPs. In particular, we study the question of how to identify the critical scenarios obtained after solving a multistage DRSP. Recent research has studied the notion of effective scenarios for static distributionally robust stochastic programs. Roughly speaking, a scenario is deemed effective if its removal changes the optimal value of the problem. We discuss the extension of these ideas to the case of multistage stochastic programs. Such extension requires proper ways of defining the meaning of removing a scenario in a dynamic context. Computational and analytical results show that identifying such effective scenarios may provide useful insight on the underlying uncertainties of the problem. (Conference Room San Felipe) |

12:00 - 13:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

13:00 - 19:00 | Free Afternoon (Oaxaca) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Thursday, September 26 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 09:45 |
Alexander Shapiro: Computational Approaches to Solving Multistage Stochastic Programs ↓ In this talk we discuss computational approaches to solving convex stochastic programming problems. We start with a discussion of sample complexity of solving static problems and argue that this is essentially different from sample complexity of solving multistage programs. In some applications the considered multistage stochastic programs have a periodical behavior. We demonstrate that in such cases it is possible to drastically reduce the number of stages by introducing a periodical analog of the so-called Bellman equations, used in Markov
Decision Processes and Stochastic Optimal Control. Furthermore, we describe a variant of
the Stochastic Dual Dynamic Programming algorithm, applied to the constructed periodical
Bellman equations, and show numerical experiments for the Brazilian interconnected
power system problem. (Conference Room San Felipe) |

09:45 - 10:30 |
Johannes Royset: Diametrical Stochastic Optimization with Application to Statistical Learning ↓ It is well known that Sample Average Approximations (or Empirical Risk Minimization) can lead to arbitrarily poor solutions and slow learning when the objective function is poorly behaved. The presentation describes a surprisingly simple remedy that we coin Diametrical Stochastic Optimization (Diametrical Risk Minimization) . In contrast to common robustification strategies based on perturbing the data set and probability distribution, our approach “diametrically" modifies any solution and thereby obtain theoretical stability guarantees even for poorly behaved functions. We show that in challenging machine learning problems the approach generalizes even if obtained after aggressive minimization of the diametrical risk. (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 11:45 |
Angelos Georghiou: Robust Dual Dynamic Programming ↓ Multi-stage robust optimization problems, where the decision maker can dynamically react to consecutively observed realizations of the uncertain problem parameters, pose formidable theoretical and computational challenges. As a result, the existing solution approaches for this problem class typically determine suboptimal solutions under restrictive assumptions. In this paper, we propose a robust dual dynamic programming (RDDP) scheme for multi-stage robust optimization problems. The RDDP scheme takes advantage of the decomposable nature of these problems by bounding the costs arising in the future stages through lower and upper cost to-go functions. For problems with uncertain technology matrices and/or constraint right-hand sides, our RDDP scheme determines an optimal solution in finite time. If also the objective function and/or the recourse matrices are uncertain, our method converges asymptotically (but deterministically) to an optimal solution. Our RDDP scheme does not require a relatively complete recourse, and it offers deter-ministic upper and lower bounds throughout the execution of the algorithm. We demonstrate the promising performance of our algorithm in a stylized inventory management problem. (Conference Room San Felipe) |

11:45 - 12:30 |
Vincent Leclere: Upper bounds for Stochastic Dual Dynamic Programming ↓ The Stochastic Dual Dynamic Programming (SDDP) algorithm has been used successfully for the past 25 years in the energy industry, especially for mid and long term hydromanagement problem. In essence SDDP is a dynamic programming algorithm that approximate de value function by outer polyhedral approximation. This outer polyhedral approximations yield exact lower bounds (minimization framework), but no upper bounds.
The classical way of obtaining upper bounds consists in simulating the policy over multiple scenarios and estimate the expected cost. This approach has multiple drawbacks. First of the bound is only estimated, hence using it as a stopping test can lead to false positives. The theoretical probability of false positive become very high if it is often tested.
A second way of obtaining upper bounds consists in leveraging the monotonicity of Bellman Operators and the convexity of the value function. More precisely, from an upper bound at time $t+1$ one can compute upper bound at time $t$ on a finite set of points. Convexity allow to extend the upper bound definition on the convex hull of this set.
Finally, we present a third approach consisting in applying SDDP to the Fenchel transform of the value function. The exact lower bound of the Fenchel transform become a deterministic upper bound on the original value function. This approach leads to exact upper bounds converging toward the true value. Incidentally, this also define a new policy with guaranteed expected cost. (Conference Room San Felipe) |

12:30 - 13:00 |
Regan Baucke: SDDP-Like Algorithm for Infinite Horizon Multistage Stochastic Programs ↓ In this talk, I will discuss how recent advances in SDDP methods for
finite horizon problems can be transfer to the infinite horizon setting.
These types of problems arise when modelling energy systems over long
time horizons. We propose an convergent algorithm with several
attractive properties; chiefly no Monte Carlo simulation is required to
obtain an upper bound. (Conference Room San Felipe) |

13:30 - 15:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |

16:00 - 16:30 | Coffee Break (Conference Room San Felipe) |

16:30 - 17:15 |
Bernardo Freitas Paulo da Costa: Stochastic Lipschitz Dynamic Programming ↓ In this talk we present a new algorithm for solving multistage
stochastic mixed integer linear programming (MILP) problems with
complete continuous recourse. A typical example of such problems is
the energy planing with disjunctive operational constraints. Similar
to cutting plane methods, we introduce nonlinear Lipschitz cuts that
are building blocks for lower approximations for non-convex cost-to-go
functions. An example of such cuts are those derived from (exact)
Augmented Lagrangian Duality for MILPs. If one chooses a family of
Lipschitz cuts that is MILP representable, the introduction of these
cuts does not change the class of the original stochastic optimization
problem.
We illustrate the application of this algorithm, comparing our
approach with the convex relaxation of the stagewise problem, for
which we can apply SDDP, and for a discretized approximation, applying
SDDiP (Conference Room San Felipe) |

17:15 - 18:00 |
Aditya Mahajan: Information State (and its Approximations) for Stochastic Control ↓ The standard approach for modeling partially
observed systems is to model them as partially observable
Markov decision processes (POMDPs) and obtain a dynamic
program in terms of a belief state. The belief state formulation
works well for planning but is not ideal for online reinforcement
learning because the belief state depends on the model and, as
such, is not observable when the model is unknown.
In this talk, we present an alternative notion of an informa-
tion state for obtaining a dynamic program in partially observed
models. In particular, an information state is a sufficient
statistic for the current reward which evolves in a controlled
Markov manner. We show that such an information state leads
to a dynamic programming decomposition. Then we present
a notion of an approximate information state and present
an approximate dynamic program based on the approximate
information state. Approximate information state is defined
in terms of properties that can be estimated using sampled
trajectories. Therefore, they provide a constructive method
for reinforcement learning in partially observed systems. (Conference Room San Felipe) |

18:00 - 18:30 | Wrap-up session (Conference Room San Felipe) |

19:00 - 21:00 | Dinner (Restaurant Hotel Hacienda Los Laureles) |

Friday, September 27 | |
---|---|

07:30 - 09:00 | Breakfast (Restaurant at your assigned hotel) |

09:00 - 10:30 | Open discussion (Conference Room San Felipe) |

10:30 - 11:00 | Coffee Break (Conference Room San Felipe) |

11:00 - 12:00 | Workshop wrap-up (Conference Room San Felipe) |

12:00 - 14:00 | Lunch (Restaurant Hotel Hacienda Los Laureles) |