Working Papers
“Dynamically Optimal Treatment Allocation” (with Karun Adusumilli and Friedrich Geiecke)
How should we assign candidates to job training - or, more generally, any treatment? In practice, individuals often arrive sequentially and the planner faces various constraints such as limited budget/capacity, borrowing constraints, or the need to place people in a queue. In such settings involving inter-temporal trade-offs, previous work on devising optimal policy rules in a static context is either not applicable, or is sub-optimal. We show how one can use offline observational data to estimate an optimal policy rule that maximizes expected welfare in this dynamic context. We allow the class of policy rules to be restricted for legal, ethical or incentive compatibility reasons. The problem is equivalent to one of optimal control under a constrained policy class and we exploit recent developments in Reinforcement Learning to propose an algorithm to solve this. We also characterize the statistical regret and find that it decays at a n-1/2 rate in most examples; this is the same rate as in the static case.