Here are the abstract and slides for my talk at IBM PL Day.
Title: Mechanizing Optimization and Statistics
Abstract:
Scientific and engineering investigations are formalized most often in the language of numerical mathematics. The tools supporting this are numerous but disparate, leading to sub-optimal use of existing mathematical theory. We present a unifying framework by taking a programming languages based approach to this problem. Our richly typed language allows naturally declaring optimization and statistics problems, and a library of transformations allows users to interactively compile input problems to solvable forms. We implement our system as a domain specific language embedded in OCaml. Here, we focus on three features: disjunctive constraints, measure types and random variables, and indexing.
By disjunctive constraints, we mean disjunctions over propositions on reals, e.g. \(x \leq w \vee x \geq w + 4.0\). The usual solution strategy involves converting these into mixed-integer linear programming (MILP) constraints using the big-M, convex-hull, or other methods. Automation is clearly needed because these are algebraically tedious and manual application limits them to experts. We provide the first robust implementations and compare our results with that of ILOG CPLEX.
Statistics is increasingly important due to the increasing amount of data generated in the sciences. We introduce language features that enable declarative expression of statistical models and estimation problems. A type ‘prob T’ characterizes probability measures over type T, a special let binding introduces random variables, and some standard measures (e.g. Normal, Gaussian) can be used to construct more complex ones. We demonstrate with an example how our software facilitates exploring the large space of algorithms for solving statistical problems.
Finally, matrices are accepted canonical forms in mathematics, but practitioners employ a more flexible indexing notation: e.g. \(\forall i \in \{A,B,C\} \quad x_i \leq w_i\). Especially in optimization, this need is so critical that virtually every tool supports it. However, indexing has been treated as a mere syntactic convenience and is eliminated at parse time. We present a dependently typed theory that enables far richer index sets to be expressed. Importantly, our theory brings indexing into the formal realm, providing an O(n) to O(1) reduction in memory requirements and the potential for a corresponding computational improvement.
I’ve added my slides. Vijay Saraswat was especially interested in the indexing work, which has some features he felt could enhance X10. I enjoyed hearing all the other great talks and learned some new things about parallel programming.