Software

Biocaml is an OCaml library for bioinformatics. It is meant to be the OCaml analog to BioPerl, BioPython, etc. This library was originally used for the modENCODE Project at Yale, and has since been used on a variety of other projects at NYU and elsewhere. The API is well documented and many additional features are planned.

At NYU, I managed and co-authored (with Sebastien Mondet) several software projects. Two code bases are in production use: hitscore operates all computational aspects of the Genomics Core Facility and th17 automates many of the analyses for the Th17 Project in the Bonneau Lab. All software is implemented in OCaml for the full stack: interfacing to databases, managing millions of files, launching jobs on HPC clusters, running analysis pipelines, and displaying results on websites. Our websites are built on various Ocisgen projects, such Lwt, Js_of_ocaml, and Eliom. For the Gunsalus Lab, I wrote utrome, which organized and analyzed several modENCODE datasets. We also developed sequme, an internal library containing miscellaneous functions that didn’t fit anywhere else. None of this code is documented nor particularly useful outside of NYU. Nonetheless it is open source.

Our code on Mechanizing Mathematics has been distributed with our publications on this topic. I originally started this work in SML, we then switched over to OCaml, and Sooraj Bhat, the lead PhD student on this project re-implemented parts of it in Coq.

TranxPipe is a pipeline, implemented in OCaml, for analyzing microarray data for the modENCODE Project. It consists of data management—e.g. storing data in a controlled directory structure and tracking meta-data in JSON files—conducting various analyses—e.g. segmenting transcription signals, computing gene expression levels, finding the targets of transcription factors, computing ROC curves—and finally generating output reports in html and plain text formats.

AdaptTo (written in SML) and Adaxo (written in Haskell) are data analyzers for PerTrac and Advent Axys output, two software used in the finance industry. Data is exported from these tools, and my software conducts some of the further analysis that this company uses in a proprietary fund ranking strategy. This software was in production use without modification for 6 years.

I was briefly involved in the Nettle Project, which aims to provide high level control and configuration of computer networks. This is realized by embedding a domain specific language in Haskell. I designed a collection of types to represent messages of the OpenFlow protocol, wrote binary serializers for transmitting and reading the messages, and set up some of the infrastructure for running code in virtualized environments based on Netkit, Open vSwitch, and OpenFlow’s own reference implementation.