Ashish Agarwal

Abelson, Sussman, and Sussman: programming languages are for humans

Posted on July 30, 2010 by ashish

“programs must be written for people to read, and only incidentally for machines to execute”
— Abelson, Sussman, and Sussman (1996)

Posted in Quotes | Leave a comment

Nettle Tech Reports

Posted on July 26, 2010 by ashish

Andi Voellmy and the Nettle Team have released two tech reports describing our work so far.

Donâ€™t Conï¬gure the Network, Program It! Domain-Speciï¬c Programming Languages for Network Systems

Nettle: Functional Reactive Programming for OpenFlow Networks

Posted in Publications | Tagged Computer Networks, Haskell | Leave a comment

Presenting at the CScADS Workshop on Autotuning for Petascale Applications

Posted on July 22, 2010 by ashish

Thanks to Rich Vuduc for inviting me to give a talk at CScADS. Autotuning is an approach for generating efficient code for high performance computing. I’ll try to summarize how my PL work can contribute to and benefit from this approach.

Slides

Posted in News, Presentations | Tagged Statistics, Types | 1 Comment

IBM PL Day 2010

Posted on July 14, 2010 by ashish

Here are the abstract and slides for my talk at IBM PL Day.

Title: Mechanizing Optimization and Statistics
Abstract:

Scientific and engineering investigations are formalized most often in the language of numerical mathematics. The tools supporting this are numerous but disparate, leading to sub-optimal use of existing mathematical theory. We present a unifying framework by taking a programming languages based approach to this problem. Our richly typed language allows naturally declaring optimization and statistics problems, and a library of transformations allows users to interactively compile input problems to solvable forms. We implement our system as a domain specific language embedded in OCaml. Here, we focus on three features: disjunctive constraints, measure types and random variables, and indexing.

By disjunctive constraints, we mean disjunctions over propositions on reals, e.g. \(x \leq w \vee x \geq w + 4.0\). The usual solution strategy involves converting these into mixed-integer linear programming (MILP) constraints using the big-M, convex-hull, or other methods. Automation is clearly needed because these are algebraically tedious and manual application limits them to experts. We provide the first robust implementations and compare our results with that of ILOG CPLEX.

Statistics is increasingly important due to the increasing amount of data generated in the sciences. We introduce language features that enable declarative expression of statistical models and estimation problems. A type ‘prob T’ characterizes probability measures over type T, a special let binding introduces random variables, and some standard measures (e.g. Normal, Gaussian) can be used to construct more complex ones. We demonstrate with an example how our software facilitates exploring the large space of algorithms for solving statistical problems.

Finally, matrices are accepted canonical forms in mathematics, but practitioners employ a more flexible indexing notation: e.g. \(\forall i \in \{A,B,C\} \quad x_i \leq w_i\). Especially in optimization, this need is so critical that virtually every tool supports it. However, indexing has been treated as a mere syntactic convenience and is eliminated at parse time. We present a dependently typed theory that enables far richer index sets to be expressed. Importantly, our theory brings indexing into the formal realm, providing an O(n) to O(1) reduction in memory requirements and the potential for a corresponding computational improvement.

Download slides

Posted in Presentations | Tagged Optimization, Statistics, Types | 1 Comment

I’ll be presenting at IBM PL Day on July 29.

Posted on July 14, 2010 by ashish

Abstract and slides are here.

Posted in News | Tagged Optimization, Statistics, Types | Comments Off

Our paper comparing sequencing and array technologies is online.

Posted on June 18, 2010 by ashish

Click here.

Posted in News | Tagged Bioinformatics | Comments Off

Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays

Posted on June 17, 2010 by ashish

Abstract

Background: Tiling arrays have been the tool of choice for probing an organism’s transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs.

Results: Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of C. elegans. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a “nearest neighbor” classifier applied to array probes; we describe a method for determining potential “black list” regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array analysis. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center.

Conclusions: Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.

Download free from publisher
I made a presentation on this material at the ENCODE/modENCODE Meeting 2009

To my surprise, as of 26-Nov-2010:

Citation
Ashish Agarwal, David Koppstein, Joel Rozowsky, Andrea Sboner, Lukas Habegger, LaDeana W. Hillier, Rajkumar Sasidharan, Valerie Reinke, Robert H. Waterston, Mark Gerstein (2010). Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays, BMC Genomics 11(383):1-16.

Posted in Publications | Tagged Bioinformatics | Comments Off

Strunk and White on colons, commas, semicolons, dashes, and parentheses

Posted on June 9, 2010 by ashish

“The colon has more effect than the comma, less power to separate than the semicolon, and more formality than the dash.”

“A dash is a mark of separation stronger than the comma, less formal than a colon, and more relaxed than parentheses.”
— Strunk and White (2000)

Notice how the two statements are consistent with respect to relative formality and power to separate.