A Type Theory for Probability Density Functions

Abstract

There has been great interest in creating probabilistic programming languages to simplify the coding of statistical tasks; however, there still does not exist a formal language that simultaneously provides (1) continuous probability distributions, (2) the ability to naturally express custom probabilistic models, and (3) probability density functions (PDFs). This collection of features is necessary for mechanizing fundamental statistical techniques. We formalize the first probabilistic language that exhibits these features, and it serves as a foundational framework for extending the ideas to more general languages. Particularly novel are our type system for absolutely continuous (AC) distributions (those which permit PDFs) and our PDF calculation procedure, which calculates PDFs for a large class of AC distributions. Our formalization paves the way toward the rigorous encoding of powerful statistical reformulations.

Citation
Sooraj Bhat, Ashish Agarwal, Richard Vuduc, Alexander Gray (2012). A Type Theory for Probability Density Functions, Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2012. ACM SIGPLAN Notices 47(1):545-556.

Errata
In Figure 10, the P-PLUS rule should be:
$\frac{{\Upsilon;\Lambda} \vdash {\varepsilon_1} \perp {\varepsilon_2} \qquad\{{\Upsilon;\Lambda} \vdash {\varepsilon_i} \leadsto {\delta_i}\}_{i=1,2}} {{\Upsilon;\Lambda} \vdash {\varepsilon_1+\varepsilon_2} \leadsto {\lambda {x:\mathsf{R}}\centerdot \int\lambda {t:\mathsf{R}}\centerdot\ \delta_1\ t * \delta_2\ (x – t)}}$
The $$t$$ and $$x$$ were accidentally transposed. Many thanks to Chung-chieh “Ken” Shan for finding this.

Posted in Presentations, Publications | Tagged , | 1 Comment

ML/ICFP/CUFP 2011

I’ve been having a great time at ICFP in Tokyo. I spent the first day at the ML Workshop, and am now enjoying CUFP after presenting my own tutorial on creating collaborative scientific software in OCaml.

Rap Guide to Evolution

Baba Brinkman’s Rap Guide to Evolution is a hilarious show on a serious topic. It’s an alternative approach to educate people about the fact of evolution. If you have it, don’t miss the chance to see him live!

Dijkstra on elegance in programming

“programmers should not be puzzle-minded …. We would be much better served by clean, systematic minds, with a sense of elegance.”
— Dijkstra (In interview as reported in CACM)

The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics

Abstract

Biological data is often tabular but finding statistically valid connections between entities in a sequence of tables can be problematic – for example, connecting particular entities in a drug property table to gene properties in a second table, using a third table associating genes with drugs. Here we present an approach (CRIT) to find connections such as these and show how it can be applied in a variety of genomic contexts including chemogenomics data.

Citation
Tara A Gianoulis, Ashish Agarwal, Michael Snyder, and Mark Gerstein (2011). The CRIT framework for identifying cross patterns in systems biology and application to chemogenomics, Genome Biology 12(R32):1-12.

Posted in Publications | Tagged | Comments Off

OCaml User Meeting 2011

I just registered for the OCaml User Meeting. Looking forward to seeing everyone in Paris. I’ll also attend the OCaml Hacking Days at IRILL and visit OCamlCore.

MACS 1.4 patch

A bug was introduced in MACS 1.4, which arises if many of the initial reads in an input SAM file are unaligned. You are having this error if MACS is printing something like the following to stderr:

Traceback (most recent call last): File "python/bin/macs14", line 354, in main() File "python/bin/macs14", line 59, in main (treat, control) = load_tag_files_options (options) File "python/bin/macs14", line 323, in load_tag_files_options ttsize = tp.tsize() File "lib/python2.6/site-packages/MACS14/IO/Parser.py", line 655, in tsize return int(s/n) ZeroDivisionError: integer division or modulo by zero

Here’s a patch to fix the problem. You can download the patch to any location. Apply the patch by cd’ing to the directory lib/IO within the MACS source code. Then do

patch < path/to/macs14_Parser.patch.txt

and proceed to install MACS as normal.

Many thanks to David Cittaro, who provided this fix on the MACS mailing list. However, the diff given there is the wrong way around---with the old and new files swapped---so that cannot be used as the patch.

Here’s a simple C program that demonstrates the use of the pthreads library. The following code can be downloaded here, so you can test it yourself. Simply compile with the command:

gcc -lpthread pthreads_test.c -o pthreads_test

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

Now, define a task that takes some non-negligible amount of time to complete. We pass in an id simply to identify each call to the task in output messages.

void *task(int id) {
int i;
double result = 0.0;
for (i = 0; i < 1000000; i++) {
result = result + sin(i) * tan(i);
}
printf("Task %d completed with result %e\n", id, result);
}

We can run the above task a desired number of times in serial by calling the following function:

void *serial(int num_tasks) {
int i;
for (i = 0; i < num_tasks; i++) {
}
}

Now, let’s define task in a manner suitable for being called in its own thread. All this requires is to use pthread_exit to let the parent process know this thread has completed. Actually this doesn’t affect the simple program we’re writing, but it’s the correct thing to do if you want to later join threads. Also, we have to make sure the function signature matches the requirements of pthread_create.

void *threaded_task(void *t) {
long id = (long) t;
}

void *parallel(int num_tasks)
{
int rc;
long t;
for (t = 0; t < num_threads; t++) {
if (rc) {
printf("ERROR: return code from pthread_create() is %d\n", rc);
exit(-1);
}
}
}

The main function runs a specified number of tasks either in serial or parallel.

void *print_usage(int argc, char *argv[]) {
exit(1);
}

int main(int argc, char *argv[]) {
if (argc != 3) {print_usage(argc, argv);}

if (!strcmp(argv[1], "serial")) {
} else if (!strcmp(argv[1], "parallel")) {
}
else {
print_usage(argc, argv);
}

printf("Main completed\n");
}

That’s it! Now, compile the program using the command given at the beginning of this post. Here are some results on a MacBook Air.
 $time ./pthreads_test serial 4 Task 0 started Task 0 completed with result -3.153838e+06 Task 1 started Task 1 completed with result -3.153838e+06 Task 2 started Task 2 completed with result -3.153838e+06 Task 3 started Task 3 completed with result -3.153838e+06 Main completed real 0m0.673s user 0m0.665s sys 0m0.003s $ time ./pthreads_test parallel 4 Creating thread 0 Creating thread 1 Creating thread 2 Creating thread 3 Main completed Thread 1 started Task 1 started Thread 2 started Task 2 started Thread 3 started Task 3 started Thread 0 started Task 0 started Task 3 completed with result -3.153838e+06 Thread 3 done Task 2 completed with result -3.153838e+06 Task 1 completed with result -3.153838e+06 Thread 2 done Thread 1 done Task 0 completed with result -3.153838e+06 Thread 0 done

real 0m0.378s
user 0m0.667s
sys 0m0.007s

Note that threads do not necessarily start or end in order. And the point of it all; there is a factor 2x speedup for the multi-threaded run! Here’s a plot of the real time against number of tasks for 2, 4, 8, 16, 32, and 64 tasks.

You can learn more by following the excellent tutorial provided here, from which the above example was derived.

Posted in Uncategorized | Tagged | 12 Comments

Downloading MACS requires you to either sign up for their mailing list or solve a puzzle. I signed up for their mailing list and then solved the puzzle for fun. Perhaps you will find the answer on my website.