Ashish Agarwal, LaDeana W. Hillier, Joel Rozowsky, David Koppstein, Andrea Sboner, Lukas Habegger, Jeanyoung Jo, Michael Snyder, Philip Green, Valerie Reinke, Robert H. Waterston, Mark Gerstein, “Transcriptome comparison between tiling arrays and next generation sequencing on matched worm samples, towards making optimal use arrays”. Presentation at ENCODE/modEncode Consortium Meeting 2009.
Title: Transcriptome comparison between tiling arrays and next generation sequencing on matched worm samples, towards making optimal use arrays
Abstract:
Sequencing technologies are becoming a viable alternative to traditional microarrays as their cost continues to decrease, and so a detailed comparison of their relative strengths is warranted. We investigate the transcriptome of a C. elegans matched sample that was both sequenced and hybridized on a tiling array. We describe a method for comparing the single base pair resolution data from sequencing with the probe data from an array. Given this we conduct several correlations of the signal. We find the raw signal to have a high correlation, both across the genome as well as in only transcribed regions. A comparison of the differential expression across two samples from both technologies also shows significant agreement.
Next we compare both technologies in regards to their agreement with a gold standard in a ROC plot fashion, where the parameter varied is the threshold above which the signal is considered indicative of transcriptional activity. The sequencing data simultaneously provides a higher sensitivity and a lower false positive rate. The higher resolution of sequencing also leads to more accurate prediction of exonic boundaries. We are able to use the sequencing data as a gold standard to optimally calibrate the parameters required to analyze the tiling array.
Finally, we investigate the extent of cross hybridization, the most likely artifact leading to false positives, in the tiling array. We quantify the degree of its contribution to the signal and show the effects of filtering out the unreliable probes. Correspondingly, we show that there is a much smaller but detectable degree of error in the sequencing data from cross-mapping.
The material presented here was later published in BMC Genomics
Presentation Slides