Random Forest QTL Mapping

Background: The analysis of expression quantitative trait loci (eQTL) is a potentially powerful way to detect transcriptional regulatory relationships at the genomic scale. However, eQTL data sets often go underexploited because legacy QTL methods are used to map the relationship between the expression trait and genotype. Often these methods are inappropriate for complex traits such as gene expression, particularly in the case of epistasis.

Results: Here we compare legacy QTL mapping methods with several modern multivariate methods and evaluate their ability to produce eQTL that agree with independent external data in a systematic way. We found that the modern multivariate methods (Random Forests, sparse partial least squares, lasso, and elastic net) clearly outperformed the legacy QTL methods (Haley-Knott regression and composite interval mapping) in terms of biological relevance of the mapped eQTL. In particular, we found that our new approach, based on Random Forests, showed superior performance among the multivariate methods.

Conclusions: Benchmarks based on the recapitulation of experimental findings provide valuable insight when selecting the appropriate eQTL mapping method. Our battery of tests suggests that Random Forests map eQTL that are more likely to be validated by independent data, when compared to competing multivariate and legacy eQTL mapping methods.

Efficient Implementation of RF: Michael Kuhn has modified the original Random Forest R-package to be much more efficient. It uses less memory, runs faster, and can easily be parallelized. Otherwise it behaves like the standard package. The code is here.

PaperData-driven assessment of eQTL mapping methods.

Version 1:

R-code: functions.R
Tutorial: RF-QTL-tutorial.pdf

Version 2:

R-package: RFQTL
Tutorial: RF-QTL-tutorial.tar.gz

Related Publications

Ackermann M, Sikora-Wohlfeld W, Beyer A. (2013) Impact of natural genetic variation on gene expression dynamics. PLoS Genetics, 9(6):e1003514

Paola Picotti, Mathieu Clément-Ziza, Henry Lam, David S. Campbell, Alexander Schmidt, Eric W. Deutsch, Hannes Röst, Zhi Sun, Oliver Rinner, Lukas Reiter, Qin Shen, Jacob J. Michaelson, Andreas Frei, Simon Alberti, Ulrike Kusebauch, Bernd Wollscheid, Robert L. Moritz, Andreas Beyer and Ruedi Aebersold (2013) A complete mass spectrometric map of the yeast proteome applied to quantitative trait analysis Nature, doi:10.1038/nature11835

Ackermann M, Clément-Ziza M, Michaelson JJ, Beyer A. (2012) Teamwork: improved eQTL mapping using combinations of machine learning methods. PLoS One7(7):e40916.

Michaelson JJ, Alberts R, Schughart K, Beyer A. (2010) Data-driven assessment of eQTL mapping methods. BMC Genomics. 7;11:502.

Michaelson JJ, Loguercio S, Beyer A. (2009) Detection and interpretation of expression quantitative trait loci (eQTL)Methods 48(3):265-76.