Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.

Nat Biotechnol
Authors
Keywords
Abstract

With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals and surpass experimental datasets in consistency, recovery of gene annotations and enrichment for disease-associated variants. We use the imputed data to detect low-quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information.

Year of Publication
2015
Journal
Nat Biotechnol
Volume
33
Issue
4
Pages
364-76
Date Published
2015 Apr
ISSN
1546-1696
URL
DOI
10.1038/nbt.3157
PubMed ID
25690853
PubMed Central ID
PMC4512306
Links
Grant list
R01 GM113708 / GM / NIGMS NIH HHS / United States
R01 HG004037 / HG / NHGRI NIH HHS / United States
U01 HG007610 / HG / NHGRI NIH HHS / United States
R01HG004037 / HG / NHGRI NIH HHS / United States
U54 HG004570 / HG / NHGRI NIH HHS / United States
RC1 HG005334 / HG / NHGRI NIH HHS / United States
U41 HG007000 / HG / NHGRI NIH HHS / United States
RC1HG005334 / HG / NHGRI NIH HHS / United States