Information

Allele-specific bisulfite sequencing

Allele-specific bisulfite sequencing


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

When evaluating methylation status at various CpG sites after sequencing, how much consideration should one give to random single base pair insertions and deletions. Suppose there is a CA dinucleotide; can we assume that the CA is native to the sequence or results from a G deletion especially when the latter is suspect once comparing to other sequences. Is there really a set standard sequence to compare it?


I agree with Vance, we need a little more detail to better answer your question. From what I can tell, you are asking whether a single nucleotide polymorphism (SNP) that results in the addition of a cytosine base to your sequence is of concern when examining the methylation signature of that sequence.

I would first determine whether the polymorphism is common in the population using NCBI. The more common the polymorphism, the less likely it is to have a serious effect. However, that is not to say that it doesn't have any effect.

I would then examine how much the methylation of that one site varies compared to samples without the additional cytosine base.

Finally I would determine the importance of the location of the SNP. Is it located in a CpG island? Or is it located on the shores, shelves or 'open sea'? (see work by Sandoval et al.)

Without knowing all the above it is hard to make a call on how important/unimportant the SNP is in your methylation analysis.


Allele-Specific Transcriptome and Methylome Analysis Reveals Stable Inheritance and Cis-Regulation of DNA Methylation in Nasonia

Affiliations Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America, Cornell Center for Comparative and Population Genomics, Cornell University, Ithaca, New York, United States of America

Affiliation Department of Biology, University of Rochester, Rochester, New York, United States of America

Affiliations Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America, Cornell Center for Comparative and Population Genomics, Cornell University, Ithaca, New York, United States of America


Generating Epireads

The original epiread format (proposed by the methpipe team), only includes information related to CpGs. BISCUIT extends this format to also include SNP information. The columns in the BISCUIT epiread format indicate:

  1. Chromosome name
  2. Read name
  3. Read position in paired-end sequencing
  4. Bisulfite strand (bisulfite Watson (+) or bisulfite Crick (-))
  5. Position of the cytosine in the first CpG (0-based)
  6. Retention pattern (“C” for retention or “T” for conversion) for all CpGs covered
  7. Position of the first SNP, if a SNP location file is provided
  8. Base call of all SNPs covered

An example of the epiread format is:

To produce an epiread formatted file, you need to run the epiread subcommand with a BAM file, a FASTA file of the reference genome, and, optionally, a BED file for SNPs.

The SNP BED file can be obtained by running biscuit vcf2bed -t snp my_pilefup.vcf.gz . If no SNP file is supplied, the output does not include the extra columns related to SNPs. To get back the original epiread format, run cut -f 1,5,6 on the output epiread file.

To test all SNP-CpG pairs, include the -P flag in your command prompt. Note, if looking for allele-specific methylation, you will need to run with the -P flag and include a SNP BED file. For more help on available flags, run biscuit epiread in the terminal.


Semisimulated Allele-Specific Methylation Data

We conducted simulations to evaluate how the performance of our model relates to several critical parameters of the underlying dataset. To reflect performance characteristics on real datasets, we used a strategy called “semisimulated” data. The locations of mapped reads were taken from real data, as were the locations of CpGs within reads and the underlying reference genome. The methylation states inside those reads were determined according to randomly generated allele-specific or single-allele methylation profiles. Briefly, within a region designated as an AMR, we randomly generated two methylation profiles by sampling individual CpG methylation levels as βeta variants skewed toward 0 or 1. Then we assigned each read with equal probability to one of the two alleles, and the methylation states of the CpGs within the read were sampled according to probabilities given by the methylation profile corresponding to that allele. A full description of this procedure is provided in the SI Text.

With current methylomes from BS-seq, we expected the variation in coverage along chromosomes to be a critical factor for the performance of our model. In addition, the variation in inter-CpG distance may prevent our method from capturing ASM in regions of low CpG density for a fixed read length. We examined how well our method could identify ASM in a given genomic interval by manipulating three independent variables:

Mean coverages were <5×, 10×, 15×>, corresponding to current methylomes from BS-seq.

Read lengths were <50, 100, 150>bases corresponding roughly with current short-read sequencing technologies.

CpG density distributions took three different settings: CpG islands (CGIs) defined as in ref. 23, non-CGI promoters defined as 1 kb upstream of transcription start site (TSS) in National Center for Biotechnology Information reference sequences but not CGIs, and randomly sampled genomic background with CpG density (observed/expected) between 0.2 and 0.4.

Details concerning the number of simulated datasets for each parameter combination can be found in the SI Text.

Specificity was generally very high (approximately 99%) for all simulation parameter combinations, reflecting our conservative model selection criterion (Eqs. 4 and 5). In contrast, sensitivity showed greater dependence on properties of the datasets. Sensitivity was higher for regions of higher CpG density, as expected because our model depends on the relationships between CpG states inside a read. As shown in Fig. 1, inside CGIs sensitivity reached above 95% for all read lengths when the mean coverage was above 10×. Sensitivity reached approximately 70% for intergenic regions but required both 10× coverage and read length 100, which compensates for the decrease in CpG density. As expected, greater coverage and read length improved accuracy, and the effect of read length is equivalent to that of CpG density. These results indicate that methylomes with read lengths around 100 bp and mean coverage above 10× appear sufficient for our model to accurately identify ASM. These criteria are met by most existing methylomes from BS-seq experiments.

Sensitivity of AMR identification based on semisimulated data. Coverages of 5, 10, and 15×, and read lengths of 50, 100, and 150 bp were used. CpG densities were controlled by simulating within (A) CGIs, (B) non-CGI promoter regions, and (C) non-CGI intergenic regions.


Allele-specific methylation occurs at genetic variants associated with complex disease

We hypothesize that the phenomenon of allele-specific methylation (ASM) may underlie the phenotypic effects of multiple variants identified by Genome-Wide Association studies (GWAS). We evaluate ASM in a human population and document its genome-wide patterns in an initial screen at up to 380,678 sites within the genome, or up to 5% of the total genomic CpGs. We show that while substantial inter-individual variation exists, 5% of assessed sites show evidence of ASM in at least six samples the majority of these events (81%) are under genetic influence. Many of these cis-regulated ASM variants are also eQTLs in peripheral blood mononuclear cells and monocytes and/or in high linkage-disequilibrium with variants linked to complex disease. Finally, focusing on autoimmune phenotypes, we extend this initial screen to confirm the association of cis-regulated ASM with multiple complex disease-associated variants in an independent population using next-generation bisulfite sequencing. These four variants are implicated in complex phenotypes such as ulcerative colitis and AIDS progression disease (rs10491434), Celiac disease (rs2762051), Crohn's disease, IgA nephropathy and early-onset inflammatory bowel disease (rs713875) and height (rs6569648). Our results suggest cis-regulated ASM may provide a mechanistic link between the non-coding genetic changes and phenotypic variation observed in these diseases and further suggests a route to integrating DNA methylation status with GWAS results.

Conflict of interest statement

Competing Interests: We have the following interests. This study was funded in part by Massachusetts Lions Eye Research Fund, Inc. Research to Prevent Blindness, Inc. and New England Eye Center. There are no patents, products in development or marketed products to declare. This does not alter our adherence to all the PLOS ONE policies on sharing data and materials, as detailed online in the guide for authors.

Figures

Figure 1. Microarray based detection of allele-specific…

Figure 1. Microarray based detection of allele-specific methylation.

Figure 2. Types of allele-specific methylation candidates.

Figure 2. Types of allele-specific methylation candidates.

Plots showing number of different categories of ASM…

Figure 3. Cis-regulated allele-specific methylation confirmation in…

Figure 3. Cis-regulated allele-specific methylation confirmation in an independent population.

Heatmaps show percent methylation status…

Figure 4. Genomic context of cis-regulated allele-specific…

Figure 4. Genomic context of cis-regulated allele-specific methylation events.


NanoNOMe Combines Nanopore and NOMe-seq to Mine the Allele-specific Epigenome

Gnomes are small magical creatures that mine for treasures, so you might wonder if a nanoNOMe is an even tinier being. Well actually it’s a new molecular technique, but its applications are nothing short of magical. Several methods have been developed to probe DNA methylation, transcription factors, and chromatin state simultaneously. Many are based on the principle of NOMe-seq, which uses a GpC methyltransferase to label open chromatin, allowing bisulfite sequencing to identify both these sites and DNA methylation at CpG sites. The major advantages of these approaches are the single-molecule data output, meaning that DNA methylation and nucleosome occupancy data come from the same DNA strand. A challenge of these methods has been that most sequencing technologies have short reads, meaning this valuable single-strand information is very limited in scale.

The lab of Winston Timp at Johns Hopkins University wanted to expand the read length of these combinatory methods. To do this, they used nanopore sequencing. Nanopore is unique among sequencing technologies for its very long reads (>10kb) of unamplified DNA. In previous work, Dr. Timp’s lab has shown that nanopore sequencing can accurately call DNA methylation using there software Nanopolish. This is a distinct challenge when compared to bisfulfite sequencing, since in this case the DNA is unamplified and thus the nanopore sequencer directly detects the 5mC nucleotide itself. In this new study, they combined nanopore sequencing with GpC methylation in a method they call nanopore sequencing of nucleosome occupancy and methylome (nanoNOMe). They applied this approach in four human cells lines. Here’s what they found on their mining expedition:

  • Long reads allow interrogation of repetitive elements, which is difficult with other methods
    • Of all repetitive elements, only Alu elements show increased methylation, and chromatin accessibility is reduced at all repetitive elements, especially in LINE and LTR regions
    • This approach could also be used to explore epigenetic effects on mutated vs. wild type alleles

    At its core, nanoNOMe adds an exogenous layer of information to the DNA itself, which can be used to store information about nucleosome occupancy in the cell and generate a fully phased human epigenome. This is then read out along long single molecules using nanopore sequencing. We could imagine many uses for this approach across basic science and disease models. So, if you are mining the epigeNome looking for your own treasures, consider going on a nanoNOme expedition.


    Results

    Patterns of AIs across epigenomic marks

    To explore the effects of genetic variation on the epigenome, the National Institutes of Health (NIH) Roadmap Epigenomics Project (17) has now completed whole-genome sequencing (WGS) on genomes of 13 donors and published NIH Roadmap reference epigenomes from 71 combined samples that collectively represent 27 distinct tissue types and nine cell types (fig. S1). For accurate identification of heterozygous genomic loci, we sequenced the donor genomes (18). Eight assays were included in most of the samples and used for AI detection: WGBS, RNA sequencing (RNA-seq), and chromatin immunoprecipitation sequencing (ChIP-seq) for six different histone marks (H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3, and H3K27ac) (fig. S1). We performed allele-specific methylation (ASM) analysis at heterozygous single-nucleotide polymorphism (SNP) loci within the 49 WGBS methylomes using a threshold of absolute methylation difference of >30% between alleles and by estimating significance by means of Fisher’s exact test on the counts of methylated and unmethylated cytosines observed on the same sequencing read with each of the two SNP alleles (fig. S2A) (18). We performed the identification of AIs for histone marks and transcription using the AlleleSeq pipeline (fig. S2B) (18, 19).

    Considering the AIs in all the marks, the imbalances in DNA methylation were by far the most abundant (table S1 and fig. S3), largely because of the genome-wide distribution of DNA methylation, in contrast to the uneven genomic distribution of other marks. Among the histone marks, H3K27ac had more imbalance calls than others (table S1), in part owing to deeper ChIP-seq coverage for H3K27ac (table S1 and fig. S3). At promoters, H3K27ac and H3K4me3 marks were more abundant on the allele with less DNA methylation (Fig. 1A). Conversely, H3K9me3 signal was more abundant on the allele with more methylation in promoters (Fig. 1A). At enhancers, H3K27ac tended to occur more often on the allele with less DNA methylation (Fig. 1A). We also detected, at high specificity, enrichment of AIs in methylation and coordinated changes in transcription and histone marks within a majority of those imprinted loci that included a heterozygous SNP (figs. S4 and S5) (18).

    (A) Number of AIs in histone marks and transcription, overlapping ASM loci, over classes of genomic elements. (B to E) Proportions of SD-ASM loci over total heterozygous loci in 200-bp bins near promoters, CpG islands, and enhancers.

    We next evaluated the extent of reported SD-ASM. Consistent with genetic effects in cis (6, 2022), co-occurrence of ASM at the same heterozygous locus across different samples was higher than expected by chance under a permutation-based null model (fig. S6A). The degree of co-occurrence of ASM tended to be higher for pairs of samples across tissues of the same individual than between pairs from the same tissue across different individuals, which was higher than for samples without matching tissue or individual (fig. S6B). Low concordance in ASM calls between individuals may be due to local haplotype context, epigenetic drift, or other nongenetic factors (3, 4, 6, 20, 22, 23). Gaussian mixture modeling (18) showed that allelic differences in methylation (above the 30% threshold) at heterozygous SNPs had a tendency to occur in the same direction (the same allele showing higher methylation than the other) across pairs of samples (fig. S6, C to E).

    In order to increase the power to detect SD-ASM at high sensitivity, we pooled the reads across all 49 methylomes and applied the same detection method as for individual samples (fig. S7A) (18). The deep coverage of the combined set (1691-fold total coverage in bisulfite sequencing reads in the combined set of 49 methylomes) increased our power to detect those sequence-associated AIs that were detectable across different tissues and donors (fig. S7, B to D), whereas our power to detect tissue-dependent and donor-dependent SD-ASM was reduced (fig. S8, A and B). The number of accessible heterozygous loci (those having at least six counts per allele), for SD-ASM determination, after pooling rose to 4,913,361, increasing our SD-ASM mapping resolution—measured as an average distance between “index hets”—to 600 base pairs (bp). At the 30% methylation difference default threshold, AIs were detected at 5% of index hets lowering the threshold to 20%, a total of

    Sensitivity of the methylome to genetic variation varies across classes of genomic elements

    We next explored whether SD-ASM had the tendency to occur within any particular type of genomic element. Using the reads pooled across the 49 methylomes, we observed depletion of SD-ASM within promoters containing CpG islands (Fig. 1B), as well as within CpG islands in general (Fig. 1C), which is consistent with observations that ASM is depleted in CpG islands (4, 21) and that mQTLs are depleted within promoters of genes within CpG islands (24, 25) and that expression quantitative trait methylation is enriched within CpG island shores and not in CpG islands themselves (26). By contrast, and mirroring previous mQTL patterns (25), promoters of genes not in CpG islands showed high levels of SD-ASM (Fig. 1D). We also observed enrichment of SD-ASM downstream from the promoter and into the gene body (Fig. 1, B and D) and positive association between allele-specific expression (ASE) and ASM over exons (Fig. 1A), which is consistent with higher methylation of actively transcribed regions, including those on the X chromosome (27) and with the enrichment of mQTLs in regions flanking the transcription start site (TSS) (23, 28). One factor contributing to the ASM, particularly near the transcription start sites (Fig. 1D), may be the presence of transcriptional regulatory signals (29).

    SD-ASM was also highly enriched within enhancers (Fig. 1E), which is consistent with previous reports (24, 28). The abundance of TF binding sites within enhancers suggests that SD-ASM may result from disruption of TF binding (23). Under that assumption, our data suggest that TF binding at CpG islands and CpG-rich promoters is buffered against genetic perturbations, whereas the TF binding to non-CpG promoters and enhancers is most sensitive. We also observed a somewhat puzzling mild depletion of SD-ASM in the flanking regions of enhancers (Fig. 1E), which also suggests buffering in those regions.

    SD-ASM is attributable to differences between allele-specific epiallele frequency spectra

    We next asked whether the lack of buffering at SD-ASM loci may result in excess stochasticity and metastability, which is defined by the presence of more than one stable state, each stable state corresponding to an epiallele (single-chromosome methylation pattern). To answer this question, we made use of the deep combined WGBS read coverage across 49 methylomes (table S2) and that each read relates a single variant to a single epiallele. We assessed epialleles by scoring the methylation status of four homozygous CpG sites (4 2 = 16 possible epialleles) that were the closest to each index het in individual WGBS reads (13, 14) (Fig. 2A). [Our use of the term “epiallele” follows the most recent usage (12, 13) and does not comply with the original definition (30), which implies intergenerational inheritance. Our use of the term “metastability” is consistent with its use in dynamical systems theory and does not imply inheritance of an epiallele during cell division.]

    (A) Example of an epiallele frequency spectrum (bottom) derived from observed epialleles in WGBS reads (top). (B) Histograms of Shannon entropy, in bits, for the epiallele frequency spectra for the hets showing SD-ASM (red) and the nearest (control) hets without SD-ASM (black). (C) Most heterozygous loci with two frequent epialleles show SD-ASM and have entropy larger than 1.7 bits (red portion of the bar), the two epialleles being biphasic (fully methylated or fully unmethylated) 71.7% of the time. The callout on the right provides an example of a het in which the difference between epiallele frequency spectra of allele 1 (A, orange) and allele 2 (G, blue) explains SD-ASM. (D) Histogram of coefficients of constraint for SD-ASM loci with two frequent epialleles. The callouts illustrate an example het (T/C, top right callout) with a low coefficient of constraint, and another (G/C, bottom right callout) with a high coefficient of constraint. (E) Illustration of buffering in contrast to ergodic/periodic and mosaic metastability.

    To quantify the amount of stochasticity at index het loci, we used Shannon entropy (18). The entropy values ranged from 0 to 4: An even distribution of frequencies across the 16 possible epiallele patterns produces a maximum entropy score of 4 bits, whereas a complete absence of stochasticity because of maximal “buffering” implies just one epiallele with nonzero frequency and an entropy score of 0 bits. To assess quantitatively any differences in buffering (lack of sensitivity to genetic variation) between SD-ASM and control loci, we identified SD-ASM loci that had sufficient coverage and a close index het without ASM and compared entropies. A total of 6619 (2.7%) of 241,360 loci with SD-ASM met the two criteria (18). We observed a striking difference in entropy, providing a quantitative assessment of the higher stochasticity at the SD-ASM versus control loci (Fig. 2B).

    We next examined enrichment for epigenetic polymorphisms at SD-ASM loci. We estimated the number of frequent epialleles for each locus by sorting the epialleles from the most to the least frequent and identified the minimal-size “top-list” of epialleles that accounted for at least 60% of all the reads with ascertained epialleles. In contrast to the control loci, which typically had only one high-frequency epiallele on the “top-list” and were therefore not epigenetically polymorphic, SD-ASM loci showed multiple frequent epialleles—in most cases, just two (Fig. 2C). By examining the top pairs of epialleles, we found that 71.7% of the pairs consisted of one that was completely methylated and another completely unmethylated (Fig. 2C). This is concordant with previous reports of biphasic (fully methylated and fully unmethylated) distributions of methylation in amplicons with high interindividual methylation variance and in polymerase chain reaction clones with bimodal methylation patterns (3, 31). AIs at SD-ASM loci could be traced to shifts in epiallele frequency spectra between alleles, typically shifts in relative frequencies of the fully methylated and fully unmethylated epialleles (Fig. 2C). We validated the observed excess of stochasticity and the enrichment for the biphasic pattern at SD-ASM loci using an independent WGBS dataset from the Encyclopedia of DNA Elements (ENCODE) (fig. S9, A to C) (18).

    We next quantified the relationship between genetic variation and stochastic epialleles. At each locus, we estimated the probabilities of epialleles for each allele (higher probabilities are indicated by thicker arrows in Fig. 2, C and D). We then quantified the degree to which genetic alleles determine epiallele frequencies using a coefficient of constraint (18), an information-theoretic measure that is a generalization of the R 2 coefficient of determination that is commonly used in genetics and is more appropriate for quantifying genetic determination of stochastic phenotypes. A larger value for the coefficient of constraint value signifies that epigenetic variation is more constrained and determined by genetic variation in cis. Intuitively, a larger coefficient of constraint indicates a larger difference in the epiallelic frequency spectra corresponding to the two alleles, implying a higher degree of determination of epiallele frequency spectra by the genetic alleles (Fig. 2D).

    There are two general mechanistic models that could explain the effect of sequence variation in cis on epiallele frequency spectra. The ergodic/periodic model stipulates ongoing switching between metastable states, the transitions being stochastic with a possible component of periodicity, such as circadian oscillations. If a sufficient number of stochastic transitions from one epiallele to another occur, that epiallele frequency spectrum depends largely on the sequence-dependent shape of the current energy landscape (state transition probabilities) and not on the epigenetic memory of past events (Fig. 2E). By contrast, the mosaic model stipulates that epialleles are stably transmitted over time and even during cell division, being “frozen” after a period of initial metastability into one of the stable states. Both models entail a period of metastability, whether past (mosaicism) or current (ergodic/periodic model).

    CTCF binding loci show sequence-dependent stochastic switching and looping

    Because of its association with DNA methylation at a large number of binding sites, we next examined the role of CCCTC-binding factor (CTCF) in creating the metastable states that correspond to epialleles. Metastability is known to be created by positive (including double-negative) feedback loops (32) that in our case also include interactions in cis, such as the protection against DNA methylation by CTCF binding and reciprocal preference of CTCF for unmethylated DNA (33). The first indication of the role of CTCF binding in metastability came from the observation that the heterozygote with the larger coefficient of constraint (G/C het) also showed larger differences in predicted CTCF binding affinity between the two alleles than the other (T/C het) (Fig. 2D). Considering that the coefficient of constraint is proportional to the differences in epiallele frequency spectra for the two alleles (identical epiallele frequency spectra resulting in coefficient of constraint value of 0), this observation suggested a positive correlation between the coefficient of constraint and the differences in CTCF binding affinity for the two genetic alleles, which was indeed observed (Fig. 3A). In terms of the epigenetic landscape distortion due to genetic variation, we see that sequence variants that show larger differences in CTCF binding affinity also show greater differences in their epigenetic (energy) landscapes, as reflected in the more prominent shifts between alleles in their occupancy of metastable states (as measured by higher values of coefficient of constraint) (Fig. 3A, top). Because CTCF binding and demethylation of its binding site are mutually reinforcing (forming a positive-feedback loop and a metastable state), the model also predicts that the variants associated with higher CTCF binding affinities will show lower methylation, which is indeed the case (Fig. 3A, bottom) as previously observed (23, 24). Taken together, these results suggest sequence-dependent stochastic epigenetic switching between metastable states that is mediated by CTCF binding.

    (A) (Top) Correlation between absolute CTCF binding affinity differences, based on position weight matrix scores (PWMs), and the coefficient of constraint for predicted CTCF binding sites with SD-ASM, two frequent epialleles, and a biphasic methylation pattern. (Bottom) Correlation between CTCF binding affinity and DNA methylation at predicted CTCF binding sites. (B) SD-ASM is more predictive of allelic looping (28 true positive of 44 predictions) than motif disruption scores (1 true positive of 44 predictions). To control for specificity, thresholds were selected so that both methods predicted the same number of hets (44) to show allelic looping. (C) SD-ASM at binding sites of 377 TFs defined with the SELEX method. The pie chart (left) and the table (right) indicate both enrichments and directionality trends using a shared color code. (D) (Top) Correlation between absolute ELK3 binding affinity differences and the coefficient of constraint for predicted binding sites with SD-ASM, two frequent epialleles, and a biphasic methylation pattern. (Bottom) Correlation between ELK3 binding affinity and DNA methylation at predicted ELK3 binding sites. (E) A mechanistic model of a sequence-dependent energy landscape with two metastable states: allele 1 (top row), corresponding to a landscape where the most frequently occupied metastable state corresponds to a completely unmethylated epiallele, and allele 2 (bottom row), corresponding to a landscape where the most frequently occupied metastable state corresponds to a completely methylated epiallele. Putative positive-feedback loops involving interactions between TF binding and binding site methylation are indicated for CTCF. An alternative model involving competitive binding of two TFs is indicated on the right. Significance of correlations was tested by using Student’s t test.

    Because the CTCF TF establishes chromatin loops (34), we asked whether the allelic state of methylation also coincided with allelic looping. Toward this goal, we used a study (35) that reports heterozygous SNP loci that associate both with allelic CTCF binding and allelic chromatin looping, as determined by means of chromatin interaction analysis by paired-end tag sequencing (ChIA-PET). Indeed, a total of 44 of those SNP loci were also present in our dataset. Comparing our signals for the methylation state of CTCF binding sites with the predicted CTCF motif disruption scores suggested that SD-ASM is a more accurate indicator of allelic CTCF binding and looping than the motif disruption score (Fig. 3B) (18).

    TF binding sites show sequence-dependent shifts in epiallele frequency spectra and AIs

    Analyses of ASM at regulatory elements and eQTLs revealed associations between ASM and allele-specific histone marks with downstream allele-specific transcription (fig. S10, A to F) (18). These results complemented previous studies (22, 23) and suggested involvement of allele-specific TF binding and cofactors in ASM. To examine the role of allele-specific TF binding, we focused on the set of 377 TFs assessed for binding affinity using the high-throughput systematic evolution of ligands by exponential enrichment (SELEX) method (36). As for CTCF, we identified the subset of binding motif loci in a heterozygous state with two frequent epialleles and examined the correlation between coefficient of constraint and difference in predicted allelic binding affinities across these loci for each TF (table S3). Because of the relatively small number of such loci per TF, only 13 showed significant individual P values (Student’s t test, P < 0.05), with only CTCF surviving Bonferroni correction (for testing 377 TFs) (table S3). However, a majority (11 of 13) of the TFs that showed individually significant correlation also showed positive correlation (P = 0.01, binomial test), which is consistent with the pattern observed for CTCF where larger differences in TF binding affinities correspond to larger distortions in the configuration of metastable states within the landscape (table S3).

    Likewise, we next examined for all 377 TFs whether disruptions of their predicted binding sites associated with methylation imbalances. A majority (241) showed SD-ASM enrichment within their binding motifs compared with flanking loci (500 bp on each side) (Fig. 3C and table S4), suggesting that TF binding associates with allelic DNA methylation. The SD-ASM outside of the examined motifs may be attributable to sequence variation within undiscovered binding loci, within motifs of noncoding RNAs, or within loci in physical proximity or contact with regions of perturbed TF activity.

    We then examined the relation between allelic differences in motif strengths and methylation levels at SD-ASM loci (18). We observed that for more than half of the TFs tested (207), there was an association between motif strength and level of methylation (Fig. 3C). Most TFs (159) showed gain in methylation on the allele with the disrupted motif, which is consistent with the TF binding either protecting a region from passive methylation (37) or causing active demethylation (Fig. 3C) (38). By contrast, a smaller number of TFs (48), including members of TF families that recruit methyltransferases such as the ETS-domain TF family members (39, 40), showed loss of methylation on the allele with the disrupted motif (Fig. 3, C and D). About a quarter of TFs that show enrichment for SD-ASM show no bias in directionality (table S4), the lack of bias being explainable by contextual behavior at different binding loci, such as for nuclear factor of activated T cells 1 (41) or because of competing TFs at overlapping motifs. Our results support that TF motif sequences are predictive of proximal CpG methylation levels (23, 42, 43).

    We sought to validate the downstream functional consequences of SD-ASM variants, with predicted allelic differences in TF binding, using a luciferase assay. We prioritized cis-overlapping motifs (CisOMs), including those of c-MYC proto-oncogene (cMYC) and tumor suppressor p53 (TP53) that show competitive binding at many loci (44), because CisOMs provide one of the mechanisms of metastability (Fig. 3E), and also those that may have consequences for human disease (table S5). All four SNP validations showed allelic effects on luciferase expression, including two SNPs within CisOMs for cMYC and TP53 and some falling within disease-associated loci (fig. S11) (18), which suggests that SD-ASM helps identify those disease-associated variants that also have functional consequences.

    SD-ASM is enriched near disease-associated loci

    We observed that heterozygous variants with SD-ASM were enriched in the neighborhood of variants previously reported as significant in genome-wide association studies (GWASs) of common disease (Fig. 4A) (22, 23, 45). The enrichment was stronger around GWAS variants that have been replicated in multiple studies versus those that have not. To explore more specifically the role of enhancers, we performed a similar enrichment analysis focusing only on GWAS and SD-ASM variants overlapping enhancer elements. Enhancers that contain replicated GWAS variants were significantly (P < 0.0001, χ 2 test) more likely to also contain a variant with SD-ASM than enhancers that did not contain replicated GWAS variants (Fig. 4B). Taken together, these results indicate that AIs provide information about the role of specific loci in common diseases, pointing to the loci that are sensitive to the effects of genetic variation and have functional effects. The enrichment of both GWAS loci and AIs at enhancers, and sensitivity of TF binding to genetic variation discussed in previous sections, provide a mechanistic link between AIs and GWAS associations.

    (A and B) Enrichment of ASM in the proximity of GWAS loci. ASM hets within 1 kb of GWAS loci are compared with colocalized hets without ASM. (C to F) Evidence of purifying selection acting on rare variants with ASM. [(C) and (D)] Proportion of variants associated with ASM compared with those without ASM among the rare (DAF < 1%) variants across individual methylomes. [(E) and (F)] Proportion of loci with ASM over total heterozygous loci over windows of increasing DAF in the combined set of methylomes. (F) This bar chart summary of the data in (E) shows the excess of SD-ASM variants among those with DAF < 1%. χ 2 tests were used for significance of enrichments.

    Variants showing SD-ASM are under purifying selection

    Because the variants with large effects are under purifying selection, they tend to be rare, with frequencies below the detection threshold of association studies such as GWAS, mQTL, and eQTL. By contrast, AIs may provide evidence for functional effects even for rare variants that may be detected in only one individual. On the basis of previous studies that have used signatures of purifying selection such as shifts toward smaller derived allele frequency (DAF) to identify functional variants (46, 47), we would expect that ASM variants would also tend to have a lower DAF than those without ASM. Therefore, we obtained DAF estimates from the 1000 Genomes Project (48), ignoring variants that overlapped regions with low accessibility to variant calling. We observed that in nearly every sample in our dataset, heterozygous variants with ASM were significantly (P < 0.05, χ 2 test) more likely to have DAF smaller than 1% than were those without ASM (methylation difference between alleles < 5%) (Fig. 4C). Overall, this analysis found

    130 (median) more rare (DAF < 1%) variants than expected among those with ASM per individual methylome, providing a lower bound on the number of those under purifying selection per individual. When we repeated the analysis for enhancer regions, strong signal was again observed (Fig. 4D), suggesting a median excess of at least 26 enhancer variants under purifying selection per individual.

    The lower bounds from individual samples may underestimate the extent of purifying selection because of underdetection of SD-ASM. We therefore investigated whether an enrichment for rare variants could also be seen for those variants associated with SD-ASM from the combined dataset, using neighboring variants as controls (18). We observed that the chance of a locus having SD-ASM decreased as the derived allele frequency increased (Fig. 4E there were very few variants with DAF > 50%, causing high variance and large confidence intervals). We further tested whether there was a significant enrichment for variants with DAF < 1% among those with SD-ASM and found that such enrichment was indeed significant (odds ratio 1.18 P < 0.0001, χ 2 ) (Fig. 4F). That enrichment represents an excess of 2184 rare variants among those with SD-ASM compared with controls. Considering that this observed excess represents a set of 11 genomes (nine individuals and two cell lines), we estimate at least

    200 variants with SD-ASM under purifying selection per individual donor.


    Allele specific cloning efficiency after bisulfite treatment - (Sep/20/2005 )


    I hope my question is appropriate for this bioforum. I performed a bisulfite treatment on genomic DNA (EZ methylation kit, Zymo) and performed nested-PCR on that genomic DNA to amplify specific imprinted genes.

    I sent the PCR product to sequence and I could see on the Chroma file that some T/C peaks were juxtaposing each other. involving that I amplified both alleles in the PCR reaction (as was previously discussed here).

    I cloned the PCR product in P-Drive and sent 10 clones to sequence. and the results seems weird to me. far from the 50/50 I was expecting. I have to mention that I had a lot of trouble getting 10 clones to sequence. bacteria would not grow easily and the yield of my minipreps were very low (for some samples I had to do 20 minipeps to get 10 to sequence. but miniprep issuesare not the point here. )

    The point is: Is it possible that cloning efficiency depends on the allele inserted in the cloning vector? Can some "cloning bias" be introduced such that one bisulfite-treated allele is more easily cloned than the other?

    Anyone has an idea about that? Or got stuff that looks like mine??

    I am curious what differences are there between the two alleles, different methylation, polymorphism?

    When we deal with imprinted genes, we may expect different methylation patterns between the two alleles depending on the maternal/paternal origin of the allele. one allele may be hypermethylated and the other hypomethylated.

    Chroma files showed that I'm amplifying both alleles in my PCR. (if you refer to my first posted message)

    So is it possible that cloning efficiency is not the same for both alleles after bisulfite treatment?

    I don't know if there is a clone bias for methylated allele. It may be possible. But, the under-representation of unmethylated allele in your sequencing clones may due to sampling error because you hardly got 10 colonies in total. I usually don't have any problem obtaining more than hundreds of colonies for sampling. You may want to try Topo cloning kit. I found that result from clone sequencing is consistent to that from direct sequencing.

    Yeah I thought it might be some cloning issues. but I also thought it would not harm to investigate about other possibilities. I'll look for that Topo cloning. thanks pcrman!


    Published online:

    Figure 1 (A) Dlk1-Gtl2 imprinting cluster, including transcriptional start sites (arrows) and transcription units (hatched boxes). (B) Portion of IG-DMR analyzed in this study. The 458 bp region analyzed by bisulfite mutagenesis and DNA sequencing corresponds to positions 110,766,298-110,766,755, NC_000078.5 (black box). Polymorphisms (*) between C57BL/6J and Mus musculus castaneus are as follows: (B6/CAST): 110,766,439 (A/G), 110,776,579 (G/A), 110,766,774 (G/A), 110,766,902-110,766,904 (TTT/TT), 110,767,052 (A/G). (C) Schematic of the Gtl2-DMR, including the Gtl2 transcriptional start site (arrow) and exon 1 (hatched box) +1 corresponds to position 110,779,206. Regions analyzed correspond to positions 110,778,378-110,778,966 and 110,779,331-110,780,052. Polymorphisms are as follows: 110,779,741 (G/A), 110,779,881 (A/G), 110,780,030-110,780,031 (AA/GC).

    Figure 1 (A) Dlk1-Gtl2 imprinting cluster, including transcriptional start sites (arrows) and transcription units (hatched boxes). (B) Portion of IG-DMR analyzed in this study. The 458 bp region analyzed by bisulfite mutagenesis and DNA sequencing corresponds to positions 110,766,298-110,766,755, NC_000078.5 (black box). Polymorphisms (*) between C57BL/6J and Mus musculus castaneus are as follows: (B6/CAST): 110,766,439 (A/G), 110,776,579 (G/A), 110,766,774 (G/A), 110,766,902-110,766,904 (TTT/TT), 110,767,052 (A/G). (C) Schematic of the Gtl2-DMR, including the Gtl2 transcriptional start site (arrow) and exon 1 (hatched box) +1 corresponds to position 110,779,206. Regions analyzed correspond to positions 110,778,378-110,778,966 and 110,779,331-110,780,052. Polymorphisms are as follows: 110,779,741 (G/A), 110,779,881 (A/G), 110,780,030-110,780,031 (AA/GC).

    Published online:

    Figure 2 Paternal allele-specific methylation of the IG-DMR is inherited from sperm. Bisulfite mutagenesis and sequencing of DNA from B6 × CAST and CAST × B6 F1 hybrid liver and B6 × CAST F1 hybrid spermatozoa. Each circle represents one of 32 potentially methylated CpG dinucleotides, the first one located at position 110,766,345 (NC_000078.5). Each row of circles represents an individual strand sequenced. Filled circles represent methylated cytosines, open circles represent unmethylated cytosines, absent circles represent positions at which methylation data was not obtained.

    Figure 2 Paternal allele-specific methylation of the IG-DMR is inherited from sperm. Bisulfite mutagenesis and sequencing of DNA from B6 × CAST and CAST × B6 F1 hybrid liver and B6 × CAST F1 hybrid spermatozoa. Each circle represents one of 32 potentially methylated CpG dinucleotides, the first one located at position 110,766,345 (NC_000078.5). Each row of circles represents an individual strand sequenced. Filled circles represent methylated cytosines, open circles represent unmethylated cytosines, absent circles represent positions at which methylation data was not obtained.

    Published online:

    Figure 3 Methylation of the paternal IG-DMR is maintained during pre- and post-implantation development. Bisulfite mutagenesis and sequencing of DNA from B6 × CAST F1 hybrid embryos. Details as described in Figure 2.

    Figure 3 Methylation of the paternal IG-DMR is maintained during pre- and post-implantation development. Bisulfite mutagenesis and sequencing of DNA from B6 × CAST F1 hybrid embryos. Details as described in Figure 2.

    Published online:

    Figure 4 Paternal allele-specific methylation of the Gtl2-DMR is not inherited from sperm. Each circle represents one of 29 potentially methylated CpG dinucleotides, the first one located at position 110,779,349 (NC_000078.5). Details as described in Figure 2.

    Figure 4 Paternal allele-specific methylation of the Gtl2-DMR is not inherited from sperm. Each circle represents one of 29 potentially methylated CpG dinucleotides, the first one located at position 110,779,349 (NC_000078.5). Details as described in Figure 2.


    These authors contributed equally: Qiyang Li, Zhongju Wang, Lu Zong, Linyan Ye

    Affiliations

    Department of Medical Genetics, School of Basic Medical Sciences, and Guangdong Technology and Engineering Research Center for Molecular Diagnostics of Human Genetic Diseases, Southern Medical University, Guangzhou, Guangdong, China

    Qiyang Li, Zhongju Wang, Lu Zong, Linyan Ye, Junping Ye, Haiyan Ou, Bo Guo, Wenquan Liang, Jian Zhang, Yong Long, Yu Hou, Lin Zhou, Shufen Li & Cunyou Zhao

    Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, and Guangdong Province Key Laboratory of Psychiatric Disorders, Southern Medical University, Guangzhou, Guangdong, China

    Qiyang Li, Zhongju Wang, Linyan Ye, Junping Ye, Bo Guo, Wenquan Liang, Shufen Li & Cunyou Zhao

    Reproductive and Genetic Hospital, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China

    The Third People’s Hospital of Zhongshan, Zhongshan, Guangdong, China

    Department of Psychiatry, the Affiliated Brain Hospital of Guangzhou Medical University (Guangzhou Huiai Hospital), Guangzhou, Guangdong, China

    Qiong Yang, Fengchun Wu & Xingbing Huang

    Guangdong General Hospital, Guangdong Academy of Medical Science and Guangdong Mental Health Center, Guangzhou, China



Comments:

  1. Abdul-Hafiz

    I think, that you are mistaken. I can defend the position. Write to me in PM.

  2. Daigami

    I apologize for interrupting you, but, in my opinion, there is another way to resolve the issue.

  3. Ararisar

    Understandably, thank you for an explanation.

  4. Ambrosius

    With talent ...

  5. Tolmaran

    You are not right. I can prove it.Write to me in PM, we will communicate.

  6. Otto

    I think, you will come to the correct decision. Do not despair.

  7. Dhu

    This is a funny phrase.



Write a message