We are searching data for your request:
Upon completion, a link will appear to access the found materials.
In detail, what causes mutations in regulatory genes?
nothing causes mutations in any specific genes. mutations actually occur as the result of random processes and can mutate any given point in the genome. mutations may be found more often in some genes or regions of the genome more often because they have a positive selective force that gives them more staying power in the gene pool over time.
Its better to think of it this way: mutations happen everywhere, but they stay in the gene pool more often if they do something useful. Genes are the active portions of the genome so mutations in genes are more likely to stay around.
18.5: Mutation and Evolution
- Contributed by John W. Kimball
- Professor (retired) at Tufts University & Harvard
Mutations are the raw materials of evolution. Evolution absolutely depends on mutations because this is the only way that new alleles and new regulatory regions are created. However, this seems paradoxical because most mutations that we observe are harmful (e.g., many missense mutations) or, at best, neutral, For example, "silent" mutations encoding the same amino acid. Also, many of the mutations in the vast amounts of DNA that lie between genes. Morevoer, most mutations in genes affect a single protein product (or a small set of related proteins produced by alternative splicing of a single gene transcript) while much evolutionary change involves myriad structural and functional changes in the phenotype.
So how can the small changes in genes caused by mutations, especially single-base substitutions ("point mutations"), lead to the large changes that distinguish one species from another? These questions have, as yet, only tentative answers.
Mutation Detection in the LDLR Gene | Genetics | Biology
In this article we will discuss about the mutation analysis of LDLR mutations in patients with familial hypercholesterolaemia.
Introduction to Mutation Detection in the LDLR Gene:
In some families, the risk of early coronary heart disease is considerably raised by the inheritance of a specific mutation, in one of two genes (apolipoprotein B, ApoB [OMIM #107730], or low-density lipoprotein receptor, LDLR [OMIM #143890]), which gives rise to Familial Hypercholesterolaemia (FH).
FH is characterized by an autosomal dominant inheritance, severe hypercholesterolemia, premature onset of atherogenesis and cholesterol deposits in the skin, tendons (tendon xanthoma) or in and around the eyes (corneal arcus and xanthelasma). In the majority of FH patients, the disorder is caused by a mutation in the LDLR gene that destroys or significantly impairs its proper function.
FH is among the most common metabolic genetic disorders, the prevalence being estimated at one in 400 to 500 people for mutations in LDLR, and one in 1000 for ApoB mutations, more frequent than either cystic fibrosis or sickle cell anaemia. Recent reports from the UK and the Netherlands suggest that the vast majority of these individuals remain unidentified and untreated.
In both these countries there is extensive evidence that case finding combined with family screening and mutation detection is a very effective method for increasing the percentage of FH individuals getting effective treatment.
With the exception of a few populations where founder mutations have been reported (French Canadians, Finns, Ashkenazi Jews of Lithuanian descent, Christian Lebanese, Dutch and Afrikaners), a relatively large number of diverse mutations cause FH in most populations, with different mutations prevailing in different populations.
Over 700 LDLR gene mutations have been reported world-wide and new mutations are reported regularly ( www (dot)ucl(dot)ac(dot)uk/fh/), as opposed to only three mutations that have been described in the apoB gene that cause FH.
Since the most frequent known LDLR mutations are present only in a small proportion of patients, an initial strategy of testing for specific mutations (such as is routine for Cystic Fibrosis where a single muta­tion accounts for more than 70% of mutations) is precluded as a first step in those countries with heterogeneous populations.
For some heterozygous FH individuals, a clear diagnosis can be made on the basis of grossly increased cho­lesterol and associated clinical features. However, one of the problems of using hypercholesterolemia as a primary criterion for diagnosis of FH is that plasma lipid levels in heterozygous FH patients overlap with those in the general pop­ulation.
The use of DNA tests to confirm the diagnosis of FH in family mem­bers of identified LDLR mutation carriers suggests that between 15-20% of adult relatives and 5-10% of children may be misdiagnosed by cholesterol testing alone.
The LDL receptor is encoded by a large gene comprised of 18 exons span­ning approximately 45 kilobases. Its size, along with the large numbers of mutations found throughout the coding and control regions, means that mutations are not easily detected by simple diagnostic assays and sequencing the patients’ entire LDLR genes on a clinical scale is prohibitive due to cost.
By far the majority of mutations described in this gene are single base substitutions or deletions and insertions of only a few base pairs (approximately 90%), with the remainder deletions of multiple introns and exons. In addition, 35 polymorphisms and a number of silent mutations have been described through­out the LDLR gene.
In particular, the LDLR polymorphisms complicate the screening process as although the allelic frequencies for these polymorphisms within various populations range from 0.5% to 96%, the majority occurs with allelic frequencies from 30-70 %.
Therefore, it is sensible to use a pre-screening method to identify specific small regions of variation for further analysis by sequencing, as has been the case in BRCA1 and BRCA2 gene mutation screen­ing for familial breast cancer.
Methods, such as single strand conformational polymorphism (SSCP), previously used to screen for LDLR mutations, lack sensitivity and are labor intensive. The optimal length of DNA for SSCP analysis appears to be from 150 to 200 nucleotides with mutation detection sensitivity for fragments this size ranging from 70-90%.
The sensitivity of this technique decreases with increasing size of the fragment and it is often necessary to run samples under a number of different experimental conditions, including electrophoresis temperature, in order to detect variants reliably.
DHPLC has been reported to have numerous advantages over SSCP analysis is rapid, inexpensive and semi-automated. Amplicons from 200 to 500 nucleotides are within the ideal range for heteroduplex detection, with sensitivity and specificity reported to range from 96-100 %. In addition, in most cases elution profiles are distinct for any given sequence variant allowing differentiation of polymorphisms from pathogenic mutations.
Previously we have reported a comparison of SSCP with DHPLC, assessing their effectiveness as prescreening methods for LDLR mutation detection in New Zealand patients with FH in a research setting. Compared with DHPLC, we were able to detect only 64% of mutations by SSCP.
As a result, we have imple­mented diagnostic testing through LDLR mutation screening by DHPLC. We have developed the first LDLR gene diagnostic assay that integrates DHPLC pre­screening with automated PCR setup and DNA sequencing of variants.
LDLR Mutation Detection in a Diagnostic Setting:
A number of factors must be taken into account when implementing a new diagnostic assay to enable timely delivery of an effective service. These include: turnaround time for reporting, automation for greater sample numbers and reduced risk of handling errors, optimization to a single set of PCR param­eters for reduced handling conditions and ease of automated set-up and, per­haps of prime concern, specificity and sensitivity to avoid reporting of false positives and false negatives.
Analysis of the LDLR gene requires amplification of each patient sample in a number of fragments or amplicons to encompass the coding and control regions. Previously, the LDLR gene has been amplified under multiple PCR con­ditions in 21 fragments for analysis by SSCP, ranging in size from 127 to 355 base pairs.
As it was essential that the LDLR diagnostic assay developed met the turnaround time and automation requirements detailed above, the screening process was designed to allow for analysis of patient samples in batch sizes consistent with sample numbers obtained. Therefore, seven patient sam­ples plus controls for all 21 amplicons are screened simultaneously per batch in two microtiter plates (168 wells in total) in a largely automated process.
Samples were obtained from apparently unrelated individuals with clinical­ly probable FH, attending lipid disorders clinics at Christchurch or Dunedin Hospitals. These patients had plasma cholesterol of >8.0 mmol/L and family histories of hypercholesterolaemia and/or classical clinical stigmata of FH. DNA was extracted from patients’ whole blood specimens according to an established method.
The known mutations in the apoB gene (R3500Q, R3531C and R3500W) were excluded by PCR and restriction analysis. After PCR set-up in two microtiter plates by a Tecan robotic workstation (Tecan AG, Switzerland), all exons were amplified under one set of conditions using Roche Taq Polymerase (Germany).
Prior to analysis, heteroduplices were formed by heating the PCR products at 95°C for five minutes and slowly cooling to room temperature over one hour. Samples were analyzed overnight by DHPLC using a WAVE Nucleic Acid Fragment Analysis System including a C18 reversed phase column based on non-porous poly(styrene-divinylbenzene) particles (DNASep Cartridge) from Transgenomic (Omaha, NE, USA). DNA was eluted from the column by an acetonitrile gradient in 0.1 mol/L triethylammonium acetate buffer (TEAA), pH 7, at a constant flow rate of 0.9 ml/min.
The melting profile for each DNA frag­ment, the respective elution profiles and column temperatures were determined using the WAVEMAKER™ software from Transgenomic, with further optimization performed against a positive control.
Every sequence alteration was confirmed in two independently amplified PCR products by direct cycle sequencing of dou­ble-stranded DNA with 33P-labeled terminators and ThermoSequenase according to the manufacturer’s instructions (Amersham Pharmacia Biotech, NZ).
Whilst use of a single set of conditions has greatly improved speed of sam­ple preparation for analysis by DHPLC, there have been some disadvantages. Acceptable products are produced but these are not necessarily optimal in terms of yield or quality of product for DHPLC. Some non-specific pre-peaks are observed during DHPLC for certain amplicons, though these are consistent between samples and runs, allowing the operator to correct for their presence.
Although PCR yield varies between amplicons, variation is also seen to a simi­lar degree between patient samples (at a factor of between one and three) and can be adjusted for by small changes in scaling of traces.
Optimization of DHPLC Screening:
The LDLR gene is similar to many other genes in terms of approach to muta­tion screening by DHPLC. Factors for consideration include: primer selection, size of amplicons, choice of positive controls, presence of multiple melting domains within fragments and optimum temperature for analysis.
However, the LDLR gene is larger than most genes screened routinely in a diagnostic envi­ronment and has a larger number of mutations, with more than 700 described throughout and more being regularly reported.
These mutations are also spread throughout all of the exons and intron/exon boundaries, which are both rela­tively polymorphic. Therefore, there is a distinct chance of DHPLC profiles being similar for different mutations. The influence of these factors on the develop­ment of diagnostic LDLR screening is discussed below.
Selection of Primers:
A number of primer sets have been described for LDLR mutation analysis by both SSCP and by denaturing gradient gel electrophoresis (DGGE) analysis, which is considered to be more sensitive but also more technically challenging and labor intensive than SSCP (www(dot)ucl(dot)ac(dot)uk/fh/primers(dot)html).
As we were interested in comparing SSCP and DHPLC, a primer set was selected for both applications based on SSCP design requirements. Whilst DHPLC melt­ing profiles for some of the amplicons obtained by use of these primers were not ideal, they provided an effective starting point for the screening process.
In some cases primers annealed to sequences within or immediately adjacent to intron/exon boundaries and, as approximately 46 splice site mutations have been described for the LDLR, it was necessary to relocate some of these primers to encompass these regions. For example, the Exon 2 product, ampli­fied using the primers designed by Hobbs and colleagues, encompasses a stretch of six nucleotides containing 17 reported splice mutations clustered in the 5′ donor site.
However, the upstream primer anneals to these nucleotides, ren­dering detection of any of these mutations unlikely. Therefore a new amplicon was designed to allow analysis of this region, with a new primer annealing approximately 50 nucleotides from the intron/exon boundary.
Generation of Positive Controls:
Although various software packages can predict the ‘optimal’ temperature for screening by DHPLC the use of positive controls, containing a single base substitution or small deletion, is essential to verify and refine the temperature used for each amplicon. These controls also provide some security that the dis­criminating conditions are maintained between DHPLC runs.
Non-identification of abnormal traces for the positive control would indicate variation in buffer conditions, column condition or oven calibration, and is essential in controlling for false negatives during diagnostic screening. Calibration of ovens may also vary between instruments, resulting in variation when screening is performed (at slightly different temperatures) on multiple instruments.
This also means that temperatures predicted by the WAVEMAKER software, or as reported in the literature, provide a good guide but may not be directly translatable from instru­ment to instrument. Therefore, temperature optimization and verification by way of positive controls is important and will resolve any issues arising from these differences.
However, obtaining positive controls for each amplicon is a difficult step. Here they were obtained from Professor Steve Humphries (University College London, UK) and Dr. Ros Thiart (University of Stellenbosch), except in the cases of the promoter region and exon 18 (Table 6-1). Positive controls were syn­thesized for the latter two fragments as described below.
A number of methods have been assessed within our laboratory previously for generating positive controls, including: screening of amplicons for polymor­phic restriction enzyme sites, site directed mutagenesis (SDM) using the megaprimer protocol and Ligation During Amplification (LDA) SDM. As no polymorphic restriction sites were identified in either the promoter or exon 18 amplicons, it was necessary to generate positive controls for these amplicons.
Introduction of a specific mutation through SDM by megaprimer protocol involves a multistep PCR and restriction digest protocol, whereas LDA SDM involves a single extension and ligation PCR – allowing more rapid synthesis. Due to rapidity and ease of synthesis, the LDA SDM method was selected as the most suitable method for generation of positive controls and was used to elim­inate a unique restriction site within each amplicon.
In brief, a thermostable ligase and a phosphorylated mutagenic primer, containing a single base mismatch, were included in a standard PCR reaction mix and cycling conditions modified to include a seven-minute extension at 65°C. The mutagenic primer was designed to eliminate a unique restriction site, allowing digestion of wild type product post-amplification.
Digested reactions were then diluted and re-amplified to obtain 100% mutant product. The mutant product was sequenced to confirm the mutation and was mixed in an equal ratio with wild type PCR product in order to obtain heteroduplices for DHPLC analy­sis. The synthesized mutant was analyzed by DHPLC against a wild type (nor­mal) control (Figure 6-1). This method allows the generation of multiple posi­tive controls within one to two days.
Multiple Melting Domains:
The LDLR gene is relatively GC rich, with several exons having long repeti­tive stretches of GCs (Table 6-1). Due to sequence constraints, it may not always be possible to design primers to amplify regions that contain similar GC content and therefore have similar melting profiles across the amplicon for DHPLC. In these cases, with referral to the WAVEMAKER software predictions for melting domains, it is desirable to screen samples at two temperatures to resolve sequence variation in different domains within an amplicon.
Where possible, multiple positive controls were used in order to optimize partial denaturing temperatures for DHPLC. Two Exon 1 positive controls, local­ized to different regions of the amplicon, were used during DHPLC optimization (Table 6- 1) and further patient screening was performed at both 65°C and 66°C.
This example highlights the importance of positive control selection as if only the C>T variant had been used then the optimal screening temperature select­ed would have been 66°C. Therefore, variants localized in the lower melting domain could be potentially missed if those regions are denatured at 65°C (Figure 6-2).
Similarity of DHPLC Profiles:
As discussed previously, a number of factors complicate screening of the LDLR gene for genetic variation that might cause FH. Of considerable impor­tance are the large number of mutations and polymorphisms previously described in the gene and the increasing number of novel mutations being repotted, meaning that any LDLR screening process must be able to potentially target any mutation present rather than solely target known variants. As DHPLC has the capacity to detect all these small variations at the DNA level (substitu­tions, insertions and deletions) without discrimination it is ideally suited to LDLR mutation analysis.
However, unlike other genes that might have fewer mutations more sparsely spaced through coding and control regions, it is unlikely that elution profiles will be distinct for each LDLR sequence variant making the use of these profiles less effective in discriminating deleterious mutations from benign polymorphisms.
This lack of ability to discriminate variation on the basis of DHPLC profiles alone is not so important in a clinical context as all mutations must be con­firmed by a second method, whether the prescreening method used is DHPLC, SSCP or dideoxy fingerprinting (DDF).
However, the use of DHPLC profiles would still remain effective in cases where a mutation has been previously identified in a relative and screening is focused on confirming the presence or absence of that solitary mutation.
Alternatively, in highly polymorphic regions (for exam­ple, Exons 12, 15 and 18), it would be better to sequence the amplicon direct­ly rather than prescreen for variation initially. However, where sequencing capacity is restricted (i.e. in low throughput sequencers or when sequencing is performed manually), this may not be an effective use of resources and DHPLC would remain the best option.
Confirmation by sequencing of variants identified in Exon 10a fragments of New Zealand FH patients has led to the identification of two different mutations with virtually identical profiles within the same amplicon (Figure 6-3). The most common LDLR mutation identified in the New Zealand population is “FH Northern Irish”, or D461N, and this variant is indistinguishable from a splice site mutation, 1359-1G>A, which has been identified in multiple patients in Intron 9.
On a worldwide basis it has been estimated that more than 10 million peo­ple have FH, of whom as many as 200,000 die of premature coronary heart dis­ease each year. In contrast to most genetic disorders, an effective treatment is available for FH. By using lipid-lowering drugs, a reduction in LDL choles­terol of 50-60% can be achieved, which if started early may be expected to achieve a normal life expectancy.
Recent reports from the UK and the Netherlands suggest that the vast majority people with FH remain unidentified and untreated. In both of these countries there is extensive evidence that case finding combined with family screening and mutation detection is a very effective method for increasing the percentage of FH individuals getting effec­tive treatment.
Therefore an effective means of screening for LDLR mutations, that recognizes that there is a great diversity of LDLR mutations and that tar­gets both known and novel mutations, is needed, especially in heterogeneous populations.
We have shown that DHPLC is an effective tool for the detection of muta­tions within the LDLR. Twenty-nine different LDLR mutations have been iden­tified in 55 New Zealand FH patients by DHPLC. These mutations are localized throughout the gene, occurring in the receptor’s various functional domains (Table 6-2).
The majority of mutations were identified within the EGF precur­sor domain (67.4%), followed by the ligand binding domain (23.6%) and the balance are localized in the signal peptide (1.8%), the sugar-modified domain (3.6%), the membrane-spanning domain (1.8%) and the cytoplasmic domain (1.8%).
In the past, screening programs for LDR mutation analysis have focused on “mutation hot spots” such as the ligand-binding domain, specifically Exons 3 and 4. However, more recently it has been recognized that this tar­geting of “hot spots” has resulted in sampling bias rather than being a true reflection of the localization of mutations. In fact, more novel muta­tions are being identified within the EGF precursor domain than in any other region, now that more comprehensive screening of the LDLR gene is being per­formed.
This is not unexpected, as mutations in this functional domain would affect lipoprotein dissociation from the receptor in endosomes during receptor recycling and receptor positioning at the cell surface. This effect of sampling bias highlights the necessity for a comprehensive screening approach that encompasses all coding and control regions, as well as reported and novel mutations.
Mutations were detected in 20.9% of the patients analyzed. The majority of these patients were selected according to total cholesterol levels, >8.0 mmol/L (300mg/dl), and a suggestive family history of heart disease. Less than one third also displayed clinical stigmata. These patients could be categorized into “probable FH” and “definite FH” according to the absence or presence of stigmata, respectively.
Mutations have been identified in 45% of patients with “definite FH” compared to 8% of those with “probable FH.” These data are con­sistent with those reported for other populations, which range from 18- 27%, reinforcing that hypercholesterolaemia is not sufficient as the primary criterion for making a clinical diagnosis of FH.
The LDLR gene is an ideal candidate for mutation analysis by DHPLC due to its relatively large size, the range of variation identified throughout the gene and also the nature of FH expression. FH is an autosomal dominantly inherit­ed condition with variable phenotypic presentation. Whilst FH heterozygotes are prone to premature coronary artery disease, homozogous individuals are more severely affected and may die before reaching maturity.
However, homozy­gous affected individuals are rare, having a prevalence of one in one million, compared with one in 400 to 500 for heterozygotes. Therefore, FH patients attending lipid clinics will generally be heterozygotes. Thus, unlike those disorders where some samples may be homozygous and must be combined with wild type DNA, mixing of LDLR amplicons is not required for FH.
DHPLC allows a more rapid turnaround time for LDLR mutation screening than SSCP, as amplification of gene fragments and pre-screening by DHPLC is largely automated, less laborious, cheaper and 30% more sensitive than SSCP. Taken together with our finding that it provides sensitive mutation detection of LDLR variants, DHPLC is ideally suited to LDLR mutation analysis in both clini­cal and research settings.
Deciphering the Impact of Non-coding Mutations in the Human Genome
The U.S. National Institutes of Health (NIH) and other funding agencies around the world have invested vast resources to ultimately sequence the complete genomes of millions of individuals with various common and rare diseases in order to identify underlying genetic causes. However, this work thus far has identified the disease-causing genetic changes in only a small number of patients. Because protein-coding genes comprise less than 3% of the human genome, geneticists increasingly suspect that mutations to non-coding DNA may be the culprit in a lot of these cases.
The effects of mutations in protein-coding genes are predictable because we understand the genetic code, but it’s more difficult to assess the functional consequences of mutations in non-coding sequence. Scientists know that non-coding sequence contains regulatory switches, called enhancers, that control when and where genes are turned on or off in an organism. A mutation within one of these enhancer sequences may cause the activity of the gene it controls to be too high, too low, or to be misdirected to a cell type or tissue where it may have detrimental effects on the organism.
While it is understood, in principle, that a sequence change in an enhancer can cause disease, identifying such a mutation remains a major hurdle. Sequencing the genomes of many individuals has revealed that each person’s genome contains dozens to hundreds of new mutations compared to their parents’ genomes. Yet, the majority of these changes don’t disrupt normal gene regulation. The challenge is to identify the small subset of mutations that change the sequence of specific enhancers in a deleterious way.
New work from the Mammalian Functional Genomics Laboratory in Biosciences’ Environmental Genomics and Systems Biology (EGSB) Division addresses this critical challenge. The group has developed a higher-throughput transgenic mouse assay to evaluate the disease-causing potential of human variants in enhancers that turn on gene expression during development. The new approach leverages the CRISPR-Cas9 genome editing technology to create transgenic mice that carry an enhancer-reporter construct at a specific “safe harbor” location in the mouse genome. A color-generating chemical reaction (the “reporter”) creates a blue stain in all cells in which the enhancer is active.
From left: Diane Dickel, Evgeny Kvon, Len Pennacchio, and Axel Visel of the Mammalian Functional Genomics Lab.
“Before, people would inject that enhancer-reporter into the mouse zygotes and it would randomly integrate in the genome,” said Evgeny Kvon, a project scientist in the EGSB Division and first author on the paper published in Cell. “And because of that random integration, the results were less reproducible because, depending on the integration site, there were so-called position effects that would affect the activity of that reporter.
“The major conceptual advance in our method is that because the transgenes are integrated in the same location in the genome there are no position effects, so we need fewer mice to get reproducible results,” he continued. What’s more, the researchers were fortunate to find a location in the mouse genome where the integration frequency is four times higher than with the old method, Kvon said. “So not only do we get more reproducibility but it’s also high efficiency. And that reduces the cost of performing this experiment several fold.”
To demonstrate proof of principle, the researchers used the new method—which they dubbed enSERT (enhancer inSERTion)—to examine nearly a thousand variants of one of the most well-characterized human enhancers that is associated with polydactyly (extra fingers or toes). The enhancer, called ZRS (Zone of polarizing activity Regulatory Sequence), regulates the expression of the sonic hedgehog gene (Shh), which produces a powerful signaling molecule required for the correct patterning of many body elements, including limbs and digits.
The ZRS enhancer is widely conserved across vertebrate species, including humans, mice, fish and even snakes. Mutations in this enhancer are known to cause abnormal patterning of the limbs in humans, as well as other vertebrates. The most common type of malformation caused by these mutations is called “preaxial polydactyly,” i.e. the formation of extra digits near the thumb or big toe. For example, mutations in the ZRS enhancer have been observed in polydactyl Hemingway cats, so-named for the American author and ailurophile Ernest Hemingway, who kept a large colony of the mitten-pawed felines at his Key West, Florida, home.
Polydactyl cat at Hemingway House in Key West, Florida.
Using the new assay, the researchers examined all human ZRS enhancer mutations that had previously been reported by other groups as potential causes of polydactyly, whether the proposed malformation-causing mechanism could be confirmed experimentally or not. They also evaluated additional sequence changes that clinician-scientists collaborating on this study had newly identified in patients with polydactyly. Overall, in about 70 percent of cases the researchers were able to confirm that the enhancer activity was changed by the mutations. Perhaps surprisingly, they found no evidence for changes in activity for the remaining 30 percent of cases, suggesting that a subset of mutations that were proposed to cause polydactyly in previous studies are not the true cause of the condition in these patients.
“These results suggest that extreme care should be taken when interpreting human mutations without experimental testing, because it is possible that one could be misled to think that they are causative when they could potentially just be one of the many rare benign mutations in the human population,” cautioned Diane Dickel who, along with Mammalian Functional Genomics Lab co-PIs Len Pennacchio and Axel Visel, was a corresponding author on the Cell paper.
The work supports the Berkeley Lab Biosciences Area’s health strategy, which aims to increase understanding of human genome function. “We expect that this method will be very powerful for systematically testing enhancer mutations that are being found by the large number of patient sequencing studies,” Dickel said. The group has some pilot projects in the lab that show the utility of the approach to experimentally validate non-coding mutations associated with a variety of conditions that significantly impact human health, such as developmental delay, autism, or heart disease.
Institutional collaborations on the study included the National Center for Biotechnology Information (NCBI) of the National Library of Medicine branch of the National Institutes of Health (NIH), Washington University School of Medicine in St. Louis, and several teaching hospitals (Centre hospitalier universitaire CHU) in France: CHU Lille, CHU Poitiers, and CHU Dijon Bourgogne.
Identifying a genetic mutation behind sporadic Parkinson’s disease
This proposed model shows the correlation between mutation-dependent transcription factor binding, alpha-synuclein expression, and the risk of Parkinson’s disease. In the top instance, the enhancer efficiently binds the transcription factor, thereby reducing the expression of the alpha-synuclein gene (SNCA). Below, changing one nucleotide from an A to a G prevents the transcription factor from binding well, which increases SNCA expression and the risk of developing Parkinson’s disease.
CAMBRIDGE, Mass. – Using a novel method, Whitehead Institute researchers have determined how a non-coding mutation identified in genome-wide association studies (GWAS) can contribute to sporadic Parkinson’s disease (PD). The approach could be used to analyze GWAS results for other sporadic diseases with genetic causes, such as multiple sclerosis, diabetes, and cancer.
“This is really the first time we’ve gone from risk variants highlighted by GWAS to a mechanistic and molecular understanding—right down to the nucleotide—of how a mutation can contribute to the risk of developing disease,” says Whitehead Founding Member Rudolf Jaenisch, who is also a professor of biology at MIT.
About 90% of PD cases are sporadic that is, caused by complex interactions between environmental and common genetic risk factors. Because scientists have had difficulty analyzing these interactions, most research has focused on rare familial forms of the disease. GWAS, which identify common mutations that increase the risk to develop a particular condition, have been used to study sporadic PD, and other complex conditions, with limited success.
GWAS are akin to genomic treasure maps bearing hundreds or thousands of X’s marking the general locations of mutations that could be risk factors for a given condition. However, GWAS do not reveal the specific locations of potentially pathogenic mutations, nor do they indicate how an X on a genomic map contributes, if at all, to a disease. For example, in sporadic PD, multiple GWAS point to the alpha-synuclein gene (SNCA) as one of the strongest risk loci in patients’ genomes, yet GWAS contain little information regarding the mechanism of how this gene is dysregulated in sporadic PD patients.
To see if distant gene regulatory elements on the same chromosome carrying SNCA could affect cellular levels of alpha-synuclein, a team of researchers led by Frank Soldner, a senior researcher in the Jaenisch lab, investigated two GWAS-flagged risk variants located in a putative SNCA enhancer. Their results are described online this week in the journal Nature.
The team used clustered regularly-interspaced short palindromic repeats (CRISPR)/Cas9 to edit the mutations into isogenic human pluripotent stem cells. By altering the genetic variant on only one chromosome, the other chromosome remains unchanged and acts as an internal control. This method allows the scientists to measure very subtle effects with very high confidence, while eliminating the effect of any genetic or epigenetic modifications and cell culture related variations that could occur during the experiment.
“Our method addresses an essential shortcoming of GWAS—using the correlations produced by GWAS, you cannot distinguish the effect between two variants that are very close together in the genome,” says Soldner, who is the lead author of the Nature paper. “Such physical proximity means that they will always co-segregate during inheritance, which is why we had to do what we did—modify and analyze each variant independently while keeping the rest of the genome completely constant.”
After differentiating the cells into neurons, the scientists noted the changes in SNCA expression. Although one of the mutations has no effect, the other, which switches one nucleotide from an A to a G, slightly but significantly boosts alpha-synuclein production. When compared to the enhanced alpha-synuclein production in the familial form of the disease, the modest effect created by the A to G mutation would be sufficient over a lifetime to increase the risk of PD, according to Soldner.
To see how the mutation affects alpha-synuclein production, the researchers identified two transcription factors that bind to the enhancer that carries this mutation. When the enhancer is not mutated, the transcription factors bind to it, which suppresses SNCA. If the enhancer has the G mutation, the transcription factors are unable to bind to the enhancer, and SNCA is activated.
Most genetic conditions are sporadic and caused by a combination of mutations. Jaenisch says that the method that identified the single point mutation in SNCA’s enhancer could be used to pinpoint additional pathogenic genes for sporadic PD and sift through the GWAS hits for other diseases, including Alzheimer’s disease, cancer, diabetes, and multiple sclerosis.
This work was supported by the National Institutes of Health (NIH grants 1R01NS088538-01 and 2R01MH104610-15) and by Qatar National Research Fund (grant NPRP 5-531-1-094).
Rudolf Jaenisch's primary affiliation is with Whitehead Institute for Biomedical Research, where his laboratory is located and all his research is conducted. He is also a professor of biology at Massachusetts Institute of Technology.
Soldner, F., Stelzer, Y., Shivalila, C. S., Abraham, B. J., Latourelle, J. C., Barrasa, M. I., . & Jaenisch, R. (2016). Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. Nature, 533(7601), 95-99.
Any alteration capable of being replicated in the genetic material of an organism. When the alteration is in the nucleotide sequence of a single gene, it is referred to as gene mutation when it involves the structures or number of the chromosomes, it is referred to as chromosome mutation, or rearrangement. Mutations may be recognizable by their effects on the phenotype of the organism (mutant).
Two classes of gene mutations are recognized: point mutations and intragenic deletions. Two different types of point mutation have been described. In the first of these, one nucleic acid base is substituted for another. The second type of change results from the insertion of a base into, or its deletion from, the polynucleotide sequence. These mutations are all called sign mutations or frame-shift mutations because of their effect on the translation of the information of the gene. See Nucleic acid
More extensive deletions can occur within the gene which are sometimes difficult to distinguish from mutants which involve only one or two bases. In the most extreme case, all the informational material of the gene is lost.
A single-base alteration, whether a transition or a transversion, affects only the codon or triplet in which it occurs. Because of code redundancy, the altered triplet may still insert the same amino acid as before into the polypeptide chain, which in many cases is the product specified by the gene. Such DNA changes pass undetected. However, many base substitutions do lead to the insertion of a different amino acid, and the effect of this on the function of the gene product depends upon the amino acid and its importance in controlling the folding and shape of the enzyme molecule. Some substitutions have little or no effect, while others destroy the function of the molecule completely.
Single-base substitutions may sometimes lead not to a triplet which codes for a different amino acid but to the creation of a chain termination signal. Premature termination of translation at this point will lead to an incomplete and generally inactive polypeptide.
Sign mutations (adding or subtracting one or two bases to the nucleic acid base sequence of the gene) have a uniformly drastic effect on gene function. Because the bases of each triplet encode the information for each amino acid in the polypeptide product, and because they are read in sequence from one end of the gene to the other without any punctuation between triplets, insertion of an extra base or two bases will lead to translation out of register of the whole sequence distal to the insertion or deletion point. The polypeptide formed is at best drastically modified and usually fails to function at all. This sometimes is hard to distinguish from the effects of intragenic deletions. However, whereas extensive intragenic deletions cannot revert, the deletion of a single base can be compensated for by the insertion of another base at, or near, the site of the original change. See Gene, Genetic code
Some chromosomal changes involve alterations in the quantity of genetic material in the cell nuclei, while others simply lead to the rearrangement of chromosomal material without altering its total amount. See Chromosome
Origins of mutations
Mutations can be induced by various physical and chemical agents or can occur spontaneously without any artificial treatment with known mutagenic agents.
Until the discovery of x-rays as mutagens, all the mutants studied were spontaneous in origin that is, they were obtained without the deliberate application of any mutagen. Spontaneous mutations occur unpredictably, and among the possible factors responsible for them are tautomeric changes occurring in the DNA bases which alter their pairing characteristics, ionizing radiation from various natural sources, naturally occurring chemical mutagens, and errors in the action of the DNA-polymerizing and correcting enzymes.
Spontaneous chromosomal aberrations are also found infrequently. One way in which deficiencies and duplications may be generated is by way of the breakage-fusion-bridge cycle. During a cell division one divided chromosome suffers a break near its tip, and the sticky ends of the daughter chromatids fuse. When the centromere divides and the halves begin to move to opposite poles, a chromosome bridge is formed, and breakage may occur again along this strand. Since new broken ends are produced, this sequence of events can be repeated. Unequal crossing over is sometimes cited as a source of duplications and deficiencies, but it is probably less important than often suggested.
In the absence of mutagenic treatment, mutations are very rare. In 1927 H. J. Muller discovered that x-rays significantly increased the frequency of mutation in Drosophila. Subsequently, other forms of ionizing radiation, for example, gamma rays, beta particles, fast and thermal neutrons, and alpha particles, were also found to be effective. Ultraviolet light is also an effective mutagen. The wavelength most employed experimentally is 253.7 nm, which corresponds to the peak of absorption of nucleic acids.
Some of the chemicals which have been found to be effective as mutagens are the alkylating agents which attack guanine principally although not exclusively. The N7 portion appears to be a major target in the guanine molecule, although the O 6 alkylation product is probably more important mutagenically. Base analogs are incorporated into DNA in place of normal bases and produce mutations probably because there is a higher chance that they will mispair at replication. Nitrous acid, on the other hand, alters DNA bases in place. Adenine becomes hypoxanthine and cytosine becomes uracil. In both cases the deaminated base pairs differently from the parent base. A third deamination product, xanthine, produced by the deamination of guanine, appears to be lethal in its effect and not mutagenic. Chemicals which react with DNA to generate mutations produce a range of chemical reaction products not all of which have significance for mutagenesis.
Significance of mutations
Mutations are the source of genetic variability, upon which natural selection has worked to produce organisms adapted to their present environments. It is likely, therefore, that most new mutations will now be disadvantageous, reducing the degree of adaptation. Harmful mutations will be eliminated after being made homozygous or because the heterozygous effects reduce the fitness of carriers. This may take some generations, depending on the severity of their effects. Chromosome alterations may also have great significance in evolutionary advance. Duplications are, for example, believed to permit the accumulation of new mutational changes, some of which may prove useful at a later stage in an altered environment.
Rarely, mutations may occur which are beneficial: Drug yields may be enhanced in microorganisms the characteristics of cereals can be improved. However, for the few mutations which are beneficial, many deleterious mutations must be discarded. Evidence suggests that the metabolic conditions in the treated cell and the specific activities of repair enzymes may sometimes promote the expression of some types of mutation rather than others. See Deoxyribonucleic acid (DNA)
New study explains why genetic mutations cause disease in some people but not in others
The hypothesis of the study is illustrated with an example in which an individual is heterozygous for both a regulatory variant and a pathogenic coding variant. The two possible haplotype configurations would result in either decreased penetrance of the coding variant, if it was on the lower-expressed haplotype, or increased penetrance of the coding variant, if it was on the higher-expressed haplotype. Credit: New York Genome Center
Researchers at the New York Genome Center (NYGC) and Columbia University have uncovered a molecular mechanism behind one of biology's long-standing mysteries: why individuals carrying identical gene mutations for a disease end up having varying severity or symptoms of the disease. In this widely acknowledged but not well understood phenomenon, called variable penetrance, the severity of the effect of disease-causing variants differs among individuals who carry them.
Reporting in the August 20 issue of Nature Genetics, the researchers provide evidence for modified penetrance, in which genetic variants that regulate gene activity modify the disease risk caused by protein-coding gene variants. The study links modified penetrance to specific diseases at the genome-wide level, which has exciting implications for future prediction of the severity of serious diseases such as cancer and autism spectrum disorder.
NYGC Core Faculty Member and Columbia University Department of Systems Biology Assistant Professor Dr. Tuuli Lappalainen led the study alongside post-doctoral research fellow Dr. Stephane Castel.
"Our findings suggest that a person's disease risk is potentially determined by a combination of their regulatory and coding variants, and not just one or the other," Dr. Lappalainen said. "Most previous studies have focused on either looking for coding variants or regulatory variants that affect disease in these individuals or potentially looking at common variants that could affect disease. We have merged these two fields into one clear hypothesis that uses data from both of them, which was fairly unheard of before."
Variable penetrance has long posed a challenge for predicting the severity of a disease, even for diseases with a strong genetic association. Dr. Lappalainen and colleagues developed the modified penetrance hypothesis from their interest in the idea that gene variants that regulate the activation of genes could also play a role in modifying the penetrance of coding variants for the same gene.
As a first test of the modified penetrance hypothesis, the researchers conducted an analysis of data from the Genotype-Tissue Expression (GTEx) project, a large catalog of genetic variants that affect gene expression in humans, to evaluate the interactions of regulatory and coding variants in a human population without severe genetic diseases. They found an enrichment of combinations of regulatory and coding variants, called haplotypes, that act as protective against disease by decreasing the penetrance of coding variants associated with disease development. This finding was expected in the general population, Dr. Castel explained, as a result of natural selection removing damaging gene variants from the genome over time.
To test their hypothesis in a disease-specific population of patients, the researchers analyzed data from the National Institutes of Health's The Cancer Genome Atlas (TCGA) and the Simons Simplex Collection, a permanent repository of genetic samples from 2,600 families, each of which has one child affected with an autism spectrum disorder, and unaffected parents and siblings. In the cancer patients and individuals with autism, they found an enrichment of haplotypes predicted to increase the penetrance of coding variants associated with cancer and autism spectrum disorder, respectively.
Finally, they designed an experiment using CRISPR/Cas9 genome editing technology to test the modified penetrance hypothesis with a coding variant that is known to be associated with a disease. They chose a coding variant associated with Birt-Hogg-Dubé Syndrome, a rare hereditary disease that increases the risk of certain types of tumors. They edited the SNP into a cell line on different haplotypes with a regulatory variant. The researchers were able to show that the regulatory variant indeed modified the effect of the coding disease-causing variant, consistent with expectations based on the large-scale data collections. This finding provides an important framework for scientists moving forward to experimentally test specific disease SNPs to determine if they could be affected by modified penetrance.
"Now that we have demonstrated a mechanism for modified penetrance, the long-term goal of the research is better prediction of whether an individual is going to have a disease using their genetic data by integrating the regulatory and coding variants," Dr. Lappalainen said.
"In future, studies of the genetic causes of severe diseases should take into account this idea that regulatory variants need to be considered alongside coding variants," Dr. Castel said. "This should eventually lead to a more fine-grained understanding of the risk of coding variants associated with disease."
This section provides recent general overviews that are important for understanding evolutionary genetics as a field. Stern 2010 stresses that the study of evolution of development must integrate a variety of disciplines, and that the field is now tending to reunify population genetics (the study of allele frequencies across space and time) with developmental biology (the study of processes that integrate genotypic and environmental factors into phenotypes). Orr 2005 summarizes the discordance between theoretical predictions and emerging empirical data on the genes and mutations responsible for phenotypic evolution. Carroll, et al. 2004 and Nei 2007 represent one of the first syntheses of such empirical data, while Stern and Orgogozo 2008 Alonso-Blanco, et al. 2009 and O’Bleness, et al. 2012 convey more recent overviews of the field. Rockman 2011 is a necessary paper that condemns rapid generalizations from empirical data due to inescapable experimental biases.
A comprehensive review of the loci of natural and agricultural variation that have been identified in plants.
Carroll, Sean B., Jennifer K. Grenier, and Scott D. Weatherbee. 2004. From DNA to diversity: Molecular genetics and the evolution of animal design. 2d ed. Malden, MA: Blackwell Science.
This edition, along with its first edition (2001), represents a very accessible and beautifully illustrated synthesis of the field of evolutionary developmental biology. Of particular interest is the testable hypothesis that morphological evolution occurs primarily via the regulation of expression of a restricted set of architect genes, the so-called developmental toolkit.
Nei, Masatoshi. 2007. The new mutation theory of phenotypic evolution. Proceedings of the National Academy of Sciences 104.30: 12235–12242.
Against the view held by most evolutionary biologists, Masatoshi Nei argues that the primary driving force of phenotypic evolution is mutation, rather than natural selection.
O’Bleness, M., V. B. Searles, A. Varki, P. Gagneux, and J. M. Sikela. 2012. Evolution of genetic and genomic features unique to the human lineage. Nature Reviews Genetics 13.12: 853–866.
An overview of the emerging genetic data on the changes that have shaped some of the peculiarities of our own species.
Orr, H. Allen. 2005. The genetic theory of adaptation: A brief history. Nature Review Genetics 6.2: 119–127.
An informal description of the literature on theories of adaptation, recommended to understand what past and modern models predict about the number and the effect size of evolutionarily relevant mutations.
Rockman, Matthew V. 2011. The QTN program and the alleles that matter for evolution: All that’s gold does not glitter. Evolution 66.1: 1–17.
Matthew Rockman reminds us with eloquence that only mutations of large-phenotypic effect are accessible to geneticists, and that the lack of information on mutations with infinitesimal effects prevents us from using empirical data to derive quantitative statements about the genetic basis of evolution in general.
Stern, David L. 2010. Evolution, development, and the predictable genome. Greenwood Village, CO: Roberts.
A concise and thoughtful introduction to the genetics of phenotypic evolution and its main concepts, explaining how fields such as evolutionary developmental biology and population genomics illuminate each other.
Stern, David L., and Virginie Orgogozo. 2008. The loci of evolution: How predictable is genetic evolution? Evolution 62.9: 2155–2177.
A meta-analysis of the mutations responsible for phenotypic evolution, which uses the literature for testing various predictions about the relative importance of coding and regulatory changes. Of particular interest is the intriguing possibility that intraspecific and interspecific evolution might involve different kinds of causing mutations.
Users without a subscription are not able to see the full content on this page. Please subscribe or login.
Cell Signaling Events
Cell Biology, Diagnosis, and Treatment of HOCM
HOCM, a prevalent form of Hypertrophic Cardiomyopathy (HCM), is a significant cause of sudden death in younger people, including athletes. It occurs at a rate of 1:500 in the general adult population affecting men and women equally across all races. In the United States, HOCM is responsible for fewer than 100 deaths per year primarily due to a rate of 1:220,000 among athletes. HOCM most often is inherited as an autosomal dominant condition and is the most common genetically transmitted cardiomyopathy. HOCM most commonly results from a mutation in genes coding for sarcomere proteins, which cause structural abnormalities in myofibrils and myocytes that lead to abnormal cardiac force generation and conduction abnormalities.
The cardiac sarcomere is the contractile unit of cardiomyocytes. The proteins involved in sarcomere formation are organized into thick and thin filaments, composed primarily of actin filaments, myosin proteins, and titin. Contraction occurs by sliding and interdigitating of the thick and thin filaments of sarcomeres, which require a highly regulated and precise set of interactions between these two sets of filaments.
Over 800 mutations in nine primary genes encoding sarcomere proteins have been identified as causing Hypertrophic Cardiomyopathy (HCM) in large family cohorts. Two of these causal genes MYH7 (beta myosin heavy chain) and MYBPC3 (myosin-binding protein C) are the two most common being responsible for half of the HCM cases. The other seven major gene defects are in TNNT2 (cardiac troponin T), TNNI3 (cardiac troponin I), TPM1 (alpha tropomyosin), ACTC1 (cardiac alpha actin), MYL2 (Regulatory myosin light chain), MYL3 (Essential myosin light chain), and CSRP3 (cysteine and glycine rich protein 3). There are another 17 causal genes that been observed in smaller families and sporadic cases. While these are the primary defects, they result in proximal defects in response to the changes in sarcomere protein structure and function. An example of a proximal effect is altered calcium sensitivity. In animal models of HCM abnormal intracellular calcium has been demonstrated, with decreased calcium in the sarcoplasmic reticulum and increased calcium in the cytosol. This then leads to important changes in cell signaling events. The increased intracellular calcium associates with calmodulin which in turn increases the calcineurin phosphatase. Calcineurin dephosphorylates the NFAT transcriptional factor that turns on genes that increase cardiac hypertrophy. This has been demonstrated to be true in HCM animal models and a human clinical trial. Diltazem, an L-type calcium channel inhibitor, was first demonstrated to decrease intracellular calcium and inhibit cardiac hypertrophy in a mouse HCM model. More recently it was demonstrated to do the same in a small randomized human trial involving HCM subjects with a mutation in MYBPC3.
Loss of mechanical force due to molecular impairment of sarcomere function to support myofibril contraction is a major cause of muscle thickening. These phenotypic changes result in a mode of compensation by the heart muscle to be able to generate enough mechanical force to maintain adequate tissue perfusion throughout the body.
Clinical presentations associated with these mutations range from mild to severe, depending on the location and type of mutations present, which dictate the effect on the overall function of the sarcomere unit. However, it is difficult to correlate a particular mutation to a clinical outcome due to multifactorial effects on disease onset and clinical presentation. In many cases there are no symptoms and the disease manifests itself only upon heart function tests or upon strenuous physical activity, as seen in this case. Over time the condition leads to cardiac hypertrophy, most often presenting as LVH. Enlarged heart muscle might obstruct heart valve function resulting in heart murmurs and/or abnormal electrical activity, which can cause arrhythmias and subsequent complications such atrial fibrillation and even heart failure.
Treatment options include beta blockers and selective calcium channel blockers along with avoidance of strenuous exercise and heavy lifting. ACE inhibitors and nitrates should be avoided as they decrease afterload and may worsen the left ventricular outflow track obstruction. Septal myomectomy is reserved for persons whose symptoms are not well managed with medications and lifestyle changes.
ADH Activity and Expression Differ Between and Within Species.
The total level of ADH activity, as measured in crude extracts of adult flies, differs among several pairs of Drosophila species as well as within Drosophila melanogaster (Fig. 1) (detailed methods are presented in SI Appendix). Specifically, flies of D. melanogaster strain Florida-9 (fast allele) showed 173% (2.7-fold) higher ADH activity than flies of strain Canton-S (slow allele). Drosophila yakuba flies showed 74% higher activity than those from its sister species Drosophila santomea, Drosophila virilis flies had 510% higher activity than those from its sister species Drosophila americana, and Drosophila erecta had 293% higher activity than those from its sister species Drosophila orena.
ADH activity and protein level differ among pairs of Drosophila alleles and species. (A) Cladogram shows relationships between species. Data from ref. 23. ADH activity is in units of ∆Abs340 per minutes per milligram soluble protein (mean ± SD). Percent difference = ([activity(high)/activity(low) − 1] × 100%), mean ± SD. Different assay conditions were used for the D. erecta/D. orena pair and for the D. virilis/D. americana pair (SI Appendix, Methods), so the means of, for example, D. santomea and D. orena, should not be compared. These data are also presented in different form in Fig. 2 A and B and SI Appendix, Fig. S1. (B) Differences in ADH protein levels between pairs of strains and species are also apparent in Western blot. Differences in band intensity among pairs (but not within pairs) are potentially affected by sequence divergence in the region to which the antibody was raised (SI Appendix, Fig. S2). The additional band in D. virilis in the anti-tubulin blot is likely the result of cross-reactivity with the polyclonal antibody.
These large differences in enzyme activity prompted us to determine their underlying mechanistic causes. In principle, higher activity could be the consequence of greater enzyme specific activity, the production of more enzyme, or both. To determine whether the differences observed might be due to the production of different amounts of Adh protein, we used Western blots with an anti-Adh antibody to examine the relative amounts of Adh protein in whole-fly extracts. In three cases, the species or strain with higher ADH activity produced more Adh protein, indicating that differences in protein expression level are at least partly responsible [Fig. 1B Adh protein was not detected in the D. erecta/D. orena pair, likely due to amino acid divergence in the epitopes against which the antibody was raised (SI Appendix, Fig. S2)].
ADH Activity Evolution Originates Primarily from the Adh Gene.
Differences in ADH activity could be due to substitutions at the Adh locus and/or to trans-acting factors outside of the locus. To determine if ADH activity differences originated from substitutions within the Adh gene, we cloned the Adh alleles from each species or strain and then transformed them back into a specific D. melanogaster attP-PhiC31 genomic landing site in a uniform Adh null genetic background (24). Cloned loci were ∼8 kb with identical boundaries in each pair, containing all known sequences required for adult expression (SI Appendix, Methods). In each case, the transgenic Adh alleles largely recapitulated the between- and within-species differences in Adh activity (Fig. 2 A and B). A similar pattern of relative differences in protein level was seen in Western blots (Fig. 2C). We could therefore use Adh transgenes to determine the contribution of protein coding versus noncoding substitutions to evolutionary differences in ADH activity.
ADH enzyme activity and protein level differ between and within species. (A) ADH enzyme activity of Drosophila strains and species (white box plots) is largely recapitulated when cloned Adh loci are transformed into D. melanogaster (gray box plots). (B) Transformants of D. virilis and D. americana Adh loci also largely recapitulate the species difference and show a major contribution from tandem duplication. Data from ref. 25. (C) Relative differences in levels of ADH protein between pairs of transformants are also apparent in Western blot. Low signal intensity in ere and ore is plausibly due to sequence divergence in the region to which the antibody was raised (SI Appendix, Fig. S2).
Amino Acid Replacements Account for only a Minor Fraction of Activity Evolution.
To directly determine the relative contribution of protein coding sequences to overall activity differences, we made a set of constructs that substituted the amino acid sequence from one species or strain into the allele from the other species or strain, leaving all noncoding substitutions unchanged. In these experiments, it was critical to be able to reliably detect small increments of differences between transgenic flies. To do so, we scaled up the sensitive ADH activity assay, measuring multiple batches of flies from multiple transgenic lines. We estimate that we could detect activity differences of around 4 to 8% after correcting for multiple testing (SI Appendix, Methods).
The number of amino acid replacements between strains or species was small. Just one amino acid difference separates the slow and fast D. melanogaster alleles (a lysine-to-threonine substitution at position 192 K192T), while three and four amino acid differences distinguish the santomea/yakuba and orena/erecta alleles, respectively (SI Appendix, Fig. S2). To measure any potential difference between the D. virilis and D. americana coding regions, we first had to consider the tandem duplication of the entire Adh gene and flanking region that occurs in D. virilis. The tandem copies are identical except for three substitutions in the 3′ noncoding region that have been shown to not affect activity (25). This allowed us to delete one duplicate from the construct, resulting in a single copy that had orthologous synteny with D. americana. We could then substitute the one amino acid change (virilis: L51, americana: I51) into this single-copy virilis Adh locus and determine if it contributed significantly to the species difference.
We found that the swapping of amino acid residues had the effect of changing ADH activity by at most 22% (the D. melanogaster K192T substitution) (Fig. 3 and Table 1, percent difference). In the case of D. virilis–D. americana, the single amino acid substitution had no significant effect [P = 0.09 (Table 1) after correction for multiple pairwise comparisons, P = 0.26 (Fig. 3D)]. Thus, amino acid replacements within the ADH protein contributed 0 to 25% of the overall difference in ADH activity between the loci we compared (Table 1, percent of total). It follows that 75 to 100% of ADH activity differences are the result of noncoding substitutions.
ADH activity differences are mostly cis-regulatory. (A–D) Activity is shown for transformant lines carrying either unmodified Adh alleles or swap alleles, which have the amino acid sequence of one allele and the noncoding sequence of the other. Noncoding sequence predicts ADH activity substantially better than protein sequence does. Activity data shown in Figs. 3 and 4 were collected simultaneously and are presented in full in SI Appendix, Fig. S3. P values shown here were adjusted for all pairwise comparisons with DF 18–22. Box plots show the distribution of data, while error bars show 95% confidence intervals of the mean.