Application of Hypersensitivity Assays to Discovery of Therapeutics

Chemical hypersensitivity is a common method to probe gene dysfunction and can be deployed in drug screens to find new therapeutics. In this blog post, we will focus on models of Inborn Errors of Metabolism (IEM) and describe how these genetic conditions can lead to hypersensitivity to a metabolite. By using humanization techniques, we take advantage of the ancient biology between humans and other organisms to create stand-ins – patient avatars – for drug screening studies. These genetically engineered model systems enable fast and affordable phenotypic screens in whole organism format to enable researchers to find molecules that alleviate the metabolic stress occurring from an IEM deficiency. Ultimately, this report showcases how chemical hypersensitivity is generalizable to drug discovery tool applicable to many genetic disorders.

Clinical variants disrupting metabolic gene can lead to build up of toxic metabolites (Figure 1).  This phenomenon can be used to create a functional assay where the model system has hypersensitivity to metabolites upstream of the gene’s function in a metabolic pathway.

Figure 1.  Hypothetical metabolic pathway.  When ENZ 2 gene is defective, metabolite 2 can build up to toxic level that lead to paralysis and death.

Chemical Hypersensitivity Due to Metabolic Block

In the above example, the second enzyme of the metabolic pathway, “ENZ 2,” is the cause of a genetic disease  that inhibits the enzymatic conversion of metabolite 2 into metabolite 3.  For patients with this condition, exposure to metabolite 1 leads to toxic metabolite 2 build up and activation of reactive oxygen species, ultimately leading to paralysis and death.  Because of this metabolic pathway blockade, patients can experience hypersensitivity with exposure to metabolite 1.

Model Systems in the Fluidics Paradigm

A variety of model systems are possible for use in testing for metabolite hypersensitivity.  Starting with the simple model organisms, a human gene associated with disease can be installed as a gene replacement. In the case of metabolic disorders, the high degree of sequence conservation in these ancient genes often enable the human gene to rescue the function of the removed ortholog (the animal’s version of the disease gene). When we add in iPSC and then install clinical variants in the human gene locus, we get three types of model systems (C. elegans nematode, zebrafish and iPSC to model clinical variants (Figure 2). These models are advantageous for drug discovery because they fit the fluidics paradigm – The zebrafish and nematode can live their entire lifecycle in liquid and differentiated iPSC can be hooked up in biocircuits with microfluidics. By being in fluid, assessment of oral bioavailability is simply done by adding drugs to the liquid growth media. In this liquid environment, The first step is to create the wt-Avatars. In the nematode, the evolutionary distance often renders a gene to gene comparison with low homology (sequence identity under 75%). As a result only a portion of the pathogenic alleles can be modeled as amino acid substitution in the nematode’s native locus.  For work around we use Whole Gene Humanization – CRISPR gene editing is used to remove the coding sequence at the native locus and replace it with the human gene coding sequence.  When the human sequence restores normal function, we know we are looking at a high degree of conservation of biology.

In zebrafish, the orthologous gene is often at a sequence identity that is equal to or greater than 75% of the human sequence. This often renders the zebrafish’s gene sufficient for modeling the patient condition as a single amino acid substitution, yet occasionally whole-gene humanization will need to be deployed. For iPSC, we source from a healthy patient (reference line – “wt Avatar”) and make the single amino acid variation to model the patient variant condition (var-Avatar).   Bottomline, modeling a patient condition requires the creation of a wild-type humanized animal or the use of an unmodified line (wt-Avatar) and then the insertion of the missense variant that models the genetic variant of the patient (var-Avatar). In regards to IEM deficiencies, this system enables detection of phenotypic abnormalities that are often accentuated by metabolite exposure.

Figure 2. Three types of model systems are zebrafish and C. elegans nematode animal models (most relevant when used in whole gene humanized formats) and the use of iPSCs. Humanizing mutations are made to recreate the same variation seen in the patient (var-Avatar) which are then compared to the wild type control (wt-Avatar).

Detection of Hypersensitivity Phenotypes

To provide a phenotypic screen that is linked to a mechanism of action, the var-Avatar lines can be examined for their hypersensitivity to specific environmental stressors. Applied to a metabolic gene, the environmental stressor is exposure to an upstream metabolite. Using the ENZ 2 example of Figure 1, increasing dosage of metabolite 1 can be monitored for its effect on paralysis in the var-Avatar model system. To measure activity in the nematode, the 96 well format of the wMicrotracker apparatus is used to track loss of locomotion. As the animal becomes more paralyzed light beam disruption rate decreases. By exposing a nematode var-Avatar to different concentrations of metabolite, an LD50 curve can be generated (Figure 3).

Figure 3. wMicrotracker is used to generate LD50 curves for var-Avatar and wt-Avatar upon exposure to different concentrations of metabolite. As metabolite concentration increases, survival of animals decreases. The var-Avatar has an earlier drop in survival when compared to wt-Avatar. An intermediate concentration of metabolite (40mM) can be used to discriminate hypersensitivity in the var-Avatar animals.

In the var-Avatar, the loss of enzyme function with ENZ 2 pathogenic variants creates pathway blockage and allows build up of metabolite 2. When the metabolite 2 reaches toxic levels and the cell becomes oxidatively stressed, which leads to cell apoptosis and eventually animal paralysis and death. In this example, 40 mM of metabolite 1 is enough to cause paralysis and lethality in the var-Avatar of ENZ 2 deficient animals, but it is not enough to create paralysis in wt-Avatar control animals. As a result, a drug screen can be developed that uses this 40mM concentration as a cut-off tool for finding drugs that can alleviate the metabolic block and enable survival of the ENZ 2 var-Avatar (Figure 4).

Figure 4. Rescue of hypersensitivity screen applied to drug repurposing library.  A “hit” is obtained when the drug rescues survival in var-Avatar in 40 mM metabolite but does not influence survival at the wt-Avatar at its LD50 measured metabolite concentration.

Drug Screen

A drug screen can be performed on repurposing libraries to find hits for use in clinical trials. In the above example, the 40 mM of metabolite 1 is used to detect hits as a var-Avatar animal that can survive when exposed to a drug from the repurposing library. Additionally, the hit is considered valid when the drug has minimal to no effect on the wt-Avatar control. For metabolic disorders screening,the first step (Phase 1) is a Stress Test to determine if upstream metabolites can lead to a hypersensitivity.  In the next step (Phase 2), prior to the library screening, testing for Metabolic Modulation is performed by observing if changes in hypersensitivity can be achieved with downstream metabolites, cofactors, oxidants and antioxidants (Figure 5).  In the final step (Phase 3), A Library Screen is performed to find modulatory effects from FDA approved drugs.

Figure 5.  The wt-Avatar and var-Avatar animals are compared for their response to stress test compounds (upstream metabolites) to achieve optimal separation of wt-Avatar vs var-Avatar (dotted lines). Then a set of metabolic modulators (downstream metabolites, oxidants and antioxidants) are tested for their capacity to restore normal sensitivity in var-Avatar and have minor change in wt-Avatar sensitivity (dashed lines).

Phase 1 – Stress Testing to Detect Compound Hypersensitivity

In the first phase of a 3 phase process, an assessment is made for the detection of metabolite sensitivities to metabolites and ROS mediators. In this pilot screen, a small selection of chemicals are used to determine if metabolite sensitivity can be detected. First upstream metabolites are exposed to the model system. These will often lead to their build up to toxic levels which are achieved sooner in the var-Avatar line vs the wt-Avatar line.

Scope Steps: Hypersensitivity Screen (upstream metabolites)

  1. Obtain wt-Avatar and var-Avatar: Generate humanized animals for modeling a metabolic gene deficiency – created as a prerequisite project.
  2. Obtain Stress-Test Compounds: Select upstream metabolites (1 to 10 molecules)
  3. Solubilize Stress-Test Compound: Resuspend the metabolites in 100% DMSO at 100mM concentration to mimic the sample conditions of most drug screening libraries.
  4. Measure Activity: Determine suppression as retention of locomotion using wmicrotracker plate reader instrumentation.
  5. Generate LD50 curves: Expose animals at concentrations 0.01, 0.1, 1, 10, 100, and 1000 uM (1% DMSO)
  6. Select Optimal Separation: Examine LD50 curves for conditions creating a high degree of separation between wt-Avatar and var-Avatar.
  7. Repeat With Combinations (optional): If necessary, repeat with combinations of chemicals to attempt to create at least 4x separation in chemical sensitivity between wt-Avatar and var-Avatar.

Phase 2 – Metabolic Modulation Effects

Once metabolic hypersensitivity is established, a second set of molecules are examined for their ability to alleviate metabolic defect and restore normal activity. Downstream metabolites and cofactors can be added to the system to determine if they can restore normal metabolite sensitivity.  Additionally, In many metabolic disorders the buildup of metabolite intermediates lead to disruption of normal levels of Reactive Oxygen Species (ROS) (doi: 10.1155/2018/1246069). As a result, two other groups of molecules can be screened for their effect on metabolite hypersensitivity. Various antioxidants (resveratrol, glutathione, metformin) can be tested for their ability to decrease sensitivity to metabolite build up. On the other hand, some metabolic intermediates may attenuate ROS which at low levels are necessary for homeostasis signaling. Various oxidants (paraquat, juglone and AAPH (2,2′-azobis-2-methyl-propanimidamide)) may lead to restoration of normal low levels of ROS and alleviate metabolic hypersensitivity.

Scope Steps: Metabolic Modulation (downstream metabolites, cofactors, oxidants, antioxidants)

  1. Obtain metabolic-modulator Compounds: Select downstream metabolites, and oxidants and antioxidants (5 to 10 chemicals)
  2. Solubilize metabolic-modulator Compound: Resuspend the compounds in 100% DMSO at 100mM concentration to mimic the sample conditions of most drug screening libraries.
  3. Measure Activity: Determine suppression as retention of locomotion using wmicrotracker plate reader instrumentation.
  4. Generate EC50 curves: Using a upstream metabolite at a concentration that leads to a strong deficiency (ie. death), add metabolic-modulator at concentrations 0.01, 0.1, 1, 10, 100, and 1000 uM (1% DMSO)
  5. Identify Modulator Candidates: Examine the EC50 curves for the modulators effect on wt-Avatar and var-Avatar.
  6. Repeat With Combinations (optional): If necessary, repeat with combinations of chemicals to attempt to decrease the separation between wt-Avatar and var-Avatar.

Phase 3: Library Screening Approach

For speed to clinical trials, a good choice is to use repurposing libraries. These are the FDA-approved drugs that are well vetted for ADMET issues, which makes them a safe choice for therapeutic consideration. We will apply the assay developed in phase 1 to scale across a commercially-sourced drug screening library. A variety of sources for compound libraries utilizing FDA-approved drugs are available (TargetMol, ApexBio, Chembridge, Microsource, Prestwick, Seleckchem, etc… – typically about 2000-3000 mlcs). As an example, the metabolite hypersensitivity assay is applied to the var-Avatar line (patient variant animal) at 10uM concentrations of compounds from the Apexbio FDA-approved library of 2726 compounds. The top hits (up to 200) are counter screened on wt-Avatar animals and the results are scored for minimal alteration of wild-type metabolite sensitivity. A rank is developed that balances strong response in the var-Avatar line against a strong response in the wt-Avatar line. The top 20 hits are examined at a range of doses to find the EC50 values in suppressing metabolite hypersensitivity. Data package is prepared and delivered to the client.

Scope Steps: Examine Rescue of metabolite Sensitivity in var-Avatar line with 2000+ Chemical Library

  1. Obtain library: Multiple repurposing libraries are available – for example, ApexBio library of 2726 FDA-approved chemicals.
  2. Drug Exposure: Using Stress-Test chemical concentrations optimized in phase 1, test The var-Avatar line against the library of 2726 molecules at 10uM concentration for their ability to suppress metabolite hypersensitivity.
  3. Measure Activity: Determine suppression as retention of locomotion using wmicrotracker plate reader instrumentation.
  4. Perform Primary Screen: Score compounds as positive or negative for ability to suppress metabolite hypersensitivity by retaining locomotion.
  5. Identify Preliminary Hits: Select up to 200 compounds (≤ mlcs) from the positive category for use in repeat screen on wt-Avatar animals.
  6. Perform Counter Screen: Using metabolite concentration optimized in phase 1, test the ≤200 mlcs at 10uM concentration for their ability to NOT suppress metabolite sensitivity in wt-Avatar animals.
  7. Rank Hits: Select up to 20 compounds (≤20 mlcs) that are positive in the var-Avatar for suppression of metabolite hypersensitivity and negative for suppression of metabolite sensitivity in the wt-Avatar.
  8. Characterize Hits: Perform EC50 assays on var-Avatar line for the 20 compounds.
  9. Report Out: Send a report to the client.

RWE – Real World Evidence Applications

A real world example occurs in the propionate metabolic pathway.  Defects in the ECHS1 gene lead to altered propionate metabolism shunting propionate away from Acetyl-CoA production and instead only allow Succinyl-CoA production (Figure 6). 

Figure 6. Defects in ECHS1 block propionate metabolism via acetyl-CoA pathway. Instead only the Succinyl-CoA pathway remains.

Potentially hampering succinyl-CoA production is a vitamin B12 dependency of the last enzyme in the Succinyl-CoA production pathway. When both B12 is limiting and the Acetyl-CoA production is blocked by a ECHS1 defect, hypersensitivity to propionate occurs. Since the propionate pathway is highly conserved between the nematode and humans, creating a deficiency in the nematode equivalent of ECHS1 (the ech-6 gene) can create a propionate hypersensitivity (Figure 7).

Figure 7. Differential effects of propionate exposure in C. elegans nematode models. Propionate exposure to the nematode exhibits a sensitivity in wild type (N2) control that has LD50 near 90 mM.  When RNAi mediated knockdown of ech-6 gene is performed, a propionate hypersensitivity ensues and LD50 drops to near 10 mM propionate.  The result is a near 10x difference in propionate sensitivity when the ech-6 locus is blocked from expression of the nematode’s version of ECHS1.  Addition of B12 vitamin partially rescues the propionate sensitivity.  (data adapted from Watson et al. Elife. 2016 Jul 6;5:e17670)

Loss of function of ech-6 creates propionate hypersensitivity

RNAi was used to knock down expression from the ech-6 gene (Watson et al. Elife. 2016 Jul 6;5:e17670). The result was hypersensitivity to propionate – a 9 fold increase in LD50 with exposure to the metabolite. When B12 was added to the system, partial rescue of hypersensitivity occurred. From this data where the drug screen was performed at 30 mM propionate, the level of rescue of survival was 4 fold higher with B12 exposure. 40-50 mM propionate appears to be the ideal range for detecting chemicals capable of rescue of propionate hypersensitivity in ech-6 null animals (KO). Inserting the human ECHS1 as gene replacement of ech-6 and observing restoration of normal propionate sensitivity will establish that the animal model (wt-Avatar) can be used to create a background system for modeling human disease. Next, installing variants in the humanized strain will create a model system for specifically exploring a patient’s genetic condition (var-Avatar) (Figure 8). When good separation between var-Avatar and wt-Avatar is observed, a screen for rescue of propionate hypersensitivity can be performed and B12 can be used as positive control.

Figure 8. Propionate sensitivity for four types of animal models. The KO null does not express ech-6 gene and is expected to be highly sensitive to propionate.  The var-Avatar is a whole-gene humanized wt-Avatar containing a patient coding-sequence variant.  The wt-Avatar is a humanized animal created as a whole-gene replacement of the ech-6 coding sequence. The wild-type N2 is the unmodified animal model commonly used as a control animal for comparison to gene-modified animals.

B3GAT3 Drug Target for Modulating Propionate Hypersensitivity

Reassuringly, other genes involved in propionate metabolism are known to have hypersensitivity when they are made defective. For instance, the PCCA gene, in combination with PCCB, is a heteromeric enzyme responsible for converting propionyl-CoA to D-methylmalonyl-CoA, which is an intermediate step of the metabolism of propionate to succinyl-CoA. When loss of function is created in the nematode ortholog pcca-1, propionate hypersensitivity occurs and these animals do not survive at above 50 mM propionate (Watson et al. Elife. 2016 Jul 6;5:e17670).  This same group used the propionate sensitivity assay to determine that  loss-of-function mutations in glct-3 (an ortholog of B3GAT3) create a propionate resistant animal (Na et al., PLoS Genet. 2020 Aug 28;16(8):e1008984). As a result, it appears inhibition of B3GAT3 may be useful in offsetting the hypersensitivity of defects within the propionate pathway.  In silico approaches can be used to dock molecules to the B3GAT3 structure (1FGG, 1KWS, 3CU0) (Figure 9).  Hits found can then be validated by testing in a humanized animal model.

Figure 9. Molecular dynamics screens large libraries to find top hits for validation in an animal model.  

Zebrafish Model for Study of RPE65 Defects

Switching to zebrafish models, vitamin A metabolite toxicity can be used to probe defects in the RPE65 gene. First, morpholinos or crispants can be used to dramatically reduce expression of this gene in the fish and lead to loss of function phenotypes.  Next, the addition of an mRNA to the injection mix can be used to rescue the loss of function and restore normal metabolic activity.  With regards to the loss of function activity, morpholinos in zebrafish work by shutting down translation of a gene transcript.  In the alternative approach, crispants work by disrupting the gene’s coding sequence in a large proportion of cells (>90%). Often the effect with either a pproach is a 10x or more decrease in gene expression which effectively models a loss-of-function variant. It is expected loss of function of the zebrafish isoforms for RPE65 will result in metabolite hypersensitivity.

In humans, loss of the RPE65 retinoid isomerohydrolase, an enzyme of the visual cycle involved in catalytic recycling of retinyl palmitate to 11-cys-retinol, will render the REP65 deficient animals hypersensitive to retinyl palmitate and its precursors. The all-trans-retinal (atRAL) is a precursor with established toxicity in the visual system (Chen et al. J Biol Chem. 2012 Feb 10; 287(7): 5059–5069; Gao et al., J Biol Chem. 2018 Sep 14; 293(37): 14507–14519). The RPE65 enzyme is critical in the last step of the visual cycle by creating the 11-cis-retionol needed by opsins to convert light into a biochemical signal. Deficiencies in this enzyme are likely to render the animals prone to light-induced blindness. In zebrafish, there are three genes (rpe65a, rpe65b, and rpe65c) that provide support for the RPE65 role in humans (Ward et al., Front Cell Dev Biol. 2018; 6: 37). A crispant CRISPR-based somatic knockout strategy can be employed where three sgRNAs each targeting the three isoforms are used to create removal of retinoid isomerohydrolase from nearly all tissues (~95%) in a zebrafish embryo. Exposing the resulting larvae to bright light and all-trans-retinal (atRAL) is likely to create a hypersensitivity that manifests as a high rate of blindness.

To rescue the blindness, an mRNA encoding human retinoid isomerohydrolase (hRPE65) is added to the crispant mix. This allows the visual cycle to remain active and toxic levels of atRAL are avoided. To determine if a variant in hRPE65 is pathogenic, any of the 177 missense variants identified in the clinical population database of ClinVar can be made as an mRNA and then be examined for their pathogenic potential (lack of rescue). The mRNA effectively becomes a var-Avatar model for testing a variant’s hypersensitivity to atRAL.  For the variants that exhibit accelerated blindness, small molecules can be explored for their ability to restore normal sensitivity to atRAL exposure.

An iPSC Model for Study of NOTCH3 Deficiencies

As an evolving model system, induced pluripotent stem cells (iPSCs) can be derivatized into tissues that are populated into microphysiological systems (MPS). Some of the MPS use iPSC-derived tissue that models the vascular system. An important disease to model in vascular formats is cerebrovascular disorders. CADASIL is one of the vascular diseases with an established genetic cause – variations in the extracellular domain of NOTCH3 gene result in small blood vesicle defects in the brain. Although NOTCH3 variations causative for CADASIL are rare, the disease is considered an ideal model for cerebrovascular disease, which affects 25% of all stroke patients and 45% of all dementias. Further we know there is a therapeutic antibody binding site on the extracellular side of the NOTCH3 protein that modulates the disease by promoting extracellular cleavage and turnover of the NOTCH3 gene. Loss or gain of cystines are the most common pathogenic variants in NOTCH3. A reference iPSCs can be modified by CRISPR to contain the patient variant (var-Avatar) and be compared to the unmodified cells (wt-Avatar). In CADASIL patients, oxidative stress occurs (Neves et  al., JCI Insight. 2019 Dec 5; 4(23): e131344.) Specifically, CADASIL patients exhibit Nox5-induced oxidative stress as measured by Lucigenin-enhanced chemiluminescence assay of NADPH-dependent ROS production. So, although the NOTCH3 gene is not considered a metabolic gene, disruptions in its function behave like many metabolic deficiencies and lead to ROS imbalance.  For a hypersensitivity screen, it is likely oxidative stressors such as paraquat will exhibit hypersensitivity in iPSC-derived MPS models of CADASIL patients.


Stressor hypersensitivity can be used to detect favorable drug effects.  Applied to Inborn Errors of Metabolism, the use of an upstream metabolite can be a stressor that leads to hypersensitivity to the metabolite. Patients with defects in an enzyme of a metabolic pathway experience unhealthy build up of a metabolite intermediates. We can take advantage of the ancient biology between humans and other organisms for the genes involved in metabolism where we create the patient’s genetic defect in the model system, either nematode or zebrafish (or iPSC). This humanized animal (or modified iPSC) then becomes the patient avatar for use in drug screening studies. For metabolic deficiencies, the animal can be used for phenotypic screens to find molecules that alleviate the metabolic stress occurring from the deficiency.  Ultimately the approach is likely to be widely generalizable to many genetic disorders.

The Path to Affordable Therapeutics in Rare Disease – Tackling Congenital Disorder of Glycosylation in the PMM2 gene.

Authors: Hannah Huston, Alexandra Narin, and Chris Hopkins

By definition, there are not a lot of patients for any given Rare Disease. A disease is only considered rare (or “orphan”) if it affects fewer than 200,000 people (NIH). Often, a rare genetic condition caused by a malfunction of a gene is even more infrequent. Maybe only a small handful of people on the planet will have genetic lesions in a given gene. Despite this, rare disease as a group is actually quite common. There are over 7,000 genes in everyone’s genome that can harbor a gene variation which leads to a disease. One way to conceptualize the variants in a rare disease is to imagine an inverted image of a galaxy, where black dot clusters of Rare variants have different levels of phenotypic severity (Figure 1). The variant clusters in the outer fringes are near normal activity, but in the center is the black null of lethality.

CREDIT: Adapted from the European Space Agency

Although a pair of individuals with similar variations in a given gene is not often found, the likelihood that any two individuals will have a defective variation in any one of the 7,000 genes is much higher. The result is that about 1 in 15 persons are afflicted with a rare disease condition. So, in aggregate, it turns out that rare diseases are a highly common health care issue that suppresses quality of life for a large proportion of the planet’s population.

The challenge of today’s personalized medicine is to develop therapeutic approaches that can treat these rare disease populations in a cost effective way. In the traditional therapeutic development path, the cost to bring a chemical entity through the challenges of toxicity and efficiency and have it reach the market, has cost billions of dollars. This sizable sum of money can be challenging to justify when only a dozen or so individuals will stand to benefit from such a high cost. As a society, we must get inventive and find more affordable approaches to bring rare disease therapeutics to market. Answering this call to action are small biotech companies, such as Perlara.

Perlara, located in the San Francisco Bay Area, was founded in February 2014 by Ethan Perlstein. The company is “on a mission to accelerate the discovery of cures for rare genetic diseases and uncover underlying mechanisms that enable the development of treatments that work across a range of diseases and individuals” (Perlara Website). Through their PerlQuests™, Perlara partners with families, researchers, and patients to find treatments for these otherwise forgotten, yet very real, rare diseases. One such PerlQuest, focusing on PMM2 (Phosphomannomutase 2 deficiency), holds a special place in the heart of Dr. Sangeetha Iyer, the Director of Preclinical Development at Perlara.

Dr. lyer (currently a senior PM at Pfizer) has a background in neurodegenerative disorders and rare genetic diseases. She received her Ph.D. in molecular pharmacology from the University of Pittsburgh and then went on to do her postdoc research at the University of Texas, Austin. Sangeetha has worked with a plethora of models, starting her career with mouse models, and working with xenopus oocytes, drosophila, and C. elegans. Not only does she have a wide understanding of model organisms, but Dr. lyer also has had over 10 years of experience in model and assay development and drug screening for human genetic disorders. 

Dr. lyer developed nematode models of rare diseases and conducted several successful screen campaigns for rare diseases, one of which was the aforementioned PMM2, the Phosphomannomutase 2 deficiency. These efforts led to a clinical trial with one patient, Maggie (n=1). The trial outcome was a success! Maggie is doing quite well and the trial is currently being expanded to include other patients. Recently, these findings were published in a paper titled: Repurposing the aldose reductase inhibitor and diabetic neuropathy drug epalrestat for the congenital disorder of glycosylation PMM2-CDG.

What is PMM2?

PMM2-CDG, formerly known as congenital disorder of glycosylation type 1a, is a rare multisystem disorder that involves a normal, but complex, chemical process known as glycosylation (PMM2-CDG 2015) 

PMM2-CDG is caused by mutations of the phosphomannomutase-2 (PMM2) gene and is inherited as an autosomal recessive condition. The variation reduces the function of the PMM2 enzyme and leads to improper levels of glycosylation. The disease can affect any part of the body, though most cases usually have an important neurological component. PMM2-CDG is associated with a broad and highly variable range of symptoms and can vary in severity from mild cases to the severe, disabling or life-threatening cases. Most cases appear in infancy or early childhood, like in Maggie’s case, thus, this patient population became the focus of Perlara’s PerlQuest.

Maggie’s Quest

The PMM2 PerlQuest came about because, prior to meeting Maggie, Dr. lyer had started working on lysosomal storage disorders. The research team had just completed some work on a glycosylation disorder (NGLY-1), whose loss of function leads to developmental delay and seizures. In presenting this work, they were introduced to the Maggie’ parents, as PMM2 deficiency is one of the most common causes of glycosylation disorders.  Sangetha and the team at Perlara felt this was a good candidate for using humanizing mutations to create an animal model of the disease in a simple model organism.

“Some model systems work, but not a whole lot and we felt that it fit very well with our platform of model organisms. Perlara was working with yeast model systems, drosophila as well as C. elegans along with the basic model organism pipeline and on the other hand, we had patient fibroblast, which were also available for PMM2. So PMM2 basically came onto our radar because of Maggie, the girl who has PMM2 disorder. And after meeting with her parents and having some conversations about the utility of our platform, we decided to go ahead and model some of our mutations and see if we could conduct a drug screening campaign. That’s how that program came to be.” (Iyer, Podcast: 17 minutes of Science, 2020)

To pursue PMM2 treatment options, Sangeetha and her team at Perlara decided to go down the path of drug repurposing. This involves testing already known drugs and compounds in alternative ways to see if they are viable treatment options for other diseases outside of their initial purpose. One of the main benefits of drug repurposing is that it speeds up the traditional drug discovery timeline as the compounds are already known, and often have vast amounts of information already available for researchers to use. As a result, drug development can be achieved while cutting costs immensely: instead of the typical hundreds of millions it takes to reach clinical trials, drug repurposing trials can be conducted with millions or even sub-millions.

The process:

To begin their repurposing campaign, researchers at Perlara initially started testing with yeast models of PMM2. They then wanted to move into C. elegans, but hit a roadblock. C. elegans already had one PMM2 model, but it was lethal which meant it would not be a viable tool for their campaign. At this point, Perlara approached InVivo Biosystems for help building a new C. elegans model to use. “With InVivo Biosystems’ help, we were able to model another, a different patient mutation, one that does not have this severe lack of enzyme activity”(Iyer, Podcast: 17 minutes of Science, 2020). The new worm model that InVivo Biosystems built for Perlara has an ortholog for PMM2 in which the protein is 54% identical to humans. This may not sound like a lot, but it is very significant. Additionally, and possibly most importantly, the mutation sites were conserved between humans and C. elegans. Because of this conservation, Perlara was able to use the C. elegans models engineered with a specific point mutation that modeled the exact mutation seen in the patient.

“One of the reasons we believed in the power of model organisms specifically for rare monogenic diseases was because when you have a single gene ortholog and one that has high similarity to what one might encounter in humans, you can model the same mutation as you see in humans, in those model organisms.” (Iyer, Podcast: 17 minutes of Science, 2020). 

Dr. lyer performed a drug screen using a 2560-compound Microsource Spectrum library consisting of FDA-approved drugs, bioactive tool compounds and natural products. The top 20 hits were found to be either antidiabetic or antioxidant molecules. Remarkably, the dominant portion of hits were antioxidant flavonoids with known utility as aldose reductase inhibitors (ARIs). Next they checked the activity of the leads in yeast and patient-derived fibroblast to see if cross species conservation of activity can be observed. The result was identification of α-cyano-4-hydroxycinnamic acid (CHCA) as the most cross species bioactive for ARI activity. The structure activity relationship of the CHCA molecule was used to develop a drug pharmacophore profile which allows identification of a set of commercially-available ARIs (tolrestat, ranirestat, imirestat, zopolrestat, sorbinil, ponalrestat, alrestatin, fidarestat and epalrestat).  Testing these new molecules in worms and patient fibroblast revealed epalrestat as the best activity lead.

The aldose reductase is an enzyme that shunts glucose down the polyol pathway with its conversion of glucose into sorbitol. The inhibition of this enzyme activity has two favorable effects. It leads to an increase in glucose-1,6-bisphosphate production which activates PMM2. And it prevent activation of the polyol pathway activity which generates Reactive Oxygen Species (ROS) and leads to Advanced Glycation End-products (“AGE”).  The AGE are especially nasty because they create abnormal protein glycosylation and cause a normal protein to be recognized by the immune system as “foreign” protein. Then a cascade of auto-inflammatory response is initiated.

It is likely PMM2 deficiency set in motion quite a few cellular stress responses. Not only does it result in reduction in normal levels of N-linked glycosylation, it also results in increased shunt of glucose through the polyol pathway. This leads to high levels of sorbitol which readily alkylates with the amines in the body’s proteins rendering the proteins a “foreign” in appearance to the immune system and inflammation results. This ripple effect of cellular stress stress leads to the clinical presentations of the disease. 

“Until the time that we did this work, nobody had discovered that you could boost PMM2 enzyme activity through some other artificial shunt pathway, which is essentially what our model organism screens were telling us, that there was another way to increase PMM2 activity.” (Iyer, Podcast: 17 minutes of Science, 2020).

And while this was true, they had discovered another way to increase PMM2 activity in both their yeast and worm models, they were not certain how this would translate to the human enzyme. At this point, Dr lyer and her team at Perlara were able to incorporate the human fibroblasts which were known to have the defective enzyme activity. The team used both generic Fibroblasts from Coriell, a non-profit repository for patient fibroblasts, in addition to using the fibroblasts from Maggie. In both cases, the fibroblasts showed that PMM2 enzyme activity was in fact increased when exposed to epalrestat. According to Dr lyer, “that’s where the final piece of the puzzle came together.”

Maggie’s Cure

One of the benefits of drug repurposing over developing new compounds is that these drugs are already on the market and have had a wealth of safety data collected. While not always the case, this typically makes it easier to get approval to use the drugs. Although epalrestat had never been approved for use in the US, it had been on the market for over 20 years in other countries and was no longer under patent protection.

At this point, Perlara started talking with Maggie’s family and Dr. Eva Morava from the Mayo Clinic to see about treatment opportunities for Maggie using epalrestat. Dr. Morava, along with Maggie’s parents and Ethan Perlstein, put together an n=1 IND application which they submitted to the FDA in order to gain approval for the use of epalrestat. Thanks to the trove of safety data that was available for epalrestat and the body of data generated by Perlara substantiating that epalrestat increased PMM2 enzyme activity in a variety of modern systems, they were able to gain approval for the trial. 

Maggie has now been on the drug for over a year (Interview conducted when Maggie had been on the drug for about 10 months). She has gained weight and can have conversations, something she was unable to do pre-treatment. Her motor skills and coordination have skyrocketed — she can even ride a bike now. Maggie is continuing to take epalrestat, and her team is now working to expand the trial to a larger group in the hopes of helping others. Dr lyer credits the success of this drug repurposing study to the model organisms which were able to generate the data they needed quickly, efficiently, and affordably in order to gain their FDA approval. 

“To see a drug have an impact, a positive impact, on the child, in a child was just incredibly powerful” (Iyer, Podcast: 17 minutes of Science, 2020)


  1. Sangeetha Interview:
  2. PMM2-CDG. (2015, August 06). Retrieved April 28, 2021, from,chemical%20process%20known%20as%20glycosylation.
  3. Iyer, S., Sam, F. S., DiPrimio, N., Preston, G., Verhejein, J., Murthy, K., . . . Perlstein, E. (2019, November 11). Repurposing the aldose reductase inhibitor and diabetic neuropathy drug epalrestat for the congenital disorder of glycosylation PMM2-CDG. Retrieved from:
  4. Aldose reductase inhibitors for the treatment of Diabetic polyneuropathy. (n.d.). Retrieved April 28, 2021, from,or%20reverse%20progression%20of%20neuropathy
  5. Aldose reductase inhibitors for the treatment of Diabetic polyneuropathy. (n.d.). Retrieved April 28, 2021, from,or%20reverse%20progression%20of%20neuropathy

Arg…What up with that?!! Arginine is Enriched in Pathogenic Variants

You know when that hunch seems to get reinforced over and over again, then your mind starts speculating it as a fact.

!Danger! Will Robinson… it’s time for a serious fact check.

My hunch was that the amino acid arginine (Aka: “Arg” or “R) seems to be showing frequent association with pathogenicity. It started with the observation that many of the established pathogenic variants in the coding sequence of STXBP1 seem to involve a preference for arginine. Extracting from ClinVar for missense that are pathogenic and likely pathogenic gives the following table:

Indeed arginine (R) is disproportionately represented. Assuming all amino acids as equals, then there should be 4.3 for each amino acid. Disproportionally low are things that make sense. Like methionine (M), only one codon (ATG) instructs for insertion of this amino acid in a sequence. Similarly tryptophan (W) also has only one codon (TGG). These two amino acids should be represented below the average. A little bit oddly, we have similar low levels from lysine (K), phenylalanine (F) and glutamate (Q) who each have two codons. If codon dosage was key to variant proportioning, then these should have been seen at least 2x more than M and W, so perhaps something more than codon dosage mediates amino acid choice in creating pathogenic variations.

Arginine has 6 codons which still could drive its outsized proportion in the graph. Yet Serine (S) and Leucine (L) also have 6 codons. But respectively they are at 7 and 3 for being involved in pathogenicity. Only mighty arginine accounts for 13 of the 43 pathogenic variants in STXBP1 (30%). Tempering my enthusiasm is the observation that for 3 amino acid positions R292, R406 and R451, we have multiple changes being called pathogenic. Yet no other amino acid in the STXBP1 pathogenics has this changling capacity, so why is it that arginine is at high proportion in the assigned pathogenics – perhaps it is just a consequence of a biased investigator focus specific to STXBP1 and they fixed their gaze onto the repeating de novo clinical variants at positions 292, 406 and 451.

Is arginine involved in fragility elsewhere in the genome?

To normalize for possible investigator bias and find a method that can be applied to other portions of the genome, I took advantage of the Ensembl database to list and rank a gene’s codon sequence variants by bioinformatics analysis. Ranking on CADD was used to list protein coding sequence variations by their severity.

Ensembl allows us to identify which variations are theoretically likely to be disruptive of protein function. The choice to rank by CADD (stand for Combined Annotation-Dependent Depletion) allows us to use a sophisticated algorithm that avoids investigator bias because it intentionally avoids using “known” pathogenicity databases when it creates it ranking. A key test is to see if CADD can independently observe the pathogenicity known to exist in STXBP1. To construct the test, we compare the top scoring CADD variants with the lowest scoring CADD variants.

With CADD, we get an independent call for possible pathogenicity that still picks up what you might expect. Nearly half the calls in the Top-30 CADD pull up known pathogenicity and no benign calls are found. In the Bottom-30 CADD we get one known benign call and no pathogenics.

Healthy population data also is consistent. STXBP1 is autosomal dominant. That means you only need one of your two chromosomal copies to be defective and disease will occur. Selection pressure has been very tight on autosomal dominant genes. Variants in healthy population cannot occur at higher than the known frequency of the disease in the population. Published frequency in STXBP1 for causing early-infantile epileptic encephalopathy is 1/90,000. The largest healthy population database is in GnomAD. At 141,456 individuals, and the fact that STXBP1 needs to distribute across at least 43 pathogenic alleles, the likeliness of even one pathogenic variant being in healthy populations is pretty close to zero. Some of our Top-30 CADD have 1x or more frequency in healthy populations. Most of these are unassigned. For these unassigned that are seen at 1x or more, the disease frequency argument strongly implicates that they are benign variants.

So the CADD is not perfect, the top scoring hits are a mix of known pathogenic and probably benign. But the bottom scoring CADD seems to be more efficient at pulling out benign. In the Bottom-30 CADD, only one variant, I271V, is labeled Likely Benign by ClinVar, yet nearly everyone of these alleles (27 of 30) is seen in healthy populations, so they too are probably benign.

At this point in the analysis, we can pinpoint an anomaly. Y264C is labeled in ClinVar as a Likely Pathogenic. But from the population frequency argument, this assignment is highly unlikely. Y264C has been observed to occur in healthy populations. So a a bare minimum, it should be downgraded to a VUS, but probably be called a Likely Benign for causing early-infantile epileptic encephalopathy.

Finding Arginine-associated Fragility Throughout the Genome

This top-30 / bottom-30 approach was applied to a large set of genes. As a form of internal control, we add isoleucine (I) in the screen. With less conviction, I have felt this amino acid was associating with benign variants. If true, it should show an enrichment in the Bottom 30 CADD scores. So in my gene set experiment, I measured 4 bins. 2 bins for how many arginine and isoleucine in the Top 30 and 2 bins for how many arginine and isoleucine in the Bottom 30.

30% of top 30 CADD scoring variants contain arginine???!!!

An assumption of even distribution of amino acids, combined with an even more absurd assumption of an average 3.05 codons per amino acid, gives us 4.3% as average amino acid fraction per each 30 (dashed line). Arginine is 7.2x more than this average number. Yet, we need to account for the fact arginine uses about 2x more than the average codon usage. A a result Arginine bias in the Top 30 is about 3.5x more than expected. For isoleucine, the enrichment in the bottom 30 appears to be about 2x more than expected.

Test dataset – 30% arginine in Top-30 CADD prevails

The noisiest data in the Top-30 CADD appears to be the Arginine data. A cumulative trending plot was used to see how many genes were need before the trend to 30% becomes apparent. After assessing 7 genes the trend starts to stabilize. A new set of 7 genes were chosen. This time the genes were chosen from the Undiagnosed Disease Network (UDN). The UDN recently listed 54 genes as in desperate need for animal modeling to provide gene function studies. A sub-selection of these were identified as having good sequence similarity to genes in the animal models which we hold dear to our heart and expertise (zebrafish and C. elegans). The Top-30 / Bottom-30 CADD selection was applied to these genes and plotted for Arg and Leu enrichment. 30% prevails for arginine – it occurs at least 3.5x more than expected for being the top CADD variants as hypersensitive to substitution.

This all assumes that the representation of amino acids is uniform across all proteins. But the that is a reach. Louis Gross at University of Tennessee, Knoxville, has observed the amino acid distribution in vertebrates has some anomalies.

Most notable anomaly is arginine. 6 codons are use by arginine, but the observed frequency is low at 4.2%. To illustrate how low, they calculated the expected frequency for each amino acid biasing only for the GC richness of vertebrate genomes.

The expected frequency for arginine is quite high at about 10.5% due to its GC richness in its codons. Yet the actual observed frequency is quite low at about 4%. Based on this observed frequency, we bounce back – we now assess that we are observing arginine in the top 30 at 8x more than expected. No explanation for the anomaly and it just became more pronounced!

Taking a different approach, we can ask what percentage of ALL known pathogenic and likely pathogenic variants in a gene involve arginine substitution. 7 genes analyzed and we get the same 30% for arginine. Yet the calculations are that it should be below 4%. 8x more than expected prevails.

Are your arginines special too?

This analysis has uncovered a unique phenomenon. It appear everyone’s arginines are special. Exactly why arginine has this special status is not entirely clear. It is highly likely arginine has been strongly selected against its random incorporation during evolution. As a result of this strong negative selection (much more than what is happening for all other amino acids), arginine’s frequency in all proteins is much lower than predicted. The observed pathogenic sensitivity may be a read out of this hyperselectivity of evolution. Basically, arginine’s use in any given protein is very particular. A possible driver for this is arginine’s amazing capacity to bring high order to neighboring side chains in most protein structures. When it is gone, chaos reigns. When it is introduced where it should not be, chaos still reigns.

Arginine is special. I suggest we need to ditch Douglas Adam’s “42”.

Instead, we make like a pirate and just say:


VUS at 44% in ClinVar Assessments and Growing

How prevalent are Variants of Uncertain Significance?

ClinVar database for variant interpretation was analyzed for its levels of ACMG-AMP assessments. With help from the data dumps from ClinVar Miner, the yearly distribution of assessments was plotted. Since 2016 and shortly after the ACMG-AMP guidelines came out in 2015, the number of assessments assigned to the VUS category has grown rapidly. These are the variants that clinical genetics researchers have examined, but cannot decide if they are pathogenic or not.

How big will the VUS problem get?

To estimate how large the VUS problem will become, we must first understand how big is the human genome. Controversy abounds, but current estimate are there are 21,306 protein coding genes and 21,856 non-coding genes. To be conservative, and for simplicity sake, let us use 20,000 genes as the number. The next question is how many of these are disease associated. When we look to ClinVar the number of “genes with variants specific to one protein-coding gene” we get 7221 genes. More conservatively, we can look to ClinVar’s “gene_condition_source_id” which list 4242 genes as being associated with a diagnostic condition. This lower number is reinforced by OMIM in which the “Total number of genes with phenotype-causing mutation” is 4162 genes. These list have been growing rather steady at 5% per year, so in a few years the likely number of gene-disease associations will probably approach 5000 genes, or roughly 1/4 the human genome.

VUS problem may eventually approach 7 Million variants

A recent attempt to preload the human genome with pathogenicity assessment potential has been made. InterVar database applied ACMG-AMP guidelines to ~80,000,000 amino acid positions in the genome to provide a database for easier variant interpretation. Since at least 20% of these positions are likely to be in genes with known disease association, there are roughly 16,000,000 variants that will eventually occur in patient-derived genome sequencing. If the current trend of 44% VUS translates across that number, then there will be close to 7,000,000 variants in need of functional studies to resolve their pathogenicity.

A novel animal model systems for rapid variant interpretation

The team at Nemametrix just produced a wonderful set of preliminary data that we showed at the recent American Society of Human Genetics. It shows it is possible to use a training set of known benign and pathogenic alleles in a gene to “teach” a ML algorithm to determine if pathogenicity is present in a VUS. When applied to the STXBP1 gene, a set of 5 benign and 5 pathogenic was sufficient to train for segregation in an LDA plot and the Y75C was assessed as pathogenic.

Once this type of system is trained with a set of known pathogenic and benign variants, the assessment of pathogenicity can be achieved in a soon as 10 days from start of a VUS transgenesis project.

Total Domination – Uncovering the Phenomenon of 1:2 Dominant vs Recessive ratio for Variation in the Genome

In a prior blog post, the presence of dominant alleles in my genome gave me pause when trying to interpret the data from sequencing my DNA. Dominant alleles can be the cause disease when only one pathogenic variation occurs in only one gene copy of the chromosome pair. Contrast this to a recessive allele where you must get a defect in both chromosome copies of the gene to cause disease. In the recessive condition, if you only have one defective copy, you can expect to remain healthy, but you are a carrier of a disease allele. With the lack of immediate consequence to being a carrier status, many more individuals should be walking around with variations that are recessive towards disease. In fact, the CFTR gene variation (p.Arg117His) for Cystic Fibrosis that was highlighted for me in my Veritas Genomic sequencing report is quite common. It occurs globally at 1 per 2,500 persons, and that increases close 1 per 1,000 for northern europeans, which is a dominant portion of my ancestral genomic composition. In contrast, the CACNA1S variant (p.Arg419His) that most concerns me in my genome, has a prevalence of 1 in 25,000. Thats of low enough to be Rare Disease in Europe, but still probably way to high for disease manifestation rates.

Rare domination in CACNA1S needs to be rare enough to cause Hypokalemic Periodic Paralysis.

Dominant disease causality with the Arg419His variation in CACNA1S is unlikely because it is too frequent for the 1 per 100,000 population frequency for the disease of Hypokalemic Periodic Paralysis. Yet there are two variations known to be causative in CACNA1S, Arg528His and Arg1239His. Arg528His occurs at close to 1/100,000, while Arg1239His has yet to be detected in healthy populations. Clearly the Arg11239His is low enough population frequency to be causative for Hypokalemic Periodic Paralysis. Yet for my Arg419His, the frequency is too high for it to be causative. A variant effect that is Autosomal Dominant (AD) is extremely unlikely for my lone Arg419His allele.

If dominant alleles need to be rare in the population, how frequent is dominant status for variants of a disease?

The frequency of Autosomal Dominance (AD) for any given disease gene appears to be quite high. It is estimated that there are about 7000 Rare Diseases. If we assume the On-line Mendelian Inheritance in Man (OMIM) already represents most of these genes, then rare disease variants will map to the 4346 gene entries in OMIM with published allelic variations. Next, I listed these variations in blocks of 100 to reveals the number of genes for which they are known to exclusively Autosomal Dominant (AD) or Autosomal Recessive (AR), or some kind of hybrid.

When one runs down the inheritance pattern and tabulates them per gene, the first 100 variants have about twice as many genes in the AR category when compared to the AD category.

Running thru the another 400 more variants in the 100 variant blocks shows the trend continues – Dominance of a genetic conditions occurs for about 1/3rd of the disease genome.

Axiom for the individual : “I am not very dominating, but there are lots out there who are.”

So at the individual basis, it appears the AD status of pathogenic or likely pathogenic variants in your genome is very rare. Yet, at a population level, a large proportion of Rare Disease is caused by Autosomal Dominant variation. Rare disease calculate to occur at about 1 per 15 persons. So, for about 1 in 50 (150 million persons), their disease casing variation is likely to be Autosomal Dominant.

What is hot – or not??? – Looking at Hotspots and Coldspots of Pathogenicity in the STXBP1 Gene

In today genomic medicine era, it remains challenging to understand the functional consequence of a gene variant’s contribution towards disease. Guilt by association is one of the criteria upon which a new variant is judged. We can look at healthy populations data and compare it to established Pathogenic and Likely Pathogenic variants.  This helps us understand if a new variant may have a propensity to cause disease. The thought is that if a new variant is occurring at a region previously established as causing pathogenicity, then the new variant may be pathogenic too (ACMG guideline: PM1 “moderate” assessment criteria).

Is my variant guilty of pathogenicity because of its proximity to a pathogenicity hotspot?

In the image above, we see that there are hotspots (red) and coldspots (blue) for pathogenicity in STXBP1.  The hotspot values were generated from the known Pathogenic and Likely-Pathogenic listed in ClinVar.  The coldspot values (highMAF) come from variants seen in healthy populations. In yellow we have Variants of Uncertain Significance (VUS). Intensity of the peak is a measure of both how many times different variations are seen at an amino acid position and if their nearest neighbors have the same assignment. This plot suggest there are spots in STXBP1 that can tolerate sequence diversity (blue bars) and spots where a hit leads to pathogenic behavior (red bars).  Further, the VUS are landing in both red bar and blue bar regions. Perhaps we can consider VUS to be either pathogenic or benign by this association? Yet, there is a critical assumption that leads to a question: How legitimate is it that every variant in healthy populations (“highMAF”) is ASSUMED to be benign?

2,504 healthy population genomes – Calculating the rare variants in each person

To dig into the validity (or invalidity) of this assumption, we can look to a large population study and ask how many times do we see variation and what are their types. The 1000 Genomes Project Consortium shows an average person has about 4,500,000 million variations. Of these, about 100,000 are somewhat rare because they are seen in less than 1/200 persons (<0.005 MAF). The even more rare “singletons” of the study occur at a frequency of 1 per 2504 persons. This restriction gives us about 10,000 more rare variations to think about per each person. Yet, to get even more rare and be able to ask the question how many variants per person meet the 1 per 200,000 USA definition for Rare Disease frequency, the study size would need to be 100x bigger. Nevertheless, we have interesting data reported in the 1000 Genomes study on healthy population variants that are also seen as pathogenic in Human Gene Mutation Database (HGMD) and ClinVar datasets. Filtering the observed path in healthy population as frequency per individual, every person can expect to harbor 20-25 variants of established pathogenicity.

A larger study by Karczewski et al. 2019 is approaching the scale need for assessing Rare Disease.  A dataset of 141,456 human genomes (125,748 exomes and 15,708 genomes) was harvested from the wildtype controls used in various disease studies. The exomes observe variation mostly in the coding sequence of a gene, while the genomes record variant information across the gene (coding + upstream/downstream/introns). The result is a deeper measure of the frequency of missense variation that approaches the 1 in 200,000 genomes needed for Rare Disease designation.  Currently the National Organization for Rare Disease (NORD) list 1258 disease in their database. STXBP1 cross references to two of these (Dravet and West Syndromes).  Both of these syndromes each have a support group, which are two of the 283 total family foundation groups that are listed in the NORD member list.

Yet the situation for Rare Disease is larger.  In the NIH’s Genetics and Rare Disease (GARD), there are 6264 unique genetic diseases listed. This suggest there are thousands of genes for which we can expect to have gene variant issues leading to disease. ClinVar currently list 7046 is the number of “Genes with variants specific to one protein-coding gene.”  Basically it appears that a third of your 20,000 protein coding genes could take a hit that increases your risk or likeliness of coming down with genetic disease symptoms. 

The GARD lists an intriguing statistics that 20-25 Americans are living with Rare Disease.  The USA’s current population is 327.2 Million, so roughly 1 in 15 individuals world wide are probably living with rare disease. Assuming monogenic cause, then at least 51 million pathogenic might residing in the human population.  Add polygenic burden and the number may be a multiple (100, 150, 200, 250….??) for variants associated with disease currently being experienced today. Guilt by association to hotspots and coldspots might provide some answer, but functional studies are the more definitive proof, and +50 million is a lot of animal models to build!!

Rare and Not-So-Rare – Finding 100 Impactful Targets for Modeling Disease-Gene Associations in Alternative Animals

What genes are good candidates for alternative animal modeling?

I set out to determine which important disease genes are good candidates for creating animal models in C. elegans. The first step was to turn to a database that has a comprehensive listing of human genes and their disease association. The DisGeNet database has nearly every human gene annotated for its level of disease association (17,549 genes as of June 2019). They provide a curated list that has 8400 genes with Gene-Disease Association (GDA) score of 0.1 or higher. For the top 1000 genes the GDA scores are 0.69 or higher, which indicates they scored high for having a significant disease association. These top 1000 were selected for examination of their ortholog status in C. elegans using the Diopt database. 749 othologies were detected, of which 411 had clear reciprocal nature (back-blast gives starting gene for the ortholog as best hit). The top 100 of these genes for high homology and detectable loss-of-function consequence were selected.

Tabulation of disease-associated genes with properties favorable for C. elegans humanization

The top 100 are tabulated in gene-alphabetical format below. These 100 genes have 8360 variants as known to be as problematic (Path, Likely Path, or VUS).

Use a search tool to quickly find out if your favorite gene occurs below.

(Note: gene knock out for 58% of these genes results in lethality.)

Human geneDisease associationNematode gene (LOF)Problem variants
ABCB1Colchicine resistance; Inflammatory bowel diseasepgp-9 (development)3
ABCB6Dyschromatosis universalis hereditaria; Microphthalmia; Pseudohyperkalemiahmt-1 (development)11
ABCC1Peripheral Neuropathymrp-1 (development)38
ACTG1Baraitser-Winter syndrome 2; Deafness, autosomal dominant 20/26act-4 (lethal)57
ADAAdenosine deaminase deficiency; Severe combined immunodeficiency C06G3.5 (development)81
ADAM10Reticulate acropigmentation of Kitamura; Alzheimer diseasesup-17 (lethal)5
AGO2Alcoholic Intoxication, Chronicalg-1 (lethal)1
AHRRetinitis pigmentosa; Malignant neoplasmahr-1 (development)2
ALDH2Alcoholic Intoxication, Chronicalh-1 (lethal)3
APEX1Malignant neoplasmexo-3 (development)2
ATG5Spinocerebellar ataxiaatg-5 (morphology)2
BMS1Aplasia cutis congenitaY61A9LA.10 (lethal)1
CALRschizoaffective disordercrt-1 (lethal)1
CATtype 2 diabetes mellitusctl-1 (development)2
CDC42Takenouchi-Kosaki syndromecdc-42 (lethal)10
CHEK1Malignant neoplasmchk-1 (lethal)3
CIB2Deafness, autosomal recessive; Usher syndromecalm-1 (morphology)11
CTSBKeratolytic winter erythemaF57F5.1 (lethal)0
CTSDCeroid lipofuscinosis, neuronal, 10asp-4 (morphology)103
DECR1SchizophreniaF53C11.3 (morphology)1
EIF4EAutismife-3 (lethal)0
ENO1Enolase deficiencyenol-1 (lethal)0
EPHX1Hypercholanemia; Malignant neoplasm; W01A11.1 (development)3
ERCC1Cerebrooculofacioskeletal syndromeercc-1 (lethal)17
ERCC2Cerebrooculofacioskeletal syndrome 2; Trichothiodystrophy 1, photosensitive; Xeroderma pigmentosum, group Dxpd-1 (lethal)72
ERCC3Trichothiodystrophy 2, photosensitive; Xeroderma pigmentosum, group Bxpb-1 (lethal)33
FASNObesity diseasefasn-1 (lethal)83
FHFumarase deficiency; Leiomyomatosis and renal cell cancerfum-1 (lethal)1830
G6PDHemolytic anemia, G6PD deficient (favism); Resistance to malaria due to G6PD deficiencygspd-1 (lethal)91
GAD1Cerebral palsy, spastic quadriplegic, 1 unc-25 (movement)35
GAPDHhepatocellular carcinomagpd-2 (lethal)0
GCH1Dystonia, DOPA-responsive, with or without hyperphenylalaninemia; Hyperphenylalaninemia, BH4-deficient, Bcat-4 (movement)76
GGT1Glutathioninuria; chronic hepatitis BH14N18.4 (development)1
GNA12ulcerative colitisgpa-12 (lethal)2
GPIHemolytic anemia, nonspherocytic, due to glucose phosphate isomerase deficiency gpi-1 (development)11
GPTnon-alcoholic fatty liver diseaseC32F10.8 (lethal)46
GSK3Bbipolar disordergsk-3 (lethal)0
GSRHemolytic anemia due to glutathione reductase deficiency gsr-1 (lethal)0
HCCSMicrophthalmoscchl-1 (lethal)8
HDAC2chronic obstructive Airway Diseasehda-1 (lethal)1
HPRT1 HPRT-related gout; Lesch-Nyhan syndromehprt-1 (morphology)55
HRASBladder cancer, somatic; Congenital myopathy with excess of muscle spindles; Costello syndrome; Nevus sebaceous or woolly hair nevus, somatic; Schimmelpenning-Feuerstein-Mims syndrome, somatic mosaic; Spitz nevus or nevus spilus, somatic; Thyroid carcinoma, follicular, somaticlet-60 (lethal)81
HSP90AA1Breast Cancerhsp-90 (lethal)0
HSPA4bipolar disorderhsp-110 (lethal)0
HSPA5hepatocellular carcinomahsp-3 (lethal)0
HSPA9Anemia, sideroblastic, 4; Even-plus syndromehsp-6 (lethal)6
HSPD1Leukodystrophy, hypomyelinating, 4; Spastic paraplegia 13, autosomal dominanthsp-60 (lethal)35
IDH1Glioma, susceptibility to, somaticidh-1 (development)3
ILKcardiomyopathypat-4 (lethal)33
ISYNA1Malignant neoplasminos-1 (development)0
ITPR1Gillespie syndrome; Spinocerebellar ataxiaitr-1 (lethal)139
MAP2K1Cardiofaciocutaneous syndrome 3mek-2 (lethal)93
MAP2K7Malignant neoplasmmek-1 (development)2
MAPK1gastric carcinogenesismpk-1 (lethal)2
MAPK14schizophreniapmk-1 (lethal)1
MFN2Charcot-Marie-Tooth disease; Hereditary motor and sensory neuropathyfzo-1 (development)202
MRE11Ataxia-telangiectasia-like disordermre-11 (development)515
MSH2Colorectal cancer; Mismatch repair cancer syndrome; Muir-Torre syndromemsh-2 (development)1905
MTHFRHomocystinuria; Neural tube defects; Schizophrenia; Thromboembolism; Vascular diseasemthf-1 (development)212
MTORFocal cortical dysplasia; Smith-Kingsmore syndromelet-363 (lethal)65
MTRHomocystinuria-megaloblastic anemia, cblG complementation type; Neural tube defects, folate-sensitive, susceptibility tometr-1 (development)200
NME1Neuroblastomandk-1 (lethal)144
NT5C2Spastic paraplegiaY71H10B.1 (lethal)14
ODC1Colonic adenoma recurrenceodc-1 (fecundity)5
P4HBCole-Carpenter syndrome 1 pdi-2 (lethal)2
PCPyruvate carboxylase deficiencypyc-1 (development)329
PCNAAtaxia-telangiectasia-like disorder 2 pcn-1 (lethal)1
PCYT1ASpondylometaphyseal dysplasiapcyt-1 (development)15
PEPDProlidase deficiencyK12C11.1 (lethal)60
PGK1Phosphoglycerate kinase 1 deficiencypgk-1 (development)28
PHGDHNeu-Laxova syndrome; Phosphoglycerate dehydrogenase deficiencyC31C9.2 (development)42
PLK1Neoplasmsplk-1 (lethal)0
PNPImmunodeficiencyK02D7.1 (movement)46
PNPLA2Neutral lipid storage disease with myopathyatgl-1 (lethal)66
PPP3CAArthrogryposis, cleft palate, craniosynostosis, and impaired intellectual development; Epileptic encephalopathy, infantile or early childhood, 1tax-6 (movement)8
PSEN1Acne inversa; Alzheimer disease; Cardiomyopathy, dilated; Dementia, frontotempora; Pick diseasesel-12 (development)134
PTDSS1Lenz-Majewski hyperostotic dwarfismpssy-1 (morphology)6
RAD51Fanconi anemia, complementation group R; Mirror movements 2; Breast cancer, susceptibility torad-51 (lethal)11
RAP1AKabuki syndromerap-1 (lethal)0
RPS19Diamond-Blackfan anemia 1rps-19 (lethal)37
SDHBGastrointestinal stromal tumor; Paraganglioma and gastric stromal sarcoma; Paragangliomas 4; Pheochromocytomasdhb-1 (lethal)276
SDHCGastrointestinal stromal tumor; Paragangliomasmev-1 (lethal)134
SDHDMitochondrial complex II deficiency; Paragangliomas; Pheochromocytomasdhd-1 (development)146
SFRP1Narcolepsysfrp-1 (morphology)1
SLC6A2Orthostatic intolerancedat-1 (movement)50
SMARCA1brain malformationisw-1 (lethal)1
SMARCA2Nicolaides-Baraitser syndromeswsn-4 (lethal)90
SMARCB1Coffin-Siris syndrome; Rhabdoid tumors; Schwannomatosissnfc-5 (lethal)93
SMC1ACornelia de Lange syndrom3him-1 (lethal)110
SMC3Cornelia de Lange syndromesmc-3 (lethal)75
SOD1Amyotrophic lateral sclerosis 1 sod-1 (development)43
SOD2Cardiomyopathy, Dilatedsod-3 (development)2
TATTyrosinemiatatn-1 (lethal)59
TBPSpinocerebellar ataxia; Parkinson diseasetbp-1 (lethal)3
TIMM8AMohr-Tranebjaerg syndromeddp-1 (development)19
TYMSColorectal Carcinomatyms-1 (lethal)2
USO1Malignant neoplasmuso-1 (morphology)0
VCPAmyotrophic lateral sclerosis 14, with or without frontotemporal dementia; Charcot-Marie-Tooth disease, type 2Y; Inclusion body myopathy with early-onset Paget disease and frontotemporal dementia 1cdc-48.2 (lethal)77
WLSBone densitymig-14 (lethal)31
XPR1Basal ganglia calcificationY39A1A.22 (development)5

Genomic baggage – What are the skeletons rattling in your genetic closet?

When you get the genomic report, you have a movement of trepidation. What will it say? ….Will it have a reveal that says you should do countermeasures immediately? ….Will it say something that you can do nothing about? The latter condition occurred for me. There were findings that had strong impact on my psyche.

Two things were called out heavy.  A cancer risk of melanoma. Good thing my family, first my momma, and then my spouse, have been diligent in their liberal in the application sunscreen to the family.  Once I googled and pubmed searched the MC1R(R160W) locus, I found the evidence was less than compelling for a dramatic change of lifestyle. Just keep the sunscreen coming and I will likely be fine.

The carrier result was a little more of a shocker. A good personal friend has a daughter homozygous in this gene. It was discovered in utero and they have been vigilant ever since. Their daughter is now in her teens. Doing exceptionally well and acting like any normal kid – currently enthralled with dance class and other outdoor activities. Preventative medicine done right. So getting tagged with a pathogenic in this gene is giving me mixed feelings.  A mix of some worry and yet, almost pride. Even though my good friends don’t share my specific genetic lesion, it still feels very personal and connecting. Furthermore, this is one of the genes where modern genomic medicine is making great progress in understanding and treatment.

Will you too be a carrier of a pathogenic variation?

Carrier status is something all of us should expect.  Veritas recently publicly disclosed at the Precision Medicine World Congress that their database has 90% of customer reports as returning with carrier status for at least one pathogenic variant. Recent discussions with Robert Green at Harvard confirm this – he showed me a large dataset that gave the number as 92% of healthy populations as being carriers for known pathogenic variants. You might think that there are a lucky few (10%) who are not carriers, but think again. The average person will have close to 3 million differences from the reference genome and this may be an underestimate.  Distribute that unbiased across the genome and we have coding regions with close to 30 thousand variations. Since you have close to 20 thousand genes that means every gene has approximately 1.5 variations in it. Now lots of approximating, and does not factor in selection against bad variations. Yet in that quick calculation, the main message is every gene is likely to have a variation and some genes will have multiple variations. So the original question of how many of these are pathogenic, becomes difficult to approximate. Publications suggest we may have up about 1300 suspect variations hiding in our genome. Yet definitive variants with “known” pathogenicity is likely to be much lower in your genome.

Complicating this is issue is variable penetrance – a pathogenic variant in one family may behave with monogenic behavior in that family. While in another family, that same variation may be acting more polygenic – it needs other gene mutations to have pathos in the patient.  It is behaving more like a “risk factor” for disease.

Pathogenic variant frequency in Chris Hopkins’ genome

The vagueness of my carrier status “kills” me, so I wanted to know in more.  I contacted a good friend at the Rady Children’s Hospital.  Dr. Matthew Bainbridge is a researcher who was a key contributor to the Rady’s renowned speed at using whole genome sequencing for rapid genetic diagnosis.  Matthew introduced me to some software tools he has been developing. His company Codified Genomics has developed a variant analysis software that allows exploration of one’s genomic variants.  All you need is your BAM or VCF files.

What’s that? …You don’t know what is a BAM file, …or a VCF?!!!

Dont worry, lets decode the jargon.  In the clinphen journey to understand my clinical predilections, predispositions, and pathos, I found myself getting immersed into the intricacy of the end-to-end solution in genomic data acquisition and interpretation.  What happens when you spit in a tube and put it in the mail? A lot of stuff! I came across an amazing guide to understanding the industry space behind genomic sequencing, the Enlightenbio Report.  This help me get a tightly-focused view on the process of understanding one’s DNA.

That first box is what happens after you spit in the tube. The chemicals in the tube react with the cellular material in the spit to help stabilize it and prevent its degradation. This allows one to send the sample at room temp to the lab.  On the receiving, the lab initiates a protocol to isolate the DNA that comes from the mouth epidermal cells that slough off into your spit. DNA is manipulated in such a way that it can go onto a microchip slide and set of DNA sequencing chemistry reactions are used to read out the DNA in small segments of sequence. Each of the millions of sequence segment reads is recorded as a fastq file.  The fastq read segments are compared and aligned to a reference genome to make a BAM file.  The BAM file alignments are processed to detect where sequence variation occurs, which is recorded as a VCF file.  VCF files are analyzed by comparison to databases and assessments are made of each variant’s potential for pathogenicity.  The assessment data is generally provided as a report to the clinician (or the intrepid genome wanderer such as myself). This report takes the raw data and massages it into a format for easier understanding of what is the baggage of one’s genome.

1604 suspect variations in my genome

Matthew helped me upload my VCF files into the Codified program. Next, he showed me how to wander around sifting the data by various aspect such as allele frequency,  dominant and recessive status. known pathogenic genes, etc. The upload to Codified indicates I have exactly 1604 suspect variations occurring at an appreciable fraction of the reads and at positions inside, or in close proximity to, the coding sequence of my genes. These variants are suspect because they may alter protein function or levels of expression for the identified genes.  If we just limit the dataset to changes that alter amino acid composition (non-synomous), we get 875 gene variations. Add back potential spicing issues, indels, and aberrant start and stop codon issues, we are back up to 1440 variants as genetic differences that are highly suspect for altering gene expression and function.

316 MIM variant hits in my genome!

What happens if we limit the entire 1604 to only those genes with recognized involvement in disease.  We get 316 variants occurring in genes as recognized by the Mendelian-inheritance-in-Man (MIM) database for being disease-associated genes.  When we restrict this set to coding issues only, we get 281 suspect variants.  

I get clean bill of health when I get a physical exam, so can I disregard these 281 suspect variants?

One easy step is to filter for carrier only status. 111 variants are clearly identifiable as only autosomal recessive (AR).  I would require two hits in each of the paired chromosome copies to have these be of concern. Since, no paired hits were detected, we can dismiss these genes as in need of my immediate concern. As a result, we are now only concerned about hits in genes with known autosomal dominant (AD) issues.  These are the genes where only one bad hit is needed to render them pathogenic. Bottomline, 170 gene variants in my genome are worthy of further contemplation.

How frequent is frequent in my 170?

There is good rational to only be concerned about a hit in a gene with AD propensity, if it is rare in the population.  The thinking is that if a variation is deleterious by itself (AD), it cannot be tolerated at a high level in the human population.  Contrast this to the recessive (AR) variants (also called “alleles” when talking about frequency). My known AR pathogenic variant in the CFTR gene is in the human population at 0.0014 minor allele frequency (MAF).  This high allelic frequency is tolerated in the human population because you need two hits in each gene copy in order to have a syndromic issue. Autosomal dominant alleles must have much lower frequency. If we cull the 170 for variations that occur at 0.00001 MAF or lower, we get 53 gene-codon-altering variations to be concerned about.  Examining the list manually gave me 17 genes for which I hold varying degrees of concern, of which I list the top 10:

None are in the ACMG59

In a prior blog post, I described the list of genes for which can be included in a clinical report as a secondary findings.  These are allowed in a report because these 59 genes have known actions that can be taken to mitigate their negative health effect.  None of my genes of concern are in this group, so the immediate actionability is absent for my findings about the baggage in my genome.  In fact, the genes I am listing as genes I am concerned about, but they actually do not significantly bother me that much. I am still alive and in good health.  If I had pathogenic variations in these genes the negative health consequence, they should have manifest many years ago. Nevertheless, the three for which I hold highest concern are CACNA1S, LGI1, and RTN2.

The variation in CACNA1S (p.R419H) may sound like a benign, and it is a conservative change in amino acid composition, but it occurs in a highly-conserved region. It is present as an Arginine (“R”) in humans, mice, fish, flies and worms. This invariant use of R implies protein function will be compromised when the position is substituted with a histidine. The LGI1 (p.A253T) variant is also a conserved amino acid change, but it is in a less conserved region. This lack of complete conservation indicates this position might tolerate an Arginine to Threonine change. The RTN2 is complex variant. It does two significantly alarming changes. It makes a dramatic Leucine to Arginine change in the 4th exon up from the end of the protein. It also occurs immediately adjacent to splice junction acceptor site. This alteration of splicing region suggest it could lead to improper splicing in a highly conserved region of the protein and thus create a defective protein.

It is likely that all three of these genes yield a protein of messed up function. But what is not clear is the type of mess-up. Are they leading to loss-of-function (LOF) activity?   Or do they lead to dominant gain-of-function (GOF)? These variations are most likely in the LOF category. Otherwise, I would almost certainly be dealing with the disease symptoms that the GOF variant’s manifest. Yet this is just supposition – a hypothesis. We don’t yet have solid evidence for what is going on.

How could we get final answer for if these variations are these pathogenic or not?

To get precision answers, we could model all of these variants in C elegans,  For the CACNA1S and the RTN2, their high conservation from human to worm would allow direct modeling in the worm’s homologous position of the worm’s native gene (“Native locus”).

Our prior work with full gene humanization indicates more congruent results occur if we first swap in a human gene for the native gene locus (“Humanized Locus”) and then install variant. The use of a humanized locus allows modeling of any variant, whether it is highly conserved or not across many species. So far, in our studies all known pathogenic variants behave with deviant behavior, but only when put into humanized systems.  Contrast this to insertion in native locus – some known pathogenic alleles did not create detectable deviance of behavior!

For the 3 genes to which I am concerned, all are of favorable size that the human sequence can be easily optimized and installed for expression from the worm’s native locus (“Humanized” animal). If we can observe that the human gene can rescues loss of function, we will know we are off-to-the-races and can study variant biology in a gene-humanized system. The humanized animals will be precision proxies serving as clinical avatars of the patient condition.

CACNA1S is a drugable target. The creation of a humanized system expressing CACNA1S as gene replacement of egl-19 gene would generate a platform for drug discovery. The patient variants might be responsive to calcium channel blockers, such as benzothiazepines, phenylalkylamines and 1,4-dihydropyridines.  The end result, a highly-personalized medicine approaches would be achieved that finds drug treatments specific to the patient’s genetic pre-conditions.

The Unnecessary Procedure – A Problem with False Positives in Genomic Testing

There is a significant pressure to increase diagnostic yield and it has its consequences. BRCA testing is probably the most developed ecosystem for genetic tests but controversy remains about what medical procedures are best recommended for the patient. High profile cases like the decision of Angelina Jolie and to undergo a bilateral mastectomy and the implication of a “Positive” Turner Syndrome test have helped bring the controversies to more widespread attention..

adapted from Anna Parini (NewYorker article)

The heart of the controversy is how often is a correct diagnosis leading to a form of unnecessary care that is crowding out necessary care, or worse. Physician and Surgeon Atul Gawande wrote New Yorker piece titled:

“Overkill – An avalanche of unnecessary medical care is harming patients physically and financially. What can we do about it?”

This article nicely explores the problem of unproductive or unnecessary procedures. In regards to genetic testing, we need to be mindful of all the downstream repercussions of a positive (or negative) test result.

Forms of Risk in Breast Cancer Testing

The decision to have a mastectomy is challenging decision. The involvement of BRCA1 and BRCA2 in breast cancer is clear, yet what to do about it is still controversial (Domchek 2018). From 2006 to 2014, a retrospective study was conducted and identified 780 women at 11 cancer centers who underwent BRCA testing after breast cancer was detected (Rosenberg 2016). 86% of those testing positive elected to have the bilateral mastectomy procedure. But perhaps even more striking, 51% who tested negative also went on to have bilateral mastectomy.  A question arises:

Does the election to have full mastectomy by a large fraction of women testing either positive or negative for BRCA1 and BRCA2 pathogenic variants indicate this form of genetic testing has low value to treatment care?

In the general population, the risk of death from surgical procedure is small but real at about 0.01%. So it is prudent to keep that in mind before undergoing the knife. Are there more minimally invasive procedures available? A rather old study (Kurian 2014) suggest it has been known for a while that double mastectomy is no better that the less invasive breast-conserving surgery with radiation for impact on patient mortality. These authors went on to describe:

“In a time of increasing concern about overtreatment, the risk-benefit ratio of bilateral mastectomy warrants careful consideration and raises the larger question of how physicians and society should respond to a patient’s preference for a morbid, costly intervention of dubious effectiveness”

The Need of Piece of Mind

In light of the evidence, what are the psychological factors that drive the choice to have bilateral Mastectomy? For those testing positive as carriers of pathogenic BRCA mutations, the choice is backed up by evidence that reoccurrence risk drops significantly, but for the noncarriers, it appears the impact of having a breast cancer diagnosis is a sufficient driver (Hamilton 2017). Within the physician-patient relationship there is a need to better communicate how to avoid unnecessary procedures and yet find ways to meet the psycho-social need of the patient.

Although ClinVar is a useful resource for seeing data distributions and trends,  groups need to be cautious with the details. Julie Eggington from the Center for Genomic Interpretation states “I would warn that rates derived from what is being reported in classification databases are likely very different than what is really going on in testing labs and academic labs. People rarely report boring stuff – I think calculated pathogenic rates derived from classification databases are too high in almost every context.” Julie further postulates that the issue of false positives is larger than people realize. The implication is that about 30% of the variants in ClinVar designated as pathogenic may in fact not be pathogenic. Within a gene, some variants are being more over-interpreted than others.  Groups may be relaying data that is fraught with the inaccuracy of a high false positive rate.

Also unsettling is the Variants of Uncertain Significance (the “VUS” problem) are frequently not reported to the physician at time of genetic testing. Recent studies in hereditary cancer have found that 8.7% of VUS have been reclassified to Likely Pathogenic status while only 0.7% of pathogenic have been changed to non-pathogenic status (Mersch 2018). This reclassification leaves us with 21% Path, 21% benign and 58% VUS in hereditary cancer.   This closely resembles the overall distribution as outlined in an earlier blog post that relies on ClinVar data (34P:26B:40V). Keep in mind from the prior paragraph, the level of pathogenic variants may actually be much lower than what is reported in the databases sources. This has been leading to a follow-on problem, as variants get reclassified, there is frequently a big disconnect in getting that information back out to the patient.

Consumer reports suggestions:

What are some of this things we can do as consumers of genetic testing? A good consumer reports article makes 5 suggestions one should consider when getting a genetic test done and to contemplate what will be the procedure (surgery, drugs, or no-therapeutic-approach-is-known) if a pathogenic finding is a result.

1) Do I really need this test or procedure?

2) What are the risks and side effects?

3) Are there simpler, safer options?

4. What happens if I don’t do anything?

5. How much does it cost, and will my insurance pay for it?

Uncertainty in Uncertain Times

We are embarking down the new frontier of precision medicine. Our genomes will hold a big key to better understanding of our health and lifespan.  But, because one-gene / one-disease hypothesis is the exception and not the rule, we have a long way to go in getting predictive and actionable as we obtain more knowledge of the molecular pathogenicity of the variation in our genomes.  The journey to link genotype to phenotype will be long and arduous, and possibly quite epic in its implication to the health management approach we take as a species.

Domchek SM,. Risk-Reducing Mastectomy in BRCA1 and BRCA2 Mutation Carriers: A Complex Discussion. JAMA. 2018 Dec 6. doi: 10.1001/jama.2018.18942.

Rosenberg SM, Ruddy KJ Tamimi RM Gelber S Schapira L Come S Borges VF Larsen B, Garber JE, Partridge AH,. BRCA1 and BRCA2 Mutation Testing in Young Women With Breast Cancer. JAMA Oncol. 2016 Jun 1;2(6):730-6. doi: 10.1001/jamaoncol.2015.5941.

Kurian AW, Lichtensztajn DY Keegan TH Nelson DO Clarke CA Gomez SL. Use of and mortality after bilateral mastectomy compared with other surgical treatments for breast cancer in California, 1998-2011. JAMA. 2014 Sep 3;312(9):902-14. doi: 10.1001/jama.2014.10707.

Hamilton JG, Genoff MC Salerno M Amoroso K Boyar SR Sheehan M Fleischut MH Siegel B Arnold AG Salo-Mullen EE Hay JL Offit K Robson ME. Psychosocial factors associated with the uptake of contralateral prophylactic mastectomy among BRCA1/2 mutation noncarriers with newly diagnosed breast cancer. Breast Cancer Res Treat. 2017 Apr;162(2):297-306. doi: 10.1007/s10549-017-4123-x. Epub 2017 Feb 1.

Mersch J, Brown N, Pirzadeh-Miller S, Mundt E Cox HC Brown K Aston M Esterling L Manley S Ross T,. Prevalence of Variant Reclassification Following Hereditary Cancer Genetic Testing. JAMA. 2018 Sep 25;320(12):1266-1274. doi: 10.1001/jama.2018.13152.

What is in your genome? – MyGenome’s WGS data reveals some interesting surprises but no immediate action needed.

Got my report from Vertias for the MyGenome analysis. What is it that is hiding between the words that come out of my mouth that get written down on this blog? Saliva was delivered into a tube, 3 months ago, and finally the data is starting to arrive.

What lays beneath the surface, may not stay beneath the surface.

If you are like me, you may think you are “healthy,” but we know what is highly likely – you will be a carrier for a disease and it’s also likely risk factors for other diseases will be identified in your genome. Note, 9 of 10 persons are carriers for rare disease, as previously addressed in a prior post. You will even have a low chance (~20%) for immediately actionable conditions that you can start to explore now and find mitigating options.

The Ticking Time Bomb

That last one is perhaps the most compelling reason to get your genome done – can you capture an impending time bomb of genetic disease before it has gone off! For pathogenic variants in the ACMG59 “secondary findings” genes, you stand a good chance of being able to diffuse the bomb before it is too late.

For my report, immediately actionable findings were not discovered. I am highly skeptical that we can say I am healthy and “free” of a genetic precondition. It is clear that researchers are only just now scratching the surface of this potential. The rare monogenic drivers of disease are somewhat understood, but the polygenic drivers are way more in their infancy.

What lies beneath might be two variations that, by themselves are not pathogenic, but together they can cause, or highly exasperate, a disease.

Think about the size of the problem from a theoretical aspect. There are roughly 7000 genes thought to be involved in rare disease. Some of the variants in these genes are monogenic and powerful enough by themselves to cause disease. But it is likely there are many more variants in these genes for which their contribution is not pathogenic by themselves and they need another variation somewhere else in the genome to enable manifestation of disease. Taking just the 7000 genes, the diagenic possibilities are 49 million. In fact, the remainder of the genome can be part of the diagenic, so the space may actually be near 400 million. Then what about 3 gene sympaticos – 8 trillion!! Thats a 1000x more than the number of the people on the planet! The only hope we have for predictive systems here is Big Data and AI options to help us gain sufficient understanding.

Heterogeneity and Homogeneity – the Advantage and Bane of Each.

To truly move to greater understanding of our genetic liabilities, we must move from qualitative (yes or no?) assessment to the quantitative (how much?) assessment. Knowing that a gene variant is 50% pathogenic in its potential can help us start to deconvolute the polygenic problem. When two 50% pathogenic variants in the same disease pathway are seen in the same individual, we have will have reached a threshold and the disease condition can manifest. With the amazing amount of heterogeneity in the human genome, analyzing patient derived tissue will be an extremely difficult approach for quantify pathogenic potential of a variant. Instead, it becomes highly desirable to use systems of high homogeneity. A uniform genetic background greatly simplifies the quantitation of disease contribution of a variant. Knowing the genetic background is the same, we can easily say that gene variant A is XX% stronger than gene variant B in regards to a pathogenic propensity, after deploying a range of function tests of deviant behavior for each of the variants.

Proxies of Disease Biology

The use of C. elegans has unique attributes that make it an ideal system for quantifying variant behavior. There is enough similarity of gene function between humans and the worm, that so far, 4 of 4 human gene insertions with observable sequence homology have been capable of rescue function as gene replacement of the ortholog gene in the worm. Of the many favorable features (speed to transgenics, microscopic size, high-throughput amenable, wide range of easily measured phenotypes, etc), the worm is a self fertilizing hermaphrodite. What this means is that when growth conditions are good, the animal clones copies of itself and can go from 1 animal to nearly 30 million near identical animals in just under 10 days. Only when conditions get stressful does the accident of spontaneous nondisjunction of sex chromosomes become more prevalent and males can form. Under these stress conditions, males go from being extremely rare to about 1 per 100 animals. So the worm has evolved to be highly tolerant of homogeneity and only needs to sample heterogeneity a small fraction of the time to maintain health of the species (

Classical LOH – the Bane of Self-fertilization

The clonal nature is quite useful for getting large populations of nearly identical animals, but there is a flip side that creates problems. There is a phenomenon in genetics called Loss of Heterozygosity. Commonly applied to explain the evolution of cancer cell populations, the principle applied to population genetics in species is backcrossing will drive heterozygous conditions towards rarity. What this means for a self fertilizing hermaphrodite is that if the individual starts to self-propagate, and has one of their gene’s in a heterozygous conditions (A/B: variants A and B for a given gene),then half the progeny will be homozygous (either A/A or B/B) and the other half will be heterozygous (A/B). In the next generation, the prior homozygous remain homozygous (either A/A or B/B), but the hets generate another 50/50 split of homo and het. After 10 generations the het is nearly nonexistent in the population (<1%) . The population has bifringed to to A/A and B/B strains. If B/B is deleterious to life, then at 10 generations, most of the animals are A/A.

DNA replication is not perfect. As a clonal population expands, random mutations happen that essentially create heterozygous conditions at random genes (A/B scenarios). For the researcher maintaining strains, one of the biggest mistakes they can do is serially propagate the next generation plate by isolation of only 1 individual for the next population expansion. Since each clone progeny will have at least 4 de novo mutations in their genome from their parent, in just a few generations of this extreme selectivity, the population after 10 generations will have quite a few random and possibly pathogenic hits in quite a few genes and the animals of the serially-propagated strain will have drifted significantly in their genetics from the starting strain. Critical here for C. elegans is to occasionally access sexual reproduction to avoid Muller’s Ratchet.

Genetic drift is Unavoidable

To mitigate this, but not eliminate it, good practice is to transfer 10 to 20 animals for next generation of animals being maintained as a population. Even with this technique, fecundity compromised strains can quickly evolve new mutations that eliminate the starting phenotype and grow faster. So, add to a variety of other transgenerational silencing mechanism, the clonal propagation of a strain can lead to auto-selection of suppressors that effectively “silence” an engineered gene phenotype. Thankfully worms can be flash frozen shortly after making a transgenic line, so one can essentially have an endless supply of starting material. Genetic drift driving selection of gene silencing backgrounds can be avoided by going to a fresh thaw. As a result, high levels of homogenous backgrounds can be obtained for comparing the properties between two variants.

Anti-simpatico Creates More Complexity

Lets take the dialog back to the quantitation of pathogenicity in variants of human disease genes. There are almost certainly some variants in the genome that act to suppress a “monogenic” pathogenic variant. We can envision a negative pathogenicity value for these variants. And adding more complexity to this, is the fact that a variant can be pathogenic in one condition and be protective in another condition. The classic example is sickle-cell anemia and malaria. A person who is a carrier for a recessive pathogenic variation is protected from malaria infections. Yet for persons who are homozygous for the V6Q change in hemogobin, they will have a pathogenic condition that leads to quality of life issues and a reduced lifespan ( So, as Julie Eggington says, pathogenicity assessment must be made in a disease-specific context. As a result, calculating all of any one individual’s genetic liabilities is an exceedingly complex problem.