What is in your genome? – MyGenome’s WGS data reveals some interesting surprises but no immediate action needed.

Got my report from Vertias for the MyGenome analysis. What is it that is hiding between the words that come out of my mouth that get written down on this blog? Saliva was delivered into a tube, 3 months ago, and finally the data is starting to arrive.

What lays beneath the surface, may not stay beneath the surface.

If you are like me, you may think you are “healthy,” but we know what is highly likely – you will be a carrier for a disease and it’s also likely risk factors for other diseases will be identified in your genome. Note, 9 of 10 persons are carriers for rare disease, as previously addressed in a prior post. You will even have a low chance (~20%) for immediately actionable conditions that you can start to explore now and find mitigating options.

The Ticking Time Bomb

That last one is perhaps the most compelling reason to get your genome done – can you capture an impending time bomb of genetic disease before it has gone off! For pathogenic variants in the ACMG59 “secondary findings” genes, you stand a good chance of being able to diffuse the bomb before it is too late.

For my report, immediately actionable findings were not discovered. I am highly skeptical that we can say I am healthy and “free” of a genetic precondition. It is clear that researchers are only just now scratching the surface of this potential. The rare monogenic drivers of disease are somewhat understood, but the polygenic drivers are way more in their infancy.

What lies beneath might be two variations that, by themselves are not pathogenic, but together they can cause, or highly exasperate, a disease.

Think about the size of the problem from a theoretical aspect. There are roughly 7000 genes thought to be involved in rare disease. Some of the variants in these genes are monogenic and powerful enough by themselves to cause disease. But it is likely there are many more variants in these genes for which their contribution is not pathogenic by themselves and they need another variation somewhere else in the genome to enable manifestation of disease. Taking just the 7000 genes, the diagenic possibilities are 49 million. In fact, the remainder of the genome can be part of the diagenic, so the space may actually be near 400 million. Then what about 3 gene sympaticos – 8 trillion!! Thats a 1000x more than the number of the people on the planet! The only hope we have for predictive systems here is Big Data and AI options to help us gain sufficient understanding.

Heterogeneity and Homogeneity – the Advantage and Bane of Each.

To truly move to greater understanding of our genetic liabilities, we must move from qualitative (yes or no?) assessment to the quantitative (how much?) assessment. Knowing that a gene variant is 50% pathogenic in its potential can help us start to deconvolute the polygenic problem. When two 50% pathogenic variants in the same disease pathway are seen in the same individual, we have will have reached a threshold and the disease condition can manifest. With the amazing amount of heterogeneity in the human genome, analyzing patient derived tissue will be an extremely difficult approach for quantify pathogenic potential of a variant. Instead, it becomes highly desirable to use systems of high homogeneity. A uniform genetic background greatly simplifies the quantitation of disease contribution of a variant. Knowing the genetic background is the same, we can easily say that gene variant A is XX% stronger than gene variant B in regards to a pathogenic propensity, after deploying a range of function tests of deviant behavior for each of the variants.

Proxies of Disease Biology

The use of C. elegans has unique attributes that make it an ideal system for quantifying variant behavior. There is enough similarity of gene function between humans and the worm, that so far, 4 of 4 human gene insertions with observable sequence homology have been capable of rescue function as gene replacement of the ortholog gene in the worm. Of the many favorable features (speed to transgenics, microscopic size, high-throughput amenable, wide range of easily measured phenotypes, etc), the worm is a self fertilizing hermaphrodite. What this means is that when growth conditions are good, the animal clones copies of itself and can go from 1 animal to nearly 30 million near identical animals in just under 10 days. Only when conditions get stressful does the accident of spontaneous nondisjunction of sex chromosomes become more prevalent and males can form. Under these stress conditions, males go from being extremely rare to about 1 per 100 animals. So the worm has evolved to be highly tolerant of homogeneity and only needs to sample heterogeneity a small fraction of the time to maintain health of the species (

Classical LOH – the Bane of Self-fertilization

The clonal nature is quite useful for getting large populations of nearly identical animals, but there is a flip side that creates problems. There is a phenomenon in genetics called Loss of Heterozygosity. Commonly applied to explain the evolution of cancer cell populations, the principle applied to population genetics in species is backcrossing will drive heterozygous conditions towards rarity. What this means for a self fertilizing hermaphrodite is that if the individual starts to self-propagate, and has one of their gene’s in a heterozygous conditions (A/B: variants A and B for a given gene),then half the progeny will be homozygous (either A/A or B/B) and the other half will be heterozygous (A/B). In the next generation, the prior homozygous remain homozygous (either A/A or B/B), but the hets generate another 50/50 split of homo and het. After 10 generations the het is nearly nonexistent in the population (<1%) . The population has bifringed to to A/A and B/B strains. If B/B is deleterious to life, then at 10 generations, most of the animals are A/A.

DNA replication is not perfect. As a clonal population expands, random mutations happen that essentially create heterozygous conditions at random genes (A/B scenarios). For the researcher maintaining strains, one of the biggest mistakes they can do is serially propagate the next generation plate by isolation of only 1 individual for the next population expansion. Since each clone progeny will have at least 4 de novo mutations in their genome from their parent, in just a few generations of this extreme selectivity, the population after 10 generations will have quite a few random and possibly pathogenic hits in quite a few genes and the animals of the serially-propagated strain will have drifted significantly in their genetics from the starting strain. Critical here for C. elegans is to occasionally access sexual reproduction to avoid Muller’s Ratchet.

Genetic drift is Unavoidable

To mitigate this, but not eliminate it, good practice is to transfer 10 to 20 animals for next generation of animals being maintained as a population. Even with this technique, fecundity compromised strains can quickly evolve new mutations that eliminate the starting phenotype and grow faster. So, add to a variety of other transgenerational silencing mechanism, the clonal propagation of a strain can lead to auto-selection of suppressors that effectively “silence” an engineered gene phenotype. Thankfully worms can be flash frozen shortly after making a transgenic line, so one can essentially have an endless supply of starting material. Genetic drift driving selection of gene silencing backgrounds can be avoided by going to a fresh thaw. As a result, high levels of homogenous backgrounds can be obtained for comparing the properties between two variants.

Anti-simpatico Creates More Complexity

Lets take the dialog back to the quantitation of pathogenicity in variants of human disease genes. There are almost certainly some variants in the genome that act to suppress a “monogenic” pathogenic variant. We can envision a negative pathogenicity value for these variants. And adding more complexity to this, is the fact that a variant can be pathogenic in one condition and be protective in another condition. The classic example is sickle-cell anemia and malaria. A person who is a carrier for a recessive pathogenic variation is protected from malaria infections. Yet for persons who are homozygous for the V6Q change in hemogobin, they will have a pathogenic condition that leads to quality of life issues and a reduced lifespan ( So, as Julie Eggington says, pathogenicity assessment must be made in a disease-specific context. As a result, calculating all of any one individual’s genetic liabilities is an exceedingly complex problem.

Why Doctors Impede Genomic Testing Adoption Rate – Influence of Big Data and Presentations at the PWMC Clinical Genomics Industry Gathering.

I had the fortunate opportunity to attend the PMWC19 – Precision Medicine World Congress held in the sunny silicon valley of California in beginning of 2019. A good snippet was made by genomeweb – we are in a….

“struggle to figure out best practices for implementing genomics in the clinic.”

One of the factors that impacts adoption of genetic screening is the rate of successful diagnosis. Depending on how you pre-filter your patient population before applying your success criteria, the genomic diagnostic rates in publications range from 15 to 35%. Boasts were made at the meeting that some had achieved diagnostic rates as high as 60%, yet most evidence suggests it is near 20% when measured against a broad spectrum disease containing both monogenic and polygenic drivers. What does this mean to the clinician contemplating ordering of a gene panel – ordering a genetic panel screening will result in a diagnosis for only every 5th person for whom a test is ordered. So it is easy to see why many primary care physicians exploring genomic sequencing respond with pessimism.

A common physician response – “Genetic tests are not very useful.”

It might be that, as whole exome sequencing (WES) or whole genome sequencing (WGS) become more common place, diagnostic yields will increase a by a small percentage. Yet one of the challenges to overcome is a strong conservative desire by the clinical genetics community to keep diagnostic test restricted to the genes that are designated as clinically actionable. Commonly referred to as the “ACMG59“. These are the genes for which change-of-function alleles have clear-cut therapeutic options.

What about the other 7000 genes involved in disease, don’t we want to know data there too?

Vertias gave a talk that had very intriguing data. In their data set so far, there were two findings reported. One was population frequency on the ACMG59 and the other nugget of information was population frequency for all other genes associated with disease. Not revealed was the core ratios used for the variant calls, but keep in mind that for the top 20 genes the ratio is 40% VUS, 34% path and 26% benign (see prior post for more detail). Veritas reported close to 20% of “healthy” individuals were harboring at least one pathogenic variant in the ACMG59 and 90% of individuals were harboring at least one pathogenic variant in the 7000 other rare disease genes. Basically 9 out of 10 people are a carrier of something not so good and some of them may have a ticking time bomb of an autosomal dominant driver hiding in their genome. This ratio may go higher if many of the VUS convert to pathogenic.

What will happen once we convert the remaining VUS into pathogenic assignment?

Estimates on the conversion of VUS to pathogenic vary widely. Julie Eggington form the Center of Genomic Interpretation suggest only 1% of VUS will convert to path. Others have estimated it may be as high as 50%. Two lines of evidence lead me to think it is somewhere in between. First line of data comes from a prior blog post. In that post, I describe the data in ClinVar has 23% of all variants as pathogenic (P + LP). When we auger in on a subset data by looking at the most highly studied genes (the “top 20” with most data from a diverse set of data submitters), the path jumps to 34%. So if the highly studied genes are an indicator, a 10 fold jump in variants/gene analyzed is expected to have the lesser studied genes to likely yield about 30% of newly uncovered variants as pathogenic.

The next line of evidence comes from a functional study using saturative mutagenesis studies. A recent study by the Brotman Baty Institute for Precision Medicine and Department of Genomic Sciences at University of Washington illustrates how the Jay Shendure‘s group used a sensitized monoclonal LIG4 knockout HAP1 line to assess variants in BRCA1 for influence on cell survival. Their CRISPR-based approach examined 96.7% of all variants in BRCA1 for a total of 3,893 SNVs examined. The authors note that 24.9% of these SNVs are in ClinVar as either benign or pathogenic without any conflicting interpretations. Yet, if we open it up a bit and ask ratio regardless of conflicts, we get 40% path, 27% benign and 33% VUS, which means 67% are leaning towards definative assignment. Still, even for BRCA1, there are at least 1/3 of the identified variants left in the ambiguity land of a VUS assessment. To provide more clarity to VUS assigned alleles, the author’s functional studies were able to convert 25% of VUS to a pathogenic category (the “non-functional” in their study). Intriguingly, they were able to convert 49.2% of SNVs with conflicting interpretations into clearly pathogenic “non-functional” assessments.

As a result, application of functional studies will have the potential to boost diagnostic yields by close to 30%.

Higher diagnostic yields combined with AI-enhanced patient history will likely give us a boost to diagnostic yield and, hopefully, more than 50% of the time, genetic casualty and/or contribution will be identified in a patient. If this can be achieved, then doctors might start to say about genetic testing: “yes, I find it quite useful.”

Systems for Improving Diagnostic Yield of Genomic Sequence Analysis

How to better understanding Variants of Uncertain Significance in epilepsy and help find new therapeutic approaches

There is an amazing statistic out there on epilepsy:

1 in 26 persons will experience epilepsy at some point in their lifetime.

It is likely many of you have experience epilepsy or know someone who has. My experience was with my son. When my boy Alemeyahu was just about to turn 2 years old, I had him outside on a beautiful sunny day. Bouncing on my knee, giggling away when suddenly:

His head tilted back, eyes rolled up in his head. He went limp and stopped breathing.

Panicking I laid him down on the ground and yelled for help. Then…. I lifted his chin and started mouth to mouth.

Diagnostic Rate in Epilepsy

photo credit:

With the high rate of epilepsy in the population there is a bit of a puzzle out there. How is that the diagnostic rate of genome sequencing is so low? A recent retrospective study of 8565 epilepsy patients was done for patients who underwent genetic sequencing diagnostic test. The ability to diagnose a genetic cause to their epilepsy was only 15%. What of the remaining 85% – much of that remains dark matter of uncertainty.

Dark Matter Problem – what is the cause?

What is the major driver to low diagnostic rate with a DNA sequencing test? When we look at variants occurring in the genome, there tens of thousands of variants in the coding sequences for important genes, in each person’s genome. Many of these genetic differences are benign, but some variations may be pathogenic drivers of disease.

There are over 100 genes that are involved in causing epilepsy. So the clinician/geneticist has their work cut out for them in trying to decide which variants they can dismiss and which one should be of concern. Many of the variants are challenging to dismiss, which can leave us with a bit of concern.

80,000,000 Variant Possibilities

There are roughly 80 million places to pick up a amino acid change (average gene size: 600 bp x number of disease genes: 7000 x amino acids: 20). Granted maybe only 10% may actually be in places of concern, but that is still a big number. And we only have just barely access the surface of this number. Clinvar database ( can be mined for number of variants known to be pathogenic or benign. As of end of December 2018, the number of pathogenic variants (P + LP) is over 115,000. The number of benign (B + LB) is over 168,000. On the 80 million, that is only 0.35% that are known to either pathogenic or benign, the remaining 99+% is a form of genomic dark matter! A portion of this dark mater has been seen in patients, but scientists have look at these variants and just cant decide. For the ones we cannot decide, we are calling them Variants of Uncertain Significance, or VUS. The ratio of VUS to path to benign is…

44% vus : 23% path : 34% benign

Is low pathogenicity of genome to be expected?

When we look at variant calls for highly studied genes the answer is no. ClinVar data was examine for the top 20 genes with the most number of submitters supplying variant data. On average 51 groups per gene submitted sets of variants. Total variant assessments submitted were 72,471 at an average of nearly 3000 calls per gene. Taking every amino acid position in this top 20 we have a theoretical variant space of 1, 411,833. As a result, 5% of the theoretical space has been covered. Yet the amount of pathogenicity is rather steady (in fact it has gone up a bit!)

In the 5% coverage, there were 34% as pathogenic + likely-pathogenic. A nearly equal number were benign + likely-benign. And in what may seem to be a bit of a surprise, the variants of uncertain significance are still high at 40% of the entries. So pathogenicity extended is trending at about a third of examined rare variants and VUS is trending at close to 40%.

What are the data trends over time?

Couple the large problem of Variants of Uncertain Significance with data trends and we have an alarming phenomenon looming upslope of us. As previously posted, Clinvar data was extracted for the rate of new submissions since 2013. We see the number of new submission is starting to accelerate in just the last few years.

An avalanche of uncertainty is coming and we appear to be ill prepared to deal with it.

It is just a febrile seizure episode….

Alemeyahu was still not breathing. I had just put about 5 breaths of air into his lungs. I checked his pulse – thankfully his heart is still beating, ….but he is still not breathing! I put more air into his lungs. Then finally, a weak breath. Then another and another – he was finally breathing again.

I called the paramedics and by the time they got there, Alemeyahu was back to being a normal 2 year old – trying to crawl over the downstairs barricade, trying to put thumb size rocks in his mouth, you know, the usual 2 year old stuff!!!

We got him to the hospital and they ran all kinds of tests. They cant find anything wrong. They diagnose it as a Febrile Seizure Episode and say “Bring him back if it repeats.” Well, ever since, there has been no repeat episode, but for others people with chronic epilepsy, they sometimes have to deal with repeats on a daily basis. Sometimes even hour to hour, and many of the current epilepsy drugs don’t work well for 1/3rd of all epilepsy cases. So, we must find a way to help people with epilepsy achieve better control their seizures symptoms.

Humanized animal models may be one key system to help us understanding variant biology and finding new therapeutic options in epilepsy.

Our team successfully inserted the human coding sequence of STXBP1 into the native locus of C. elegans otholog gene. We call the technique a “Gene-Swap” because the worms version of the gene is replaced with a human coding sequence. The Gene-Swapped STXBP1 sequence functioned and gave a high level of rescue activity (the “Humanized” strain). A knock-out was made by precisely deleting the unc-18 locus (ortholog of STXBP1). In the process, all coding sequence of unc-18 is removed and a full loss-of-function deletion allele is generated. In the gene variant we insert the p.Arg406His into the humanized locus.

The genes-swap humanized strain for expressing hSTXBP1 show a significant level of activity (see video above). In contrast, the knock-out shows very little activity. For the R406H genomic variant, its activity is somewhere in between the “humanized” and “knock-out” strain.

Behavior quantified using three assays

R406H’s deviant behavior was quantified in a microfluidics system that measures an EEG-like electrical activity of the animal. We added two other assays on this variant – a microtracker test for thrashing in liquid and and a chemotaxis test for speed to navigate to food source. We also added two other reference alleles that are also known to be pathogenic variants of STXBP1 (R292H and R388X). The three variants in red can be compared to the wild type “gene-swap” for the humanized line in blue. In the ephys assays, we can see that two variants, R292H and R406H, have increased rate of neurotransmission. In contrast, the R388X allele has a lower rate of transmission. Adding in the thrashing assays for movement in liquid, we see that the R292H has lost its deviant behavior, while the R406H and R338X have retained a detectable level of altered function. A final assays was performed that uses a sensitive chemotaxis behavioral assay. This assay measure the ability of animal to reach a food source in 1 hr. All variants show altered function and only the R292H allele show residual level of activity.

So we can see that a series of assays provide measure of pathogenic variant biology and give a glimpse into the mechanism of disease.

Speed is an important aspect to measuring variant biology. Our team did an internal contest. We wanted to see how fast we could install a variant and measure activity. Our team did it and provided an assessment in just under 10 days.

The variant profiling system has three key values:

  • First, because the variant profiling system uses a gene-swapped animal expressing a human gene, any variant can be installed and profiled for functional defects.
  • Second, a series of assays can be deployed to detect the nuanced differences in variant pathogenicity.
  • Third, the humanized animal models can be used in high-throughput screens as discovery systems for finding personalized therapeutics.

Induced Pluripotent Stem Cells (iPSC) for Variant Biology Discovery – Dramatic Increases in Efficiency Are Starting to Occur!

guest co-author: Trisha Brock

The use of iPSC cells is quickly gaining momentum as tool of personalized medicine

Whether you call them stem-cell-like cells with unique expression signatures (Chin 2009), or if you stick with the seminal publication and call them induced Pluripotent Stem Cells (Takahashi 2007), iPSC technology is becoming an impressive system for helping understand genome-encoded disease biology.  We can expect iPSCs to have major impact in regenerative medicine, disease modeling, and drug development.

iPSC technology is a relatively new technique for variant profiling.  

Since inception with demonstration in mice (Takahashi 2006), progress has occurred rapidly. Original systems were developed using retroviral integration tools, which have high risk of chromosomal instability and tumorigenesis. Then, in the last 10 years, significant progress has been made to improve the process by making the technique non-integrative and more efficient (Omole and Fakoya 2018).  Genomic integration methods using viruses have traditionally yielded the highest efficiency, but non-integrative methods are quickly catching up. The original methods used a standard set of gene expression modifiers: Oct3/4, Sox2, Klf4, and c-Myc – typically referred to as the “OSKM cocktail.” These are encoded in viruses and plasmids that are transfected into the cell and integrate into the genome. This one time procedure contrast with non-integrating methods, which require repeat transfections to boost cells towards pluripotency (Warren 2010).  Once induced into the pluripotent state, the cell can be modified (CRISPR-based gene editing or similar methods) to establish control lines with and without variant in question. Finally cells are differentiated into desired tissue by exposure to appropriate tissue-specification factors. With this capacity to analyze patient derived tissue, you can feel fairly certain the biological differences seen between control and variant will translate well to the patient’s specific biology.

Adapted from El Hokayem 2016

Speed to variant biology data requires 2 to 5 months

Relative to rodent animal models, the system is fast.  Most human cell cultures can be enticed to achieve the proper pluripotent expression profile after about 4 weeks of growth and then there is another 2 months or more to induce desired tissue type.  For instance, in a popular method, DeRosa et al. used blood-derived Sendi-virus-transformed iPSCs to derived cortical neurons and test them for biological consequence with various functional assays (DeRosa 2018).   Biomarkers for transformation into iPSCs were cell-stain-cofirmed with antibodies for NANOG, Oct 3/4, and SOX2. Next, cells were differentiated to desired cortical neuron tissue type by exposure to relevant co-factors and monitored for sequential biomarkers production (Nestin to DCX to CAMK2A to TBR1) and finally, at 90 days, the last set of biomarkers was used (MAP2 and SYNAPSIN1).  The result, the creation of iPSCs and their differentiation into appropriate tissue types takes about 3 months or more, depending on the desired cell type needed (McKinney 2017 and DeRosa 2018).

Low efficiency problem of easily sourced iPSCs is showing signs of dramatic improvement

Low efficiency in creation of iPSCs has been the main drawback preventing routine use as a clinical diagnostic.  Efficiency is measured as number of iPSC cells obtained after dividing by the number of input cells prior to starting transformation. Efficiencies vary widely with multiple reports range from 0.1% to 0.001% (Malik and Rao 2013). The measured efficiency is heavily influenced by choice of starting material. Fibroblast show some of the highest efficiencies but typically require a biopsy plug from the skin to enable isolation of sufficient amounts of starting cells. Yet, researchers are hard at task working on conditions that improve efficiency and find easier to source material.  Two years back, a report published conversion of fibroblasts had improved to an efficiency near 3% (Pomeroy 2016). Isolation of iPSC is more convenient to the patient source of blood sources (PBMCs) but efficiency remain low at 0.15–0.32% (Zhou 2015). Generation of iPSCs from PBMCs is problematic because they are non adherent. For human cells, the lack of adherence is especially problematic because it leads to high activation of cell death through apoptosis. Researcher are finding the use of adhesion promoting matrices (Geltrex and rhLaminin-521) and ROCK inhibitors of apoptosis can greatly improve the efficiency process for making iPSCs from blood sources (Ye 2018). Additional recent developments are highly encouraging. For a nearly nonexistent invasiveness to the patient, researchers have demonstrated iPSC can be sourced from patient urine (Gaignerie 2018). Further, in a very impressive efficiency feat, a meticulous study of fibroblast cell conversion was performed by sampling a wide variety of optimizing conditions (Kogut 2018).  Using mRNA encoded transformation factors with a select set of microRNA inhibitors of transcription, the Kogut team demonstrated an amazing 80% efficiency in fibroblast conversion to iPSC. Additionally, they found they could isolate single cells and get conversion to iPSC in 90% of the isolates. Intriguingly, in an almost counter-intuitive finding, efficiency of fibroblast conversion started to drop dramatically when more than 1000 source cells were present at the starting conditions. Finally, their speed to conversion of source to iPSCs was only 15 days. In conclusion, it appears iPCS methodologies are breaking through the efficiency barrier and soon the use of patent-derived iPSCs will become part of routine clinical diagnostic procedures.

Transplantation of iPSCs as therapeutics is challenged by tumorigenics issues.

Three main applications of iPSC are its promise for regenerative medicine, disease modeling, and drug development, yet transplantation for regenerative medicine has issues of tumorigenesis to overcome (Focosi 2018).  The original work by Takahashi was plagued by significant tumorigenicity, because the method uses genomic-integrating viral vectors with transformation factors known for their tumorigenesis potential – mouse iPSC derived from the method would result in tumors at 20% of the time when the cells were reintroduced into mice (Omole and Fakoya 2018). Switching to a transformation cocktail that avoids the use of cMyc (OSKM to OSNL) reduced tumor formation rates.  Yet the alternative transformation factors do not eliminate tumorigenesis, possibly due to genomic integration causing insertional mutagenesis. To reduce tumorigenicity even further, researchers have been switching to non-integrative methods. Yet even then, a low level of tumorigenic capacity remains – apparently inherent to iPSC cells “totally potent” status, where they can become many different types of tissue, including cancerous ones. As a result, when differentiating iPSC into tissue for reintroduction into the patient as treatment, the FDA remains concerned about the tumorigenic potential of any cells that remain in the iPSC state. Current thoughts are that a 10-fold passage of a differentiated cell population may effectively eliminate the tumorigenic risk, yet this need for high passage number has tempered the enthusiasm for use of iPSCs for regenerative medicine applications.

Polygenic Consequence – iPSCs from patient tissues has unique advantage that the multiple “Risk Factors” variations of the patient background are retained.

One big advantage to using patient-derived tissue is the genetic background of the test system is exactly as occurs in the patient. For instance, DeRosa and team studied Autism-related variants using patient-derived iPSCs differentiated into neuronal tissues (DeRosa 2018). They looked at 6 patients with variants in target genes suspected of involvement in Autism Spectrum Disorder.  Genomic data was provided on 5 of the patient conditions (see Table below). All suspect variants were observed as heterozygotes. As a result, clear pathogenicity of any variant is lacking, when referenced against existing databases sources. Nevertheless, they saw clear phenotypic consequence of all 6 cell lines examined in their studies (RNA-seq, multi-electrode array recordings, spontaneous calcium transients and scratch recovery assays).

In what might be a highly recommended next step, the De Rossa authors could use CRISPR on their iPSC lines to make isogenic controls.  For instance, on cell line 377110, the VPS123B variation is a prime candidate for using CRISPR-Cas9 gene editing techniques to reverse the rs28940272 locus back to Asparagine.  If this isogenic control behaved as wildtype in all the phenotyping assays deployed, then the authors would have generated a definitive demonstration that the Asn2993Ser variant in VPS13B is pathogenic.

Modeling functional defects in autism variants could be done in C elegans (see Table below).  5 of the 8 genes in the DeRosa work have homology that exceeds 42%. The PRICKLE1 gene has sufficient similarity. Either its Val57Phe or Glu185Ter variations could be installed into a PRICKLE1-humanized animal model.  The Glu185Ter would very likely model a loss-of-function allele effect, but Val57Phe is a missense variant that could go either way. The change of valine to a phenylalanine may disrupt by either gain-of-function, or it may cause a loss-of-function leading to quite different phenotype in functional assays. Only testing directly in an a model system will be the way to get the needed definition of mechanism of action.

Modeling epilepsy with iPSCs requires 135 day preparation protocol

iPSC cell technology will likely become a widespread tool of personalized medicine and it holds significant promise for neuronal regeneration (Wu 2018).  Although it suffers from challenges of asynchronicity, tumorigenicity, timeliness, and low efficiency (Vitrac 2018, Omole and Fakoya 2018), much progress has been made, as attested with this article’s focus on the DeRosa paper.

To list aspects of concern and advantage with iPSC tech, we have:

  • Tumorigenicity and Immunogenicity.  This is especially an important concern in regards to regenerative medicine uses of iPSC.  Perhaps the most promising approaches involve the use if various RNA molecules to provide the requisite reprogramming factors at minimal potential for tumorigenicity and immunogenicity.
  • Clinical Diagnostic.  Sendai virus tech seems to be one of the most promising methods.  Stability reprogramming factors is high, so repetitive redosing is not as problematic.  Also sourcing from easy to acquire primary culture (blood, urine, and saliva) is promising for minimizing the procedure’s invasiveness to the patient.
  • Isogenic Control.  CRISPR-Cas9 tech can be used to return a suspect variant back to wild type.  Functional studies on the variant strain and its isogenic control will determine if the variant is pathogenic or benign.

Chin MH et al. Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. Cell Stem Cell. 2009 Jul 2;5(1):111-23. doi: 10.1016/j.stem.2009.06.008.    

DeRosa BA et al. Convergent Pathways in Idiopathic Autism Revealed by Time Course Transcriptomic Analysis of Patient-Derived Neurons. Sci Rep. 2018 May 30;8(1):8423. doi: 10.1038/s41598-018-26495-1.

El Hokayem J et al. Blood Derived Induced Pluripotent Stem Cells (iPSCs): Benefits, Challenges and the Road Ahead. J Alzheimers Dis Parkinsonism. 2016 Oct;6(5). pii: 275. doi: 10.4172/2161-0460.1000275. Epub 2016 Oct 25.

Focosi D and Amabile G. Induced Pluripotent Stem Cell-Derived Red Blood Cells and Platelet Concentrates: From Bench to Bedside. Cells. 2017 Dec 27;7(1). pii: E2. doi: 10.3390/cells7010002.

Gaignerie A, Lefort N Rousselle M, Forest-Choquet V, Flippe L, Francois-Campion V, Girardeau A, Caillaud A, Chariau C, Francheteau Q, Derevier A, Chaubron F, Knöbel S, Gaborit N, Si-Tayeb K, David L. Urine-derived cells provide a readily accessible cell type for feeder-free mRNA reprogramming. Sci Rep. 2018 Sep 25;8(1):14363. doi: 10.1038/s41598-018-32645-2.

Pomeroy JE, Hough SR, Davidson KC, Quaas AM, Rees JA, Pera MF. Stem Cell Surface Marker Expression Defines Late Stages of Reprogramming to Pluripotency in Human Fibroblasts. Stem Cells Transl Med. 2016 Jul;5(7):870-82. doi: 10.5966/sctm.2015-0250. Epub 2016 May 9.

Kogut I et al. High-efficiency RNA-based reprogramming of human primary fibroblasts. Nat Commun. 2018 Feb 21;9(1):745. doi: 10.1038/s41467-018-03190-3.

Malik N and Rao MS. A review of the methods for human iPSC derivation. Methods Mol Biol. 2013;997:23-33. doi: 10.1007/978-1-62703-348-0_3.

McKinney CE. Using induced pluripotent stem cells derived neurons to model brain diseases. Neural Regen Res. 2017 Jul;12(7):1062-1067. doi: 10.4103/1673-5374.211180.

Omole AE and Fakoya AOJ. Ten years of progress and promise of induced pluripotent stem cells: historical origins, characteristics, mechanisms, limitations, and potential applications. PeerJ. 2018 May 11;6:e4370. doi: 10.7717/peerj.4370. eCollection 2018.

Takahashi K and Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006 Aug 25;126(4):663-76. Epub 2006 Aug 10.

Vitrac A and Cloëz-Tayarani I. Induced pluripotent stem cells as a tool to study brain circuits in autism-related disorders. Stem Cell Res Ther. 2018 Aug 23;9(1):226. doi: 10.1186/s13287-018-0966-2.

Warren L et al. Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA. Cell Stem Cell. 2010 Nov 5;7(5):618-30. doi: 10.1016/j.stem.2010.08.012. Epub 2010 Sep 30.

Wu S et al. On the Viability and Potential Value of Stem Cells for Repair and Treatment of CentralNeurotrauma: Overview and Speculations. Front Neurol. 2018 Aug 13;9:602. doi: 10.3389/fneur.2018.00602. eCollection 2018.

Zhou H, Martinez H, Sun B, Li A, Zimmer M, Katsanis N, Davis EE, Kurtzberg J, Lipnick S, Noggle S, Rao M, Chang S. Rapid and Efficient Generation of Transgene-Free iPSC from a Small Volume of Cryopreserved Blood. Stem Cell Rev. 2015 Aug;11(4):652-65. doi: 10.1007/s12015-015-9586-8.

Child Neurology Meeting: Bernard Sachs Award to William B. Dobyns and Precision Medicine in Epilepsy Lecture

The Child Neurology Society honored William (Bill) Dobyns for his highly impactful efforts in characterizing child neurology. In a prolific and highly influential carrier, Bill talked about how the key influencers in his life shaped the directions he pursued.  Reinforcing the title of his talk “The Names of Things, ” Bill was the first to discover and name the LIS1 gene of lissencephaly (Dobyns 1993).

Whats in a name?

A spicy little comment was made by Bill. He let it be known he did not appreciate the HUGO Nomenclature Committee’s decision to rename LIS1 to PAFAH1B1. Definitely a mouthful and a change that does not seem necessary, since LIS1 appears to have no confusion in Pubmed literature searches.

To worm it or not – Can comparative biology can help?

It appears gene humanization in the nematode may be of use in characterizing lissencephaly genes. A quasi-random sampling of 9 genes mentioned in Bill’s presentation was obtained (all the ones I wrote down in my notes!). Next, a table of properties was made:

We first ask: What is the homology to C. elegans? 8 of 9 genes are identified as homologs. All but the AUTS2 are found to have a worm gene equivalent, as scored for similarity percentage using the DIOPT website. Next, we ask Which of these homologs have sufficient similarity for application of a gene-swap technique? Using inspiration from Douglas Adams for a rather arbitrary cutoff of 42, we get 6 of 8 homologs with enough similarity that gene substitution with human cDNA has a chance of retaining function and rescuing a null phenotype. We are currently 3 for 3 in genes tried for gene-swap in epilepsy – STXBP1, KCNQ2, CACNB4 – all can function and rescue their respective C. elegans null. Finally, we ask Is there something to rescue in the Dobyns gene list? 4 of the 6 good homologs are found to be lethal (LIS1, MTOR, ZIC1 and PIK3R2), when removed by either by genetic deletion and or RNAi knock-down.  For the remaining two genes (ATK2 and PC2CA), although not appearing to be lethal, they have detectable phenotypic consequence when their homolog is absent in the nematode.  Bottom line, 6 of 9 have good prospects for using comparative biology to answer variant pathogenicity questions.

Symposium IV: Precision Medicine – Epilepsy, The Next Frontier

This session was of high interest to the geneticist wanting to design high-throughput animal models for use in rare disease discovery. Jeffery Loeb opened with a “Big Data” discussion on linking electrode implantation in the brain with various properties of tissue biopsies (genomics, transcriptomics, metabolomic, and histological data). It was a fascinating and bewildering look at the 1700 genes that are associated with bursting/spiking phenomena in the brain. Next was Maduri Hedge who took us on a tour of what is the latest and greatest in whole genome analysis – basically what is happening in the 5342 OMIN-identified disease genes. Much of the answer is we do not know yet, because there is an ever expanding universe of Variants of Uncertain Significance (VUS), fueled by the fact that each person averages 3,000,000 differences in sequence when compared to reference human genome. (Hedge 2017).  With 1% of genome as coding, this suggest 30,000 variations occur in exonic coding sequence.  The amount of this coding sequence variation that is known pathogenic is much less.  Filtering out high frequency alleles and checking against databases where variant call is either pathogenic or likely-pathogenic, we get an average of 2.1 pathogenic/likely-pathogenic variants per person (Rego 2018). The same data reveals each person can expect another 17.6 variant alleles as either VUS or as unassigned in status.

Demonstration of rapid pathogenicity assessment of a VUS allele.

Next presentation was a great tag-team duo of Alfred George and John Millichap speaking on the rapid pathogenicity assessment of a VUS in KCNQ2 gene. Its title, “Emerging Paradigms for Precision Medicine in Early Life Epilepsy: Bed-to-Bench Workflow” exemplified where genomic biology is now starting to hold focus. They showed data for a three week turn-around in providing functional data on a patient variant.  Their data demonstrated a newly-discovered variant in KCNQ2 behaves as a loss-of-function allele.  In a very short time, from genomic data to clinically-actionable results, the patient had a genetic mechanism revealed for their illness and drug treatment options became more clear. This same variant, by the way, was assessed as a VUS call, 3 months later when the genomics lab finished applying the traditional ACMG-AMP guideline assessment criteria.  Obviously functional studies deployed quickly after genomic data acquisition can be quite impactful in helping find treatment choices for the clinician and their patient.

Time to Get Organized

Finally, wrapping up what proved to be a lively lecture series and a very long QnA afterward, we had Anne Berg give us a plea for coming together as a community and doing what has been done in cancer, and with Spinraza on SMA – Find ways to work together, get DNA diagnostics done as quickly as possible and then screen variations for function consequence.  There is a big need for an organized effort in epilepsy.  Early intervention in neurological diseases has the capacity to create a dramatic improvement in outcome. Anne provided us some data to help understand how big is the problem:

The size and coordination of effort in cancer has recently been giving us miraculous gains. For some types of cancer, what was nearly a death sentence 10-20 years ago is now, in many of today’s oncology practices, a responsive and well-controlled disease. Anne feels it is time for a profound shift in the way epilepsy is approached and treated. We need to apply the same rigor and interactivity that drove the success of cancer biology understanding. When it come to pediatric populations, cancer is less significant than epilepsy.  There are 29,000 new pediatric epilepsy cases each year, while pediatric cancer cases occur at 13,000 per year – a more than 2x lower in rate.  The complexity of disease types between epilepsy and cancer is of similar size, so systematically fostering the level of researcher-clinician interaction as seen in cancer biology should be highly productive for epilepsy understanding and discovery.  Ultimately the epilepsy ecosystem is ripe for new approaches that can be adequately and rapidly address the complexity of epilepsy biology and uncover new therapeutic modalities.

Dobyns WB et al. Lissencephaly. A human brain malformation associated with deletion of the LIS1 gene located at chromosome 17p13. JAMA. 1993 Dec 15;270(23):2838-42.

Hegde M, Santani A,Mao R,Ferreira-Gonzalez A,Weck KE and Voelkerding KV. Development and Validation of Clinical Whole-Exome and Whole-Genome Sequencing for Detection of Germline Variants in Inherited Disease. Arch Pathol Lab Med. 2017 Jun;141(6):798-805. doi: 10.5858/arpa.2016-0622-RA. Epub 2017 Mar 31.

Shannon Rego, Orit Dagan-Rosenfeld, Wenyu Zhou, M. Reza Sailani, Patricia Limcaoco, Elizabeth Colbert, Monika Avina, Jessica Wheeler, Colleen Craig, Denis Salins, Hannes L. Rost, Jessilyn Dunn, Tracey McLaughlin, Lars M. Steinmetz, Jonathan A. Bernstein, Michael P. Snyder.
High Frequency Actionable Pathogenic Exome Mutations in an Average-Risk Cohort. bioRxiv 151225; doi:

Worming into Relevance – Human disease models in the C. elegans nematode

The philosopher Friedrich Nietzsche once said:

“You have evolved from worm to man, but much within you is still worm.”

Genetic diversity in individuals and between species is responsible for bewildering variability and biological niche adaptation of life, yet much of the essential genes involved in disease presentation are highly conserved from yeast to humans. For instance, the direct comparison C. elegans genome to H. sapiens reveals only 44% of the genes are similar (Shaye and Greenwald 2011).  Yet when one restricts the comparison to the 6460 genes known to be associated with genetic disease (1/3rd the human genome), clear similarity (orthology) to C. elegans occurs for 79% of the human disease genes (ClinVar database). This high degree of interspecies conservation between worm and human has recently become more recognized and appreciated for use in disease biology understanding (Golden 2017, Wang 2017, Wangler 2017, Apfeld and Alpers 2018)

Undiagnosed Disease

The Undiagnosed Disease Network (UDN) has been making significant strides in rare disease research and holds an emphasis on developing animal models for use as tools of variant biology discovery (Splinter 2018). Tim Schedl at Washington University in St Louis is now a recent addition to the UDN.  Tim will run the C elegans nematode Model Organisms Screening Center (MOSC). The worm is proving to be a useful tool for uncovering the disease related biology (Bend 2016, Wangler 2107, Luo 2017, Chao 2017, Oláhová 2018, Liu 2018, Guiberson 2018). Perhaps the biggest place where the nematode holds promise is its proven capacity for high-thoughput screening (Leung 2013, Rangaraju 2015, O’Reilly 2016, Lucanic 2018, Partridge 2018).  The Leung publication achieved an impressive 340,000 compounds tested in 1536-well format in 5 weeks.  If we can find the systems that accurately model human disease in the nematode animal model, we will have developed a system ripe for high-throughput discovery of new pharmaceuticals.

Can we model clinical variant biology in the worm?

Figure 1. A) A clinical variant is installed into native homolog of the nematode. B) Position of STXBP1 pathogenic variants in the nematode unc-18 gene.

Using CRISPR and related techniques, it is now quite easy to insert DNA changes into the genome of animal models. For example, our company just hit a milestone of 2000 transgenics delivered. But the puzzle now becomes are we doing it right. What is the best transgenic configuration to address disease biology?  We can put a clinical variant in the native gene of the nematode (Figure 1A). Although there is a large amount of similarity when making gene-to-gene connections in disease genes, a question arrises: is there enough similarity at the amino acid level?  When one maps out the location of know pathogenic variants to their locations at the amino acid sequence, one occasionally finds that established or likely-pathogenic variants do not occur at a conserved position.  For instance, in a sequence occuring in the middle of the C. elegans STXBP1 gene, three variants suspected of pathogenicity are shown (Figure 1B).  The patient variants R406H and L426P are at conserved positions in the worm unc-18 homolog. Unfortunately M443R does not have the identical base in C. elegans.  If we insert a Methionine (“M”) in for the worm’s Alanine will we get function?  …Probably.  If we put in an Arginine instead (“R”) will the gene work? …Probably not.  Yet, we are left with a bit of ambiguity because location is not identical between species. A more robust way to model human biology is to put in the entire human gene into the nematode genome.  To explore this concept, we have developed a technique that we call “gene-swap humanization” (Figure 2). We remove the endogenous gene of the worm and replace it with a cDNA copy of the human gene.  This creates a gene-humanized animal.

Figure 2. human gene is swapped in for native sequence then observe if it can retain function.

We have only just started, but we have uncovered an interesting finding:  Gene-swap humanization appears to be better than native locus for use as platform for modeling disease variant biology.  We decided to compare variant installs in the native locus to installs in the humanize locus. The main driver to try this out was sequence similarity, or more precisely, the lack of it.  When one compares the human to mouse for the STXBP1 gene, one see 100% identical amino acid usage between the two species.  Contrast this to human sequence to the worm’s unc-18 gene – one sees only 59% identical amino acid usage.  With nearly half the amino acids not being fully conserved, can we model in the native locus effectively, or….

What will happen if we swap-in the human gene as a gene replacement of the worm gene?

The new gene editing CRISPR technology is allowing us to address this question. Genes are now quite easy to order up as fully synthetic.  The DNA sequence comes in a plasmid format that we can inject, with the right CRISPR components, directly into the gonad of the worm. The progeny of the animal will have a high propensity to have picked up the genome edit. We use PCR technology to find the edited animal and then grow up the population for testing in various activity assays (Figure 3).  From concept to variant-specific animal data in less than one month.

Figure 3.  comparison of native vs humanized locus.  Electrophysiology of A) native vs B) humanized locus. Chemotaxis assay of C) native vs D) humanized locus.

What we found fascinating were two finding:

1. Human gene retains function in worm.

2. Variants in human gene render the worm sensitized for pathogenicity assessment.

That second observation begs more explanation.  In figure 3, you will see two types of phenotyping assays performed on three known pathogenic variants of STXBP1 (R292H, R406H and R388X). When we put these patient-variants into the native locus at their sequence conserved positions, we see only the R388X gives a big deviation of function.  Alternatively, when we put these variants in the gene-swapped hSTXBP1 locus, we see that all variants exhibit a statistically significant deviation of function in both assays. As a result, it appears the gene-swap humanized locus is better choice for modeling clinical variants.

“What about that first observation?” Lets not overlook that one! This shows that for this particular protein, its physiological role is highly conserved between worm and man. She/he (worms are self-fertilizing hermaphrodites – more on that later…) is giving data that suggest modeling variants in humanized worms can be useful as a platform for variant biology discovery!

Figure 4. Assay differences and similarities within and between genes. A) Electrophysiology assays on STXBP1 exhibits gain of function and loss of function phenotypes. B) Locomotion assay detects some loss of function activity. C) Chemotaxis assay see all variants as loss of function alleles. D) KCNQ2 Electrophysiology see gain of function when the kqt-1 homolog is removed.

We have found one more interesting finding in our humanized strain studies.  The types of phenotyping assays used on a clinical variant will depend on the unique biology of that variant.  In figure 4, we have three different activity assays.  What should jump out at you is that each assay has a unique response. The R292H and R406H variants have gain-of-function response in the electrophysiology, while the R388X has lost function.  In the next assay, the animal locomotion measurement, we see wild-type activity in R292H, while R406H and R388X are exhibiting movement deficiencies.  In the last highly-sensitive chemotaxis assay, only R292H retains an ability to sniff out and go to food – the remaining two, R406H and R388X, are not capable of getting to food in the 1 hr test period of the assay.  One last observation for this figure –  there is high similarity in electrophysiology response for the KCNQ2 knock-out and the R292H and R406H variants in STXBP1. All three (KCNQ2 knock-out and the two STXBP1 variants) show gain-of-function response.  This is expected if all these variants lead to M-current modulation of potassium channels. What we speculate based on prior findings (Devaux 2016) is the R292H and R406H are defective in chaperone function of syntaxin.  They no longer prevent syntaxin from being an inhibitor of potassium channels and effectively these STXBP1 variants have a physiological mode of action leading to sodium channel inhibition. Prediction: ezogabine will be useful on patients with hSTXBP1(p.R292H) and hSTXBP1(p.R406H) as a treatment for reversing their levels and intensity of epilepsy seizures, but probably be contraindicated for hSTXBP1(p.R388X) clinical variants.


Figure 5. Genetically-encoded biosensors of transcriptional activity induced by patient variant.

The last component of this blog post is to discuss formats for making the animals perform drug screening in high-throughput formats. Loss of locomotion can be a powerful screen, but what might be more sensitive is the use of genetically-encoded biosensors. The biosensor formats we favor most are the transcriptionally-activation reporter systems (Figure 3). Typically these are GFP/RFP combos where one reporter is driven by a promoter that is either up regulated or down regulated upon being in a particular genetic background. Loss-of-function alleles can have a myriad of gene transcriptional effects and biosensor reporters have previously been used to dissect physiological pathway interactions in C. elegans (Urano 2002, Lea 2005, Kahn 2008).

Figure 6. Two biosensors 

We have used biosensors to probe drug response in nuclear hormone receptor signaling (Figure 6). We compared biosensor responses to assays of behavior.  Perhaps what is most striking about the data is the sensitivity.  In the dafadine assay we see nearly 10x higher sensitivity with the biosensor based assay.  From a drug discovery standpoint, this means 10x less compound is need in order to see an effect.  For the dafachronric acid assay, the result is even higher – 30x higher than the biological activity output.  For those who do drug discovery, the application of biosensors to clinical-variant profiling will be worth the investment in biosensor construction – cost saving will be obtained from reduced consumption of the expensive compounds used in a drug library.

Apfeld J et al. What Can We Learn About Human Disease from the Nematode C. elegans? Methods Mol Biol. 2018;1706:53-75. doi: 10.1007/978-1-4939-7471-9_4.

Bend E et al. NALCN channelopathies: Distinguishing gain-of-function and loss-of-function mutations. Neurology. 2016 Sep 13;87(11):1131-9. doi: 10.1212/WNL.0000000000003095.

Devaux, J. et al. A possible link between KCNQ2- and STXBP1-related encephalopathies: STXBP1 reduces the inhibitory impact of syntaxin-1A on M current. Epilepsia. 2017 Dec;58(12):2073-2084. doi: 10.1111/epi.13927.

Golden A. From phenologs to silent suppressors: Identifying potential therapeutic targets for human disease. Mol Reprod Dev. 2017 Nov;84(11):1118-1132. doi: 10.1002/mrd.22880. Epub 2017 Oct 3.

Kahn NW et al. Proteasomal dysfunction activates the transcription factor SKN-1 and produces a selective oxidative-stress response in Caenorhabditis elegans. Biochem J. 2008 Jan 1;409(1):205-13.

Leung CK et al. An ultra high-throughput, whole-animal screen for small molecule modulators of a specific genetic pathway in Caenorhabditis elegans. PLoS One. 2013 Apr 29;8(4):e62166. doi: 10.1371/journal.pone.0062166. Print 2013.

Liu, N. et al. Functional variants in TBX2 are associated with a syndromic cardiovascular and skeletal developmental disorder. Hum Mol Genet. 2018 Jul 15;27(14):2454-2465. doi: 10.1093/hmg/ddy146.

Lucanic M et al. A Simple Method for High Throughput Chemical Screening in Caenorhabditis Elegans. Journal of visualized experiments : J Vis Exp. 2018 Mar 20;(133). doi: 10.3791/56892.

Luo X et al. Clinically severe CACNA1A alleles affect synaptic function and neurodegeneration differentially. PLoS Genet. 2017 Jul 24;13(7):e1006905. doi: 10.1371/journal.pgen.1006905. eCollection 2017 Jul.

Oláhová, M. et al. Biallelic Mutations in ATP5F1D, which Encodes a Subunit of ATP Synthase, Cause a Metabolic Disorder. Am J Hum Genet. 2018 Mar 1;102(3):494-504. doi: 10.1016/j.ajhg.2018.01.020.

O’Reilly L et al. High-Throughput, Liquid-Based Genome-Wide RNAi Screening in C. elegans. Methods Mol Biol. 2016;1470:151-62. doi: 10.1007/978-1-4939-6337-9_12.

Partridge F et al. An automated high-throughput system for phenotypic screening of chemical libraries on C. elegans and parasitic nematodes. Int J Parasitol Drugs Drug Resist. 2018 Apr;8(1):8-21. doi: 10.1016/j.ijpddr.2017.11.004.

Rangaraju S et al. High-throughput small-molecule screening in Caenorhabditis elegans. Methods Mol Biol. 2015;1263:139-55. doi: 10.1007/978-1-4939-2269-7_11.

Rea SL et al. A stress-sensitive reporter predicts longevity in isogenic populations of Caenorhabditis elegans. Nat Genet. 2005 Aug;37(8):894-8. Epub 2005 Jul 24.

Shaye DD and Greenwald I. OrthoList: a compendium of C. elegans genes with human orthologs. PLoS One. 2011;6(5):e20085. doi: 10.1371/journal.pone.0020085. Epub 2011 May 25.

Splinter K et al. Effect of Genetic Diagnosis on Patients with Previously Undiagnosed Disease. N Engl J Med. 2018 Oct 10. doi: 10.1056/NEJMoa1714458.

Urano F et al. A survival pathway for Caenorhabditis elegans with a blocked unfolded protein response. J Cell Biol. 2002 Aug 19;158(4):639-46. Epub 2002 Aug 19.

Wang J et al. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome. Am J Hum Genet. 2017 Jun 1;100(6):843-853. doi: 10.1016/j.ajhg.2017.04.010. Epub 2017 May 11.

Wangler, M. F. et al. Model Organisms Facilitate Rare Disease Diagnosis and Therapeutic Research. Genetics. 2017 Sep;207(1):9-27. doi: 10.1534/genetics.117.203067.

Zhou, P. et al. Novel mutations and phenotypes of epilepsy-associated genes in epileptic encephalopathies. Genes Brain Behav. 2018 Jan 4. doi: 10.1111/gbb.12456.

KCNQ2 Cures Summit – Patient Family Passion and Involvement

The passion to find answers is inspiring.

I had the good fortune to attend the KCNQ2 Cure Summit 2018. Meeting the patient families and seeing their passion to find answers, brings meaning and urgency to the work I have been doing to develop high-throughput drug screening platforms.

image Credit: Chris Hopkins with permission of KCNQ2 patient family gathering

What can we do to help these families?

The first step I have been doing is to get more understanding.  KCNQ2 cure is a patient organization that strives to raise “research funds for KCNQ2 epileptic encephalopathy, a rare and catastrophic form of epilepsy beginning in the first days of life.” By nature of the name, the foundation focus on how DNA sequence variation in the KCNQ2 gene does or does not causes epilepsy.  For those variations that cause disease (pathogenic), the foundation supports research to find new treatments and therapies. As I previously blogged, KCNQ2 has 21% of the 855 variants in the NCBI database ( as known or suspected to be pathogenic. About another 10% are known to be benign or likely benign. That leaves 69% variants for which we are scratching our head and feeling uncertain, or there nothing known about them.  The challenge to the expecting parent is “What is the likelihood I will be passing on a disease to my child?”

For the expecting parent what becomes important is to know the status of the KCNQ2 gene being passed on to the child. There are two ways to pass the genetic disease to our children. For some of the KCNQ2 variants, they are recessive – both copies of the gene will need to have these recessive pathogenic alleles in order to manifest disease.  It may be the father provides a chromosome where the KCNQ2 gene is defective with a pathogenic variation.  From the mother, her gene copy for KCNQ2 may contain a variant of uncertain significance (a “VUS” allele).  The chance that epilepsy will manifest in the child remains uncertain due to uncertainty of the mother’s VUS allele.

Yet a majority of variants in KCNQ2 are autosomal dominant. Unlike a recessive allele, only one bad copy is needed for the disease to manifest. In these cases, the mutation is almost never detected at high level in the parent.  Instead of being past down by the parent, the pathogenic variant is thought to have occurred by random chance and becomes manifest at conception. These are considered to be de novo variations because they are not detected to be present in one of the chromosomes of either parent – both the mom and dad appear to be homozygous wild-type. In many of these cases, a random mutation occurred when either the Dad’s sperm or the mother’s egg were formed. This happens because replication of DNA is not perfect. There is a very low frequency of DNA replication errors – close to 1 in every 1,000,000,000 base pairs (Pray 2008). That means that each gamete (sperm or egg) has at least 3 errors somewhere in the genome. Thankfully 98.8% of the genome is non-coding, so a change in coding sequence is quite rare. Add in that the genome size is 20,000 genes, it becomes very rare that the KCNQ2 gene picks up a mutation by chance.

What are the chances of that!

image Credit: rosendahl at flikr

Mosaic in the parent

Yet some de novo assigned conditions have much higher frequency than they should. A very interesting phenomenon that was illustrated multiple times by the speakers at the KCNQ2 Cure Summit was the issue of mosaicism. If getting a de novo variation is an ultra rare “roll-of-the-dice,” how is it that a surprising number of the KCNQ2 families with “de novo” origin have more than 1 effected sibling?  That should just not happen at chance if DNA replication is as good as we have measured it to be in laboratory tests.  A big component of the answer to that puzzle turns out to be a phenomenon called mosaicism (Weckhuysen 2012, Milh 2015, Mulkey 2017).  Lets now imagine a situation where the Dad may have pick up a random pathogenic mutation occurring shortly after he was conceived by his parents.  If the error occurred during the 3rd cell division event in utero, then 1/8th of his cells would have the pathogenic variant.  Now as a grown man and expecting father, 1 in 16 of his sperm harbor the pathogenic variant.  There is significant (non-rare) risk he can pass the variant on to more than one of his children.  Yet, because more than 80% of his cells do not have the variant, he remains dosage compensated and unaffected.  His child’s variation appears “de novo,” but it is actually being inherited at a less than mendelian frequency.  Dr. Ingrid Scheffer, who is helping write the definitions of what is epilepsy (Scheffer 2017), gave a presentation suggesting near 8% of the proband Mom and/or Dad on de novo assignments have evidence of mosaicism. She suggest high depth sequencing of exomes should become more important to the clinician in helping them get an accurate diagnosis.

Enhancer/suppressor effect

Another phenomenon that could alter the presentation of a gene’s phenotype is to have a compensating variation somewhere else in the genome.  This is the form of a classic suppressor effects taught in genetics courses. Another gene, usually upstream in the functional pathway, can have a mutation that suppresses the effect of the variant in question – a condition referred to as an epistasis.  For instance, a KCNQ2 variant in one person may lead to a pathogenicity that is quite severe, while another person having the same KCNQ2 variant presents with a less severe phenotype due to a compensating mutation somewhere else in the KCNQ2 signaling pathway.  Conceptually applied, if you had a Gain-of-Function (GOF) variant in KCNQ2 and a Loss of Function (LOF) variants in either SCN1a or STXBP1, this would could counter the bad effect and restore activity towards wild-type behavior.  This effect in ion channels has been recently documented and described (Noebels 2017). Dr Phillip Pearl, the Director of Epilepsy and Clinical Neurophysiology at Boston Children’s Hospital and William G. Lennox Chair and Professor of Neurology at Harvard Medical School, gave a lecture on variant types and how they fall into classes of mild (BNFE) and severe (EE) variant groups.  So, mediating severity will not only be the molecular nature of variation severity in the KCNQ2 gene, but also what is the profile of the background variants for enhancing or suppressing that given variant’s expressivity and penetrance.

If you would like to learn more, see some of the KCNQ2 Cure speaker presentations here

Pray, L. DNA Replication and Causes of Mutation. Nature Education 2008 1(1):214

Scheffer IE et al. ILAE classification of the epilepsies: Position paper of the ILAE Commission for Classificationand Terminology. Epilepsia. 2017 Apr;58(4):512-521. doi: 10.1111/epi.13709. Epub 2017 Mar 8.

Weckhuysen S et al. KCNQ2 encephalopathy: emerging phenotype of a neonatal epileptic encephalopathy. Ann Neurol. 2012 Jan;71(1):15-25. doi: 10.1002/ana.22644.

Milh M et al. Variable clinical expression in patients with mosaicism for KCNQ2 mutations. Am J Med Genet A. 2015 Oct;167A(10):2314-8. doi: 10.1002/ajmg.a.37152. Epub 2015 May 10.

Mulkey SB et al. Neonatal nonepileptic myoclonus is a prominent clinical feature of KCNQ2 gain-of-function variants R201C and R201H. Epilepsia. 2017 Mar;58(3):436-445. doi: 10.1111/epi.13676. Epub 2017 Jan 31.

Noebels J. Precision physiology and rescue of brain ion channel disorders. J Gen Physiol. 2017 May 1;149(5):533-546. doi: 10.1085/jgp.201711759. Epub 2017 Apr 20.

Genome Data Suppliers

When will you decide it is time to give a spit?

It will not take too much effort.  You order a kit, contributed a batch of spittle into a receptacle cup, then send it away for DNA analysis.

The most common utility format for this type of DNA testing is ancestry analysis:

imagecredit: Molly K. McLaughlin for PC Reviews Magazine

Ancestry determinations have been the mainstay for early adoption of DNA sequence analysis technology.  Health testing has lagged, primarily due to regulatory concerns, but with the viewpoint starting to support the individual’s right to know, more people are getting DNA analysis for health impact profiling.  The conservative path to recommend in obtaining health-related DNA information is to consult a doctor. And then perhaps guide them to the service you seek.  Many providers may not be up to speed and you will be helping them navigate the path to preventative genomics utilization in their patient populations.  Be pleased when they respond that this is interesting, but they would like for you to consult with a genetic counselor prior to and after the data comes in.

Direct to Consumer

For the highly adventurous of us who want more than ancestry, you can follow the path taken by Tom Petch at Medgadget. In a detailed story of his endeavor to uncover his preventative genomics potential, Tom used whole-genome analysis kits supplied by Dante. He obtained close to 1 Gig of raw data files and a detailed report covering a range of dispositions. He found out some intriguing idiosyncrasies explaining his predilection to coffee. This was followed by the more puzzling finding of having variants that both increase and decrease his risk of Alzheimer’s.

Having the raw data files is highly intriguing to me.  There is a treasure trove of information in there that will only be released as time ticks away and our understanding of disease biology increases.  Some providers such as Helix offer a plan where you can park your DNA files with them and they will periodically “auto update” you as variant biology upgrades become available.  Veritas is another company offering whole genome analysis but it is not clear from their website I will be handed my VCF files on a flash drive. 

For me, I want to keep the adventure to be heavily under my control.  I want access to the raw data.   I want to identify and catalog the full depth of my variant profile. The variations that don’t fit the norm are likely to number in the 1000’s, and perhaps 10,000’s if we include non-coding.  To get some of my answers on variant pathogenicity, I will use NIH’s Variation Viewer. This will allow me to get many of my variants understood for their significance.  The ClinVar Miner and SNPedia databases might be referenced also if a novel variant is in question.  Another profoundly interesting tool when you are searching for information on a specific variant is Mastermined by Genomenon.  And for even greater detail, I may reach out to my Human Genetics contacts who will have unique insights and database access to get me even greater resolution.  Where I might find some variants are binning to the Uncertain Significance category, I might be motivated enough to make an animal model with the variant in question installed.  With the resulting variant avatar created as an animal model, I will then start a set of functional assays to see if my variant of uncertain significance exhibits a certain significant deviation of function.

Dealing with Uncertain Significance of the Genome

“Risky Business”

Most of us are wired to be risk averse. Yet, I have been giving serious contemplation to the “risky business” of having my whole genome be sequenced as a preventative medicine approach to my healthcare.

What will happen if go there?

I find myself staring into the murky abyss from the edge of the data cliff. A creepy feeling urges in my belly from the depths of my ski-bum days… Should I….


photocredit: Bradly J. Boner for Jackson Hole Magazine

Upon landing, I will need “spoon-my-tracks” to get the data to be interpretable and informative as possible.

photocredit: James Fagedes at Foothill freak

Mental Health

There is plenty of reason to be cautious. A recent publication by the Hasting Center report urges a high level of caution (Johnston 2018).  Although we can be somewhat dismissive of their dismissiveness –  the cost of whole genome sequencing is dramatically dropping and phenotyping technology is rapidly improving – one observation is likely to hold true for a while:

“Given the psychosocial costs of predicting one’s own or one’s child’s future life plans based on uncertain [Genomic] testing results, we think the hope and optimism deserve to be tempered.”

So it is clear there will be quite a bit of uncertainty when one opens the Pandora’s box of the genome, but hope and optimism will remain. Whether there is clearly actionable results, or frustrating uncertainty, the knowledge gained means there are things to be done, platforms to build, and cures to be discovered.

Not If, but When

Caution keeps us safe. But safe for how long? By not “going there” we might be just deluding our selves from the inevitable.  At some point, we will have a deep understanding of the consequence of genome variation.  The first to fall into line will be variants delivering functional consequence on the monogenic side of the spectrum.  These will be the easiest to model and uncover biological consequence because the variant will have clear and sometimes deterministic output on life quality and healthspan. More challenging will be the variants whose effects predominate in polygenic contexts.  These are the more subtle “risk factor” effects where the other variants in one’s genome are the influencers that either enhance or suppress the capacity of the risk factor variant to manifest.  Adding to the challenge of understanding a risk factor is the influence of external factors, such as diet and exercise. Or the more internal factors such as genomic imprinting and gene methylation status.

Yet it is clear where we are heading. Much of the uncertainty will be resolved and we will soon be living in the genome-actionable era where medicine becomes highly personalized to the individual’s variant profile.  For a glimpse of what the future holds, and if you can make the time for an amazing Rob Reid interview of Dr. Robert Green from Harvard, put the headphones on for the following podcast:


1. Johnston J et al. Sequencing Newborns: A Call for Nuanced Use of Genomic Technologies. Hastings Cent Rep. 2018 Jul-Aug;48(2):S2-6.

Properties of Top 20 Epilepsy Genes

Epilepsy Genetics

1 out of 100 persons is living with active epilepsy (Zack and Kobau 2017, WHO 2018). For the subset that can be pinned down to having a genetic cause, there are about 70 or so genes involved in causing the illness (Lindy 2018). Until recently, the frequency of a gene’s association with epilepsy remained unclear. The GeneDX study by the Lindy team is helping bring clarity by analyzing the genetic underpinnings in 8565 patients with active epilepsy.

Positive Cases

From the GeneDX study, we can plot the rank of the top 20 genes in epilepsy (green graph at top of the figure). The SCN1A gene is by far the major source gene for genetically-associated epilepsies. 27% of all the pathogenic variants in the top 20 epilepsy genes are in SCN1A and 24% in all 70 known epilepsy genes. For instance, 322 positive cases occur in SCN1A out of a total of 1181 positive case in the top 20 genes. Applying the data set to the larger populations the 1181 cases of 8565 (13%) as genetically caused, suggest close to 1 of 1000 persons are living with gene-induced epilepsy. For SCN1A, the population estimate is 78,000 individuals in USA or 1.8 Million worldwide living with pathogenic lesions in their SCN1A gene.

Total Variants

Another way to look at the rank is to ask how many variants occur in each gene (Blue graph). Gathering data from NCBI’s Variant Viewer ( There is a large difference in numbers of observed variant per gene. For example, in two genes of similar protein size (SCN1A and TSC2), there is a 2.5 fold difference in their relative numbers of variants.

Pathogenic Variants

In another aspect derived from Variant Viewer, we can look at numbers of variants known to be pathogenic. For the top 20 epilepsy genes, SCN1A comes out on top again but the next gene with high levels of pathogenic variants is MECP2 (purple graph).

Pathogenic vs total

Finally, when we look at the ratio of pathogenic to total variants, we see some interesting findings. MECP2 has major sensitivity by having a high pathogenic variant load. Similarly, UBE3 also harbors a high pathogenic variant ratio. Another gene jumping up for sensitivity is the expected SCN1A gene. But now we also get KCNQ2, CDKL5, STXBP1, SLC2A1, FOXG1, and ARX as variation-sensitive genes.

Favorite genes and an anomaly

A gene for which I hold high passion is STXBP1. This gene is ranked #6 as likely genetic cause of epilepsy. STXBP1 has 47 variants of 349 as pathogenic (13%). Another gene on our list for humanization development is KCNQ2. This gene is at the #2 position for being a frequent cause of epilepsy. KCNQ2 has 855 variants of which 181 are pathogenic so it has a pathogenicity load of 21%. The TSC2 gene has a strange result. There are 145 pathogenic variants out of 3445 total which gives a 4% pathogenicity penetration. Why are there so many non-pathogenic variants for this gene? Perhaps this gene is highly flexible and can tolerate a high level of variant load. There is a slightly higher proportion of synonymous variants at 27%. Population frequency in the measured variant pool exceeds 0.0001 for only 12% of the variants. So, most of the TSC2 alleles are rare. My other suspicion is TSC2 is like BRCA1 gene. There has been the very large size of researchers studying each of these genes. This leads to more patients being examined for variants in TSC2 and BRCA1. The result, these two genes have attained a higher sampling of the variant diversity occuring in the general population.

1. Zack MM and Kobau R. National and State Estimates of the Numbers of Adults and Children with Active Epilepsy – United States, 2015. CDC: MMWR Morb Mortal Wkly Rep. 2017 Aug 11;66(31):821-825. doi: 10.15585/mmwr.mm6631a1.

2. (World Health Organization) Epilepsy (access 8/1/2018)

3. Lindy AS et al. Diagnostic outcomes for genetic testing of 70 genes in 8565 patients with epilepsy and neurodevelopmental disorders. Epilepsia. 2018 May;59(5):1062-1071. doi: 10.1111/epi.14074. Epub 2018 Apr 14.