In today genomic medicine era, it remains challenging to understand the functional consequence of a gene variant’s contribution towards disease. Guilt by association is one of the criteria upon which a new variant is judged. We can look at healthy populations data and compare it to established Pathogenic and Likely Pathogenic variants. This helps us understand if a new variant may have a propensity to cause disease. The thought is that if a new variant is occurring at a region previously established as causing pathogenicity, then the new variant may be pathogenic too (ACMG guideline: PM1 “moderate” assessment criteria).
Is my variant guilty of pathogenicity because of its proximity to a pathogenicity hotspot?
In the image above, we see that there are hotspots (red) and coldspots (blue) for pathogenicity in STXBP1. The hotspot values were generated from the known Pathogenic and Likely-Pathogenic listed in ClinVar. The coldspot values (highMAF) come from variants seen in healthy populations. In yellow we have Variants of Uncertain Significance (VUS). Intensity of the peak is a measure of both how many times different variations are seen at an amino acid position and if their nearest neighbors have the same assignment. This plot suggest there are spots in STXBP1 that can tolerate sequence diversity (blue bars) and spots where a hit leads to pathogenic behavior (red bars). Further, the VUS are landing in both red bar and blue bar regions. Perhaps we can consider VUS to be either pathogenic or benign by this association? Yet, there is a critical assumption that leads to a question: How legitimate is it that every variant in healthy populations (“highMAF”) is ASSUMED to be benign?
2,504 healthy population genomes – Calculating the rare variants in each person
To dig into the validity (or invalidity) of this assumption, we can look to a large population study and ask how many times do we see variation and what are their types. The 1000 Genomes Project Consortium shows an average person has about 4,500,000 million variations. Of these, about 100,000 are somewhat rare because they are seen in less than 1/200 persons (<0.005 MAF). The even more rare “singletons” of the study occur at a frequency of 1 per 2504 persons. This restriction gives us about 10,000 more rare variations to think about per each person. Yet, to get even more rare and be able to ask the question how many variants per person meet the 1 per 200,000 USA definition for Rare Disease frequency, the study size would need to be 100x bigger. Nevertheless, we have interesting data reported in the 1000 Genomes study on healthy population variants that are also seen as pathogenic in Human Gene Mutation Database (HGMD) and ClinVar datasets. Filtering the observed path in healthy population as frequency per individual, every person can expect to harbor 20-25 variants of established pathogenicity.
A larger study by Karczewski et al. 2019 is approaching the scale need for assessing Rare Disease. A dataset of 141,456 human genomes (125,748 exomes and 15,708 genomes) was harvested from the wildtype controls used in various disease studies. The exomes observe variation mostly in the coding sequence of a gene, while the genomes record variant information across the gene (coding + upstream/downstream/introns). The result is a deeper measure of the frequency of missense variation that approaches the 1 in 200,000 genomes needed for Rare Disease designation. Currently the National Organization for Rare Disease (NORD) list 1258 disease in their database. STXBP1 cross references to two of these (Dravet and West Syndromes). Both of these syndromes each have a support group, which are two of the 283 total family foundation groups that are listed in the NORD member list.
Yet the situation for Rare Disease is larger. In the NIH’s Genetics and Rare Disease (GARD), there are 6264 unique genetic diseases listed. This suggest there are thousands of genes for which we can expect to have gene variant issues leading to disease. ClinVar currently list 7046 is the number of “Genes with variants specific to one protein-coding gene.” Basically it appears that a third of your 20,000 protein coding genes could take a hit that increases your risk or likeliness of coming down with genetic disease symptoms.
The GARD lists an intriguing statistics that 20-25 Americans are living with Rare Disease. The USA’s current population is 327.2 Million, so roughly 1 in 15 individuals world wide are probably living with rare disease. Assuming monogenic cause, then at least 51 million pathogenic might residing in the human population. Add polygenic burden and the number may be a multiple (100, 150, 200, 250….??) for variants associated with disease currently being experienced today. Guilt by association to hotspots and coldspots might provide some answer, but functional studies are the more definitive proof, and +50 million is a lot of animal models to build!!
What genes are good candidates for alternative animal modeling?
I set out to determine which important disease genes are good candidates for creating animal models in C. elegans. The first step was to turn to a database that has a comprehensive listing of human genes and their disease association. The DisGeNet database has nearly every human gene annotated for its level of disease association (17,549 genes as of June 2019). They provide a curated list that has 8400 genes with Gene-Disease Association (GDA) score of 0.1 or higher. For the top 1000 genes the GDA scores are 0.69 or higher, which indicates they scored high for having a significant disease association. These top 1000 were selected for examination of their ortholog status in C. elegans using the Diopt database. 749 othologies were detected, of which 411 had clear reciprocal nature (back-blast gives starting gene for the ortholog as best hit). The top 100 of these genes for high homology and detectable loss-of-function consequence were selected.
Tabulation of disease-associated genes with properties favorable for C. elegans humanization
The top 100 are tabulated in gene-alphabetical format below. These 100 genes have 8360 variants as known to be as problematic (Path, Likely Path, or VUS).
Use a search tool to quickly find out if your favorite gene occurs below.
(Note: gene knock out for 58% of these genes results in lethality.)
When you get the genomic report, you have a movement of trepidation. What will it say? ….Will it have a reveal that says you should do countermeasures immediately? ….Will it say something that you can do nothing about? The latter condition occurred for me. There were findings that had strong impact on my psyche.
Two things were called out heavy. A cancer risk of melanoma. Good thing my family, first my momma, and then my spouse, have been diligent in their liberal in the application sunscreen to the family. Once I googled and pubmed searched the MC1R(R160W) locus, I found the evidence was less than compelling for a dramatic change of lifestyle. Just keep the sunscreen coming and I will likely be fine.
The carrier result was a little more of a shocker. A good personal friend has a daughter homozygous in this gene. It was discovered in utero and they have been vigilant ever since. Their daughter is now in her teens. Doing exceptionally well and acting like any normal kid – currently enthralled with dance class and other outdoor activities. Preventative medicine done right. So getting tagged with a pathogenic in this gene is giving me mixed feelings. A mix of some worry and yet, almost pride. Even though my good friends don’t share my specific genetic lesion, it still feels very personal and connecting. Furthermore, this is one of the genes where modern genomic medicine is making great progress in understanding and treatment.
Will you too be a carrier of a pathogenic variation?
Carrier status is something all of us should expect. Veritas recently publicly disclosed at the Precision Medicine World Congress that their database has 90% of customer reports as returning with carrier status for at least one pathogenic variant. Recent discussions with Robert Green at Harvard confirm this – he showed me a large dataset that gave the number as 92% of healthy populations as being carriers for known pathogenic variants. You might think that there are a lucky few (10%) who are not carriers, but think again. The average person will have close to 3 million differences from the reference genome and this may be an underestimate. Distribute that unbiased across the genome and we have coding regions with close to 30 thousand variations. Since you have close to 20 thousand genes that means every gene has approximately 1.5 variations in it. Now lots of approximating, and does not factor in selection against bad variations. Yet in that quick calculation, the main message is every gene is likely to have a variation and some genes will have multiple variations. So the original question of how many of these are pathogenic, becomes difficult to approximate. Publications suggest we may have up about 1300 suspect variations hiding in our genome. Yet definitive variants with “known” pathogenicity is likely to be much lower in your genome.
Complicating this is issue is variable penetrance – a pathogenic variant in one family may behave with monogenic behavior in that family. While in another family, that same variation may be acting more polygenic – it needs other gene mutations to have pathos in the patient. It is behaving more like a “risk factor” for disease.
Pathogenic variant frequency in Chris Hopkins’ genome
The vagueness of my carrier status “kills” me, so I wanted to know in more. I contacted a good friend at the Rady Children’s Hospital. Dr. Matthew Bainbridge is a researcher who was a key contributor to the Rady’s renowned speed at using whole genome sequencing for rapid genetic diagnosis. Matthew introduced me to some software tools he has been developing. His company Codified Genomics has developed a variant analysis software that allows exploration of one’s genomic variants. All you need is your BAM or VCF files.
What’s that? …You don’t know what is a BAM file, …or a VCF?!!!
Dont worry, lets decode the jargon. In the clinphen journey to understand my clinical predilections, predispositions, and pathos, I found myself getting immersed into the intricacy of the end-to-end solution in genomic data acquisition and interpretation. What happens when you spit in a tube and put it in the mail? A lot of stuff! I came across an amazing guide to understanding the industry space behind genomic sequencing, the Enlightenbio Report. This help me get a tightly-focused view on the process of understanding one’s DNA.
That first box is what happens after you spit in the tube. The chemicals in the tube react with the cellular material in the spit to help stabilize it and prevent its degradation. This allows one to send the sample at room temp to the lab. On the receiving, the lab initiates a protocol to isolate the DNA that comes from the mouth epidermal cells that slough off into your spit. DNA is manipulated in such a way that it can go onto a microchip slide and set of DNA sequencing chemistry reactions are used to read out the DNA in small segments of sequence. Each of the millions of sequence segment reads is recorded as a fastq file. The fastq read segments are compared and aligned to a reference genome to make a BAM file. The BAM file alignments are processed to detect where sequence variation occurs, which is recorded as a VCF file. VCF files are analyzed by comparison to databases and assessments are made of each variant’s potential for pathogenicity. The assessment data is generally provided as a report to the clinician (or the intrepid genome wanderer such as myself). This report takes the raw data and massages it into a format for easier understanding of what is the baggage of one’s genome.
1604 suspect variations in my genome
Matthew helped me upload my VCF files into the Codified program. Next, he showed me how to wander around sifting the data by various aspect such as allele frequency, dominant and recessive status. known pathogenic genes, etc. The upload to Codified indicates I have exactly 1604 suspect variations occurring at an appreciable fraction of the reads and at positions inside, or in close proximity to, the coding sequence of my genes. These variants are suspect because they may alter protein function or levels of expression for the identified genes. If we just limit the dataset to changes that alter amino acid composition (non-synomous), we get 875 gene variations. Add back potential spicing issues, indels, and aberrant start and stop codon issues, we are back up to 1440 variants as genetic differences that are highly suspect for altering gene expression and function.
316 MIM variant hits in my genome!
What happens if we limit the entire 1604 to only those genes with recognized involvement in disease. We get 316 variants occurring in genes as recognized by the Mendelian-inheritance-in-Man (MIM) database for being disease-associated genes. When we restrict this set to coding issues only, we get 281 suspect variants.
I get clean bill of health when I get a physical exam, so can I disregard these 281 suspect variants?
One easy step is to filter for carrier only status. 111 variants are clearly identifiable as only autosomal recessive (AR). I would require two hits in each of the paired chromosome copies to have these be of concern. Since, no paired hits were detected, we can dismiss these genes as in need of my immediate concern. As a result, we are now only concerned about hits in genes with known autosomal dominant (AD) issues. These are the genes where only one bad hit is needed to render them pathogenic. Bottomline, 170 gene variants in my genome are worthy of further contemplation.
How frequent is frequent in my 170?
There is good rational to only be concerned about a hit in a gene with AD propensity, if it is rare in the population. The thinking is that if a variation is deleterious by itself (AD), it cannot be tolerated at a high level in the human population. Contrast this to the recessive (AR) variants (also called “alleles” when talking about frequency). My known AR pathogenic variant in the CFTR gene is in the human population at 0.0014 minor allele frequency (MAF). This high allelic frequency is tolerated in the human population because you need two hits in each gene copy in order to have a syndromic issue. Autosomal dominant alleles must have much lower frequency. If we cull the 170 for variations that occur at 0.00001 MAF or lower, we get 53 gene-codon-altering variations to be concerned about. Examining the list manually gave me 17 genes for which I hold varying degrees of concern, of which I list the top 10:
None are in the ACMG59
In a prior blog post, I described the list of genes for which can be included in a clinical report as a secondary findings. These are allowed in a report because these 59 genes have known actions that can be taken to mitigate their negative health effect. None of my genes of concern are in this group, so the immediate actionability is absent for my findings about the baggage in my genome. In fact, the genes I am listing as genes I am concerned about, but they actually do not significantly bother me that much. I am still alive and in good health. If I had pathogenic variations in these genes the negative health consequence, they should have manifest many years ago. Nevertheless, the three for which I hold highest concern are CACNA1S, LGI1, and RTN2.
The variation in CACNA1S (p.R419H) may sound like a benign, and it is a conservative change in amino acid composition, but it occurs in a highly-conserved region. It is present as an Arginine (“R”) in humans, mice, fish, flies and worms. This invariant use of R implies protein function will be compromised when the position is substituted with a histidine. The LGI1 (p.A253T) variant is also a conserved amino acid change, but it is in a less conserved region. This lack of complete conservation indicates this position might tolerate an Arginine to Threonine change. The RTN2 is complex variant. It does two significantly alarming changes. It makes a dramatic Leucine to Arginine change in the 4th exon up from the end of the protein. It also occurs immediately adjacent to splice junction acceptor site. This alteration of splicing region suggest it could lead to improper splicing in a highly conserved region of the protein and thus create a defective protein.
It is likely that all three of these genes yield a protein of messed up function. But what is not clear is the type of mess-up. Are they leading to loss-of-function (LOF) activity? Or do they lead to dominant gain-of-function (GOF)? These variations are most likely in the LOF category. Otherwise, I would almost certainly be dealing with the disease symptoms that the GOF variant’s manifest. Yet this is just supposition – a hypothesis. We don’t yet have solid evidence for what is going on.
How could we get final answer for if these variations are these pathogenic or not?
To get precision answers, we could model all of these variants in C elegans, For the CACNA1S and the RTN2, their high conservation from human to worm would allow direct modeling in the worm’s homologous position of the worm’s native gene (“Native locus”).
Our prior work with full gene humanization indicates more congruent results occur if we first swap in a human gene for the native gene locus (“Humanized Locus”) and then install variant. The use of a humanized locus allows modeling of any variant, whether it is highly conserved or not across many species. So far, in our studies all known pathogenic variants behave with deviant behavior, but only when put into humanized systems. Contrast this to insertion in native locus – some known pathogenic alleles did not create detectable deviance of behavior!
For the 3 genes to which I am concerned, all are of favorable size that the human sequence can be easily optimized and installed for expression from the worm’s native locus (“Humanized” animal). If we can observe that the human gene can rescues loss of function, we will know we are off-to-the-races and can study variant biology in a gene-humanized system. The humanized animals will be precision proxies serving as clinical avatars of the patient condition.
CACNA1S is a drugable target. The creation of a humanized system expressing CACNA1S as gene replacement of egl-19 gene would generate a platform for drug discovery. The patient variants might be responsive to calcium channel blockers, such as benzothiazepines, phenylalkylamines and 1,4-dihydropyridines. The end result, a highly-personalized medicine approaches would be achieved that finds drug treatments specific to the patient’s genetic pre-conditions.
There is a significant pressure to increase diagnostic yield and it has its consequences. BRCA testing is probably the most developed ecosystem for genetic tests but controversy remains about what medical procedures are best recommended for the patient. High profile cases like the decision of Angelina Jolie and to undergo a bilateral mastectomy and the implication of a “Positive” Turner Syndrome test have helped bring the controversies to more widespread attention..
The heart of the controversy is how often is a correct diagnosis leading to a form of unnecessary care that is crowding out necessary care, or worse. Physician and Surgeon Atul Gawande wrote New Yorker piece titled:
“Overkill – An avalanche of unnecessary medical care is harming patients physically and financially. What can we do about it?”
This article nicely explores the problem of unproductive or unnecessary procedures. In regards to genetic testing, we need to be mindful of all the downstream repercussions of a positive (or negative) test result.
Forms of Risk in Breast Cancer Testing
The decision to have a mastectomy is challenging decision. The involvement of BRCA1 and BRCA2 in breast cancer is clear, yet what to do about it is still controversial (Domchek 2018). From 2006 to 2014, a retrospective study was conducted and identified 780 women at 11 cancer centers who underwent BRCA testing after breast cancer was detected (Rosenberg 2016). 86% of those testing positive elected to have the bilateral mastectomy procedure. But perhaps even more striking, 51% who tested negative also went on to have bilateral mastectomy. A question arises:
Does the election to have full mastectomy by a large fraction of women testing either positive or negative for BRCA1 and BRCA2 pathogenic variants indicate this form of genetic testing has low value to treatment care?
In the general population, the risk of death from surgical procedure is small but real at about 0.01%. So it is prudent to keep that in mind before undergoing the knife. Are there more minimally invasive procedures available? A rather old study (Kurian 2014) suggest it has been known for a while that double mastectomy is no better that the less invasive breast-conserving surgery with radiation for impact on patient mortality. These authors went on to describe:
“In a time of increasing concern about overtreatment, the risk-benefit ratio of bilateral mastectomy warrants careful consideration and raises the larger question of how physicians and society should respond to a patient’s preference for a morbid, costly intervention of dubious effectiveness”
The Need of Piece of Mind
In light of the evidence, what are the psychological factors that drive the choice to have bilateral Mastectomy? For those testing positive as carriers of pathogenic BRCA mutations, the choice is backed up by evidence that reoccurrence risk drops significantly, but for the noncarriers, it appears the impact of having a breast cancer diagnosis is a sufficient driver (Hamilton 2017). Within the physician-patient relationship there is a need to better communicate how to avoid unnecessary procedures and yet find ways to meet the psycho-social need of the patient.
Although ClinVar is a useful resource for seeing data distributions and trends, groups need to be cautious with the details. Julie Eggington from the Center for Genomic Interpretation states “I would warn that rates derived from what is being reported in classification databases are likely very different than what is really going on in testing labs and academic labs. People rarely report boring stuff – I think calculated pathogenic rates derived from classification databases are too high in almost every context.” Julie further postulates that the issue of false positives is larger than people realize. The implication is that about 30% of the variants in ClinVar designated as pathogenic may in fact not be pathogenic. Within a gene, some variants are being more over-interpreted than others. Groups may be relaying data that is fraught with the inaccuracy of a high false positive rate.
Also unsettling is the Variants of Uncertain Significance (the “VUS” problem) are frequently not reported to the physician at time of genetic testing. Recent studies in hereditary cancer have found that 8.7% of VUS have been reclassified to Likely Pathogenic status while only 0.7% of pathogenic have been changed to non-pathogenic status (Mersch 2018). This reclassification leaves us with 21% Path, 21% benign and 58% VUS in hereditary cancer. This closely resembles the overall distribution as outlined in an earlier blog post that relies on ClinVar data (34P:26B:40V). Keep in mind from the prior paragraph, the level of pathogenic variants may actually be much lower than what is reported in the databases sources. This has been leading to a follow-on problem, as variants get reclassified, there is frequently a big disconnect in getting that information back out to the patient.
Consumer reports suggestions:
What are some of this things we can do as consumers of genetic testing? A good consumer reports article makes 5 suggestions one should consider when getting a genetic test done and to contemplate what will be the procedure (surgery, drugs, or no-therapeutic-approach-is-known) if a pathogenic finding is a result.
1) Do I really need this test or procedure?
2) What are the risks and side effects?
3) Are there simpler, safer options?
4. What happens if I don’t do anything?
5. How much does it cost, and will my insurance pay for it?
Uncertainty in Uncertain Times
We are embarking down the new frontier of precision medicine. Our genomes will hold a big key to better understanding of our health and lifespan. But, because one-gene / one-disease hypothesis is the exception and not the rule, we have a long way to go in getting predictive and actionable as we obtain more knowledge of the molecular pathogenicity of the variation in our genomes. The journey to link genotype to phenotype will be long and arduous, and possibly quite epic in its implication to the health management approach we take as a species.
Domchek SM,. Risk-Reducing Mastectomy in BRCA1 and BRCA2 Mutation Carriers: A Complex Discussion. JAMA. 2018 Dec 6. doi: 10.1001/jama.2018.18942.
Rosenberg SM, Ruddy KJ Tamimi RM Gelber S Schapira L Come S Borges VF Larsen B, Garber JE, Partridge AH,. BRCA1 and BRCA2 Mutation Testing in Young Women With Breast Cancer. JAMA Oncol. 2016 Jun 1;2(6):730-6. doi: 10.1001/jamaoncol.2015.5941.
Kurian AW, Lichtensztajn DY Keegan TH Nelson DO Clarke CA Gomez SL. Use of and mortality after bilateral mastectomy compared with other surgical treatments for breast cancer in California, 1998-2011. JAMA. 2014 Sep 3;312(9):902-14. doi: 10.1001/jama.2014.10707.
Hamilton JG, Genoff MC Salerno M Amoroso K Boyar SR Sheehan M Fleischut MH Siegel B Arnold AG Salo-Mullen EE Hay JL Offit K Robson ME. Psychosocial factors associated with the uptake of contralateral prophylactic mastectomy among BRCA1/2 mutation noncarriers with newly diagnosed breast cancer. Breast Cancer Res Treat. 2017 Apr;162(2):297-306. doi: 10.1007/s10549-017-4123-x. Epub 2017 Feb 1.
Mersch J, Brown N, Pirzadeh-Miller S, Mundt E Cox HC Brown K Aston M Esterling L Manley S Ross T,. Prevalence of Variant Reclassification Following Hereditary Cancer Genetic Testing. JAMA. 2018 Sep 25;320(12):1266-1274. doi: 10.1001/jama.2018.13152.
Got my report from Vertias for the MyGenome analysis. What is it that is hiding between the words that come out of my mouth that get written down on this blog? Saliva was delivered into a tube, 3 months ago, and finally the data is starting to arrive.
What lays beneath the surface, may not stay beneath the surface.
If you are like me, you may think you are “healthy,” but we know what is highly likely – you will be a carrier for a disease and it’s also likely risk factors for other diseases will be identified in your genome. Note, 9 of 10 persons are carriers for rare disease, as previously addressed in a prior post. You will even have a low chance (~20%) for immediately actionable conditions that you can start to explore now and find mitigating options.
The Ticking Time Bomb
That last one is perhaps the most compelling reason to get your genome done – can you capture an impending time bomb of genetic disease before it has gone off! For pathogenic variants in the ACMG59 “secondary findings” genes, you stand a good chance of being able to diffuse the bomb before it is too late.
For my report, immediately actionable findings were not discovered. I am highly skeptical that we can say I am healthy and “free” of a genetic precondition. It is clear that researchers are only just now scratching the surface of this potential. The rare monogenic drivers of disease are somewhat understood, but the polygenic drivers are way more in their infancy.
What lies beneath might be two variations that, by themselves are not pathogenic, but together they can cause, or highly exasperate, a disease.
Think about the size of the problem from a theoretical aspect. There are roughly 7000 genes thought to be involved in rare disease. Some of the variants in these genes are monogenic and powerful enough by themselves to cause disease. But it is likely there are many more variants in these genes for which their contribution is not pathogenic by themselves and they need another variation somewhere else in the genome to enable manifestation of disease. Taking just the 7000 genes, the diagenic possibilities are 49 million. In fact, the remainder of the genome can be part of the diagenic, so the space may actually be near 400 million. Then what about 3 gene sympaticos – 8 trillion!! Thats a 1000x more than the number of the people on the planet! The only hope we have for predictive systems here is Big Data and AI options to help us gain sufficient understanding.
Heterogeneity and Homogeneity – the Advantage and Bane of Each.
To truly move to greater understanding of our genetic liabilities, we must move from qualitative (yes or no?) assessment to the quantitative (how much?) assessment. Knowing that a gene variant is 50% pathogenic in its potential can help us start to deconvolute the polygenic problem. When two 50% pathogenic variants in the same disease pathway are seen in the same individual, we have will have reached a threshold and the disease condition can manifest. With the amazing amount of heterogeneity in the human genome, analyzing patient derived tissue will be an extremely difficult approach for quantify pathogenic potential of a variant. Instead, it becomes highly desirable to use systems of high homogeneity. A uniform genetic background greatly simplifies the quantitation of disease contribution of a variant. Knowing the genetic background is the same, we can easily say that gene variant A is XX% stronger than gene variant B in regards to a pathogenic propensity, after deploying a range of function tests of deviant behavior for each of the variants.
Proxies of Disease Biology
The use of C. elegans has unique attributes that make it an ideal system for quantifying variant behavior. There is enough similarity of gene function between humans and the worm, that so far, 4 of 4 human gene insertions with observable sequence homology have been capable of rescue function as gene replacement of the ortholog gene in the worm. Of the many favorable features (speed to transgenics, microscopic size, high-throughput amenable, wide range of easily measured phenotypes, etc), the worm is a self fertilizing hermaphrodite. What this means is that when growth conditions are good, the animal clones copies of itself and can go from 1 animal to nearly 30 million near identical animals in just under 10 days. Only when conditions get stressful does the accident of spontaneous nondisjunction of sex chromosomes become more prevalent and males can form. Under these stress conditions, males go from being extremely rare to about 1 per 100 animals. So the worm has evolved to be highly tolerant of homogeneity and only needs to sample heterogeneity a small fraction of the time to maintain health of the species (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1462001).
Classical LOH – the Bane of Self-fertilization
The clonal nature is quite useful for getting large populations of nearly identical animals, but there is a flip side that creates problems. There is a phenomenon in genetics called Loss of Heterozygosity. Commonly applied to explain the evolution of cancer cell populations, the principle applied to population genetics in species is backcrossing will drive heterozygous conditions towards rarity. What this means for a self fertilizing hermaphrodite is that if the individual starts to self-propagate, and has one of their gene’s in a heterozygous conditions (A/B: variants A and B for a given gene),then half the progeny will be homozygous (either A/A or B/B) and the other half will be heterozygous (A/B). In the next generation, the prior homozygous remain homozygous (either A/A or B/B), but the hets generate another 50/50 split of homo and het. After 10 generations the het is nearly nonexistent in the population (<1%) . The population has bifringed to to A/A and B/B strains. If B/B is deleterious to life, then at 10 generations, most of the animals are A/A.
DNA replication is not perfect. As a clonal population expands, random mutations happen that essentially create heterozygous conditions at random genes (A/B scenarios). For the researcher maintaining strains, one of the biggest mistakes they can do is serially propagate the next generation plate by isolation of only 1 individual for the next population expansion. Since each clone progeny will have at least 4 de novo mutations in their genome from their parent, in just a few generations of this extreme selectivity, the population after 10 generations will have quite a few random and possibly pathogenic hits in quite a few genes and the animals of the serially-propagated strain will have drifted significantly in their genetics from the starting strain. Critical here for C. elegans is to occasionally access sexual reproduction to avoid Muller’s Ratchet.
Genetic drift is Unavoidable
To mitigate this, but not eliminate it, good practice is to transfer 10 to 20 animals for next generation of animals being maintained as a population. Even with this technique, fecundity compromised strains can quickly evolve new mutations that eliminate the starting phenotype and grow faster. So, add to a variety of other transgenerational silencing mechanism, the clonal propagation of a strain can lead to auto-selection of suppressors that effectively “silence” an engineered gene phenotype. Thankfully worms can be flash frozen shortly after making a transgenic line, so one can essentially have an endless supply of starting material. Genetic drift driving selection of gene silencing backgrounds can be avoided by going to a fresh thaw. As a result, high levels of homogenous backgrounds can be obtained for comparing the properties between two variants.
Anti-simpatico Creates More Complexity
Lets take the dialog back to the quantitation of pathogenicity in variants of human disease genes. There are almost certainly some variants in the genome that act to suppress a “monogenic” pathogenic variant. We can envision a negative pathogenicity value for these variants. And adding more complexity to this, is the fact that a variant can be pathogenic in one condition and be protective in another condition. The classic example is sickle-cell anemia and malaria. A person who is a carrier for a recessive pathogenic variation is protected from malaria infections. Yet for persons who are homozygous for the V6Q change in hemogobin, they will have a pathogenic condition that leads to quality of life issues and a reduced lifespan (https://www.cdc.gov/malaria/about/biology/#tabs-1-4). So, as Julie Eggington says, pathogenicity assessment must be made in a disease-specific context. As a result, calculating all of any one individual’s genetic liabilities is an exceedingly complex problem.
I had the fortunate opportunity to attend the PMWC19 – Precision Medicine World Congress held in the sunny silicon valley of California in beginning of 2019. A good snippet was made by genomeweb – we are in a….
“struggle to figure out best practices for implementing genomics in the clinic.”
One of the factors that impacts adoption of genetic screening is the rate of successful diagnosis. Depending on how you pre-filter your patient population before applying your success criteria, the genomic diagnostic rates in publications range from 15 to 35%. Boasts were made at the meeting that some had achieved diagnostic rates as high as 60%, yet most evidence suggests it is near 20% when measured against a broad spectrum disease containing both monogenic and polygenic drivers. What does this mean to the clinician contemplating ordering of a gene panel – ordering a genetic panel screening will result in a diagnosis for only every 5th person for whom a test is ordered. So it is easy to see why many primary care physicians exploring genomic sequencing respond with pessimism.
A common physician response – “Genetic tests are not very useful.”
It might be that, as whole exome sequencing (WES) or whole genome sequencing (WGS) become more common place, diagnostic yields will increase a by a small percentage. Yet one of the challenges to overcome is a strong conservative desire by the clinical genetics community to keep diagnostic test restricted to the genes that are designated as clinically actionable. Commonly referred to as the “ACMG59“. These are the genes for which change-of-function alleles have clear-cut therapeutic options.
What about the other 7000 genes involved in disease, don’t we want to know data there too?
Vertias gave a talk that had very intriguing data. In their data set so far, there were two findings reported. One was population frequency on the ACMG59 and the other nugget of information was population frequency for all other genes associated with disease. Not revealed was the core ratios used for the variant calls, but keep in mind that for the top 20 genes the ratio is 40% VUS, 34% path and 26% benign (see prior post for more detail). Veritas reported close to 20% of “healthy” individuals were harboring at least one pathogenic variant in the ACMG59 and 90% of individuals were harboring at least one pathogenic variant in the 7000 other rare disease genes. Basically 9 out of 10 people are a carrier of something not so good and some of them may have a ticking time bomb of an autosomal dominant driver hiding in their genome. This ratio may go higher if many of the VUS convert to pathogenic.
What will happen once we convert the remaining VUS into pathogenic assignment?
Estimates on the conversion of VUS to pathogenic vary widely. Julie Eggington form the Center of Genomic Interpretation suggest only 1% of VUS will convert to path. Others have estimated it may be as high as 50%. Two lines of evidence lead me to think it is somewhere in between. First line of data comes from a prior blog post. In that post, I describe the data in ClinVar has 23% of all variants as pathogenic (P + LP). When we auger in on a subset data by looking at the most highly studied genes (the “top 20” with most data from a diverse set of data submitters), the path jumps to 34%. So if the highly studied genes are an indicator, a 10 fold jump in variants/gene analyzed is expected to have the lesser studied genes to likely yield about 30% of newly uncovered variants as pathogenic.
The next line of evidence comes from a functional study using saturative mutagenesis studies. A recent study by the Brotman Baty Institute for Precision Medicine and Department of Genomic Sciences at University of Washington illustrates how the Jay Shendure‘s group used a sensitized monoclonal LIG4 knockout HAP1 line to assess variants in BRCA1 for influence on cell survival. Their CRISPR-based approach examined 96.7% of all variants in BRCA1 for a total of 3,893 SNVs examined. The authors note that 24.9% of these SNVs are in ClinVar as either benign or pathogenic without any conflicting interpretations. Yet, if we open it up a bit and ask ratio regardless of conflicts, we get 40% path, 27% benign and 33% VUS, which means 67% are leaning towards definative assignment. Still, even for BRCA1, there are at least 1/3 of the identified variants left in the ambiguity land of a VUS assessment. To provide more clarity to VUS assigned alleles, the author’s functional studies were able to convert 25% of VUS to a pathogeniccategory (the “non-functional” in their study). Intriguingly, they were able to convert 49.2% of SNVs with conflicting interpretations into clearly pathogenic “non-functional” assessments.
As a result, application of functional studies will have the potential to boost diagnostic yields by close to 30%.
Higher diagnostic yields combined with AI-enhanced patient history will likely give us a boost to diagnostic yield and, hopefully, more than 50% of the time, genetic casualty and/or contribution will be identified in a patient. If this can be achieved, then doctors might start to say about genetic testing: “yes, I find it quite useful.”
How to better understanding Variants of Uncertain Significance in epilepsy and help find new therapeutic approaches
There is an amazing statistic out there on epilepsy:
1 in 26 persons will experience epilepsy at some point in their lifetime.
It is likely many of you have experience epilepsy or know someone who has. My experience was with my son. When my boy Alemeyahu was just about to turn 2 years old, I had him outside on a beautiful sunny day. Bouncing on my knee, giggling away when suddenly:
His head tilted back, eyes rolled up in his head. He went limp and stopped breathing.
Panicking I laid him down on the ground and yelled for help. Then…. I lifted his chin and started mouth to mouth.
Diagnostic Rate in Epilepsy
With the high rate of epilepsy in the population there is a bit of a puzzle out there. How is that the diagnostic rate of genome sequencing is so low? A recent retrospective study of 8565 epilepsy patients was done for patients who underwent genetic sequencing diagnostic test. The ability to diagnose a genetic cause to their epilepsy was only 15%. What of the remaining 85% – much of that remains dark matter of uncertainty.
Dark Matter Problem – what is the cause?
What is the major driver to low diagnostic rate with a DNA sequencing test? When we look at variants occurring in the genome, there tens of thousands of variants in the coding sequences for important genes, in each person’s genome. Many of these genetic differences are benign, but some variations may be pathogenic drivers of disease.
There are over 100 genes that are involved in causing epilepsy. So the clinician/geneticist has their work cut out for them in trying to decide which variants they can dismiss and which one should be of concern. Many of the variants are challenging to dismiss, which can leave us with a bit of concern.
80,000,000 Variant Possibilities
There are roughly 80 million places to pick up a amino acid change (average gene size: 600 bp x number of disease genes: 7000 x amino acids: 20). Granted maybe only 10% may actually be in places of concern, but that is still a big number. And we only have just barely access the surface of this number. Clinvar database (https://clinvarminer.genetics.utah.edu) can be mined for number of variants known to be pathogenic or benign. As of end of December 2018, the number of pathogenic variants (P + LP) is over 115,000. The number of benign (B + LB) is over 168,000. On the 80 million, that is only 0.35% that are known to either pathogenic or benign, the remaining 99+% is a form of genomic dark matter! A portion of this dark mater has been seen in patients, but scientists have look at these variants and just cant decide. For the ones we cannot decide, we are calling them Variants of Uncertain Significance, or VUS. The ratio of VUS to path to benign is…
44% vus : 23% path : 34% benign
Is low pathogenicity of genome to be expected?
When we look at variant calls for highly studied genes the answer is no. ClinVar data was examine for the top 20 genes with the most number of submitters supplying variant data. On average 51 groups per gene submitted sets of variants. Total variant assessments submitted were 72,471 at an average of nearly 3000 calls per gene. Taking every amino acid position in this top 20 we have a theoretical variant space of 1, 411,833. As a result, 5% of the theoretical space has been covered. Yet the amount of pathogenicity is rather steady (in fact it has gone up a bit!)
In the 5% coverage, there were 34% as pathogenic + likely-pathogenic. A nearly equal number were benign + likely-benign. And in what may seem to be a bit of a surprise, the variants of uncertain significance are still high at 40% of the entries. So pathogenicity extended is trending at about a third of examined rare variants and VUS is trending at close to 40%.
What are the data trends over time?
Couple the large problem of Variants of Uncertain Significance with data trends and we have an alarming phenomenon looming upslope of us. As previously posted, Clinvar data was extracted for the rate of new submissions since 2013. We see the number of new submission is starting to accelerate in just the last few years.
An avalanche of uncertainty is coming and we appear to be ill prepared to deal with it.
It is just a febrile seizure episode….
Alemeyahu was still not breathing. I had just put about 5 breaths of air into his lungs. I checked his pulse – thankfully his heart is still beating, ….but he is still not breathing! I put more air into his lungs. Then finally, a weak breath. Then another and another – he was finally breathing again.
I called the paramedics and by the time they got there, Alemeyahu was back to being a normal 2 year old – trying to crawl over the downstairs barricade, trying to put thumb size rocks in his mouth, you know, the usual 2 year old stuff!!!
We got him to the hospital and they ran all kinds of tests. They cant find anything wrong. They diagnose it as a Febrile Seizure Episode and say “Bring him back if it repeats.” Well, ever since, there has been no repeat episode, but for others people with chronic epilepsy, they sometimes have to deal with repeats on a daily basis. Sometimes even hour to hour, and many of the current epilepsy drugs don’t work well for 1/3rd of all epilepsy cases. So, we must find a way to help people with epilepsy achieve better control their seizures symptoms.
Humanized animal models may be one key system to help us understanding variant biology and finding new therapeutic options in epilepsy.
Our team successfully inserted the human coding sequence of STXBP1 into the native locus of C. elegans otholog gene. We call the technique a “Gene-Swap” because the worms version of the gene is replaced with a human coding sequence. The Gene-Swapped STXBP1 sequence functioned and gave a high level of rescue activity (the “Humanized” strain). A knock-out was made by precisely deleting the unc-18 locus (ortholog of STXBP1). In the process, all coding sequence of unc-18 is removed and a full loss-of-function deletion allele is generated. In the gene variant we insert the p.Arg406His into the humanized locus.
The genes-swap humanized strain for expressing hSTXBP1 show a significant level of activity (see video above). In contrast, the knock-out shows very little activity. For the R406H genomic variant, its activity is somewhere in between the “humanized” and “knock-out” strain.
Behavior quantified using three assays
R406H’s deviant behavior was quantified in a microfluidics system that measures an EEG-like electrical activity of the animal. We added two other assays on this variant – a microtracker test for thrashing in liquid and and a chemotaxis test for speed to navigate to food source. We also added two other reference alleles that are also known to be pathogenic variants of STXBP1 (R292H and R388X). The three variants in red can be compared to the wild type “gene-swap” for the humanized line in blue. In the ephys assays, we can see that two variants, R292H and R406H, have increased rate of neurotransmission. In contrast, the R388X allele has a lower rate of transmission. Adding in the thrashing assays for movement in liquid, we see that the R292H has lost its deviant behavior, while the R406H and R338X have retained a detectable level of altered function. A final assays was performed that uses a sensitive chemotaxis behavioral assay. This assay measure the ability of animal to reach a food source in 1 hr. All variants show altered function and only the R292H allele show residual level of activity.
So we can see that a series of assays provide measure of pathogenic variant biology and give a glimpse into the mechanism of disease.
Speed is an important aspect to measuring variant biology. Our team did an internal contest. We wanted to see how fast we could install a variant and measure activity. Our team did it and provided an assessment in just under 10 days.
The variant profiling system has three key values:
First, because the variant profiling system uses a gene-swapped animal expressing a human gene, any variant can be installed and profiled for functional defects.
Second, a series of assays can be deployed to detect the nuanced differences in variant pathogenicity.
Third, the humanized animal models can be used in high-throughput screens as discovery systems for finding personalized therapeutics.
The use of iPSC cells is quickly gaining momentum as tool of personalized medicine
Whether you call them stem-cell-like cells with unique expression signatures (Chin 2009), or if you stick with the seminal publication and call them induced Pluripotent Stem Cells (Takahashi 2007), iPSC technology is becoming an impressive system for helping understand genome-encoded disease biology. We can expect iPSCs to have major impact in regenerative medicine, disease modeling, and drug development.
iPSC technology is a relatively new technique for variant profiling.
Since inception with demonstration in mice (Takahashi 2006), progress has occurred rapidly. Original systems were developed using retroviral integration tools, which have high risk of chromosomal instability and tumorigenesis. Then, in the last 10 years, significant progress has been made to improve the process by making the technique non-integrative and more efficient (Omole and Fakoya 2018). Genomic integration methods using viruses have traditionally yielded the highest efficiency, but non-integrative methods are quickly catching up. The original methods used a standard set of gene expression modifiers: Oct3/4, Sox2, Klf4, and c-Myc – typically referred to as the “OSKM cocktail.” These are encoded in viruses and plasmids that are transfected into the cell and integrate into the genome. This one time procedure contrast with non-integrating methods, which require repeat transfections to boost cells towards pluripotency (Warren 2010). Once induced into the pluripotent state, the cell can be modified (CRISPR-based gene editing or similar methods) to establish control lines with and without variant in question. Finally cells are differentiated into desired tissue by exposure to appropriate tissue-specification factors. With this capacity to analyze patient derived tissue, you can feel fairly certain the biological differences seen between control and variant will translate well to the patient’s specific biology.
Speed to variant biology data requires 2 to 5 months
Relative to rodent animal models, the system is fast. Most human cell cultures can be enticed to achieve the proper pluripotent expression profile after about 4 weeks of growth and then there is another 2 months or more to induce desired tissue type. For instance, in a popular method, DeRosa et al. used blood-derived Sendi-virus-transformed iPSCs to derived cortical neurons and test them for biological consequence with various functional assays (DeRosa 2018). Biomarkers for transformation into iPSCs were cell-stain-cofirmed with antibodies for NANOG, Oct 3/4, and SOX2. Next, cells were differentiated to desired cortical neuron tissue type by exposure to relevant co-factors and monitored for sequential biomarkers production (Nestin to DCX to CAMK2A to TBR1) and finally, at 90 days, the last set of biomarkers was used (MAP2 and SYNAPSIN1). The result, the creation of iPSCs and their differentiation into appropriate tissue types takes about 3 months or more, depending on the desired cell type needed (McKinney 2017 and DeRosa 2018).
Low efficiency problem of easily sourced iPSCs is showing signs of dramatic improvement
Low efficiency in creation of iPSCs has been the main drawback preventing routine use as a clinical diagnostic. Efficiency is measured as number of iPSC cells obtained after dividing by the number of input cells prior to starting transformation. Efficiencies vary widely with multiple reports range from 0.1% to 0.001% (Malik and Rao 2013). The measured efficiency is heavily influenced by choice of starting material. Fibroblast show some of the highest efficiencies but typically require a biopsy plug from the skin to enable isolation of sufficient amounts of starting cells. Yet, researchers are hard at task working on conditions that improve efficiency and find easier to source material. Two years back, a report published conversion of fibroblasts had improved to an efficiency near 3% (Pomeroy 2016). Isolation of iPSC is more convenient to the patient source of blood sources (PBMCs) but efficiency remain low at 0.15–0.32% (Zhou 2015). Generation of iPSCs from PBMCs is problematic because they are non adherent. For human cells, the lack of adherence is especially problematic because it leads to high activation of cell death through apoptosis. Researcher are finding the use of adhesion promoting matrices (Geltrex and rhLaminin-521) and ROCK inhibitors of apoptosis can greatly improve the efficiency process for making iPSCs from blood sources (Ye 2018). Additional recent developments are highly encouraging. For a nearly nonexistent invasiveness to the patient, researchers have demonstrated iPSC can be sourced from patient urine (Gaignerie 2018). Further, in a very impressive efficiency feat, a meticulous study of fibroblast cell conversion was performed by sampling a wide variety of optimizing conditions (Kogut 2018). Using mRNA encoded transformation factors with a select set of microRNA inhibitors of transcription, the Kogut team demonstrated an amazing 80% efficiency in fibroblast conversion to iPSC. Additionally, they found they could isolate single cells and get conversion to iPSC in 90% of the isolates. Intriguingly, in an almost counter-intuitive finding, efficiency of fibroblast conversion started to drop dramatically when more than 1000 source cells were present at the starting conditions. Finally, their speed to conversion of source to iPSCs was only 15 days. In conclusion, it appears iPCS methodologies are breaking through the efficiency barrier and soon the use of patent-derived iPSCs will become part of routine clinical diagnostic procedures.
Transplantation of iPSCs as therapeutics is challenged by tumorigenics issues.
Three main applications of iPSC are its promise for regenerative medicine, disease modeling, and drug development, yet transplantation for regenerative medicine has issues of tumorigenesis to overcome (Focosi 2018). The original work by Takahashi was plagued by significant tumorigenicity, because the method uses genomic-integrating viral vectors with transformation factors known for their tumorigenesis potential – mouse iPSC derived from the method would result in tumors at 20% of the time when the cells were reintroduced into mice (Omole and Fakoya 2018). Switching to a transformation cocktail that avoids the use of cMyc (OSKM to OSNL) reduced tumor formation rates. Yet the alternative transformation factors do not eliminate tumorigenesis, possibly due to genomic integration causing insertional mutagenesis. To reduce tumorigenicity even further, researchers have been switching to non-integrative methods. Yet even then, a low level of tumorigenic capacity remains – apparently inherent to iPSC cells “totally potent” status, where they can become many different types of tissue, including cancerous ones. As a result, when differentiating iPSC into tissue for reintroduction into the patient as treatment, the FDA remains concerned about the tumorigenic potential of any cells that remain in the iPSC state. Current thoughts are that a 10-fold passage of a differentiated cell population may effectively eliminate the tumorigenic risk, yet this need for high passage number has tempered the enthusiasm for use of iPSCs for regenerative medicine applications.
Polygenic Consequence – iPSCs from patient tissues has unique advantage that the multiple “Risk Factors” variations of the patient background are retained.
One big advantage to using patient-derived tissue is the genetic background of the test system is exactly as occurs in the patient. For instance, DeRosa and team studied Autism-related variants using patient-derived iPSCs differentiated into neuronal tissues (DeRosa 2018). They looked at 6 patients with variants in target genes suspected of involvement in Autism Spectrum Disorder. Genomic data was provided on 5 of the patient conditions (see Table below). All suspect variants were observed as heterozygotes. As a result, clear pathogenicity of any variant is lacking, when referenced against existing databases sources. Nevertheless, they saw clear phenotypic consequence of all 6 cell lines examined in their studies (RNA-seq, multi-electrode array recordings, spontaneous calcium transients and scratch recovery assays).
In what might be a highly recommended next step, the De Rossa authors could use CRISPR on their iPSC lines to make isogenic controls. For instance, on cell line 377110, the VPS123B variation is a prime candidate for using CRISPR-Cas9 gene editing techniques to reverse the rs28940272 locus back to Asparagine. If this isogenic control behaved as wildtype in all the phenotyping assays deployed, then the authors would have generated a definitive demonstration that the Asn2993Ser variant in VPS13B is pathogenic.
Modeling functional defects in autism variants could be done in C elegans (see Table below). 5 of the 8 genes in the DeRosa work have homology that exceeds 42%. The PRICKLE1 gene has sufficient similarity. Either its Val57Phe or Glu185Ter variations could be installed into a PRICKLE1-humanized animal model. The Glu185Ter would very likely model a loss-of-function allele effect, but Val57Phe is a missense variant that could go either way. The change of valine to a phenylalanine may disrupt by either gain-of-function, or it may cause a loss-of-function leading to quite different phenotype in functional assays. Only testing directly in an a model system will be the way to get the needed definition of mechanism of action.
Modeling epilepsy with iPSCs requires 135 day preparation protocol
iPSC cell technology will likely become a widespread tool of personalized medicine and it holds significant promise for neuronal regeneration (Wu 2018). Although it suffers from challenges of asynchronicity, tumorigenicity, timeliness, and low efficiency (Vitrac 2018, Omole and Fakoya 2018), much progress has been made, as attested with this article’s focus on the DeRosa paper.
To list aspects of concern and advantage with iPSC tech, we have:
Tumorigenicity and Immunogenicity. This is especially an important concern in regards to regenerative medicine uses of iPSC. Perhaps the most promising approaches involve the use if various RNA molecules to provide the requisite reprogramming factors at minimal potential for tumorigenicity and immunogenicity.
Clinical Diagnostic. Sendai virus tech seems to be one of the most promising methods. Stability reprogramming factors is high, so repetitive redosing is not as problematic. Also sourcing from easy to acquire primary culture (blood, urine, and saliva) is promising for minimizing the procedure’s invasiveness to the patient.
Isogenic Control. CRISPR-Cas9 tech can be used to return a suspect variant back to wild type. Functional studies on the variant strain and its isogenic control will determine if the variant is pathogenic or benign.
Chin MH et al. Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. Cell Stem Cell. 2009 Jul 2;5(1):111-23. doi: 10.1016/j.stem.2009.06.008. https://www.ncbi.nlm.nih.gov/pubmed/19570518
DeRosa BA et al. Convergent Pathways in Idiopathic Autism Revealed by Time Course Transcriptomic Analysis of Patient-Derived Neurons. Sci Rep. 2018 May 30;8(1):8423. doi: 10.1038/s41598-018-26495-1. https://www.ncbi.nlm.nih.gov/pubmed/29849033
El Hokayem J et al. Blood Derived Induced Pluripotent Stem Cells (iPSCs): Benefits, Challenges and the Road Ahead. J Alzheimers Dis Parkinsonism. 2016 Oct;6(5). pii: 275. doi: 10.4172/2161-0460.1000275. Epub 2016 Oct 25. https://www.ncbi.nlm.nih.gov/pubmed/27882265
Focosi D and Amabile G. Induced Pluripotent Stem Cell-Derived Red Blood Cells and Platelet Concentrates: From Bench to Bedside. Cells. 2017 Dec 27;7(1). pii: E2. doi: 10.3390/cells7010002. https://www.ncbi.nlm.nih.gov/pubmed/29280988
Gaignerie A, Lefort N Rousselle M, Forest-Choquet V, Flippe L, Francois-Campion V, Girardeau A, Caillaud A, Chariau C, Francheteau Q, Derevier A, Chaubron F, Knöbel S, Gaborit N, Si-Tayeb K, David L. Urine-derived cells provide a readily accessible cell type for feeder-free mRNA reprogramming. Sci Rep. 2018 Sep 25;8(1):14363. doi: 10.1038/s41598-018-32645-2. https://www.ncbi.nlm.nih.gov/pubmed/30254308
Pomeroy JE, Hough SR, Davidson KC, Quaas AM, Rees JA, Pera MF. Stem Cell Surface Marker Expression Defines Late Stages of Reprogramming to Pluripotency in Human Fibroblasts. Stem Cells Transl Med. 2016 Jul;5(7):870-82. doi: 10.5966/sctm.2015-0250. Epub 2016 May 9. https://www.ncbi.nlm.nih.gov/pubmed/27160704
Kogut I et al. High-efficiency RNA-based reprogramming of human primary fibroblasts. Nat Commun. 2018 Feb 21;9(1):745. doi: 10.1038/s41467-018-03190-3. https://www.ncbi.nlm.nih.gov/pubmed/29467427
Omole AE and Fakoya AOJ. Ten years of progress and promise of induced pluripotent stem cells: historical origins, characteristics, mechanisms, limitations, and potential applications. PeerJ. 2018 May 11;6:e4370. doi: 10.7717/peerj.4370. eCollection 2018. https://www.ncbi.nlm.nih.gov/pubmed/29770269
Takahashi K and Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006 Aug 25;126(4):663-76. Epub 2006 Aug 10. https://www.ncbi.nlm.nih.gov/pubmed/16904174
Vitrac A and Cloëz-Tayarani I. Induced pluripotent stem cells as a tool to study brain circuits in autism-related disorders. Stem Cell Res Ther. 2018 Aug 23;9(1):226. doi: 10.1186/s13287-018-0966-2. https://www.ncbi.nlm.nih.gov/pubmed/30139379
Warren L et al. Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA. Cell Stem Cell. 2010 Nov 5;7(5):618-30. doi: 10.1016/j.stem.2010.08.012. Epub 2010 Sep 30. https://www.ncbi.nlm.nih.gov/pubmed/20888316
Wu S et al. On the Viability and Potential Value of Stem Cells for Repair and Treatment of CentralNeurotrauma: Overview and Speculations. Front Neurol. 2018 Aug 13;9:602. doi: 10.3389/fneur.2018.00602. eCollection 2018. https://www.ncbi.nlm.nih.gov/pubmed/30150968
Zhou H, Martinez H, Sun B, Li A, Zimmer M, Katsanis N, Davis EE, Kurtzberg J, Lipnick S, Noggle S, Rao M, Chang S. Rapid and Efficient Generation of Transgene-Free iPSC from a Small Volume of Cryopreserved Blood. Stem Cell Rev. 2015 Aug;11(4):652-65. doi: 10.1007/s12015-015-9586-8. https://www.ncbi.nlm.nih.gov/pubmed/25951995
The Child Neurology Society honored William (Bill) Dobyns for his highly impactful efforts in characterizing child neurology. In a prolific and highly influential carrier, Bill talked about how the key influencers in his life shaped the directions he pursued. Reinforcing the title of his talk “The Names of Things, ” Bill was the first to discover and name the LIS1 gene of lissencephaly (Dobyns 1993).
Whats in a name?
A spicy little comment was made by Bill. He let it be known he did not appreciate the HUGO Nomenclature Committee’s decision to rename LIS1 to PAFAH1B1. Definitely a mouthful and a change that does not seem necessary, since LIS1 appears to have no confusion in Pubmed literature searches.
To worm it or not – Can comparative biology can help?
It appears gene humanization in the nematode may be of use in characterizing lissencephaly genes. A quasi-random sampling of 9 genes mentioned in Bill’s presentation was obtained (all the ones I wrote down in my notes!). Next, a table of properties was made:
We first ask: What is the homology to C. elegans? 8 of 9 genes are identified as homologs. All but the AUTS2 are found to have a worm gene equivalent, as scored for similarity percentage using the DIOPT website. Next, we ask Which of these homologs have sufficient similarity for application of a gene-swap technique? Using inspiration from Douglas Adams for a rather arbitrary cutoff of 42, we get 6 of 8 homologs with enough similarity that gene substitution with human cDNA has a chance of retaining function and rescuing a null phenotype. We are currently 3 for 3 in genes tried for gene-swap in epilepsy – STXBP1, KCNQ2, CACNB4 – all can function and rescue their respective C. elegans null. Finally, we ask Is there something to rescue in the Dobyns gene list? 4 of the 6 good homologs are found to be lethal (LIS1, MTOR, ZIC1 and PIK3R2), when removed by either by genetic deletion and or RNAi knock-down. For the remaining two genes (ATK2 and PC2CA), although not appearing to be lethal, they have detectable phenotypic consequence when their homolog is absent in the nematode. Bottom line, 6 of 9 have good prospects for using comparative biology to answer variant pathogenicity questions.
Symposium IV: Precision Medicine – Epilepsy, The Next Frontier
This session was of high interest to the geneticist wanting to design high-throughput animal models for use in rare disease discovery. Jeffery Loeb opened with a “Big Data” discussion on linking electrode implantation in the brain with various properties of tissue biopsies (genomics, transcriptomics, metabolomic, and histological data). It was a fascinating and bewildering look at the 1700 genes that are associated with bursting/spiking phenomena in the brain. Next was Maduri Hedge who took us on a tour of what is the latest and greatest in whole genome analysis – basically what is happening in the 5342 OMIN-identified disease genes. Much of the answer is we do not know yet, because there is an ever expanding universe of Variants of Uncertain Significance (VUS), fueled by the fact that each person averages 3,000,000 differences in sequence when compared to reference human genome. (Hedge 2017). With 1% of genome as coding, this suggest 30,000 variations occur in exonic coding sequence. The amount of this coding sequence variation that is known pathogenic is much less. Filtering out high frequency alleles and checking against databases where variant call is either pathogenic or likely-pathogenic, we get an average of 2.1 pathogenic/likely-pathogenic variants per person (Rego 2018). The same data reveals each person can expect another 17.6 variant alleles as either VUS or as unassigned in status.
Demonstration of rapid pathogenicity assessment of a VUS allele.
Next presentation was a great tag-team duo of Alfred George and John Millichap speaking on the rapid pathogenicity assessment of a VUS in KCNQ2 gene. Its title, “Emerging Paradigms for Precision Medicine in Early Life Epilepsy: Bed-to-Bench Workflow” exemplified where genomic biology is now starting to hold focus. They showed data for a three week turn-around in providing functional data on a patient variant. Their data demonstrated a newly-discovered variant in KCNQ2 behaves as a loss-of-function allele. In a very short time, from genomic data to clinically-actionable results, the patient had a genetic mechanism revealed for their illness and drug treatment options became more clear. This same variant, by the way, was assessed as a VUS call, 3 months later when the genomics lab finished applying the traditional ACMG-AMP guideline assessment criteria. Obviously functional studies deployed quickly after genomic data acquisition can be quite impactful in helping find treatment choices for the clinician and their patient.
Time to Get Organized
Finally, wrapping up what proved to be a lively lecture series and a very long QnA afterward, we had Anne Berg give us a plea for coming together as a community and doing what has been done in cancer, and with Spinraza on SMA – Find ways to work together, get DNA diagnostics done as quickly as possible and then screen variations for function consequence. There is a big need for an organized effort in epilepsy. Early intervention in neurological diseases has the capacity to create a dramatic improvement in outcome. Anne provided us some data to help understand how big is the problem:
The size and coordination of effort in cancer has recently been giving us miraculous gains. For some types of cancer, what was nearly a death sentence 10-20 years ago is now, in many of today’s oncology practices, a responsive and well-controlled disease. Anne feels it is time for a profound shift in the way epilepsy is approached and treated. We need to apply the same rigor and interactivity that drove the success of cancer biology understanding. When it come to pediatric populations, cancer is less significant than epilepsy. There are 29,000 new pediatric epilepsy cases each year, while pediatric cancer cases occur at 13,000 per year – a more than 2x lower in rate. The complexity of disease types between epilepsy and cancer is of similar size, so systematically fostering the level of researcher-clinician interaction as seen in cancer biology should be highly productive for epilepsy understanding and discovery. Ultimately the epilepsy ecosystem is ripe for new approaches that can be adequately and rapidly address the complexity of epilepsy biology and uncover new therapeutic modalities.
Dobyns WB et al. Lissencephaly. A human brain malformation associated with deletion of the LIS1 gene located at chromosome 17p13. JAMA. 1993 Dec 15;270(23):2838-42. https://www.ncbi.nlm.nih.gov/pubmed/7907669
Hegde M, Santani A,Mao R,Ferreira-Gonzalez A,Weck KE and Voelkerding KV. Development and Validation of Clinical Whole-Exome and Whole-Genome Sequencing for Detection of Germline Variants in Inherited Disease. Arch Pathol Lab Med. 2017 Jun;141(6):798-805. doi: 10.5858/arpa.2016-0622-RA. Epub 2017 Mar 31. https://www.ncbi.nlm.nih.gov/pubmed/28362156
Shannon Rego, Orit Dagan-Rosenfeld, Wenyu Zhou, M. Reza Sailani, Patricia Limcaoco, Elizabeth Colbert, Monika Avina, Jessica Wheeler, Colleen Craig, Denis Salins, Hannes L. Rost, Jessilyn Dunn, Tracey McLaughlin, Lars M. Steinmetz, Jonathan A. Bernstein, Michael P. Snyder.
High Frequency Actionable Pathogenic Exome Mutations in an Average-Risk Cohort. bioRxiv 151225; doi: https://doi.org/10.1101/151225
“You have evolved from worm to man, but much within you is still worm.”
Genetic diversity in individuals and between species is responsible for bewildering variability and biological niche adaptation of life, yet much of the essential genes involved in disease presentation are highly conserved from yeast to humans. For instance, the direct comparison C. elegans genome to H. sapiens reveals only 44% of the genes are similar (Shaye and Greenwald 2011). Yet when one restricts the comparison to the 6460 genes known to be associated with genetic disease (1/3rd the human genome), clear similarity (orthology) to C. elegans occurs for 79% of the human disease genes (ClinVar database). This high degree of interspecies conservation between worm and human has recently become more recognized and appreciated for use in disease biology understanding (Golden 2017, Wang 2017, Wangler 2017, Apfeld and Alpers 2018)
The Undiagnosed Disease Network (UDN) has been making significant strides in rare disease research and holds an emphasis on developing animal models for use as tools of variant biology discovery (Splinter 2018). Tim Schedl at Washington University in St Louis is now a recent addition to the UDN. Tim will run the C elegans nematode Model Organisms Screening Center (MOSC). The worm is proving to be a useful tool for uncovering the disease related biology (Bend 2016, Wangler 2107, Luo 2017, Chao 2017, Oláhová 2018, Liu 2018, Guiberson 2018). Perhaps the biggest place where the nematode holds promise is its proven capacity for high-thoughput screening (Leung 2013, Rangaraju 2015, O’Reilly 2016, Lucanic 2018, Partridge 2018). The Leung publication achieved an impressive 340,000 compounds tested in 1536-well format in 5 weeks. If we can find the systems that accurately model human disease in the nematode animal model, we will have developed a system ripe for high-throughput discovery of new pharmaceuticals.
Can we model clinical variant biology in the worm?
Using CRISPR and related techniques, it is now quite easy to insert DNA changes into the genome of animal models. For example, our company just hit a milestone of 2000 transgenics delivered. But the puzzle now becomes are we doing it right. What is the best transgenic configuration to address disease biology? We can put a clinical variant in the native gene of the nematode (Figure 1A). Although there is a large amount of similarity when making gene-to-gene connections in disease genes, a question arrises: is there enough similarity at the amino acid level? When one maps out the location of know pathogenic variants to their locations at the amino acid sequence, one occasionally finds that established or likely-pathogenic variants do not occur at a conserved position. For instance, in a sequence occuring in the middle of the C. elegans STXBP1 gene, three variants suspected of pathogenicity are shown (Figure 1B). The patient variants R406H and L426P are at conserved positions in the worm unc-18 homolog. Unfortunately M443R does not have the identical base in C. elegans. If we insert a Methionine (“M”) in for the worm’s Alanine will we get function? …Probably. If we put in an Arginine instead (“R”) will the gene work? …Probably not. Yet, we are left with a bit of ambiguity because location is not identical between species. A more robust way to model human biology is to put in the entire human gene into the nematode genome. To explore this concept, we have developed a technique that we call “gene-swap humanization” (Figure 2). We remove the endogenous gene of the worm and replace it with a cDNA copy of the human gene. This creates a gene-humanized animal.
We have only just started, but we have uncovered an interesting finding: Gene-swap humanization appears to be better than native locus for use as platform for modeling disease variant biology. We decided to compare variant installs in the native locus to installs in the humanize locus. The main driver to try this out was sequence similarity, or more precisely, the lack of it. When one compares the human to mouse for the STXBP1 gene, one see 100% identical amino acid usage between the two species. Contrast this to human sequence to the worm’s unc-18 gene – one sees only 59% identical amino acid usage. With nearly half the amino acids not being fully conserved, can we model in the native locus effectively, or….
“What will happen if we swap-in the human gene as a gene replacement of the worm gene?“
The new gene editing CRISPR technology is allowing us to address this question. Genes are now quite easy to order up as fully synthetic. The DNA sequence comes in a plasmid format that we can inject, with the right CRISPR components, directly into the gonad of the worm. The progeny of the animal will have a high propensity to have picked up the genome edit. We use PCR technology to find the edited animal and then grow up the population for testing in various activity assays (Figure 3). From concept to variant-specific animal data in less than one month.
What we found fascinating were two finding:
1. Human gene retains function in worm.
2. Variants in human gene render the worm sensitized for pathogenicity assessment.
That second observation begs more explanation. In figure 3, you will see two types of phenotyping assays performed on three known pathogenic variants of STXBP1 (R292H, R406H and R388X). When we put these patient-variants into the native locus at their sequence conserved positions, we see only the R388X gives a big deviation of function. Alternatively, when we put these variants in the gene-swapped hSTXBP1 locus, we see that all variants exhibit a statistically significant deviation of function in both assays. As a result, it appears the gene-swap humanized locus is better choice for modeling clinical variants.
“What about that first observation?” Lets not overlook that one! This shows that for this particular protein, its physiological role is highly conserved between worm and man. She/he (worms are self-fertilizing hermaphrodites – more on that later…) is giving data that suggest modeling variants in humanized worms can be useful as a platform for variant biology discovery!
We have found one more interesting finding in our humanized strain studies. The types of phenotyping assays used on a clinical variant will depend on the unique biology of that variant. In figure 4, we have three different activity assays. What should jump out at you is that each assay has a unique response. The R292H and R406H variants have gain-of-function response in the electrophysiology, while the R388X has lost function. In the next assay, the animal locomotion measurement, we see wild-type activity in R292H, while R406H and R388X are exhibiting movement deficiencies. In the last highly-sensitive chemotaxis assay, only R292H retains an ability to sniff out and go to food – the remaining two, R406H and R388X, are not capable of getting to food in the 1 hr test period of the assay. One last observation for this figure – there is high similarity in electrophysiology response for the KCNQ2 knock-out and the R292H and R406H variants in STXBP1. All three (KCNQ2 knock-out and the two STXBP1 variants) show gain-of-function response. This is expected if all these variants lead to M-current modulation of potassium channels. What we speculate based on prior findings (Devaux 2016) is the R292H and R406H are defective in chaperone function of syntaxin. They no longer prevent syntaxin from being an inhibitor of potassium channels and effectively these STXBP1 variants have a physiological mode of action leading to sodium channel inhibition. Prediction: ezogabine will be useful on patients with hSTXBP1(p.R292H) and hSTXBP1(p.R406H) as a treatment for reversing their levels and intensity of epilepsy seizures, but probably be contraindicated for hSTXBP1(p.R388X) clinical variants.
The last component of this blog post is to discuss formats for making the animals perform drug screening in high-throughput formats. Loss of locomotion can be a powerful screen, but what might be more sensitive is the use of genetically-encoded biosensors. The biosensor formats we favor most are the transcriptionally-activation reporter systems (Figure 3). Typically these are GFP/RFP combos where one reporter is driven by a promoter that is either up regulated or down regulated upon being in a particular genetic background. Loss-of-function alleles can have a myriad of gene transcriptional effects and biosensor reporters have previously been used to dissect physiological pathway interactions in C. elegans (Urano 2002, Lea 2005, Kahn 2008).
We have used biosensors to probe drug response in nuclear hormone receptor signaling (Figure 6). We compared biosensor responses to assays of behavior. Perhaps what is most striking about the data is the sensitivity. In the dafadine assay we see nearly 10x higher sensitivity with the biosensor based assay. From a drug discovery standpoint, this means 10x less compound is need in order to see an effect. For the dafachronric acid assay, the result is even higher – 30x higher than the biological activity output. For those who do drug discovery, the application of biosensors to clinical-variant profiling will be worth the investment in biosensor construction – cost saving will be obtained from reduced consumption of the expensive compounds used in a drug library.
Apfeld J et al. What Can We Learn About Human Disease from the Nematode C. elegans? Methods Mol Biol. 2018;1706:53-75. doi: 10.1007/978-1-4939-7471-9_4. https://www.ncbi.nlm.nih.gov/pubmed/29423793
Bend E et al. NALCN channelopathies: Distinguishing gain-of-function and loss-of-function mutations. Neurology. 2016 Sep 13;87(11):1131-9. doi: 10.1212/WNL.0000000000003095. https://www.ncbi.nlm.nih.gov/pubmed/27558372
Devaux, J. et al. A possible link between KCNQ2- and STXBP1-related encephalopathies: STXBP1 reduces the inhibitory impact of syntaxin-1A on M current. Epilepsia. 2017 Dec;58(12):2073-2084. doi: 10.1111/epi.13927. https://www.ncbi.nlm.nih.gov/pubmed/29067685
Golden A. From phenologs to silent suppressors: Identifying potential therapeutic targets for human disease. Mol Reprod Dev. 2017 Nov;84(11):1118-1132. doi: 10.1002/mrd.22880. Epub 2017 Oct 3. https://www.ncbi.nlm.nih.gov/pubmed/28834577
Kahn NW et al. Proteasomal dysfunction activates the transcription factor SKN-1 and produces a selective oxidative-stress response in Caenorhabditis elegans. Biochem J. 2008 Jan 1;409(1):205-13. https://www.ncbi.nlm.nih.gov/pubmed/17714076
Leung CK et al. An ultra high-throughput, whole-animal screen for small molecule modulators of a specific genetic pathway in Caenorhabditis elegans. PLoS One. 2013 Apr 29;8(4):e62166. doi: 10.1371/journal.pone.0062166. Print 2013. https://www.ncbi.nlm.nih.gov/pubmed/23637990
Liu, N. et al. Functional variants in TBX2 are associated with a syndromic cardiovascular and skeletal developmental disorder. Hum Mol Genet. 2018 Jul 15;27(14):2454-2465. doi: 10.1093/hmg/ddy146. https://www.ncbi.nlm.nih.gov/pubmed/29726930
Lucanic M et al. A Simple Method for High Throughput Chemical Screening in Caenorhabditis Elegans. Journal of visualized experiments : J Vis Exp. 2018 Mar 20;(133). doi: 10.3791/56892. https://www.ncbi.nlm.nih.gov/pubmed/29630057
Luo X et al. Clinically severe CACNA1A alleles affect synaptic function and neurodegeneration differentially. PLoS Genet. 2017 Jul 24;13(7):e1006905. doi: 10.1371/journal.pgen.1006905. eCollection 2017 Jul. https://www.ncbi.nlm.nih.gov/pubmed/28742085
Oláhová, M. et al. Biallelic Mutations in ATP5F1D, which Encodes a Subunit of ATP Synthase, Cause a Metabolic Disorder. Am J Hum Genet. 2018 Mar 1;102(3):494-504. doi: 10.1016/j.ajhg.2018.01.020. https://www.ncbi.nlm.nih.gov/pubmed/29478781
O’Reilly L et al. High-Throughput, Liquid-Based Genome-Wide RNAi Screening in C. elegans. Methods Mol Biol. 2016;1470:151-62. doi: 10.1007/978-1-4939-6337-9_12. https://www.ncbi.nlm.nih.gov/pubmed/27581291
Partridge F et al. An automated high-throughput system for phenotypic screening of chemical libraries on C. elegans and parasitic nematodes. Int J Parasitol Drugs Drug Resist. 2018 Apr;8(1):8-21. doi: 10.1016/j.ijpddr.2017.11.004. https://www.ncbi.nlm.nih.gov/pubmed/29223747
Rangaraju S et al. High-throughput small-molecule screening in Caenorhabditis elegans. Methods Mol Biol. 2015;1263:139-55. doi: 10.1007/978-1-4939-2269-7_11. https://www.ncbi.nlm.nih.gov/pubmed/25618342
Rea SL et al. A stress-sensitive reporter predicts longevity in isogenic populations of Caenorhabditis elegans. Nat Genet. 2005 Aug;37(8):894-8. Epub 2005 Jul 24. https://www.ncbi.nlm.nih.gov/pubmed/16041374
Shaye DD and Greenwald I. OrthoList: a compendium of C. elegans genes with human orthologs. PLoS One. 2011;6(5):e20085. doi: 10.1371/journal.pone.0020085. Epub 2011 May 25. https://www.ncbi.nlm.nih.gov/pubmed/21647448
Splinter K et al. Effect of Genetic Diagnosis on Patients with Previously Undiagnosed Disease. N Engl J Med. 2018 Oct 10. doi: 10.1056/NEJMoa1714458. https://www.ncbi.nlm.nih.gov/pubmed/30304647
Urano F et al. A survival pathway for Caenorhabditis elegans with a blocked unfolded protein response. J Cell Biol. 2002 Aug 19;158(4):639-46. Epub 2002 Aug 19. https://www.ncbi.nlm.nih.gov/pubmed/12186849
Wang J et al. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome. Am J Hum Genet. 2017 Jun 1;100(6):843-853. doi: 10.1016/j.ajhg.2017.04.010. Epub 2017 May 11. https://www.ncbi.nlm.nih.gov/pubmed/28502612
Wangler, M. F. et al. Model Organisms Facilitate Rare Disease Diagnosis and Therapeutic Research. Genetics. 2017 Sep;207(1):9-27. doi: 10.1534/genetics.117.203067. https://www.ncbi.nlm.nih.gov/pubmed/28874452
Zhou, P. et al. Novel mutations and phenotypes of epilepsy-associated genes in epileptic encephalopathies. Genes Brain Behav. 2018 Jan 4. doi: 10.1111/gbb.12456. https://www.ncbi.nlm.nih.gov/pubmed/29314583