When you get the genomic report, you have a movement of trepidation. What will it say? ….Will it have a reveal that says you should do countermeasures immediately? ….Will it say something that you can do nothing about? The latter condition occurred for me. There were findings that had strong impact on my psyche.
Two things were called out heavy. A cancer risk of melanoma. Good thing my family, first my momma, and then my spouse, have been diligent in their liberal in the application sunscreen to the family. Once I googled and pubmed searched the MC1R(R160W) locus, I found the evidence was less than compelling for a dramatic change of lifestyle. Just keep the sunscreen coming and I will likely be fine.
The carrier result was a little more of a shocker. A good personal friend has a daughter homozygous in this gene. It was discovered in utero and they have been vigilant ever since. Their daughter is now in her teens. Doing exceptionally well and acting like any normal kid – currently enthralled with dance class and other outdoor activities. Preventative medicine done right. So getting tagged with a pathogenic in this gene is giving me mixed feelings. A mix of some worry and yet, almost pride. Even though my good friends don’t share my specific genetic lesion, it still feels very personal and connecting. Furthermore, this is one of the genes where modern genomic medicine is making great progress in understanding and treatment.
Will you too be a carrier of a pathogenic variation?
Carrier status is something all of us should expect. Veritas recently publicly disclosed at the Precision Medicine World Congress that their database has 90% of customer reports as returning with carrier status for at least one pathogenic variant. Recent discussions with Robert Green at Harvard confirm this – he showed me a large dataset that gave the number as 92% of healthy populations as being carriers for known pathogenic variants. You might think that there are a lucky few (10%) who are not carriers, but think again. The average person will have close to 3 million differences from the reference genome and this may be an underestimate. Distribute that unbiased across the genome and we have coding regions with close to 30 thousand variations. Since you have close to 20 thousand genes that means every gene has approximately 1.5 variations in it. Now lots of approximating, and does not factor in selection against bad variations. Yet in that quick calculation, the main message is every gene is likely to have a variation and some genes will have multiple variations. So the original question of how many of these are pathogenic, becomes difficult to approximate. Publications suggest we may have up about 1300 suspect variations hiding in our genome. Yet definitive variants with “known” pathogenicity is likely to be much lower in your genome.
Complicating this is issue is variable penetrance – a pathogenic variant in one family may behave with monogenic behavior in that family. While in another family, that same variation may be acting more polygenic – it needs other gene mutations to have pathos in the patient. It is behaving more like a “risk factor” for disease.
Pathogenic variant frequency in Chris Hopkins’ genome
The vagueness of my carrier status “kills” me, so I wanted to know in more. I contacted a good friend at the Rady Children’s Hospital. Dr. Matthew Bainbridge is a researcher who was a key contributor to the Rady’s renowned speed at using whole genome sequencing for rapid genetic diagnosis. Matthew introduced me to some software tools he has been developing. His company Codified Genomics has developed a variant analysis software that allows exploration of one’s genomic variants. All you need is your BAM or VCF files.
What’s that? …You don’t know what is a BAM file, …or a VCF?!!!
Dont worry, lets decode the jargon. In the clinphen journey to understand my clinical predilections, predispositions, and pathos, I found myself getting immersed into the intricacy of the end-to-end solution in genomic data acquisition and interpretation. What happens when you spit in a tube and put it in the mail? A lot of stuff! I came across an amazing guide to understanding the industry space behind genomic sequencing, the Enlightenbio Report. This help me get a tightly-focused view on the process of understanding one’s DNA.
That first box is what happens after you spit in the tube. The chemicals in the tube react with the cellular material in the spit to help stabilize it and prevent its degradation. This allows one to send the sample at room temp to the lab. On the receiving, the lab initiates a protocol to isolate the DNA that comes from the mouth epidermal cells that slough off into your spit. DNA is manipulated in such a way that it can go onto a microchip slide and set of DNA sequencing chemistry reactions are used to read out the DNA in small segments of sequence. Each of the millions of sequence segment reads is recorded as a fastq file. The fastq read segments are compared and aligned to a reference genome to make a BAM file. The BAM file alignments are processed to detect where sequence variation occurs, which is recorded as a VCF file. VCF files are analyzed by comparison to databases and assessments are made of each variant’s potential for pathogenicity. The assessment data is generally provided as a report to the clinician (or the intrepid genome wanderer such as myself). This report takes the raw data and massages it into a format for easier understanding of what is the baggage of one’s genome.
1604 suspect variations in my genome
Matthew helped me upload my VCF files into the Codified program. Next, he showed me how to wander around sifting the data by various aspect such as allele frequency, dominant and recessive status. known pathogenic genes, etc. The upload to Codified indicates I have exactly 1604 suspect variations occurring at an appreciable fraction of the reads and at positions inside, or in close proximity to, the coding sequence of my genes. These variants are suspect because they may alter protein function or levels of expression for the identified genes. If we just limit the dataset to changes that alter amino acid composition (non-synomous), we get 875 gene variations. Add back potential spicing issues, indels, and aberrant start and stop codon issues, we are back up to 1440 variants as genetic differences that are highly suspect for altering gene expression and function.
316 MIM variant hits in my genome!
What happens if we limit the entire 1604 to only those genes with recognized involvement in disease. We get 316 variants occurring in genes as recognized by the Mendelian-inheritance-in-Man (MIM) database for being disease-associated genes. When we restrict this set to coding issues only, we get 281 suspect variants.
I get clean bill of health when I get a physical exam, so can I disregard these 281 suspect variants?
One easy step is to filter for carrier only status. 111 variants are clearly identifiable as only autosomal recessive (AR). I would require two hits in each of the paired chromosome copies to have these be of concern. Since, no paired hits were detected, we can dismiss these genes as in need of my immediate concern. As a result, we are now only concerned about hits in genes with known autosomal dominant (AD) issues. These are the genes where only one bad hit is needed to render them pathogenic. Bottomline, 170 gene variants in my genome are worthy of further contemplation.
How frequent is frequent in my 170?
There is good rational to only be concerned about a hit in a gene with AD propensity, if it is rare in the population. The thinking is that if a variation is deleterious by itself (AD), it cannot be tolerated at a high level in the human population. Contrast this to the recessive (AR) variants (also called “alleles” when talking about frequency). My known AR pathogenic variant in the CFTR gene is in the human population at 0.0014 minor allele frequency (MAF). This high allelic frequency is tolerated in the human population because you need two hits in each gene copy in order to have a syndromic issue. Autosomal dominant alleles must have much lower frequency. If we cull the 170 for variations that occur at 0.00001 MAF or lower, we get 53 gene-codon-altering variations to be concerned about. Examining the list manually gave me 17 genes for which I hold varying degrees of concern, of which I list the top 10:
None are in the ACMG59
In a prior blog post, I described the list of genes for which can be included in a clinical report as a secondary findings. These are allowed in a report because these 59 genes have known actions that can be taken to mitigate their negative health effect. None of my genes of concern are in this group, so the immediate actionability is absent for my findings about the baggage in my genome. In fact, the genes I am listing as genes I am concerned about, but they actually do not significantly bother me that much. I am still alive and in good health. If I had pathogenic variations in these genes the negative health consequence, they should have manifest many years ago. Nevertheless, the three for which I hold highest concern are CACNA1S, LGI1, and RTN2.
The variation in CACNA1S (p.R419H) may sound like a benign, and it is a conservative change in amino acid composition, but it occurs in a highly-conserved region. It is present as an Arginine (“R”) in humans, mice, fish, flies and worms. This invariant use of R implies protein function will be compromised when the position is substituted with a histidine. The LGI1 (p.A253T) variant is also a conserved amino acid change, but it is in a less conserved region. This lack of complete conservation indicates this position might tolerate an Arginine to Threonine change. The RTN2 is complex variant. It does two significantly alarming changes. It makes a dramatic Leucine to Arginine change in the 4th exon up from the end of the protein. It also occurs immediately adjacent to splice junction acceptor site. This alteration of splicing region suggest it could lead to improper splicing in a highly conserved region of the protein and thus create a defective protein.
It is likely that all three of these genes yield a protein of messed up function. But what is not clear is the type of mess-up. Are they leading to loss-of-function (LOF) activity? Or do they lead to dominant gain-of-function (GOF)? These variations are most likely in the LOF category. Otherwise, I would almost certainly be dealing with the disease symptoms that the GOF variant’s manifest. Yet this is just supposition – a hypothesis. We don’t yet have solid evidence for what is going on.
How could we get final answer for if these variations are these pathogenic or not?
To get precision answers, we could model all of these variants in C elegans, For the CACNA1S and the RTN2, their high conservation from human to worm would allow direct modeling in the worm’s homologous position of the worm’s native gene (“Native locus”).
Our prior work with full gene humanization indicates more congruent results occur if we first swap in a human gene for the native gene locus (“Humanized Locus”) and then install variant. The use of a humanized locus allows modeling of any variant, whether it is highly conserved or not across many species. So far, in our studies all known pathogenic variants behave with deviant behavior, but only when put into humanized systems. Contrast this to insertion in native locus – some known pathogenic alleles did not create detectable deviance of behavior!
For the 3 genes to which I am concerned, all are of favorable size that the human sequence can be easily optimized and installed for expression from the worm’s native locus (“Humanized” animal). If we can observe that the human gene can rescues loss of function, we will know we are off-to-the-races and can study variant biology in a gene-humanized system. The humanized animals will be precision proxies serving as clinical avatars of the patient condition.
CACNA1S is a drugable target. The creation of a humanized system expressing CACNA1S as gene replacement of egl-19 gene would generate a platform for drug discovery. The patient variants might be responsive to calcium channel blockers, such as benzothiazepines, phenylalkylamines and 1,4-dihydropyridines. The end result, a highly-personalized medicine approaches would be achieved that finds drug treatments specific to the patient’s genetic pre-conditions.