I frequently get asked:
How do you model splice variations in your animal model systems?
Splice variation is an important consideration in genomic analysis of patient variations and it is often overlooked (PMID: 29680930). It is estimate that 15%–60% of human disease mutations are due to splicing defect ( PMID: 29304370). So, with close to 40% of disease causing variation likely being attributable to splicing defects, it becomes an important variation to be able to model in functional studies to determine if the variant is pathogenic.
But let’s first look at the process of splicing and what is known.
This complex process is managed in a complex way. Certain cell types will favor one form of splicing, while other tissues will select other forms. This is the natural splice isoforms variation that gives us more than just the number of genes in the genome to control biological output. In fact, this is part of the explanation for why a C. elegans nematode or a zebrafish, with roughly the same number of genes as a human, have such different levels of output complexity. Currently the number of functional isoforms in humans may be an order of magnitude more than what occurs in the nematode. Furthermore, the ways splice variation can take place gets bewildering quick.
How is splicing observed in the patient?
Layer on top of this the aberrant spice variations that can cause disease, and we have a tough interpretation problem. Thankfully RNAseq is providing a huge amount of diagnostic discovery for splice variation. We can compare the splicing patterns in healthy populations with a patient suspected of a genetic disease and visualize where the splicing is going wrong (PMID:28424332).
Modeling Splice Variation in Animal Models
To reduce the complexity of biology and yet bring more comparative biology relevance, often we can take a human cDNA sequence and use it to rescue the function of the animal’s version of the gene. To do this, we use CRISPR to remove the animal’s version of the gene (a gene “knock out”). Next we take a human cDNA sequence optimized for expression in the animal and either replace the deleted locus or express the sequence in trans (at a safe harbor site using the promoter that is either endogenous to the removed gene, or a promoter well established for appropriate tissue expression). In C. elegans, we have been pleasantly surprised that more than half the time for orthologs of at least 30% identity, we can get significant rescue of the loss of function seen in the knock out. In zebrafish, we have started applying the same techniques of gene replacement. The result, a set of gene humanized animals where the conservation of biology means we are looking at functional outputs that are highly similar.
Missense variations are conceptually easy to model. An amino acid change that is pathogenic (ex R235Q in STXBP1) is is installed with CRISPR using a simple donor homology that instructs the cell’s HDR to alter the DNA coding for Q (glutamine) into a code for R (Arginine) in our “wildtype” humanized locus.
But how do we mimic a splice variation?
It is actually quite simple. We create a donor homology that makes any splice form of interest. We are not interested in the mechanism to answer “if” it occurs – RNAseq already answers that. We are after functional consequence. We want to answer “does a particular splice form in question have a measurable defect compared to the normal splicing.”
Let’s look at one of the patient examples in detail.
In the red we have 4 patients with a collagen gene splice defect suspected of involvement in their diagnosis for Ullrich Congenital Muscular Dystrophy. Since all persons have two copies of the COL6A1 gene, we can see that one copy is splicing normally while the other copy is defective and its splicing brings in a pseudoexon. “The resulting inclusion of 24 amino acids occurs within the N-terminal triple-helical collagenous G-X-Y repeat region of the COL6A1 gene, the disruption of which has been well established to cause dominant-negative pathogenicity in a variety of collagen disorders” (PMID: 28424332)
Creating Knock-in for Animal Model of Disease
In regards to disease modeling of splice variations, we use a cDNA rescue approach. The variation seen in the patient is made as a plasmid coding for expression of a modified cDNA. This cDNA contains the human gene code that is suspected of creating an aberrant spice variation. Using CRISPR techniques, the segment coding for human DNA is inserted into the genome, typically at the orthologous locus of the animal.
Modeling in the C. elegans nematode.
To model the COL6A1, we would first seek to understand the phenotype from loss of function of the animal’s ortholog version of the human gene. For COL6A1, this is the C16E9.1 gene in C. elegans. This gene is not well studied in the nematode, but does show high expression in the alternative life state of dauer.
The first step is to make a gene knock-out to remove the C16E9.1 gene from the worm genome. Next, a series of functional assays are run to determine if a functional defect can be detected for the C16E9.1 knock-out as a loss-of-function allele. For essential genes, the ultimate manifestation of loss of function is lethality as a homozygote. In other genes critical genes will often manifest with functional defect after a battery of functional screens are performed. Once a defect in activity is observed, human cDNA can be introduced to see if rescue of function can be obtained. When rescue is obtained with human cDNA, we know we are looking at conserved biology for gene function between the animal and humans.
Once we have rescue of function, the fun begins. We can use CRISPR to put in the exact content that RNAseq indicates is occurring in the human gene. The pseudoexon seen in one copy of the patient’s chromosome pair can be made in the animal. Often if the patient variant is problematic from a loss of function perspective where haploinsufficiency drives disease. When a defect is made as a homozygote in the animal, the effect is usually a severe phentoype (often lethal) and is similar to what is seen in the gene knockout. Yet in the specific case from above with the pseudoexon in COL6A1, we are dealing with a dominant negative effect, so the defective splice not only disrupts this protein, it also causes the good copy to fail to function properly. Animals homozygous for the pseudoexon defect may actually have a less strong defect phenotype than when the animals are made as heterozygotes. Creation of the patient’s heterozygous condition is achieved by crossing the splice-variant-containing humanized animal model into the wild type humanized animal model and examining the cross progeny for defects in activity.
Modeling in the Zebrafish.
We can do similar modeling in zebrafish using the Tol2 system. In zebrafish there is one ortholog for the COL6A1. The col6a1 zebrafish gene has 55% sequence identity and 70% sequence similarity to humans. Like the work in the C. elegans nematode, we can remove the native gene and look for functional consequences. CRISPR techniques are used to create a knockout by inserting a stop codon early in the gene. If designed right, this results in loss of all expression for col6a1. Next we can measure the functional consequence of the gene knock-out by first trying to see if the animal can be made homozygous. If it is not lethal, the animal can be screened by a battery of assays to determine if a functional defect exists. Finding either lethality as homozygote, or observing a functional defect, allows testing for capacity of human COL6A1 cDNA to rescue function. A gene insertion approach using Tol2 is used to bring in the cDNA with an appropriate tissue-specific promoter. Rescue of function in specific tissues, for instance with the use of the 195 bp unc45b promoter for skeletal muscle expression (PMID: 27295336), will help elucidate the important roles of COL6A1 in dystrophy diseases.
The pseudoexon insertion defect seen in COL6A1 is a dominant negative variation. So, when a single copy of this gene is brought into the animal, it will have the capacity to suppress the activity of the unmodified copy of the gene. By inserting the cDNA with the patient into a safe harbor site we create a pseudo heterozygote. The dosage of the cDNA comes from two chromosomal positions while the wildtype locus provides expression of two copies of the normal gene. If the cDNA is dominant negative on its effect on the zebrafish gene, then defect of gene function will manifest.
Recap of Splice defect Modeling in Animal Models
In summary, the ability to model splice variants is done from a cDNA level. A modified cDNA rescue construct containing the human gene of interest is designed in three forms:
Positive Control (blue): The humanized wildtype cDNA provides a reference of the normal gene seen in healthy individuals.
Negative Control (red): A knockout deletion of the animal’s gene provides reference for full loss of function of the gene.
Test (yellow): A variant is tested for its functional activity. A range of activities is expected and depends on the pathogenic variant’s mechanistic role in disease pathology. It may be a dominant negative that creates a pathology worse than the loss of function allele because it binds to and causes bad behavior from the remaining good copy of the gene. Alternatively, the variant may cause loss of function. This will be either recessive and manifest as a homozygous, or it will be dominant and manifest by haploinsufficiency as a heterozygote. Finally, the variant of interest may cause a gain of function, which is typically manifest only the heterozygote.