How prevalent are Variants of Uncertain Significance?
ClinVar database for variant interpretation was analyzed for its levels of ACMG-AMP assessments. With help from the data dumps from ClinVar Miner, the yearly distribution of assessments was plotted. Since 2016 and shortly after the ACMG-AMP guidelines came out in 2015, the number of assessments assigned to the VUS category has grown rapidly. These are the variants that clinical genetics researchers have examined, but cannot decide if they are pathogenic or not.
How big will the VUS problem get?
To estimate how large the VUS problem will become, we must first understand how big is the human genome. Controversy abounds, but current estimate are there are 21,306 protein coding genes and 21,856 non-coding genes. To be conservative, and for simplicity sake, let us use 20,000 genes as the number. The next question is how many of these are disease associated. When we look to ClinVar the number of “genes with variants specific to one protein-coding gene” we get 7221 genes. More conservatively, we can look to ClinVar’s “gene_condition_source_id” which list 4242 genes as being associated with a diagnostic condition. This lower number is reinforced by OMIM in which the “Total number of genes with phenotype-causing mutation” is 4162 genes. These list have been growing rather steady at 5% per year, so in a few years the likely number of gene-disease associations will probably approach 5000 genes, or roughly 1/4 the human genome.
VUS problem may eventually approach 7 Million variants
A recent attempt to preload the human genome with pathogenicity assessment potential has been made. InterVar database applied ACMG-AMP guidelines to ~80,000,000 amino acid positions in the genome to provide a database for easier variant interpretation. Since at least 20% of these positions are likely to be in genes with known disease association, there are roughly 16,000,000 variants that will eventually occur in patient-derived genome sequencing. If the current trend of 44% VUS translates across that number, then there will be close to 7,000,000 variants in need of functional studies to resolve their pathogenicity.
A novel animal model systems for rapid variant interpretation
The team at Nemametrix just produced a wonderful set of preliminary data that we showed at the recent American Society of Human Genetics. It shows it is possible to use a training set of known benign and pathogenic alleles in a gene to “teach” a ML algorithm to determine if pathogenicity is present in a VUS. When applied to the STXBP1 gene, a set of 5 benign and 5 pathogenic was sufficient to train for segregation in an LDA plot and the Y75C was assessed as pathogenic.
Once this type of system is trained with a set of known pathogenic and benign variants, the assessment of pathogenicity can be achieved in a soon as 10 days from start of a VUS transgenesis project.