1 | INTRODUCTION
The past decade has been a time of rapid disease gene discovery, driven by the rise in popularity of next-generation sequencing (NGS) technologies and the increasing use of web-based collaborative data-sharing initiatives, such as the Matchmaker Exchange (Chong, 2015; Sobreira, 2015; Sobreira, 2017; Boycott, 2019; Bamshad, 2019). Matchmaker Exchange enhances data-sharing and characterization of novel gene-disease associations by connecting multiple genomic and phenotypic databases through a common application programming interface (API) (Sobreira, 2017). One of the components of Matchmaker Exchange is GeneMatcher (http://www.genematcher.org) which launched in 2013 to connect scientists and clinicians to share standardized data on candidate genes of interest and the associated phenotypes of individuals with presumed but unidentified Mendelian disorders. By sharing candidate gene information through GeneMatcher, researchers can assemble a critical mass of probands to support the characterization of new gene-disease associations (Sobreira, 2015). As more genes are associated with Mendelian disorders, the overall diagnostic rate of genomic technologies and the potential to identify new therapeutic targets inherently increase (Myers, 2018; Bamshad, 2019). More directly, disease gene discovery impacts patients by ending the notorious ‘diagnostic odyssey,’ providing more tailored clinical care, and informing reproductive risks.
Due to the high volume of testing, diagnostic laboratories that offer diagnostic exome sequencing (DES) are valuable partners for disease gene discovery (Bamshad, 2019). However, most of the data generated by DES are not adequately available for data-sharing and matchmaking (Boycott, 2019). Some diagnostic laboratories evaluate and report rare variants in uncharacterized genes as part of their DES protocol (Retterer, 2016; Farwell Hagman, 2017). At our laboratory, we have a standardized and validated scoring metric for evaluating gene-disease validity (GDV) (Smith, 2017). Genes with no clinical evidence or limited evidence are considered uncharacterized and those with a GDV score of moderate or higher are considered characterized. Both characterized and uncharacterized genes may be reported if meeting our DES reporting criteria and have strong evidence for their association with a proband’s phenotype (Farwell Hagman, 2017). Reporting criteria for uncharacterized candidate genes can vary widely between diagnostic laboratories with published reports of 5.8-24.2% of DES cases having a reported candidate gene (Farwell Hagman, 2017; Retterer, 2016).
Because GDV scores are based on a gene-disease relationship, having access to comprehensive phenotypic data ideally in the form of clinical notes that summarize the salient points of the medical history are crucial for accurately assessing what genes may be relevant for a proband (Seaby, 2020). Genes that meet reporting criteria for our uncharacterized genes are entered into GeneMatcher on a rolling basis. This process allows us to enter high-confidence, potentially disease-causing variants representing the strongest candidates and is consistent with the “gene-to-patient” model proposed by Seaby et al. (2021) to reduce the burden of sifting through large volumes of unvetted variants (“analytical noise”). A thoughtful approach to identifying what variants are the most likely to be disease-causing in a proband is needed before submitting to GeneMatcher to ensure the highest positive outcomes to matches. This in turn leads to newly published data which ultimately can lead to gene characterization (Figure1).
Rates of disease gene discovery have steadily increased over time with a spike occurring as the adoption of NGS technologies became more prominent. However, the rates of publications reporting these discoveries are not keeping up (Bamshad, 2019). The elusive gene-disease relationships that remain to be described may be due to several factors, including complex inheritance or the difficulty in ascertaining probands with extremely rare disorders. Publications may be delayed until the collection of a large enough cohort with robust clinical data curation and paired functional studies. This may be hindering the characterization of gene-disease associations in extremely rare cases with highly specific clinical findings that are less conducive to cohort studies. Moving forward, participation by commercial laboratories in these data-sharing initiatives is even more imperative to help identify the elusive, ultrarare diagnoses.
Here, we report our laboratory’s experience with GeneMatcher, how it has impacted characterization of gene-disease associations, and collaborations for additional research investigations into the clinical validity of the reported gene-disease associations.