RESULTS

Candidate yield using CaSc

Between January 2016 and December 2020, we evaluated 2,977 cases of individuals with NDD at the Leipzig Center for Rare Diseases and clarified 1,055 of these (35.4%). In 1,922 (65.6%) individuals no clinically relevant variant was identified. Of these, 1,192 cases (62.0%) were reevaluated in a research setting, mostly as trios (1,038; 87.1%) and using CaSc for variant prioritization.
Overall, in 932 families we identified 1,561 candidate variants in 1,309 genes. From these we contributed to 43 publications describing novel NDD entities and are currently working on the clinical and molecular description of 91 candidate genes. A complete and regularly updated list of our in-house candidates(Abou Jamra, Rami & Platzer, Konrad, 2022) can be found online. In 569 families (61.1%), we identified one candidate gene, in 218 (23.4%) we identified two candidate genes, in 137 (14.7%) we identified three to five candidate genes, and in 9 families we identified 6 or more candidate genes (most of these were homozygous in consanguineous families).

Novel NDD genes are highly ranked in synthetic trios

We applied vcfAutoCaSc to 158 (79 CEU and 79 ASH based) synthetic trios, containing novel NDD-causing variants from recent publications (Figure 2a). The number of variants remaining after prefiltering varied between 5 and 26, depending on the base trio (CEU or ASH), the sex of the index individual and the parents’ affected status.
CaSc of the inserted variants varied between 4.8 and 10.8 with a median of 8.0, whereas the median of all other rare variants, which passed the prefiltering in the trios was 4.3. Of all inserted variants, 75% had a CaSc equal to 6.7 or higher. Comparing true positive and false positive rates, a CaSc cutoff of 6.0 seems optimal (Figure 2b). In the CEU trio, inserted variants were ranked as the top variant in median (mean rank 1.5; top rank in 47/79 (59.5%); second rank in 26/79 (32.9%); third rank in 4/79 (5.0%)). In the ASH trio, inserted variants were ranked as the second highest variant in median (mean rank 2.3; top rank in 0/79 (0%); second rank in 63/79 (79.7%); third rank in 7/79 (8.9%)). The variant that consistently ranked first in the ASH trio was a de novo variant in DNMT3A present in the HG002 sample.DNMT3A is a known NDD gene and the variant type would be typical for the associated disorders (MIM #618724 or MIM #615879, compare Figure S2 for detailed discussion). In 147/158 (93.0%) of simulations, the inserted variant (or the compound-heterozygous candidate finding) was in the three highest scoring candidates. Comparing the filtered ranks of the inserted vs. all other trio specific variants passing the prefiltering showed that the inserted variants were scored significantly higher (ASH Trio: p = 7.5e-17; CEU Trio: p = 1.4e-05; Wilcoxon rank-sum test). Compare File S1 for complete results of the synthetic trio scoring.

Identification of novel candidate variants beyond confined manual evaluation

We applied vcfAutoCaSc to 93 trio exomes from our in-house cohort. The male to female ratio in this sub-cohort was 57:36 (1.58). Eight (8.6%) families self-reported consanguinity.
Automated filtering and scoring identified 309 unique candidate variants (median 2.0, average 3.3 per case) in 289 genes. The maximum number of candidates per case was 22 variants (1/93 cases; 1.1%) and the minimum was zero variants (15/93 cases; 16.1%). Most (79/81 variants, 97.5%) manually scored candidates were also scored by vcfAutoCaSc, including all 16 SNVs and indels that had been reported to treating physicians Figure 2c). In addition, 230 further candidates, which had not been manually scored before, passed all pre-filtering steps and were automatically scored (Figure 2c). In 15/93 cases (16.1%), no variant passed the filters. In 42/93 (45.2%) cases, the variant evaluated highest by AutoCaSc was also considered in the manual evaluation. In 35/93 cases (37.6%), it was not evaluated manually. In nearly half (24/35; 68.6%) of these cases, there was no single manually evaluated variant reported, but at least one scored by AutoCaSc. The overall highest candidate score in these 24 cases was 10.0, while the median of the highest score was 5.5 in these cases. The number of candidates in these cases varied between one and 10 (median 2). A table with all evaluated variants can be found in File S2.

Exemplary cavities in automatic filtering

One of the 79 variants (ENST00000159111:c.288C>T p.(Gly96=), KDM4B , CaSc 10.0) in the simulation experiment was filtered out by slivar during prefiltering because it was not predicted to lead to a change in the amino acid sequence of the protein (silent) and the impact on protein function was predicted to be low. This variant has been implied as pathogenic by Duncan and colleagues(Duncan et al., 2020) because RNA analyses showed a splice donor loss combined with the gain of a new donor site (r.287_317del) which was predicted to cause a frameshift at protein level (p.Glu97Thrfs*66).
One of the 81 previously manually scored variants (ENST00000513312: c.487C>T, p.(Arg163Trp), MCIDAS , CaSc 6.2) from the in-house cohort was filtered out, because the corresponding geneMCIDAS already has an associated non-NDD phenotype (Ciliary dyskinesia, #618695). Genes for which an associated phenotype is noted in OMIM but which are not listed as NDD (candidate) genes in SysID are scored by vcfAutoCaSc but not included in the ranking (compare filtering steps in the Supplementary Methods). Secondly, a pair of variants in compound heterozygous state ((1) ENST00000250937: c.446C>G, p.(Pro149Arg) and (2) ENST00000250937: c.224T>G, p.(Val75Gly), DOHH , CaSc 4.4) did not pass slivar VCF quality filters as the read depth (DP = 17) of variant (1) was below our defined cutoff of 20.

Example of candidate variants identified through automated scoring

Several known NDD genes, such as SCN1A , PIK3CA ,CLTC , FOXP1 or SOX5 , were among the strongest candidate variants (Table 1).
The highest scoring variant in a gene currently with unclear association to NDD (according to SysID and PanelApp as of 2021-11-27) was a homozygous LoF variant in CNTN2 (ENST00000331830: c.940C>T, p.(Arg314*), CaSc 11.4). The predicted impact of the variant was very high (CADD 34.0) and it did not occur in gnomAD.CNTN2 is highly expressed in the brain (GTEx), and directly interacts (STRING) with known NDD associated genes like CNTNAP2(Strauss et al., 2006) and L1CAM (Rosenthal et al., 1992).CNTN2 encodes contactin-2 which, together with the CNTNAP2gene product, is responsible for organizing voltage-gated potassium channels at juxtaparanodal regions. (Stogmann et al., 2013) Seizures are described in knockout mice (MGI). Stogmann and colleagues (Stogmann et al., 2013) have linked a homozygous frameshifting variant in a consanguineous family to familial adult myoclonic epilepsy; of the five affected siblings in this published family, at least two had borderline intelligence, and two individuals had average neuropsychological test scores. In our case, the male individual had epileptic encephalopathy and developmental delay and his brother, segregating the LoF variant, was similarly affected while a healthy sister did not carry the variant in homozygous state. Taken together, an impairment of neuronal functions through biallelic loss of CNTN2 seems plausible justifying the high score and this gene as a good NDD candidate. Interestingly however there is one one homozygous CNTN2 LoF variant (c.1169C>A, p.(Ser390*)) listed in gnomAD. Together with the high consanguinity in the case described here this may point to additional aggravating variants (“dual diagnosis”) in this family (e.g. homozygous variants in SELL with CaSc of 8.09, FRAS1a SysID primary gene and ERMARD a SysID candidate gene).
Other high scoring variants with previously undescribed or unclear associations to NDD at the time of data retrieval were DLGAP1 ,HDAC4 , H3F3A , ANKRD17 , SMURF1 , NRXN3 ,PRICKLE1 and CASC5 (compare Table 2 for detailed sub-scores). Members of our institute have contributed to a recent publication associating heterozygous LoF variants in ANKRD17 with a NDD entity based on a cohort of 34 individuals from 32 families (Chopra et al., 2021, p. 17) and pathogenic variants in the histone 3 family (H3F3A and H3F3B ) have been in the meantime published to cause a NDD entity with neurodegeneration (Bryant et al., 2020). For a complete list of all candidates found in the real trios, see File S2.