RESULTS
Candidate yield using
CaSc
Between January 2016 and December 2020, we evaluated 2,977 cases of
individuals with NDD at the Leipzig Center for Rare Diseases and
clarified 1,055 of these (35.4%). In 1,922 (65.6%) individuals no
clinically relevant variant was identified. Of these, 1,192 cases
(62.0%) were reevaluated in a research setting, mostly as trios (1,038;
87.1%) and using CaSc for variant prioritization.
Overall, in 932 families we identified 1,561 candidate variants in 1,309
genes. From these we contributed to 43 publications describing novel NDD
entities and are currently working on the clinical and molecular
description of 91 candidate genes. A complete and regularly updated list
of our in-house candidates(Abou Jamra, Rami & Platzer, Konrad, 2022)
can be found online. In 569 families (61.1%), we identified one
candidate gene, in 218 (23.4%) we identified two candidate genes, in
137 (14.7%) we identified three to five candidate genes, and in 9
families we identified 6 or more candidate genes (most of these were
homozygous in consanguineous families).
Novel NDD genes are highly ranked in synthetic
trios
We applied vcfAutoCaSc to 158 (79 CEU and 79 ASH based) synthetic trios,
containing novel NDD-causing variants from recent publications (Figure
2a). The number of variants remaining after prefiltering varied between
5 and 26, depending on the base trio (CEU or ASH), the sex of the index
individual and the parents’ affected status.
CaSc of the inserted variants varied between 4.8 and 10.8 with a median
of 8.0, whereas the median of all other rare variants, which passed the
prefiltering in the trios was 4.3. Of all inserted variants, 75% had a
CaSc equal to 6.7 or higher. Comparing true positive and false positive
rates, a CaSc cutoff of 6.0 seems optimal (Figure 2b). In the CEU trio,
inserted variants were ranked as the top variant in median (mean rank
1.5; top rank in 47/79 (59.5%); second rank in 26/79 (32.9%); third
rank in 4/79 (5.0%)). In the ASH trio, inserted variants were ranked as
the second highest variant in median (mean rank 2.3; top rank in 0/79
(0%); second rank in 63/79 (79.7%); third rank in 7/79 (8.9%)). The
variant that consistently ranked first in the ASH trio was a de
novo variant in DNMT3A present in the HG002 sample.DNMT3A is a known NDD gene and the variant type would be typical
for the associated disorders (MIM #618724 or MIM #615879, compare
Figure S2 for detailed discussion). In 147/158 (93.0%) of simulations,
the inserted variant (or the compound-heterozygous candidate finding)
was in the three highest scoring candidates. Comparing the filtered
ranks of the inserted vs. all other trio specific variants passing the
prefiltering showed that the inserted variants were scored significantly
higher (ASH Trio: p = 7.5e-17; CEU Trio: p = 1.4e-05; Wilcoxon rank-sum
test). Compare File S1 for complete results of the synthetic trio
scoring.
Identification of novel candidate variants beyond confined
manual
evaluation
We applied vcfAutoCaSc to 93 trio exomes from our in-house cohort. The
male to female ratio in this sub-cohort was 57:36 (1.58). Eight (8.6%)
families self-reported consanguinity.
Automated filtering and scoring identified 309 unique candidate variants
(median 2.0, average 3.3 per case) in 289 genes. The maximum number of
candidates per case was 22 variants (1/93 cases; 1.1%) and the minimum
was zero variants (15/93 cases; 16.1%). Most (79/81 variants, 97.5%)
manually scored candidates were also scored by vcfAutoCaSc, including
all 16 SNVs and indels that had been reported to treating physicians
Figure 2c). In addition, 230 further candidates, which had not been
manually scored before, passed all pre-filtering steps and were
automatically scored (Figure 2c). In 15/93 cases (16.1%), no variant
passed the filters. In 42/93 (45.2%) cases, the variant evaluated
highest by AutoCaSc was also considered in the manual evaluation. In
35/93 cases (37.6%), it was not evaluated manually. In nearly half
(24/35; 68.6%) of these cases, there was no single manually evaluated
variant reported, but at least one scored by AutoCaSc. The overall
highest candidate score in these 24 cases was 10.0, while the median of
the highest score was 5.5 in these cases. The number of candidates in
these cases varied between one and 10 (median 2). A table with all
evaluated variants can be found in File S2.
Exemplary cavities in automatic
filtering
One of the 79 variants (ENST00000159111:c.288C>T
p.(Gly96=), KDM4B , CaSc 10.0) in the simulation experiment was
filtered out by slivar during prefiltering because it was not predicted
to lead to a change in the amino acid sequence of the protein (silent)
and the impact on protein function was predicted to be low. This variant
has been implied as pathogenic by Duncan and colleagues(Duncan et al.,
2020) because RNA analyses showed a splice donor loss combined with the
gain of a new donor site (r.287_317del) which was predicted to cause a
frameshift at protein level (p.Glu97Thrfs*66).
One of the 81 previously manually scored variants (ENST00000513312:
c.487C>T, p.(Arg163Trp), MCIDAS , CaSc 6.2) from the
in-house cohort was filtered out, because the corresponding geneMCIDAS already has an associated non-NDD phenotype (Ciliary
dyskinesia, #618695). Genes for which an associated phenotype is noted
in OMIM but which are not listed as NDD (candidate) genes in SysID are
scored by vcfAutoCaSc but not included in the ranking (compare filtering
steps in the Supplementary Methods). Secondly, a pair of variants in
compound heterozygous state ((1) ENST00000250937: c.446C>G,
p.(Pro149Arg) and (2) ENST00000250937: c.224T>G,
p.(Val75Gly), DOHH , CaSc 4.4) did not pass slivar VCF quality
filters as the read depth (DP = 17) of variant (1) was below our defined
cutoff of 20.
Example of candidate variants identified through automated
scoring
Several known NDD genes, such as SCN1A , PIK3CA ,CLTC , FOXP1 or SOX5 , were among the strongest
candidate variants (Table 1).
The highest scoring variant in a gene currently with unclear association
to NDD (according to SysID and PanelApp as of 2021-11-27) was a
homozygous LoF variant in CNTN2 (ENST00000331830:
c.940C>T, p.(Arg314*), CaSc 11.4). The predicted impact of
the variant was very high (CADD 34.0) and it did not occur in gnomAD.CNTN2 is highly expressed in the brain (GTEx), and directly
interacts (STRING) with known NDD associated genes like CNTNAP2(Strauss et al., 2006) and L1CAM (Rosenthal et al., 1992).CNTN2 encodes contactin-2 which, together with the CNTNAP2gene product, is responsible for organizing voltage-gated potassium
channels at juxtaparanodal regions. (Stogmann et al., 2013) Seizures are
described in knockout mice (MGI). Stogmann and colleagues (Stogmann et
al., 2013) have linked a homozygous frameshifting variant in a
consanguineous family to familial adult myoclonic epilepsy; of the five
affected siblings in this published family, at least two had borderline
intelligence, and two individuals had average neuropsychological test
scores. In our case, the male individual had epileptic encephalopathy
and developmental delay and his brother, segregating the LoF variant,
was similarly affected while a healthy sister did not carry the variant
in homozygous state. Taken together, an impairment of neuronal functions
through biallelic loss of CNTN2 seems plausible justifying the
high score and this gene as a good NDD candidate. Interestingly however
there is one one homozygous CNTN2 LoF variant
(c.1169C>A, p.(Ser390*)) listed in gnomAD. Together with
the high consanguinity in the case described here this may point to
additional aggravating variants (“dual diagnosis”) in this family
(e.g. homozygous variants in SELL with CaSc of 8.09, FRAS1a SysID primary gene and ERMARD a SysID candidate gene).
Other high scoring variants with previously undescribed or unclear
associations to NDD at the time of data retrieval were DLGAP1 ,HDAC4 , H3F3A , ANKRD17 , SMURF1 , NRXN3 ,PRICKLE1 and CASC5 (compare Table 2 for detailed
sub-scores). Members of our institute have contributed to a recent
publication associating heterozygous LoF variants in ANKRD17 with
a NDD entity based on a cohort of 34 individuals from 32 families
(Chopra et al., 2021, p. 17) and pathogenic variants in the histone 3
family (H3F3A and H3F3B ) have been in the meantime
published to cause a NDD entity with neurodegeneration (Bryant et al.,
2020). For a complete list of all candidates found in the real trios,
see File S2.