Discussion
Based on studies primarily with children, it has been hypothesized that autistic individuals show attenuated MSI, particularly for speech stimuli. Our results provide compelling evidence that the presumption this difference extends into adulthood is unfounded. Once controlling for age, we found no significant difference between autistic and non-autistic individuals in susceptibility to the McGurk/MacDonald illusion. Because this ran against the grain of the findings of the largest meta-analysis on the topic (Zhang et al., 2019), we confirmed that group was not a significant factor in our results using both age matched and Bayesian follow up analyses (see Supplementary Materials). While Zhang et al. (2019) concluded that the difference between groups actually increases in magnitude with age, it only included one study with adults (Saalasti et al., 2012), which the original authors did not take as evidence for a difference in the strength of the McGurk/MacDonald effect. Moreover, some findings suggest that differences between autistic and non-autistic individuals in MSI may be resolved during adolescence (Foxe et al. 2015; Taylor, Isaac, & Milne, 2010). Our findings with adults are consistent with the trajectory of improvement these results imply.
Instead of a difference according to group, we found evidence that the degree of MSI increases with age (with the average rate of the illusion nearly tripling from the youngest to oldest participants) for both autistic and non-autistic individuals. While an increase in the rate of the McGurk/MacDonald effect between younger and older adults has been detected in non-autistic participants (Setti et al., 2013), this is the first study comparing them in both autistic and non-autistic samples. The near perfect overlap of the correlations between age and MSI between groups serves as compelling evidence that while autistic children may not experience the development of visual influence on speech perception as early as their non-autistic peers, autistic adults do show comparable visual influence into their older years. These findings of similar age effects across adulthood resonate with recent longitudinal research suggesting similar cognitive aging profiles between autistic and non-autistic individuals (Torenvliet et al., 2023).
The reason for such a strong effect of age on the rate of the illusion could be reduced reliability of the auditory signal resulting from the progressive hearing loss common in aging, which often goes uncorrected (Walling & Dickson, 2012). The comparative reliability of auditory and visual inputs has been shown to affect the rate at which the McGurk/MacDonald effect occurs, and their respective influence shifts during development (Hirst et al., 2018). Additionally, MSI may also serve a compensatory role in speech perception as hearing declines. Both notions are supported by research showing an increase in MSI and visual dominance later in life (Diaconescu et al., 2013), as well as enhanced susceptibility to the McGurk/MacDonald effect associated with age-related hearing loss (Rosemann & Thiel, 2018; Stropahl & Debener, 2017). Cortical reorganization leading to increased functional connectivity between auditory and visual regions may facilitate these effects in those with age-related hearing loss (Puschmann & Thiel, 2017). It is encouraging that MSI appears to serve this compensatory role as effectively in autistic adults as non-autistic ones.
Another potential factor in differences between our findings and others is the threat of an attentional confound. Autistic children have been shown to demonstrate an atypical preference for non-social stimuli, viewing faces less frequently than their non-autistic peers (Gale et al., 2019; Vacas et al., 2021). Additionally, in two McGurk/MacDonald studies using eye-tracking, it was found that autistic children attended less to the pertinent areas of the face than non-autistic ones (Feng et al., 2021; J. R. Irwin et al., 2011), partially explaining differences in susceptibility to the illusion. Accordingly, studies that do not control for visual attention may overstate differences in MSI. A merit of our design is that while we do not directly measure eye movements, our simultaneity judgment task requires participants to attend to the mouth during trials. The performance of participants on this task, resembling a typical Gaussian distribution peaking near simultaneity, suggests that they were indeed attending to the faces. While the addition of eye-tracking would help to confirm this, in online experiments such as ours, where it is not possible (due to privacy reasons), the addition of an simultaneity judgment task provides an excellent means of reducing the risk of attentional differences being conflated with differences in MSI.
Beyond our findings with regard to the McGurk/MacDonald illusion, our results have illuminated much about the nuances of temporal processing and how they compare between autistic and non-autistic individuals. In many ways, our results remained consistent with standard findings in fundamental temporal processing research. Synchrony distributions followed a typical Gaussian shape, peaking with a slight visual lead, as is consistently found with audiovisual stimuli (Dixon & Spitz, 1980; Slutsky & Recanzone, 2001; Zampini et al., 2005). Incongruent stimuli were perceived as synchronous significantly less frequently than congruent ones, as was shown in other studies measuring simultaneity judgments for McGurk/MacDonald stimuli (Jertberg et al., 2023; Van Wassenhove et al., 2007). Rapid temporal recalibration was detected, with the PSS shifting according to the previous modality order (Van der Burg et al., 2013, 2015, 2018). However, our results also captured novel differences between groups.
Firstly, with regard to synchrony distributions, we found differences in the magnitude of the effect of congruency according to group. Both groups were less likely to perceive incongruent stimuli as synchronized, but this effect was particularly pronounced for the non-autistic ones. This was even true at 0 ms, when participants went from recognizing the physical simultaneity of the stimuli 91.8% to 46.2% of the time in the non-autistic group, and 89.7% to 50.8% of the time in the autistic group. This suggests a profound interference of phonetic incongruence on basic temporal processing, one that van Wassenhove et al. (2007) attributed to a weaker correlation between the facial kinematics (what is seen) and acoustic dynamic envelope (what is heard). But why does the magnitude of this difference vary between autistic and non-autistic individuals, when the disparity between these factors remains the same?
One interpretation might be that the autistic participants simply have a lower temporal resolution than the non-autistic ones, and therefore less room for interference in temporal processing. However, we did not replicate findings that the WPS, the common measure of temporal acuity, differs between groups, so this interpretation is not supported by our results. Alternatively, these differences could be due to impoverished lip reading ability, which has been found to account for some or all of the disparity in susceptibility to the McGurk/MacDonald effect in autistic children (Iarocci et al., 2010; Smith & Bennetto, 2007). Impoverished lip reading ability may be viewed as a weaker association between a viseme and its associated phoneme. This may translate into a diminished incongruence effect, as the autistic participants would be less sensitive to the difference driving it. That being said, were this the case, one might also expect an attenuated visual influence of the visemes, and hence a lower rate of the McGurk/MacDonald effect, in the autistic participants. As such, further research into the lip-reading abilities of autistic adults and their potential influence on temporal processing of audiovisual speech stimuli is necessary.
Delving deeper into the temporal dynamics at play, we did not detect the differences between groups in the WPS or rapid temporal recalibration formerly reported. With regard to the WPS, the largest meta-analysis to date examining potential differences between autistic and non-autistic participants found a consistent enlargement of its width among those with autism (Zhou et al., 2018), suggesting blunted temporal acuity. However, there was again a limited number of studies germane to the topic (with only four studies investigating the audio-visual WPS), most had small samples (ranging from 32-64 participants), and all of them focused on children. More recent research involving adults paints a different picture. Two studies (Weiland et al., 2022; Zhou et al., 2022) with larger samples of adults found no difference between autistic and non-autistic participants in the width of the WPS, suggesting that autistic individuals may also catch up in the honing of temporal processing by the time they reach adulthood. A very similar pattern emerges with rapid temporal recalibration, where smaller studies with younger participants found differences between autistic and non-autistic individuals (J. Noel et al., 2017; Turi et al., 2016), but the largest adult study did not (Weiland et al., 2022). However, the research here is more limited, and Weiland et al. (2022) also recruited from the NAR, so their sample may partially overlap with ours. Accordingly, further examination of potential differences between autistic and non-autistic individuals in the WPS and rapid temporal recalibration (and the possibility of their resolution) is warranted.
We did, however, detect a difference between groups in the overall mean PSS value. Autistic participants showed a greater mean PSS, irrespective of stimulus type, suggesting a heightened sensory preference for visual lead. This finding may also explain the difference in the magnitude of the congruence effect between groups, at least in part, given that it was largest with a slight visual lead. The two most obvious potential explanations for the PSS difference would be either faster processing of auditory information or slower processing of visual information in autism. Unfortunately, there is little research into this topic, at least with regard to speech stimuli. Furthermore, as always, the studies that do exist tend to focus on children. We could not find studies investigating differences in auditory processing speed between autistic and non-autistic individuals. However, those that have looked into visual processing speed show faster visual processing in autism, if anything (Foss-Feig et al., 2013; Samson et al., 2011). Granted, this was in more basic sensory processing, such as recognition of motion (Foss-Feig et al., 2013), but it certainly does not provide evidence for a unisensory processing speed explanation. More research should be done into processing speed for the different sensory modalities in autism, both with simple and complex stimuli.
An alternative explanation falls more in line with our discussion of differences in representation of visual speech stimuli. If autistic individuals have differently developed representations of verbal lip movements (as suggested by their weaker lip-reading abilities) and weaker associations between them and the sounds of language, as suggested by van Wassenhove et al. (2007), it stands to reason that it might take them more time to interpret lip-movements and integrate them with their corresponding vocal sounds. This might translate into a greater sensory preference for visual lead when processing speech stimuli. However, given the dearth of evidence provided by the literature on the alternative sensory processing speed hypotheses, this interpretation is highly speculative, and further research should explore the factors contributing to differences in PSS between autistic and non-autistic individuals. An excellent starting point would be to see whether this difference is unique to speech stimuli (supporting the notion that it is driven by differences in representation of verbal mouth movements) or whether it applies more broadly to simple audiovisual stimuli (suggesting a basic sensory processing speed explanation).
While the large size of our sample and sound experimental design are strengths of our study, it is, of course, not without its limitations. Firstly, this experiment was part of a larger online experimental battery, which placed constraints on the number of trials participants could complete. A larger number of trials and range of SOAs would have allowed more sophisticated analyses of temporal processing and higher resolution representation of participants’ WPS and recalibration effects. This also would have allowed us to investigate potential effects of congruence on recalibration and, conversely, of recalibration on the likelihood for participants to perceive the illusion. A related shortcoming of this study is that the time limitation meant we were unable to include unisensory trial types. These allow a researcher to quantify participants’ ability to identify visemes and phonemes on their own, which is important as autistic children have shown differences in their lip-reading abilities when compared to non-autistic ones (Foxe et al., 2015; Iarocci et al., 2010; J. R. Irwin et al., 2011; Smith & Bennetto, 2007; Taylor et al., 2010). While we did not find a difference in audiovisual speech processing anyway, we are unable to speak to the influence of unisensory factors in our findings. Future research should assess the degree to which autistic and non-autistic adults may differ in their perception of visemes and phonemes exclusively as well as in combination to better isolate any potential differences in MSI.
Finally, it must be noted that well-educated adults with comparatively high IQs are overrepresented in the NAR sample (Scheeren et al., 2022). It could be argued that our sample is therefore less likely to capture the segments of the autistic population that may suffer from the most severe deficits in areas like MSI. In particular, those with intellectual disabilities are underrepresented. That being said, the parity in IQ (as estimated by the ICAR) between groups does mean that our results can speak directly to differences resulting from the sensory factors related to autism that are not confounded by cognitive ones related to intellectual impairment. If differences between groups in MSI were found to be driven by the individuals who are underrepresented here, it would be unclear whether they were due to autism or intellectual disability.
In conclusion, our study has confirmed several findings with regard to basic temporal and multisensory processing, as well as challenged the degree to which reported differences between autistic and non-autistic children in these areas extend to adulthood. Our findings that MSI, temporal processing acuity, and rapid temporal recalibration all seem to be intact among autistic adults is highly encouraging given the essential role MSI has in speech perception and compensation for the unisensory deterioration that is inevitable with aging. Additionally, our novel findings with regard to differences in the degree of interference in temporal processing posed by incongruent stimuli and in the mean PSS values between groups are intriguing, and demand further research to disentangle alternative explanations. Understanding these phenomena is of paramount importance given the relevance of temporal and multisensory processing to higher order social factors and the proven efficacy of multisensory training. Pinpointing the age at which related interventions may be of use is crucial to their proper timing, which our findings suggest is prior to adulthood. Finally, our results underline the importance of expanding sample sizes and age ranges in autism research. Restricting our focus to children does a disservice to the vast majority of individuals with autism and leads to a limited understanding of the broader trajectory of this developmental condition, which can only be broadened by giving autistic adults the attention they deserve.