Jingjing Huang - Authorea

Artificial intelligence continues to revolutionize various domains, with large language models (LLMs) pushing the boundaries of what machines can understand and generate. Evaluating the intellectual and linguistic capabilities of LLMs using standardized intelligence tests like the Wechsler Adult Intelligence Scale (WAIS) provides a novel and significant approach to understanding their cognitive strengths and limitations. This research presents a comprehensive evaluation of Baidu Ernie and OpenAI ChatGPT, comparing their performance in IQ tests and Chinese linguistic tasks. The IQ assessments revealed that OpenAI ChatGPT achieved a marginally higher composite IQ score, excelling particularly in verbal comprehension and working memory. Baidu Ernie demonstrated superior performance in cultural appropriateness and linguistic accuracy, reflecting its strong alignment with the Chinese language and cultural context. The study involved translating the WAIS into Chinese, integrating multimodal inputs, and applying rigorous statistical analyses to ensure robust and reliable results. The findings demonstrate the distinct strengths of each model, with OpenAI ChatGPT showing versatility in handling diverse textual data and Baidu Ernie excelling in culturally relevant and grammatically precise responses. The implications for future development of LLMs emphasize the importance of contextually relevant training data and the integration of multimodal capabilities to enhance cognitive and linguistic performance. This evaluation framework offers valuable insights for advancing artificial intelligence, guiding future research and development towards more intelligent, adaptable, and culturally aware language models.