Introduction
Synthetic polymers are indispensable in our daily life.1, 2 Polyesters, in particular biodegradable polyesters, are widely used in automotive parts, medical apparatus, packaging products, electronic devices, and other fields owing to their good thermomechanical properties and biocompatibility.3-5 Polyesters are generally consisting of ester containing repeating units produced by esterification reaction between diacids and diols.6, 7Thus, combination of different functional diacids with various diols can yield an enormous space of polyester materials. As a result, it becomes a non-trivial task to design and synthesize polyesters with targeted properties.
Glass transition temperature (T g) of polymer governs the dynamic state of polymer chains, and further affects the performance and application domains. For example, the high-T g polyester with a rigid ring structure improves the thermal stability of polyester materials, aiming to provide bio-based polymers for the plastic consumer market.8-10 In addition, aliphatic polyesters with lowT g have been studied as environmentally friendly pressure-sensitive adhesives because of their low cost and potential biodegradability.11, 12 T g is therefore an essential indicator for determining the properties of polymers.
Given the large polymer design space, it is difficult, time-consuming, and ineffective to screen polymers with targeted properties (e.g., specific T g) through experimental procedures.13-15 To enable rapid polymer molecular design and high-throughput screening of ideal products prior to laboratory synthesis and analysis, data-driven alternatives,16-23 such as the quantitative structure-property relationship (QSPR) modeling24-28and machine learning (ML) approaches29-33 have been successfully used to predict the properties for diverse polymers.
In this regard, Wang et al. used successfully trained machine learning to predict the gas permeability of more than 11,000 homopolymers and found that the upper bound of CO2/CH4separation was exceeded by synthesizing two promising polymeric membranes.34 Recently, Tao et al. first studied the performance of 79 different models by combining polymer representation, feature engineering and ML algorithms.35, 36 They then designed millions of hypothetical polyimides by polycondensation of existing dianhydrides and diamines or diisocyanates, and built an ML model to predict a diversity of their properties and verified the predictive ability of the ML through molecular dynamics simulations. Finally, a new polyimide with excellent thermal stability was successfully synthesized experimentally. By trained machine learning models, Wang and Jiang have screened nearly 30,000 hypothetical polymers with fractional free volume (FFV) > 0.2, enabling the design of high FFV polymers. 37 Wang et al. present a method for designing high-temperature polymer dielectrics by combining tailored structural units, and the design method is justified by analyzing ML predictions and experimental results.38Chen et al. developed an ML model that accurately predicts different frequency-dependent dielectric constants(ϵ), and subsequently utilized the model to successfully design ten polymers with the desired ϵ andT g for application in the capacitor and microelectronics fields.39 Meanwhile, Lin et al. proposed using a material genome approach design and screening of new heat-resistant resin materials by define gene and extracting key features of properties.40-42 This strategy was subsequently used in the design of various high-performance polymers. These advances highlight the innovative potential of data-driven approaches in the design of polymers. However, rational design of polyester materials is still less explored. Therefore, it is promising to use data-driven methods to predict the target properties of polyesters and to design new polyesters to complement the existing library.
Herein, we report a data-driven strategy to enable the evaluation of the relationship between molecular structure and T gof polyesters and further guide the design of novel polyesters with specific T gs. The workflow is illustrated inFigure 1 . First, an multiple linear regression (MLR)-based QSPR model is developed by employing ring repeating unit (RRU)43, 44 to uniquely represent polyesters and norm descriptors for feature engineering. The predictability, robustness, and chance correlation of the model are evaluated by internal validation, external validation, andY -randomized analysis, respectively. We then construct a virtual library by designing over 95000 hypothetical polyesters by in-silico retrosynthesis. Later on, theT g prediction is performed by using the well-trained QSPR model. Ultimately, several polyesters with specificT g are synthesized and characterized to validate the data-driven polymer design strategy. This work puts the QSPR modeling approach a further step forward by expanding the application scope from properties prediction to model-based design of polymers.