Introduction
Synthetic polymers are indispensable in our daily
life.1, 2 Polyesters, in particular biodegradable
polyesters, are widely used in automotive parts, medical apparatus,
packaging products, electronic devices, and other fields owing to their
good thermomechanical properties and
biocompatibility.3-5 Polyesters are generally
consisting of ester containing repeating units produced by
esterification reaction between diacids and diols.6, 7Thus, combination of different functional diacids with various diols can
yield an enormous space of polyester
materials. As a result, it becomes a
non-trivial task to design and synthesize polyesters with targeted
properties.
Glass transition temperature (T g) of polymer
governs the dynamic state of polymer chains, and further affects the
performance and application domains. For example, the
high-T g polyester with a rigid ring structure
improves the thermal stability of polyester materials, aiming to provide
bio-based polymers for the plastic consumer
market.8-10 In addition, aliphatic polyesters with lowT g have been studied as environmentally friendly
pressure-sensitive adhesives because of their low cost and potential
biodegradability.11, 12 T g is
therefore an essential indicator for determining the properties of
polymers.
Given the large polymer design space, it is difficult, time-consuming,
and ineffective to screen polymers with targeted properties (e.g.,
specific T g) through experimental
procedures.13-15 To enable rapid polymer molecular
design and high-throughput screening of ideal products prior to
laboratory synthesis and analysis, data-driven
alternatives,16-23 such as the quantitative
structure-property relationship (QSPR) modeling24-28and machine learning (ML) approaches29-33 have been
successfully used to predict the properties for diverse polymers.
In this regard, Wang et al. used successfully trained machine learning
to predict the gas permeability of more than 11,000 homopolymers and
found that the upper bound of CO2/CH4separation was exceeded by synthesizing two promising polymeric
membranes.34 Recently, Tao et al. first studied the
performance of 79 different models by combining polymer representation,
feature engineering and ML algorithms.35, 36 They then
designed millions of hypothetical polyimides by polycondensation of
existing dianhydrides and diamines or diisocyanates, and built an ML
model to predict a diversity of their properties and verified the
predictive ability of the ML through molecular dynamics simulations.
Finally, a new polyimide with excellent thermal stability was
successfully synthesized experimentally. By trained machine learning
models, Wang and Jiang have screened nearly 30,000 hypothetical polymers
with fractional free volume (FFV) > 0.2, enabling the
design of high FFV polymers. 37 Wang et al. present a
method for designing high-temperature polymer dielectrics by combining
tailored structural units, and the design method is justified by
analyzing ML predictions and experimental results.38Chen et al. developed an ML model that accurately predicts different
frequency-dependent dielectric constants(ϵ), and subsequently utilized
the model to successfully design ten polymers with the desired ϵ andT g for application in the capacitor and
microelectronics fields.39 Meanwhile, Lin et al.
proposed using a material genome approach design and screening of new
heat-resistant resin materials by define gene and extracting key
features of properties.40-42 This strategy was
subsequently used in the design of various high-performance polymers.
These advances highlight the innovative potential of data-driven
approaches in the design of polymers. However, rational design of
polyester materials is still less explored. Therefore, it is promising
to use data-driven methods to predict the target properties of
polyesters and to design new polyesters to complement the existing
library.
Herein, we report a data-driven strategy to enable the evaluation of the
relationship between molecular structure and T gof polyesters and further guide the design of novel polyesters with
specific T gs. The workflow is illustrated inFigure 1 . First, an multiple linear regression (MLR)-based QSPR
model is developed by employing ring repeating unit
(RRU)43, 44 to uniquely represent polyesters and norm
descriptors for feature engineering. The predictability, robustness, and
chance correlation of the model are evaluated by internal validation,
external validation, andY -randomized analysis,
respectively. We then construct a virtual library by designing over
95000 hypothetical polyesters by in-silico retrosynthesis. Later on, theT g prediction is performed by using the
well-trained QSPR model. Ultimately, several polyesters with specificT g are synthesized and characterized to validate
the data-driven polymer design strategy. This work puts the QSPR
modeling approach a further step forward by expanding the application
scope from properties prediction to model-based design of polymers.