Figure 2 The pipeline of QSPR model construction
(i) The first step is unique representation of the molecular structure,
which is vital for reliable property prediction.43, 44Here, the structures of polyesters are approximated by RRU.
(ii) Subsequently, norm descriptors are adopted to characterize the
RRU-based polymer structures, which contain details about the properties
and topological connections of each atom. Among them, the topological
connection relation is represented by step matrix (MS ), showing
the position relationship of each atom in a molecule. In this work, 10
basic step matrices (i.e., MS F,
MS A, MS B,
MS C, MS AB,
MS ABC, MS bon,MS ABC_aro, MS ABC_cyc,
and MS bon_cyc) are derived according to the
definitions by Eqs (S1)-(S10) in Supporting Information.
The property information of an atom refers to some basic properties of
the atom (e.g. ionization energy, and number of outermost electrons),
which are expressed in the form of a property matrix (P ), as
shown in Table S1. As such, the atomic distribution matrix (M ) is
generated by combining MS and P according to Eq. (1).
Finally, the normal descriptors are obtained by using 7 norm indexes,
whose formulas are expressed by Eqs. (S11)-(S17) in Supporting
Information.
(iii) The dimensionality of the norm descriptors is then reduced by
using bidirectional stepwise regression. Ultimately, 29 norm descriptors
are screened out. Lastly, the QSPR model between chemical structures and
targeted properties is developed through MLR.