Figure 2 The pipeline of QSPR model construction
(i) The first step is unique representation of the molecular structure, which is vital for reliable property prediction.43, 44Here, the structures of polyesters are approximated by RRU.
(ii) Subsequently, norm descriptors are adopted to characterize the RRU-based polymer structures, which contain details about the properties and topological connections of each atom. Among them, the topological connection relation is represented by step matrix (MS ), showing the position relationship of each atom in a molecule. In this work, 10 basic step matrices (i.e., MS F, MS A, MS B, MS C, MS AB, MS ABC, MS bon,MS ABC_aro, MS ABC_cyc, and MS bon_cyc) are derived according to the definitions by Eqs (S1)-(S10) in Supporting Information.
The property information of an atom refers to some basic properties of the atom (e.g. ionization energy, and number of outermost electrons), which are expressed in the form of a property matrix (P ), as shown in Table S1. As such, the atomic distribution matrix (M ) is generated by combining MS and P according to Eq. (1). Finally, the normal descriptors are obtained by using 7 norm indexes, whose formulas are expressed by Eqs. (S11)-(S17) in Supporting Information.
(iii) The dimensionality of the norm descriptors is then reduced by using bidirectional stepwise regression. Ultimately, 29 norm descriptors are screened out. Lastly, the QSPR model between chemical structures and targeted properties is developed through MLR.