3.6 Permanent domains structurally prefer similar folds
Different types of domain interactions in a protein chain might have some influence on the anatomy of the protein structure landscape. Hence, we next explored a few of the structural aspects which could be discriminated by permanent and transient domain interactions within a protein. Firstly, we aimed to look at their preferences to have repeats. Repeats can be of two types, viz, sequence repeats and structural repeats. Structural repeats can be further classified into different classes. Using different databases (see methods) to map the proportion of proteins in our dataset to have repeats, we found a few proteins in both repeat types where proteins having permanent domain interactions showed a little more preference for sequence and structural repeats. However, this observation cannot be relied upon due to the sparse number of proteins. Among the structural repeats, the repeating units (domains) of bead-on-string repeats (class-IV) are thought to either interact loosely or not interact, which could have been interesting examples of transient domains in multi-domain proteins. However, from the proteins having at least one structural repeat containing domain, there was no protein which belonged to this class. This could be due to the limited amount of information in the database or due to the limited number of domains in our study to represent multi-domain proteins. Secondly, to overcome this limitation, we defined homodomains, where both domains have same class, fold, and superfamily according to SCOPe. Thus, these domains will have similar architecture and are evolutionarily related to each other, which are supposed to be originated by duplication. Using such a definition, we observed a comparatively higher proportion of permanent domain containing proteins to have homodomains, 37.3% in comparison to 28.6% of homodomains in the dataset. Although these homodomains may not be true tandem repeats, such domains can provide functional and structural advantages to the proteins having permanent domains due to evolutionary pressure and topological constraints, respectively. Thirdly, to investigate their structural constraints, we explored their fold distribution in homodomains. We found that proteins having permanent homodomains have a comparatively lower number of unique folds than transient homodomains, which could signify the capability to re-use folds. This suggests that if domains interact permanently in a protein, there is a greater chance of finding another interacting domain of common ancestry and similar structural topology. This observation is similar to the observations of PPI, where obligate PPI tends to have more homo-DDIs. When we considered the whole dataset to look into the number of unique folds, both permanent and transient domain pairs showed a similar count of unique folds quantitatively. However, qualitatively, we observed a few biases of folds toward permanent and transient domain interactions (Table 1). Superfolds such as TIM beta/alpha-barrel, OB fold, and beta-grasp showed an inclination towards transient domains. On the other hand, 7-bladed beta-propeller, Ribonuclease H-like motif fold, and a few others showed inclinations towards permanent domains. Apart from that, superfolds like Immunoglobulin-like beta-sandwich, DNA/RNA-binding 3-helical bundle showed preferences for both permanent and transient domains. Other sparsely occurring folds (frequency: less than 5) showed little or no bias (Supplementary Table S3 and S4). These observations show the structural preferences of different domain interaction types and also justify how a limited number of folds are re-used to sample various protein structural landscapes in DDI following a power-law. This will enlighten the basic principles of domain interaction type prediction, given that we know the interacting domains in a protein, their topology, and evolutionary information.