We used SCOPe (dir.cla.scope 2.07) database for structural domain definitions. Entries or proteins having either single domain or only single domain available in the database were removed. Further, some unsuitable SCOPe classes (such as low resolution protein structures, peptides, designed proteins, and artifacts belonging to classes I, J, K, L, respectively) were removed. For the analyses to be conducted on a non-redundant protein set, a 40% sequence identity was set for clustering proteins using CD-HIT. The resulting entries were filtered for monomeric proteins solved by X-ray crystallographic method in RCSB filter using parameters like asymmetric unit, biological unit, experimental method, and structures with 3Å or better resolution. Proteins having only two domains were next alone considered through SCOPe definitions (only continuous domains were taken). Finally, the structure having the best resolution was taken as the representative structure for the RCSB entries of proteins. The various filtering steps for dataset creation are summarized in Figure 1.