2.7 Repeats analysis
Uniprot was used to know the presence of sequence repeat containing proteins in the dataset. RepeatsDB was used to get structural repeats populating at least one domain in proteins in the dataset. We used SCOPe “sccs” id till superfamily level to define homodomain containing proteins and used fold information to get folds of domains.