SuperLooper is an application for filling in gaps in protein structures or replacing parts of protein structures by others,
e.g. in the framework of comparative/homology modelling. The application is based on a comprehensive compilation
of protein segments from the PDB. Moreover, an analogous collection of membrane protein loops is integrated.
When a gap in a protein structure is specified, a fast database search is performed. Output is a list of protein
segments that fit into the gap with a tolerance of 0.75 Å. On top of this list, original loops from membrane proteins are suggested.
SuperLooper provides several features allowing the user to
select the appropriate segment and to finally create a new PDB-file containing the structure with the new built-in protein
segment.
The membrane planes calculated with help of the algorithm
TMDET kindly provided by
[
2] indicate the transition from the lipophilic to the polar milieu.
Loops could be built in below, within or above this plane. We recommend the user of
SuperLooper to select those loop conformation, where the majority of the hydrophobic
or polar loop residues are placed below or above the membrane plane, respectively.
In order to find a suitable loop conformation, the C- or N-terminal enlargement of the loop could be helpful.
Fig. 3: Loop residues capping transmembrane helices are regularly found in certain structural motifs that
indicate the exact positioning of the helix cap relative to the lipid bilayer [
1].
The most common N-terminal motif is the Gly motif where the residues in N-2 and N-3 position are exposed
to the solvent. If polar residues are found in these positions (
g), the according loop will be exposed
to the polar, otherwise (
G), to the lipophilic milieu.
Detailed results for SuperLooper benchmarks with the test sets of
Rossi et al. (2007) [
7]
and Fiser and Sali (2003) [
8] are provided here.
Fig. 4: Diagrams show the average global RMSD of the top ranked loops for each loop length (overall results)
as well as the top rank global RMSD for every single loop of the
Rossi test set,
as well as the overall results for the Sali test set.
The benchmarks were generated using version 2 (red line) or version 3 (blue line) of LIP (see below).
The two tables show the results of the benchmark testsets for loops of length 11 and 12 with the additional
criterium of less than 90 percent homology.
Datasets and Benchmarks
Here, the results of SuperLooper benchmarks are provided in numbers. Loop test sets from two different publications were used:
Rossi et al. (2007) [
7], Fiser and Sali (2003) [
8].
File names refer to the respective test sets,
the numbers (1_...-3_...) in the file names denote different versions of the benchmarks as follows:
version 1 - whole list of loop candidates retrieved from the LIP data base and ranked according to the score,
version 2 - top ranked loops using the LIP data base excluding original pdb file,
version 3 - top ranked loops using the LIP data base excluding original pdb file and homologous (*) proteins.
(*) homologous proteins are here defined as proteins with similar amino acid sequences according to the definition in the LIP paper [
3].
Click here to get information about the
contents of the files.
The first 4 values are on a separate line in the file with the complete data (starting with 1_ for the file name) and are the first 4 columns in the summarizing files (starting with 2_ or 3_):
Oriprot - 4-letter-code of the pdb-file containing the original tested loop
LoopStart - first amino acid number of the tested loop
LoopEnd - last amino acid number of the tested loop
OriSeq - original sequence of the tested loop
Nr - serial number of the proposed loop, only based on the score without respecting the rest of the protein (e.g. clashes) (rank of the chosen loop)
prot - 4-letter-code of the pdb-file containing the proposed loop
hom - one character that describes the sequence similarity between the chains of the proteins containing the tested and the proposed loop: I - both sequences are identical, or one contains the other; M - only slight differences between the sequences (mutation); D - larger differences between the sequences (different); U - the proposed loop belongs to a protein that was meanwhile removed from the pdb
score - score of the proposed loop, without respecting the rest of the protein (e.g. clashes)
rms_s - root mean square deviation of the stem atoms (original against proposed loop)
LoopSeq - sequence of the proposed loop
mindist - minimal distance between the backbone of the proposed loop, and the backbone of the rest of the original protein; this distance should not be below 2.4 A
maxdist - maximal distance between the most distant amino acid (backbone) of the proposed loop, and the backbone of the rest of the original protein; this distance should not be larger than 4+4.5*ln(loop length)
rms_g - (global) root mean square deviation between the backbones of the proposed and the original loop when the proposed loop is built in the original protein
rms_go - (global) root mean square deviation between the backbones without the oxygen of the proposed and the original loop when the proposed loop is built in the original protein (this was a criteria when testing loops by other authors)
rms_l - (local) root mean square deviation between the backbones of the proposed and the original loop when both loops are superposed
rms_lo - (local) root mean square deviation between the backbones without the oxygen of the proposed and the original loop when both loops are superposed (this was a criteria when testing loops by other authors)