How is an interface defined?

A complete residue is part of an interface if at least one atom of the aminoacid is located within a range of 4.5 Angstroem of any atom of the interacting domain or chain. One part of an interface must consist of at least 5 C-alpha atoms in the case of protein chains. In the case of nucleic acids the relevant atoms are calculated on the basis of the phosphor atoms of the RNA/DNA-backbone.

What are biounits?

The primary coordinate file deposited in the PDB generally contains one asymmetric unit. The asymmetric unit is the smallest portion of a crystal structure to which crystallographic symmetry can be applied to generate one cell. The biological molecule (biounit) is believed to be the functional unit of the protein. Frequently those units can be assumed or calculated when additional information is available. The biological units of many proteins are deposited in a separate section of the PDB database and can be used for interface calculations. More information about biounits

In which way are redundant interfaces excluded in the download section?

The redundancy is excluded in two different ways, by structure and by sequence. The sequential clustering is based on the Cd-hit program. The structural clustering is defined by the protein families and superfamilies of the SCOP classification. The database classifies proteins by domain architecture.

Which settings in the download section are best for my own research?

The selection of the datasets depends on the type of interactions (protein-protein or protein-nucleic acids) and the level of diversity that is desired. Sequence identity of maximal 50% results in a higher diversity than the setting to 95%. The default settings include interfaces of domain-domain interactions as well as interfaces between interacting chains. All interfaces of chains that were already treated by the SCOP domain interfaces are excluded by default. This procedure results in a high number of interfaces that are still diverse enough for statistical analysis.

What is meant by "show conservation" in Jmol?

The conservation of protein sequences is defined by the mutation rates at each amino acid position. For JAIL this information was retrieved from ConSurf. ConSurf is a derived database merging structural and sequence information.

Database scheme