The Erebus server.

Introduction

Erebus, which takes the name of omnipresent Greek god of darkness and shadow, is a protein substructure search server. Erebus server searches the entire PDB database for a match to a query structure, defined by a small group of atoms (current limit for query structure is 40 atoms). The distinct features of Erebus are:
  • Speed.
    Scanning of the entire PDB dataset takes about 1 hour and only weakly varies with query structure size.
  • Versatility.
    Query structure may include any atoms in any sequence, and atoms do not have to make up complete residues.

Usage

Preparing query structure

Erebus accepts as input list of atomic coordinates in standard format of Protein Data Bank. Erebus will read the following fields from the supplied PDB input:
  • Record type.
    Erebus reads ATOM and HETATM records for atom coordinates. Additionally (and only in the target PDB entries) Erebus recognizes MODEL and ENDMDL records used to indicate the presence of multiple models for a given PDB entry. Every model is matched independently and model number will be shown later in the list of matches.
  • Atom index.
    Erebus reads atom index and stores it to maintain atoms order in the output. Atom index is not used during the search.
  • Atom name.
    Erebus uses atom names to search a matching substructure in a target protein. Erebus requires that atoms in a target substructure have names exactly matching atom names in the query structure. Note, that comparison is case-sensitive and atom names in PDB entries are given in capital case. There are two exceptions to the exact matching rule:
    1. Atoms having names that begin with "H" (hydrogen atoms) are ignored during the search.
    2. Atoms that are chemically equivalent, but have different names according to standard nomenclature (symmetric atoms) can be treated by Erebus as equivalent atoms, if requested.
  • Residue name.
    Erebus reads and matches residue names similarly to atom names. Additionally, Erebus allows a special residue name - "ANY", which matches any other residue. This residue wild-card must be used with care, as it makes search less specific, and may increase number of matches beyond the current limit of 1000.
  • Residue index.
    Erebus uses residue index to distinguish atoms belonging to different residues, even when these residues are chemically identical aminoacids. Otherwise, Erebus does not impose any other residue-index-based constraints during the search.
  • Atomic coordinates.
    Erebus uses atomic coordinates to compute pairwise distances between the atoms. Only these pairwise distances are used in the following search process. The angular component of atom positions is thus discarded, and Erebus will find a substructure matching the query even if the former is rotated relative to the latter in absolute Carthesian coordinates.
  • Occupancy
    Erebus uses occupancy column to read assign relative weights to atoms. These weights are used when atoms are missing from the matched structure in order to adjust weight of the match. If the relative weight of atom is 0 it is effectively excluded from consideration.

Submitting search tasks

To begin a search Erebus needs the following information:
  1. Task name.
    Task name can be any text that may help you identify this particular search. It will be used to display your task in the queue or results tables.
    Task name can not be left empty. By default, it will be composed from the current date and time.
  2. Query structure
    You may upload a file or copy and paste a text in PDB format, as described in the previous section. After preliminary parsing atoms will be listed in the table below.
  3. Adjust parameters
    The atoms from your query structure will be shown here, giving you an opportunity to adjust residue names and/or atom weights.
    Additional parameters include:
    • Matching precision σ.
      This parameter determines how closely should a protein substructure match a query structure. When Erebus finds that two atoms in a protein are separated by a distance that is different from the corresponding distance in the query, Erebus assigns weight Wi < 1.0 to such a match. The weight of such a match is computed as

      Wi = exp[-(Δ r / σ)2]

      where Δ r is the difference between matching atom pair separations in a PDB structure and in the query structure.
      Thus, the smaller is σ, the higher is matching precision.
    • Weight threshold W
      This is the minimum weight of the matching protein substructure accepted by Erebus.
      The weight of the substructure consisting of N pairs is defined as

      W = Σi=1,N Wi

      where Wi is weight of a matching atom pair. Matching substructures having weight below W are not reported in the results table.

Results

For every found matching substructure Erebus reports list of atoms along with their corresponding residues. Matching accuracy is expressed both as a match weight W and as a root-mean-square distance between substructure atoms and corresponding query atoms. Currently, Erebus can export results in the following formats:
  • PyMOL script.
    This script can be used as input to th PyMOL visualization software. The generated commands will load the corresponding target protein and query structure, visibly mark matched atoms and residues and zoom onto the found substructure.
  • Ascii text.
    This plain text file lists indexes of atoms and residues of matching substructure along with the coordinates of query structure atoms spatially fitted to the atoms of matched substructure.

Erebus algorithm