/** \page multifit_3sfd Modeling of 3sfd with multifit
\tableofcontents
\section intro Introduction
In this example, MultiFit is used to build a model of porcine mitochondrial
respiratory complex II (PDB id 3sfd), using crystal structures of its
4 constituent proteins, a cryo-electron microscopy (EM) map of the entire
complex, and information from proteomics.
(See also \ref emagefit_3sfd for building models using 2D EM class averages
instead of 3D maps.)
All steps in the procedure use a command line tool called
multifit.py. For full help on this tool, run from a command line:
\code{.sh}
multifit.py help
\endcode
\section setup Setup
First, obtain the input files used in this example and put them in the
current directory, by typing:
\code{.sh}
cp /multifit/3sfd/* .
\endcode
(On a Windows machine, use 'copy' rather than 'cp'.) Here,
is the directory containing the IMP example files. The full path to the files
can be determined by running in a Python interpreter 'import IMP.multifit;
print IMP.multifit.get_example_path('3sfd')'.
The first step is to create an input file listing the subunits involved in the
complex. This file is a text file with a simple format; it simply contains one
line per component with the following information: the name that MultiFit will
use for the component, the name of the file containing the atomic coordinates,
and flag indicating whether placements of the subunit should be sampled locally
(0) or globally (1). The default for the fitting flag is 1 (global search).
If the user has prior knowledge or a good hypothesis as to the subunit
position, he or she can provide the proposed subunit placement in the atomic
coordinates file and ask for a local search.
In this case, no prior knowledge is assumed, and so the subunits file looks
like:
\verbatim
3sfdA 3sfd.A.pdb 1
3sfdB 3sfd.B.pdb 1
3sfdC 3sfd.C.pdb 1
3sfdD 3sfd.D.pdb 1
\endverbatim
This file is already provided, as 3sfd.subunits.txt.
The next step is to create two other input files that guide the MultiFit
protocol. This is done by using MultiFit's 'param' command, by running on a
command line:
\code{.sh}
multifit.py param -i 3sfd.asmb.input -- 3sfd.asmb 3sfd.subunits.txt 30 3sfd_15.mrc 15 3. 335 27.0 -6.0 21.0
\endcode
The 'param' command takes as arguments the coarseness level in residues (30),
the resolution of the map in angstroms (15), the map spacing in angstroms (3),
the density threshold (335), and the origin of the map in angstroms
(27.0, -6.0, 21.0). The spacing (or pixel size) and origin are often stored
in the map header. To view the map header, run:
\code{.sh}
view_density_header.py 3sfd_15.mrc
\endcode
The resolution is typically not stored in the map header; it is usually
provided in the corresponding publication and can also be found in the
corresponding \external{http://www.ebi.ac.uk/pdbe/emdb/,EMDB} entry.
A threshold is often provided by the author in the EMDB entry as
"Recommended counter level" under the "Map Information" section.
Alternatively, IMP provides a utility to calculate an approximate counter
level based on the molecular mass of the complex, which can be run as:
\code{.sh}
estimate_threshold_from_molecular_mass.py 3sfd_15.mrc 1092
\endcode
The first file generated by MultiFit, 3sfd.asmb.input, provides
information on each of the subunits and their assembly density map, such as
names of the files from which the input structures and map will be read,
and those to which outputs from later steps will be written. The second file,
3sfd.asmb.alignment.param, specifies scoring and optimization
parameters for each step of the MultiFit protocol. These parameters can be
adjusted if necessary to handle difficult modeling cases.
(Note that two other files are also created, with a .refined extension.
These can be used for model refinement, which is discussed later.)
\section anchors Create the assembly anchor graph
Next, a reduced representation of the assembly density map is generated
using the Gaussian Mixture Model, by running:
\code{.sh}
multifit.py anchors 3sfd.asmb.input 3sfd.asmb.anchors
\endcode
This command computes a reduced representation of the EM map that best
reproduces the configuration of all voxels with density above the density
threshold (provided in the 3sfd.asmb.input file) as a set of
3D Gaussian functions. (The default number of Gaussians is the number of
components. However, if the sizes of the subunits differ, it is recommended
to use the -s option to set the number of residues encapsulated in
each Gaussian. For example, with 50 residues per Gaussian, a 170-residue
protein should use 3 Gaussians and a 260-residue protein should use
5 Gaussians.) The reduced representation is written out as a PDB file
containing fake CA atoms, where each CA corresponds to a single anchor point,
and also as a \external{http://www.cgl.ucsf.edu/chimera/,Chimera} cmm file.
\section fit_fft Fit each protein to the map
First, fit each protein to the map using a FFT search either globally or
locally:
\code{.sh}
multifit.py fit_fft -a 30 -n 1000 -v 60 -c 6 3sfd.asmb.input
\endcode
The output is a set of candidate fits. In each file, a single subunit is
rigidly rotated and translated to fit into the density map. Each fit is
written out as the transformation (rotation and translation) required to
place the original subunit in the density map. The fitting of a subunit
into the density map is performed by globally searching for subunit
transformations yielding high cross-correlation between the subunit and
the map via a fast Fourier transform.
Next, a list of valid fit indexes is created. As below, this list is
simply the top 3 hits from fit_fft, but they could be filtered by other
criteria (e.g., proximity to anchor points) if desired. Note that the more fit
indexes used, the longer it will take to combine the fits into a global solution
in further steps in the protocol (but the more likely it is that the optimum
solution is found). For a quick demonstration, the 3 top fits are sufficient,
but 10 or more fits are recommended in most cases. Do this by running:
\code{.sh}
multifit.py indexes 3sfd 3sfd.asmb.input 3 3sfd.indexes.mapping.input
\endcode
\section proteomics Create a proteomics restraint file
Here, the restraint file used in the next assembly step is created.
This file instructs MultiFit how to combine the individual subunit fits
created above into a global solution of all subunits simultaneously fitted
into the map. First, MultiFit can generate a basic proteomics file, indicating
between which pairs of proteins a complementarity restraint (i.e., that the
surfaces of the proteins should fit and complement each other) should
be calculated:
\code{.sh}
multifit.py proteomics 3sfd.asmb.input 3sfd.asmb.anchors.txt 3sfd.asmb.proteomics
\endcode
The user can then add additional information from proteomics experiments to
this file. Here, 7 simulated residue-residue cross-link restraints are added.
The excluded volume (EV) pairs are also updated to calculate complementarity
restraints between pairs of proteins as indicated by the cross-link
restraints. After these additions, the final 3sfd.asmb.proteomics
file looks like:
\verbatim
|proteins|
|3sfdA|1|613|nn|nn|
|3sfdB|1|239|nn|nn|
|3sfdC|1|138|nn|nn|
|3sfdD|1|102|nn|nn|
|interactions|
|residue-xlink|
|1|3sfdB|23|3sfdA|456|30|
|1|3sfdB|241|3sfdC|112|30|
|1|3sfdB|205|3sfdD|37|30|
|1|3sfdB|177|3sfdD|99|30|
|1|3sfdC|95|3sfdD|132|30|
|1|3sfdC|9|3sfdD|37|30|
|1|3sfdC|78|3sfdD|128|30|
|ev-pairs|
|3sfdB|3sfdA|
|3sfdB|3sfdC|
|3sfdC|3sfdD|
\endverbatim
This modified file is already present, as 3sfd.xlinks.proteomics.
Replace the basic file with it by running:
\code{.sh}
cp 3sfd.xlinks.proteomics 3sfd.asmb.proteomics
\endcode
Note that these restraints will be used to create DOMINO’s junction tree.
DOMINO works most efficiently if the size of the intermediate subsets is small.
Use the multifit.py merge_tree command to view the tree defined
by the restraints. To reduce the size of the subsets, the user can determine
which restraints are used to define the merge tree by setting the first
value in the xlink definition. Setting the value to 0 instead of the default 1
specifies that the restraint is evaluated only at the root of the tree
and not in an intermediate merging step.
\section assemble Assemble subunits
The fits are combined into a set of the best-scoring global configurations
by running:
\code{.sh}
multifit.py align 3sfd.asmb.input 3sfd.asmb.proteomics 3sfd.indexes.mapping.input 3sfd.asmb.alignment.param 3sfd.asmb.combinations 3sfd.asmb.combinations.fit.scores
\endcode
The output is the file 3sfd.asmb.combinations which contains a ranked
list of combinations (best scored first). Each combination is simply a list of
4 numbers, where the first number is the index of the fit for the first subunit,
the second number the fit for the second subunit, and so on.
The scoring function used to assess each fit includes the quality-of-fit
of each subunit in the map, the protrusion of each subunit out of the map
envelope, the shape complementarity between subunits, as indicated in the
proteomics file, and distance restraints as defined by proteomics data,
also from the proteomics file. The optimization avoids exhaustive enumeration
of all possible mappings of subunits to anchor points by means of a
branch-and-bound algorithm combined with the DOMINO divide-and-conquer
message-passing optimizer using a discrete sampling space.
\section visualization Visualization
Finally, models can be generated as PDB files from the fits and best
combinations by running:
\code{.sh}
multifit.py models -m 5 3sfd.asmb.input 3sfd.asmb.proteomics 3sfd.indexes.mapping.input 3sfd.asmb.combinations model
\endcode
This generates PDB files for the 5 best-scoring solutions, calling them
model.0.pdb, model.1.pdb, and so on.
The best-scoring solution, shown fitted in the density map of the complex,
is shown below:
\section analysis Analysis
If a reference structure for each subunit is available, the 'reference'
command can be used to compare the models to the reference. To use this command,
modify 3sfd.asmb.input and add the filename of each reference subunit
structure in the rightmost column. In fact, the input subunit structures are
already in their native conformation, so these structures can be used. The
file 3sfd.asmb.input.ref already has this modification. Then run:
\code{.sh}
multifit.py reference -m 5 3sfd.asmb.input.ref 3sfd.asmb.proteomics 3sfd.indexes.mapping.input 3sfd.asmb.combinations
\endcode
This will report, for the top 5 combinations, the all-atom RMSD between the
model and the reference structure of the complex, and a list of placement
scores, one for each subunit. Each placement score is the distance that the
subunit must be moved to reach the reference structure and the angle through
which it must be rotated.
*/