/** \page emagefit_3sfd Modeling of 3sfd with EMageFit

\tableofcontents

\section intro Introduction

In this example, EMageFit is used to build a model of porcine mitochondrial
respiratory complex II (PDB id 3sfd), using crystal structures of its
4 constituent proteins and a number of electron microscopy (EM) class averages
of the complex.
For more information on the protocol employed during this example, see
the \ref emagefit_protocol page.

(See also \ref multifit_3sfd for building models using 3D EM maps
instead of 2D class averages.)

All steps in the procedure use a command line tool called
<tt>emagefit.py</tt>. For full help on this tool, run from a command line:

\code{.sh}
emagefit.py -h
\endcode

\section setup Setup

First, obtain the input files used in this example and put them in the
current directory, by typing:

\code{.sh}
cp -r <imp_example_path>/em2d/3sfd/* .
\endcode

(On a Windows machine, use 'xcopy /E' rather than 'cp -r'.) Here,
\<imp_example_path\> 
is the directory containing the IMP example files. The full path to the files
can be determined by running in a Python interpreter '<tt>import IMP.em2d;
print IMP.em2d.get_example_path('3sfd')</tt>'.

The input files are for modeling are the three simulated class averages of
the complex located in the <tt>em_images</tt> directory, and the subunits to
assemble: <tt>3sdfA.pdb</tt>, <tt>3sfdB.pdb</tt>, <tt>3sfC.pdb</tt>, and
<tt>3sfdD.pdb</tt>. A number of Python files are also present. These control
the modeling procedure and are described below. For an extensive description
of all the parameters see <tt>config_example.py</tt>.

Building a model requires 4 steps: pairwise docking between interacting
subunits, SA-MC optimization, SA-MC model gathering, and DOMINO sampling.

\section dock Pairwise docking

To do the dockings, run from the command line:

\code{.sh}
emagefit.py --exp config_step_1.py --dock --log dock.log
\endcode

Open the file dock.log and search for the line:

<tt>INFO:buildxlinks:The suggested order for the docking pairs is [('3sfdB', '3sfdA'), ('3sfdB', '3sfdC'), ('3sfdB', '3sfdD'), ('3sfdD', '3sfdC')]</tt>

EMageFit has determined the dockings required, and the order recommended.
The pairs are (Receptor,Ligand), so 3sfdA should be docked into 3sfdB, and so
on. If HEXDOCK is installed, it is used to perform these dockings, and generates
several sets of new files. These include
PDB files of the initial docking solutions as estimated from the cross-linking
restraints (ending in <tt>initial_docking.pdb</tt>); the PDB files with the
best solutions from HEXDOCK (ending in <tt>hexdock.pdb</tt>); a set of text
files starting with <tt>hex_solutions</tt>, containing all the solutions
from HEXDOCK; and 4 text files starting with <tt>relative_positions</tt>,
which contain the relative transformations between the subunits participating
in each pairwise docking. This last set of files is used by the SA-MC
optimization.

If HEXDOCK is not present, the relative positions files need to be generated
using some docking algorithm. They have a simple format; each line is simply
the transformation needed to dock the ligand to the receptor, given as the
components of the rotation quaternion followed by the vector vector.

Next, the anchor component and the docking transforms need to be placed
into the configuration file for the next step, by setting <tt>self.anchor</tt>
and <tt>self.dock_transforms</tt> respectively. The anchor component is the
component with most neighbors, and is the first receptor in the list of docking
pairs printed in the log file (the second component, 3sfdB, in this case).
(Anchored components are not moved during the SA-MC optimization.)
The file <tt>config_step_2.py</tt> includes this information.

\section optimize SA-MC optimization

The next step is running a Monte Carlo optimization. There are several
parameters to control this optimization, such as the profile of temperatures,
number of iterations, cycles, maximum displacement and angle tolerated for the
random moves, and also the parameter <tt>self.non_relative_move_prob</tt>.
This last parameter indicates the probability for a component of undergoing
a random move instead of a docking-derived relative move. For example, a
value of 0.4 means that the component acting as ligand (e.g. 3sfdA) will
prefer to do a random move with respect to its receptor (3sfdB) 40% of the time.
A value of 1 ignores all the relative positions, and a random move is always
chosen (i.e. all docking solutions are ignored). To run the optimization:

\code{.sh}
emagefit.py --exp config_step_2.py --monte_carlo -1 --log monte_carlo.log --o mc_solution1.db
\endcode

Note that <tt>config_step_2.py</tt> sets Monte Carlo parameters to get a very
short simulation, for demonstration purposes. For serious application, a longer
Monte Carlo simulation is recommended; suitable parameters can be found in
<tt>config_step_1.py</tt>.

The output is the file <tt>mc_solution1.db</tt>, an SQLite database with the
solution. To generate multiple candidate solutions, simply run the command
multiple times with different output file names.

\section gather Model gathering

After generating a set of solutions, they must be gathered together into a
single database, using a command such as:

\code{.sh}
emagefit.py --o monte_carlo_solutions.db --gather mc_solution*.db
\endcode

For this example, a pre-built database is included in the <tt>outputs</tt>
directory, containing the results of 500 Monte Carlo runs. To use this for
subsequent steps in the example, run:

\code{.sh}
cp outputs/monte_carlo_solutions_500.db monte_carlo_solutions.db
\endcode

\section domino DOMINO sampling

The last part of the modeling is running DOMINO employing the configuration
file <tt>config_step_3.py</tt>. This new configuration file has changes with
respect to <tt>config_step_2.py</tt> in the classes
<tt>DominoSamplingPositions</tt> and <tt>DominoParams</tt>. The parameters are:

 - <tt>self.read</tt> is the file with the Monte Carlo solutions
   obtained before.
 - <tt>self.max_number</tt> is the maximum number of solutions to combine.
   In this example, with 500 solutions and 4 components, strict enumeration
   would require 500<sup>4</sup> combinations to be explored, which is
   unfeasible. This number allows you to reduce that. In this example, it is
   set to 5, and therefore only 5<sup>4</sup> combinations are explored.
 - <tt>self.orderby</tt> is the name of the restraint used to sort the
   Monte Carlo solutions. Here the value is "em2d", so the best 5 solutions
   according to the em2d score will be combined with DOMINO. Another option
   is to use "total_score".
 - <tt>self.heap_solutions</tt>. This is a rather technical parameter. It is
   the number of solutions that DOMINO keeps at each merging step. The larger
   the number, the better the space of 5<sup>4</sup> combinations is explored,
   at the cost of a larger running time. Here, a value of 200 is used.

The command to run is:

\code{.sh}
emagefit.py --exp config_step_3.py --log domino.log --o domino.db
\endcode

The output <tt>domino.db</tt> is a database of solutions. For the actual
benchmark used in the EMageFit publication, parameters of
<tt>self.max_number=50</tt> and <tt>self.heap_solutions=2000</tt> were used,
resulting in the file <tt>outputs/domino_benchmark.db</tt>.

\section analysis Analysis

The solutions can be written out as PDB files. To write out the 10 best models
according to the value of the em2d restraint, run:

\code{.sh}
emagefit.py --exp config_step_3.py --w 10 --o domino.db --orderby em2d
\endcode

The best solution, shown fitted in the density map of the complex, is shown
below:

<img src="emagefit_3sfd.png" alt="3sfd model fit into the density">

*/