/**\page devguide Developer Guide
This page presents instructions on how to develop code using
\imp. Developers who wish to contribute code back to \imp or
distribute their code should also read the \ref contributing
"Contributing code to IMP" page.
-# \ref gettingaround "Getting around"
-# \ref usage "Writing new functions and classes"
-# \ref testing "Debugging and testing your code"
-# \ref codingconventions "Coding conventions"
-# \ref docs "Documenting your code"
-# \ref scripts "Useful scripts"
-# \ref contributing "Contributing code back to the repository"
-# \ref cpp "Good programming practices"
-# \ref next "Where to go next"
\section gettingaround 1. Getting around
The input files in the \imp directory are structured as follows:
- \c tools contains various command line utilities for use by developers. They are \ref scripts "documented below".
- \c doc contains inputs for general \imp overview documentation (such as this page), as well as configuration scripts for \c doxygen.
- \c applications contains various applications implementing using a variety of \imp modules.
- \c kernel and the subdirectories of \c module/ each defines a module and have the same structure. The directory for module \c name has
the following structure
- \c include contains the C++ header files
- \c src contains the C++ source files
- \c bin contains C++ source files each of which is built into an executable
- \c pyext contains files defining the python interface to the module as well as python source files (in \c pyext/src)
- \c test contains test files. When \c scons \c test or \c scons \c name-test is run each file in this directory named \c test_ is executed (after being built if it is a .cpp file)
- \c doc contains the overview documentation for the file (in the \c SConscript file) as well as any other documentation that is provided via \c .dox files
- \c examples contains examples, in python as well as any data needed for examples
- \c data contains any data files needed by the module
When \imp is built, the \c build directory is created and filled with
the results of the build. \c build contains a number of
subdirectories. They are
- \c include which includes all the headers. The headers for the \c kernel are placed in \c include/IMP and those for module \c name are placed in \c include/IMP/name
- \c lib where the C++ and python libraries are placed. Module \c name is built into a C++ library \c lib/libimp_name.so (or \c .dylib on a mac) and a python library
with python files located in \c lib/IMP/name and the binary part in \c lib/_IMP_name.so.
- \c doc where the html documentation is placed in \c doc/html and the examples in \c doc/examples with a subdirectory for each module
- \c data where each module gets a subdirectory for its data.
Unfortunately, various intermediate files from the build are scattered throughout the \c module
and \c kernel hierarchies. This messiness is part of the reason we strongly recommend doing an
\ref devbuild "out of source build".
When \imp is installed, the structure from the \c build directory is moved over more or less intact except that the C++ and python libraries are put in the (different) appropriate locations.
\section usage Writing new functions and classes
The easiest way to start writing new functions and classes is to
create a new module using the \ref make_module "make-module script".
This creates a new module in the \c modules directory, complete with
%example code.
We highly recommend using a revision control system such as
\svn or \external{git-scm.com/,
GIT} to keep track of changes to your module.
If, instead, you choose to add code to an existing module you need to
consult with the person who people who control that module. Their names
can be found on the module main page.
When designing the interface for your new code, you should
- search \imp for similar functionality and, if there is any, adapt
the existing interface for your purposes. For %example, the existing
IMP::atom::read_pdb() and IMP::atom::write_pdb() functions provide
templates that should be used for the design of any functions that
create particles from a file or write particles to a file. Since
IMP::atom::BondDecorator, IMP::algebra::Segment3D and
IMP::display::Geometry all use methods like
IMP::algebra::Segment3D::get_point() to access the
endpoints of a segment, any new object which defines similar
point-based geometry should do likewise.
- think about how other people are likely to use the code. For
%example, not all molecular hierarchies have atoms as their leaves,
so make sure your code searches for arbitrary
IMP::core::XYZDecorator particles rather than atoms if you only care
about the geometry.
- look for easy ways of splitting the functionality into pieces. It
generally makes sense, for %example, to split selection of the
particles from the action taken on them, either by accepting a
IMP::ParticleRefiner, or a IMP::SingletonContainer or just an arbitrary
IMP::Particles object.
You may want to read \ref design_example "the design example" for
some suggestions on how to go about implementing your functionality
in \imp.
\section testing Debugging and testing your code
Ensuring that your code is correct can be very difficult, so \imp
provides a number of tools to help you out.
The first set are assert-style macros:
- IMP_USAGE_CHECK() which should be used to check that arguments to
functions and methods satisfy the preconditions.
- IMP_INTERNAL_CHECK() which should be used to verify internal state
and return values to make sure they satisfy pre and post-conditions.
See \ref assert "Error reporting/checking" page for more details. As a
general guideline, any improper usage to produce at least a warning
all return values should be checked by such code.
The second is logging macros such as:
- IMP_LOG() which allows controlled display of messages about what the
code is doing. See \ref log "logging" for more information.
Finally, each module has a set of unit tests. These are python scripts
which test the behavior of a particular piece of the modules API. The
scripts are located in the \c modules/modulename/test directory.
These tests should try, as much as possible to provide independent
verification of the correctness of the C++ code. The command
\command{scons test} or \command{scons modulename-test} run all modules unit
tests or only those for the module \c modulename, respectively. Any
file in that directory or a subdirectory whose name matches \c
test_*.py is considered a test. The python files are scanned for
classes which inherit from \c IMP.test.TestCase. For each such class
found, any method whose name starts with \c test_ is run.
Some tests will require input files or temporary files. Input files
should be placed in a directory called \c input in the \c test
directory. The test script should then call
\command{self.get_input_file_name(file_name)} to get the true path to
the file. Likewise, appropriate names for temporary files should be
found by calling
\command{self.get_tmp_file_name(file_name)}. Temporary files will be
located in \c build/tmp. The test should remove temporary files after
using them.
\section codingconventions Coding conventions
Make sure you read the \ref conventions "API conventions" page
first.
To ensure code consistency and readability, certain conventions
must be adhered to when writing code for \imp. Some of these
conventions are automatically checked for by source control before
allowing a new commit, and can also be checked yourself in new
code by running \command{scons standards}
\subsection indent Indentation
All C++ headers and code should be indented in 'Linux' style, with
2-space indents. Do not use tabs. This is roughly the output of
Artistic Style run like
\command{astyle --convert-tabs --style=linux --indent=spaces=2 --unpad=paren --pad=oper}.
Split lines if necessary to ensure that no line is longer than 80
characters.
\b Rationale: Different users have different-sized windows or
terminals, and different tab settings, but everybody can read 80
column output without tabs.
All Python code should conform to the
\external{www.python.org/dev/peps/pep-0008/, Python style guide}.
In essence this
translates to 4-space indents, no tabs, and similar class, method
and variable naming to the C++ code. You can ensure that your
Python code is correctly indented by using the
\command{tools/reindent.py} script, available as part of the \imp
distribution.
\subsection names Names
See the names section of the \ref conventions "IMP conventions" page.
In addition, developers should be aware that
- all preprocessor symbols (things created by \#define) must begin with \c IMP
and no \imp code should depend on preprocessor symbols which do not
start with IMP.
- names of files that implement a single class should be named for that
class; for %example the SpecialVector class could be implemented in
\c SpecialVector.h and \c SpecialVector.cpp
- files that provide free functions or macros should be given names
\c separated_by_underscores, for %example \c container_macros.h
- Functions which take a parameter which has units should have the
unit as part of the function name, for %example
IMP::atom::SimulationParameters::set_maximum_time_step_in_femtoseconds().
Remember the Mars orbiter. The exception to this is distance
and force numbers which should always be in angstroms and
kcal/mol angstrom respectively unless otherwise stated.
.
\b Rationale: This makes it easier to tell between class names and
function names where this is ambiguous (particularly an issue with
the Python interface). The Python guys also mandate CamelCase for
their class names, so this avoids any need to rename classes
between C++ and Python to ensure clean Python code. Good naming is
especially important with preprocessor symbols since these have
file scope and so can change the meaning of other people's code.
\subsection datastorage Passing and storing data
- When a class or function takes a set of particles which are expected to
be those of a particular type of decorator, it should take a list of
decorators instead. eg IMP::core::transform() takes a IMP::core::XYZ.
This makes it clearer what attributes the particle is required to have
as well as allows functions to be overloaded (so there can be an
IMP::core::transform() which takes IMP::core::RigidBody particles instead).
- IMP::Restraint and IMP::ScoreState classes should generally use a
IMP::SingletonContainer (or other type of Container) to store the set of
IMP::Particle objects that they act on.
- Store collections of IMP::Object-derived or IMP::Decorator-derived
objects of type \c Name using a \c Names. Declare functions that
accept them to take a \c NamesTemp (\c Names is a \c NamesTemp). \c
Names are reference counted (see IMP::RefCounted for details), \c
NamesTemp are not.
\subsection display Display
All classes must have a \c show method which takes an optional
\c std::ostream and prints information about the object (see IMP::Object::show() for an example). The helper
macros, such as IMP_RESTRAINT() define such a method. In addition they
must have \c operator<< defined. This can be easily done using the
IMP_OUTPUT_OPERATOR() macro once the show method is defined. Note that
\c operator<< writes human readable information. Add a \c write method
if you want to provide output that can be read back in.
\subsection errors Errors
Classes and methods should use \imp exceptions to report errors. See
IMP::Exception for a list of existing exceptions. See \ref assert for
a list of functions to aid in error reporting and detection.
\subsection internal_ns Namespaces
Use the provided \c IMPMODULE_BEGIN_NAMESPACE,
\c IMPMODULE_END_NAMESPACE, \c IMPMODULE_BEGIN_INTERNAL_NAMESPACE
and \c IMPMODULE_END_INTERNAL_NAMESPACE macros to put declarations
in a namespace appropriate for module \c MODULE.
Each module has an internal namespace, \c module_name::internal and an internal
include directory \c modulename/internal. Any function which is
- not intended to be part of the API,
- not documented,
- liable to change without notice,
- or not tested
should be declared in an internal header and placed in the internal namespace.
The functionality in such internal headers is
- not exported to python
- and not part of of documented API
As a result, such functions do not need to obey all the coding conventions
(but we recommend that they do).
\section docs Documenting your code
\imp is documented using \doxygen. See
\external{www.doxygen.nl/docblocks.html, documenting source code with doxygen}
to get started. We use \c //! and \c /** ... * / blocks for documentation.
Python code should provide Python doc strings.
All headers not in internal directories are parsed through \doxygen. Any function that
you do not want documented (for %example, because it is not well tested), hide by surrounding
with
\code
#ifndef IMP_DOXYGEN
void messy_poorly_thought_out_function();
#endif
\endcode
We provide a number of extra Doxygen commands to aid in producing nice
\imp documentation. The commands are used by writing \c \\commandname{args}
or \c \\commandname if there are no arguments.
- When you want to specify some command-line command do
\verbatim
\command{the command text}\endverbatim
which produces
\command{the command text}
- To produce a link to a page on the Sali lab web site do
\verbatim
\salilab{imp, the IMP project}\endverbatim
which produces
\salilab{imp, the IMP project}
- To produce a link to the outside world do
\verbatim
\external{boost.org, Boost}\endverbatim
produces \external{boost.org, Boost}
- When writing the name \imp do
\verbatim
\imp\endverbatim
so that no link is produced (\imp as opposed to IMP).
- Sections of documentation that are only for people developing \imp code should
be marked with
\verbatim
\advanceddoc You can tweak this class in various ways in order to optimize its
performance. \endverbatim
Similarly advanced methods should be marked with
\verbatim
\advancedmethod\endverbatim
To produce \advancedmethod.
- General warning messages can be produced using
\verbatim
\warning Be afraid, be very afraid.\endverbatim
which produces
\warning Be afraid, be very afraid.
- To mark that some part of the API has not yet been well planned at may change
using \c \\unstable{Classname}. The documentation will include a disclaimer
and the class or function will be added to a list of unstable classes. It is
better to simply hide such things from \doxygen.
- To mark a method as not having been well tested yet, use \c \\untested{Classname}.
- To mark a method as not having been implemented, use \c \\untested{Classname}.
- To note that a class supports comparisons (eg <, >, ==, != etc) use \c \\comparable
and then hide the comparison functions from \doxygen (there are a lot of them and they
aren't very interesting).
\section scripts Useful Scripts
\imp provides a variety of scripts to aid the lives of developers.
\subsection make_scons Generate SConscripts
The \c SConscripts in a number of the modules list all of the
header and \c cpp files which are part of the module (those of other
modules automatically build this list at compile time). These lists can
be generated using the \c make-sconscripts script. To run it to rebuild
the SConscripts for the module modulename do
\command{./tools/make-sconscripts modulename}
\subsection make_module Making a module
Creating such a module is the easiest way to get started
developing code for \imp. First, choose a name for the module.
The name should only contain letters, numbers and underscores as it
needs to be a valid file name as well as an identifier in Python and C++.
To create the module do
\command{./tools/make-module my_module}
Then, if you run \c scons with \c localmodules=True, your new module will be
built. The new module includes a number of examples and comments to help
you add code to the module.
You can use your new module in a variety of ways:
- add C++ code to your module by putting \string{.h} files
in \string{modules/my_module/include} and \string{.cpp} files in
\string{modules/my_module/src}. In order to use use your new functions and
classes in python, you must add a line
\string{%include "IMP/my_module/myheader.h"} near the end of the file
\string{modules/my_module/pyext/my_module.i}.
- write C++ programs using \imp by creating \string{.cpp} files in
\string{modules/my_module/bin}. Each \string{.cpp} file placed there
is built into a separate executable.
- add python code to your library by putting a \string{.py} file in
\string{modules/my_module/pyext/my_module/}
- add python code to your library by by adding
\string{%pythoncode} blocks to \string{modules/my_module/pyext/my_module.i}.
- add test code to your library by putting \string{.py} files in
\string{modules/my_module/test} or a subdirectory.
If you feel your module is of interest to other \imp users and
developers, see the \ref anchorcontributing "contributing code to IMP" section.
If you document your code, running \command{scons doc} will build
documentation of all of the modules including yours. To access the
documentation, open \string{doc/html/index.html}.
\section contributing Contributing code back to the repository
\anchor anchorcontributing
In order to be shared with others as part of the \imp distribution,
code needs to be of higher quality and more thoroughly vetted than
typical research code. As a result, it may make sense to keep the
code as part of a private module until you better understand what
capabilities can be cleanly offered to others.
The first set of questions to answer are
- What exactly is the functionality I would like to contribute? Is
it a single function, a single Restraint, a set of related classes
and functions?
- Is there similar functionality already in \imp? If so, it might make
more sense to modify the existing code in cooperation with its
author. At the very least, the new code needs to respect the
conventions established by the prior code in order to maintain
consistency.
- Where should the new functionality go? It can either be added to an
existing module or as part of a new module. If adding to an existing
module, you must communicate with the authors of that module to get
permission and coordinate changes.
- Should the functionality be written in C++ or Python? In general, we
suggest C++ if you are comfortable programming in that language as
that makes the functionality available to more people. See
\ref cpppythondifferences "Python/C++ differences" for more considerations.
You are encouraged to post to the
\impdev to find help
answering these questions as it can be hard to grasp all the various
pieces of functionality already in the repository.
All code contributed to \imp
- must follow the \ref codingconventions "IMP coding conventions"
- should follow general good \ref cpp "C++ programming practices"
- must have unit tests
- must pass all unit tests
- must have documentation
- must build on all supported compilers (roughly, recent versions of gcc and Visual C++) without warnings
- should have examples
- must not have warnings when \doxygen is run (\c scons \c doc)
The next suggestions provide more details about the above choices and how to implement them.
\subsection submitting Submitting to a module
Small pieces of functionality or extensions to existing functionality
probably should be submitted to an existing module. Please contact the
authors of the appropriate module and discuss the submission and how
the code will be maintained.
A list of all current modules in the \impsvn can be found in
the modules list or from the modules tab
at the top of this page.
As always, if in doubt, post to \impdev.
Patches to modules for which you have write access can be submitted
directly by doing:
\command{svn commit -m "message describing the patch" files or directories to submit}
\subsection submitting Submitting a module
If you have a large group of related functionality to submit, it may make sense to create a new module in svn. Please post to \impdev to discuss your plans.
\subsection submitted Once you have submitted code
Once you have submitted code, you should monitor the
\salilab{imp/nightly/tests.html,Nightly build status} to make sure that
your code builds on all platforms and passes the unit tests. Please
fix all build problems as fast as possible.
The following sorts of changes must be announced on the \impdev
mailing list before being made
- changes to existing kernel or core APIs
- significant additions to kernel or core
We recommend that changes be posted to the list a day or two before
they are made so as to give everyone adequate time to comment.
In addition to monitoring the \impdev list, developers who have a module or
are committing patches to svn may want to subscribe to the \impcommits email
list which receives notices of all changes made to the \imp SVN repository.
\section cpp Good programming practices
The contents of this page are aimed at C++ programmers, but most apply
also to python.
\subsection coding General resources
Two excellent sources for general C++ coding guidelines are
- \external{www.amazon.com/Coding-Standards-Rules-Guidelines-Practices/dp/0321113586, C++ Coding Standards} by Sutter and Alexandrescu
- \external{www.amazon.com/Effective-Specific-Addison-Wesley-Professional-Computing/dp/0201924889, Effective C++} by Meyers
\imp endeavors to follow all the of the guidelines published in those
books. The Sali lab owns copies of both of these books that you
are free to borrow.
\subsection impcoding IMP gotchas
Below are a suggestions prompted by bugs found in code submitted to \imp.
- Never use '\c using \c namespace' outside of a function; instead
explicitly provide the namespace. (This avoids namespace pollution, and
removes any ambiguity.)
- Never use the preprocessor to define constants. Use \c const
variables instead. Preprocessor symbols don't have scope or type
and so can have unexpected effects.
- Pass other objects by value or by \c const & (if the object is
large) and store copies of them.
- Never expose member variables in an object which has
methods. All such member variables should be private.
- Don't derive a class from another class simply to reuse some
code that the base class provides - only do so if your derived
class could make sense when cast to the base class. As above,
reuse existing code by pulling it into a function.
- Clearly mark any file that is created by a script so that other
people know to edit the original file.
- Always return a \c const value or \c const ref if you are not
providing write access. Returning a \c const copy means the
compiler will report an error if the caller tries to modify the
return value without creating a copy of it.
- Include files from the local module first, then files from the
other \imp modules and kernel and finally outside includes. This
makes any dependencies in your code obvious, and by including
standard headers \e after \imp headers, any missing includes in the
headers themselves show up early (rather than being masked by
other headers you include).
\code
#include
#include
#include
#include
\endcode
- Use \c double variables for all computational intermediates.
- Avoid using nested classes in the API as SWIG can't wrap them
properly. If you must use use nested classes, you will have to
do more work to provide a Python interface to your code.
- Delay initialization of keys until they are actually needed
(since all initialized keys take up memory within each particle,
more or less). The best way to do this is to have them be static
variables in a static function:
\code
FloatKey get_my_float_key() {
static FloatKey k("hello");
return k;
}
\endcode
- One is the almost always the right number:
- Information should be stored in exactly one
place. Duplicated information easily gets out of sync.
- A given piece of code should only appear once. Do not copy,
paste and modify to create new functionality. Instead,
figure out a way to reuse the existing code by pulling it
into an internal function and adding extra parameters. If
you don't, when you find bugs, you won't remember to fix
them in all the copies of the code.
- There should be exactly one way to represent any particular
state. If there is more than one way, anyone who writes
library code which uses that type of state has to handle all
ways. For %example, there is only one scheme for
representing proteins, namely the
IMP::atom::MolecularHierarchyDecorator.
- Each class/method should do exactly one thing. The presence
of arguments which dramatically change the behavior of the
class/method is a sign that it should be split. Splitting
it can make the code simpler, expose the common code for
others to use and make it harder to make mistakes by
getting the mode flag wrong.
- Methods should take at most one argument of each type (and
ideally only one argument). If there are several arguments
of the same types (eg two different \c double parameters) it is
easy for a user to mix up the order of arguments and the compiler will
not complain. \c int and \c double count as
equivalent types for this rule since the compiler will
transparent convert an \c int into a \c double.
\section next Where to go next
*/