/**\page rmf RMF Files \tableofcontents The %RMF file format (short for Rich Molecular Format) stores hierarchical data about a molecular structure in a binary file. This data can include - molecular structures stored hierarchically. These structures need not be atomic resolution. - feature information about parts of the structures, such as how well it fits a particular measurement. - geometric markup such as segments, surfaces, balls, colors which can be used to improve visualization For example, a protein can be stored as a hierarchy where the root is the whole molecule. The root has one node per chain, each chain has one node per residue and each residue one node per atom. Each node in the hierarchy has the appropriate data stored along with it: a chain node has the chain identifier, and a residue node has the type of the residue stored and atom nodes have coordinates, atom type and elements. Bonds between atoms or coarser elements are stored explicitly as dealing with external databases to generate bonds is the source of much of the difficulty of dealing with other formats such as PDB. The file might also include a node for storing the r-value for a FRET measurement between two residues (with links to the residues) as well as extra markers to highlight key parts of the molecule. Multiple conformations on the hierarchy are stored as frames. Each frame has the same hierarchical structure, but some aspects of the data (eg coordinates) can have different values for each frame (or no value for a particle frame if they happen not be be applicable then). A hierarchical storage format was chose since - most biological molecules have a natural hierarchical structure - it reduces redundancy (eg the residue type is only stored once, as is the residue index) - most software uses a hierarchy of some sort to represent structures at runtime, so less translation is needed - low resolution and multiresolution structures are then more natural as they are just truncations of a full, atomic hierarchy. See \internal{simple.xml, simple.rmf} for an XML dump of the %RMF generated from \internal{simple.pdb.txt, simple.pdb}. For a larger example, see \internal{3U7W.xml, 3U7W.rmf} from \internal{3U7W.pdb.txt, 3U7W.pdb}. Note, that viewing XML files works much better with \external{http://www.mozilla.org/firefox, Firefox} \external{http://chrome.google.com/, Google Chrome} than with Safari. For more information about the library see RMF. And for the standard data storage schemes see \ref categories "standard categories". \section structure The RMF Hierarchy More technically, each node in the %RMF hierarchy has - a type (RMF::NodeType) - a human readable name (RMF::NodeHandle::get_name() - an ID that is unique within the file (RMF::NodeHandle::get_id()) - and associated attributes. One accesses nodes in the hierarchy using handles, RMF::NodeHandle and RMF::NodeConstHandle. The root handle can be fetched from the RMF::FileHandle using RMF::FileHandle::get_root_node(). \section attributes Attributes Each attribute is identified by a key (RMF::Key) and is defined by a unique combination of - a \ref rmf_types "type". - a \ref categories "category" such as \ref catphysics "physics", identified by an RMF::Category. - a name string On a per %RMF basis, the data associated with a given key can either have one value for each node which has that attribute, or one value per frame per node with the attribute. The methods in RMF::NodeHandle to get and set the attributes take an optional frame number. The library provides decorators to group together and provide easier manipulation of standard attributes. Examples include RMF::Particle, RMF::Colored, RMF::Ball etc. See \ref rmfdecorators "Decorators" for more information. \section inheritance Inheritance of properties Certain nodes modify how their children should behave. This modification can be either through inheritance (eg all descendants are assumed to have the property unless they explicitly override it) or composition (the descendant's property is the ancestors composed with theirs). Note that since a given node can be reached through multiple path in the hierarchy, a given view of the file might have to have multiple objects (eg graphics) for a single node. Current examples are - RMF::Colored is inherited. That is, a node that is not an RMF::Colored node, has the color of its closest RMF::Colored ancestor. - The RMF::Particle and RMF::RigidParticle coordinates are transforms of any RMF::ReferenceFrame(s) that occur on the particle itself or its ancestors. That is, a node that is a RMF::Particle or RMF::Ball with an ancestor that is an RMF::ReferenceFrame has global coordinates at the RMF::ReferenceFrame's transformation applied to its coordinates. See the example rigid_bodies.py. \section frames Frames Each RMF file stores one or more frames (conformations). The attributes of a node in a given conformation are the union of conformation-specific attributes as well as static attributes (values that hold for all frames). As with nodes, frames have a hierarchical relationship to one another. This hierarchy supports natural representation of clustering results (eg you have a frame for the cluster center with a child frame for each conformation that is in the cluster). By convention, sequential frames in a simulation should be stored as with the successor frame as a child of the predecessor. Frames also have arbitrary attributes associated with them, like nodes. \section adding Adding custom data to an RMF When adding data to an %RMF file that is just to be used for internal consumption, one should create a new category. For example, \external{http://www.integrativemodeling.org, IMP} defines an ''imp'' category when arbitrary particle data is stored. If, instead, the data is likely to be a general interest, it probably makes sense to add it to the documentation of this library so that the names used can be standardized. \section on_disk On disk formats The RMF library supports several on-disk formats. They are differentiated by the file suffix. The API for accessing all of them are the same with any exceptions noted below. - \c .rmf2: Data is stored as several different Avro Object Container files in a directory. An \c .rmf2 file can either be open for writing, in which case new frames can be generated and saved, but old frames cannot be modified or accessed or read-only, in which case no data can be changed. It can be reliably opened for reading by one process while being written by another. In addition, it should be relatively impervious to corruption and easy to repair if not closed properly. Do note, it is a directory full of files, rather than a single file, so you have to treat it accordingly. - \c .rmfa: Stores all the data as a single Avro Object. It must be rewritten each time a frame is added. However, it can be used to produce a byte array representation of the whole rmf data structure. - \c .rmf: Data is stored in a single HDF5 file. This storage is relatively fast and compact, being compressed binary, however, it suffers from several disadvantages. First, it is brittle in that it is easily corrupted by not being properly closed and once the files are corrupted, no data can be recovered. Second, it is hard to monitor from another process while one process is writing it. Third, it seems to get slower as the more frames are written to the file. \subsection rmf_and_avro RMF2 RMF2 stores the structure as a directory of files. These files are - \c file which stores the number of written frames, the version and the RMF::FileConstHandle::get_description() and RMF::FileConstHandle::get_producer() data. - \c nodes which stores the structure of the RMF::NodeConstHandle hierarchy as well as RMF::NodeConstHandle::get_type() and RMF::NodeConstHandle::get_name() data. - \c frames which stores the descriptions and hierarchy information for the frames in the file. - \c category_name.static that stores the state data associated with nodes for the RMF::Category \c category_name - \c category_name.frames that stores the data for each frame associated with nodes for the RMF::Category \c category_name. They are stored one after another in an Avro Object Container. This allows them to be read successively, and, once the Avro C++ library is updated searched for (somewhat less efficiently). See \ref json "JSON Schemas" for more information about the actual way data is stored on disk. \subsection rmf_hdf5 RMF and HDF5 The %RMF data is stored in a single \external{http://www.hdfgroup.org/HDF5/doc/UG/UG_frame09Groups.html,HDF5 group} in the file on disk. As a result, one could easily put multiple %RMF "files" in a single HDF5 archive, as well as store other data (such as electron density maps). However, adding extra data sets within the %RMF HDF5 group is not supported. HDF5 was chosen over the other candidates as it - supports binary i/o, avoiding issues parsing and delimiters that occur with text files - supports random access, allowing loading of individual conformations without reading the who file - can use internal compression to reduce the size of files - has well developed support library facilitating easy use from most programming languages and has a variety of command line tools to aid debugging \note The following information about how the data is stored on disk is not complete. Implementers should instead use the API provided in the \ref rmflib "RMF library". The %RMF data is spread over various data sets, divided up into classes based on the RMF::Category, data type and whether the particular attribute has one value per frame or just one for the whole file and whether the data is for one one or a sets of nodes. Each node has space allocated where it can store information about whether it has attributes in a given class, and if so, where in the corresponding data set the attributes are stored. Space is allocated in the appropriate table if an attribute in a particular class is used in a node. A special marker value is used to signify when a particular attribute in a class is not found for a particular node (e.g. a -1 is used to signify that a node does not have an index attribute). To get any idea of the data layout in a file, see the dump (produced by \c h5dump) of a tiny %RMF, \internal{simple.hdf5.txt, simple.rmf}. For a larger example, see \internal{3U7W.hdf5.txt, 3U7W.rmf}. */