International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G. ch. 3.6, pp. 190-194

Section 3.6.8. Publication

P. M. D. Fitzgerald,a* J. D. Westbrook,b P. E. Bourne,c B. McMahon,d K. D. Watenpaughe and H. M. Bermanf

a Merck Research Laboratories, Rahway, New Jersey, USA,bProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA,cResearch Collaboratory for Structural Bioinformatics, San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA,dInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England,eretired; formerly Structural, Analytical and Medicinal Chemistry, Pharmacia Corporation, Kalamazoo, Michigan, USA, and fProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA
Correspondence e-mail:  paula_fitzgerald@merck.com

3.6.8. Publication

| top | pdf |

The results of the determination of the crystal structure of a biological macromolecule might be published in an academic journal and/or deposited in a structural database. The data items in the core CIF dictionary cover most of the requirements for constructing an article for publication from an mmCIF and the many well defined data fields in mmCIF allow an extensively annotated record of the structure to be deposited in a database. However, the formalism of two of the core CIF categories for publication did not fit the relational database model of mmCIF, so new categories were required. The core CIF category COMPUTING, which is used to list the programs used to determine the structure, is replaced by the mmCIF category SOFTWARE, and the core CIF category DATABASE, which is used to identify the records associated with the structure in various databases, is replaced by the mmCIF category DATABASE_2.

The category groups discussed here are: the CITATION group, which is used to give citations to the literature (Section 3.6.8.1[link]); the COMPUTING group, which is used to cite software (Section 3.6.8.2[link]); the DATABASE group for citing related database entries (Section 3.6.8.3[link]), which includes a group of categories used to ensure compatibility with specific database records in the Protein Data Bank (Section 3.6.8.3.2[link]); journal administration categories that might be used by a publisher (Section 3.6.8.4.1[link]); and the PUBL family of categories used to store the text of an article for publication (Section 3.6.8.4.2[link]).

3.6.8.1. Literature citations

| top | pdf |

The categories describing literature citations are as follows:

CITATION group
CITATION
CITATION_AUTHOR
CITATION_EDITOR

Data items in these categories are as follows:

(a) CITATION [Scheme scheme185]

(b) CITATION_AUTHOR [Scheme scheme186]

(c) CITATION_EDITOR [Scheme scheme187]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_).

The original core CIF dictionary contained the data item _publ_section_references for citations of journal articles, book chapters and monographs. The authors of the mmCIF dictionary felt that a more detailed and structured approach to literature citations was required. This is provided by the mmCIF categories CITATION, CITATION_AUTHOR and CITATION_EDITOR. These categories were subsequently included in the core CIF dictionary and are used in the same way in both dictionaries. Section 3.2.5.1[link] may be consulted for details. Although _publ.section_references remains a valid mmCIF data item, it is expected that the CITATION, CITATION_AUTHOR and CITATION_EDITOR categories will be used for literature citations in mmCIFs.

3.6.8.2. Citation of software packages

| top | pdf |

The categories describing software citations are as follows:

COMPUTING group
COMPUTING
SOFTWARE

It is expected that citations of software packages in an mmCIF will be made using data items in the SOFTWARE category. However, in some cases, a particular publisher or database may require that this information is given using data items in the COMPUTING category instead (see Section 3.2.5.2[link] for details).

Data items in these categories are as follows:

(a) COMPUTING [Scheme scheme188]

(b) SOFTWARE [Scheme scheme189]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_).

The data item _computing.entry_id has been added to the COMPUTING category to provide the formal category key required by the DDL2 data model.

The data items in the SOFTWARE category are used to cite the software packages used in the structure analysis. The software can be described in great detail if necessary. However, for most applications a small subset of these data items, for example just _software.name and _software.version, could be used (see Example 3.6.8.1[link]).

Example 3.6.8.1. The refinement program Prolsq described with data items in the SOFTWARE category.

[Scheme scheme190]

Most data items in the SOFTWARE category are self-explanatory, but a few require further comment. The data item _software.citation_id provides a way to link the details of a program to the citation of an article in the literature that describes the program; this data item must match a value of _citation.id in the CITATION category. The name and e-mail address of the author of the software can also be given using _software.contact_author and _software.contact_author_email, respectively. (This may be the original author or someone who subsequently modifies or maintains the software; these data items would generally refer to the person most closely associated with the maintenance of the code at the time it was used.) The release date of the software may be recorded in _software.date. As far as possible, the date should be that of the version recorded in _software.version. The data item _software.location may be used to supply a URL from which the software may be downloaded or where it is described in detail.

3.6.8.3. Citation of related database entries

| top | pdf |

Categories describing related database entries are as follows:

DATABASE group
Related database entries (§3.6.8.3.1[link])
DATABASE
DATABASE_2
Compatibility with PDB format files (§3.6.8.3.2[link])
DATABASE_PDB_CAVEAT
DATABASE_PDB_MATRIX
DATABASE_PDB_REMARK
DATABASE_PDB_REV
DATABASE_PDB_REV_RECORD
DATABASE_PDB_TVECT

The purpose of entries in the DATABASE category group is to provide pointers that link the mmCIF to all database entries that result from the deposition of the file. For mmCIF, the relevant category is DATABASE_2, which replaces the DATABASE category of the core dictionary.

Note the distinction between the database pointers provided here and those in the STRUCT_REF family of categories. The latter are intended to provide links to external database entries for any aspect of any subset of the structure that the author may wish to record, including previous determinations of the same structure, other structures containing the same ligand or references to the sequence(s) of the macromolecule(s) in sequence databases. In contrast, the links provided in DATABASE_2 refer to the entire contents of the mmCIF and are designed to cover situations in which the entire file is deposited in more than one database (for example, in the PDB and in a database for protein kinases).

3.6.8.3.1. Related database entries

| top | pdf |

Data items in these categories are as follows:

(a) DATABASE [Scheme scheme191]

(b) DATABASE_2 [Scheme scheme192]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_).

The DATABASE category is retained in the mmCIF dictionary, but only for consistency with the core dictionary.

The role of the data items in the DATABASE_2 category is to store identifiers assigned by one or more databases to the structure described in the mmCIF. In the data model used in the core CIF dictionary, each database has an individual data item. The data model in mmCIF is more general. It comprises the data items _database_2.database_id, which identifies the database, and _database_2.database_code, which is the code assigned by the database to the entry. Thus a new database can be referred to without needing to add an additional data item to the dictionary. If a structure has been deposited in more than one database, the values of _database_2.database_id and _database_2.database_code can be looped.

The institutions and individual databases recognized in the DATABASE_2 category in the current version of the mmCIF dictionary are CAS (Chemical Abstracts Service), CSD (Cambridge Structural Database), ICSD (Inorganic Crystal Structure Database), MDF (Metals Data File), NDB (Nucleic Acid Database), NBS (the Crystal Data database of the National Institute of Standards and Technology, formerly the National Bureau of Standards), PDB (Protein Data Bank), PDF (Powder Diffraction File), RCSB (Research Collaboratory for Structural Bioinformatics) and EBI (European Bioinformatics Institute). It is intended that new databases will be added to this list on an ongoing basis; the purpose of specifying a list of possible databases in the dictionary is to ensure that each database is referenced consistently.

3.6.8.3.2. Compatibility with PDB format files

| top | pdf |

Data items in these categories are as follows:

(a) DATABASE_PDB_REV [Scheme scheme193]

(b) DATABASE_PDB_REV_RECORD [Scheme scheme194]

(c) DATABASE_PDB_MATRIX [Scheme scheme195]

(d) DATABASE_PDB_TVECT [Scheme scheme196]

(e) DATABASE_PDB_CAVEAT [Scheme scheme197]

(f) DATABASE_PDB_REMARK [Scheme scheme198]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item.

A major goal of the design of the mmCIF data model was that a file could be transformed from Protein Data Bank (PDB) format to mmCIF format and back again without loss of information. This required the creation of mmCIF data items whose sole purpose is to capture PDB-specific records that do not map onto mmCIF data items. These records would never be created for a de novo mmCIF. This family of categories also belongs to the PDB category group (see Section 3.6.9.3[link]).

The items in the categories DATABASE_PDB_MATRIX and DATABASE_PDB_TVECT are derived from the elements of transformation matrices and vectors used by the Protein Data Bank. The items in the categories DATABASE_PDB_REV and DATABASE_PDB_REV_RECORD record details about the revision history of the data block as archived by the Protein Data Bank.

The items in the DATABASE_PDB_CAVEAT category record comments about the data block flagged as `CAVEATS' by the Protein Data Bank at the time the original PDB archive file was created. A PDB CAVEAT record indicates that the entry contains severe errors. In PDB format, extended comments were stored as a sequence of fixed-length (80-character) format records, columns 9 and 10 being reserved for continuation sequence numbering. The mmCIF representation retains each record as a separate data value and does not attempt to merge continuation records to provide more readable running text. Hence the PDB CAVEAT entry[Scheme scheme199] would be represented in mmCIF as[Scheme scheme200]

The PDB format used `REMARK' records to store information relating to several aspects of the structure in free or loosely structured text. In some cases, the conventions used for individual types of REMARK record allow structured data to be extracted automatically and translated to specific mmCIF data items. Where this is not possible, the DATABASE_PDB_REMARK category may be used to retain the information that appeared in these parts of PDB format files. Unlike the CAVEAT records, it is possible to collect together several REMARK records sharing a common numbering into a single free-text field. For example, PDB practice has been to repeat the contents of CAVEAT records (see above) as records of type `REMARK 5'. While each separate CAVEAT record is converted to a separate mmCIF data value, the complete text of a REMARK 5 record may be gathered into a single mmCIF data value. Hence the CAVEAT example above would also appear in a PDB file as part of a `REMARK 5' as[Scheme scheme201] and would appear in an mmCIF as[Scheme scheme202]

Note that by convention the value of _database_PDB_remark.id matches the class of the REMARK record in the PDB file.

3.6.8.4. Article publication

| top | pdf |

Categories used during the publication of an article are as follows:

IUCR group
Journal housekeeping and reference entries (§3.6.8.4.1[link])
JOURNAL
JOURNAL_INDEX
Contents of a publication (§3.6.8.4.2[link])
PUBL
PUBL_AUTHOR
PUBL_BODY
PUBL_MANUSCRIPT_INCL

These categories cover both the metadata for the article (information about the article) and the text of the article itself.

3.6.8.4.1. Journal housekeeping and citation entries

| top | pdf |

Data items in these categories are as follows:

(a) JOURNAL [Scheme scheme203]

(b) JOURNAL_INDEX [Scheme scheme204]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_).

In mmCIF, the families of categories used to contain the text of an article for publication and to record information about the hand­ling and processing of the article by a publisher are assigned to the IUCR category group. The name arose from the fact that CIF is sponsored by the International Union of Crystallography and several of the journals of the IUCr can handle articles submitted for publication in CIF format. However, these data items may be freely used by other publishers who wish to handle articles submitted in CIF format. The JOURNAL and JOURNAL_INDEX categories are used in the same way in the core CIF and mmCIF dictionaries, and Section 3.2.5.4[link] can be consulted for details.

3.6.8.4.2. Contents of a publication

| top | pdf |

Data items in these categories are as follows:

(a) PUBL [Scheme scheme205]

(b) PUBL_AUTHOR [Scheme scheme206]

(c) PUBL_BODY [Scheme scheme207]

(d) PUBL_MANUSCRIPT_INCL [Scheme scheme208]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_).

The categories PUBL, PUBL_AUTHOR, PUBL_BODY and PUBL_ MANUSCRIPT_INCL are also members of the IUCR group in the mmCIF dictionary. They are used in the same way in the core CIF and mmCIF dictionaries, and Section 3.2.5.5[link] can be consulted for details.








































to end of page
to top of page