Pose

A bare-metal Python library for building and manipulating the molecular structures of proteins, nucleic acids, and small organic molecules

Video Tutorial

Watch the full walkthrough: Video Tutorial on YouTube

What is Pose?

Pose constructs a data structure for proteins, nucleic acids, or small organic molecules that contains all relevant information defining a polymer or an organic molecule. Primary information includes the XYZ cartesian coordinates of each atom, the identity and charge of each atom, and the bond graph of the entire molecule. Secondary information includes the FASTA sequence, radius of gyration, potential energy, and the secondary structure assignment for each protein residue.

Using this data structure, Pose can build and manipulate polypeptides, nucleic acids, or small organic molecules: construct any polypeptide or nucleic acid from sequence, move dihedral and rotamer angles, mutate residues and base pairs, and measure bond lengths and angles. It is designed as a substrate for higher-level protocols such as simulated annealing, molecular dynamics, and machine learning-based molecular design.

Key features:

Designed to be extremely stable bare-metal python: NumPy is the only dependency for the core Pose and Molecule classes
26 amino acids supported by default (20 canonical + 6 non-canonical: ALY, MSE, TPO, SEC, TRF, PTR), and can be extended to 100+ amino acids
Support for both L-amino acids and D-amino acids (mixed sequences fully supported)
5 DNA and RNA canonical nucleotides
Full bond graph with atom partial charges
Measure and rotate protein dihedral and rotamer angles (φ/ψ/ω/χ)
Measure and rotate nucleic acids dihedral angles (α/β/γ/δ/ε/ζ/χ)
Measure and adjust the distance and angle between any atoms
PDB and mmCIF file import and export
Pythonic zero-based indexing throughout (unlike PDB's one-based convention)
Bundled bare-metal force field with analytic gradients, periodic-boundary support, and a hash-cached topology
Bundled hybrid physics + statistical score function that is chirality-aware end to end across L-AA, D-AA, mixed L/D, and non-canonical residues
Three production protocols: minimisation, simulated annealing, and molecular dynamics simulation

Installation

Dependencies: Python >= 3, NumPy

For virtualenv:

pip install git+https://github.com/sarisabban/Pose

For anaconda:

conda create -n env python=3
conda activate env
pip3 install git+https://github.com/sarisabban/Pose

Quick Start

from pose import *

# === Build a peptide (no external files needed) ===
p = Pose()
p.Build('MSLESNRGI', chain='A', fmt='Protein')   # Uppercase=L-AA, lowercase=D-AA
p.GetInfo()                                      # Print structured summary

# === Inspect ===
print('Sequence:', p.data['FASTA'])
print('Mass:',     p.data['Mass'], 'Da')
print('Rg:',       p.data['Rg'], 'Å')

# === Manipulate backbone (zero-based indexing) ===
p.RotateDihedral(1, -60, 'PHI')
p.RotateDihedral(1, -45, 'PSI')
p.Mutate(2, 'V')                                 # Mutate residue index 2 (3 in PDB index) Leu → Val
p.Export('peptide.pdb')

# === Same APIs work for D-amino acids and mixed L/D sequences ===
p_d = Pose()
p_d.Build('MsLeSnRgI', chain='A', fmt='Protein') # Mixed L/D chirality-aware

# === Energy and protocols ===
ff = ForceField('OpenFF')                        # Small-molecule force field
# ff = ForceField()                              # 'Default' smoke-test FF (verification only)

E           = ff(p)                              # Potential energy (kJ/mol)
E, F        = ff(p, grad=True)                   # Also return per-atom forces

E_min, log  = Minimise(p, ff)                    # FIRE2 relaxation
E_md,  log  = MolecularDynamics(p, ff, n_steps=1000, dt_fs=2.0, T=300.0, thermostat='langevin')

Uppercase sequence letters build L-amino acids (natural form), lowercase builds D-amino acids (mirror images), and mixed sequences (e.g. 'MsLeSnRgI' above) are fully supported.

Importing a PDB file:

p = Pose()
p.Import('1TQG.pdb', chain='A')   # Or '1BNA.pdb' for a DNA/RNA structure
p.ReBuild()                       # Adds missing hydrogens

Note: Importing a PDB structure with added Hydrogens will disrupt the Bond Graph in p.data['Bonds'], the bond graph in this situation won't include the hydrogens. It is recomended to run p.ReBuild() to re-establish the correct bond graph to include hydrogens (but this will change the positions of the Hydrogens). A trick to overcome this is to import the pose p = Pose(); p.Import('1YN3.pdb') and then seperatly import and rebuild the pose p2 = Pose(); p2.Import('1YN3.pdb'); p2.ReBuild() then substiture the bond graph from p2 to p: p.data['Bonds'] = p2.data['Bonds']. This would solve it

You can run p.ReBuild() after Import() to add hydrogens to the structure. But understand that a new synthetic structure will be built, therefore you will lose the original occupancy and temperature-factor for each atom (replaced with 1.0 and 0.0).

Building DNA/RNA:

p = Pose()
p.Build('ATGCGTACGTTCCGGCAGACGT', chain='A', fmt='DNA')
p.GetInfo()

Importing a small organic molecule:

m = Molecule()
m.Import('caffiene.sdf')
m.GetInfo()

OpenMM plugin:

import io
import numpy as np
from sys import stdout
from pose import Pose
from openmm import *
from openmm.app import *
from openmm.unit import *

p = Pose()
p.Import('1YN3.pdb', chain='A')

pdb = PDBFile(io.StringIO(p.Export(fmt='PDB')))        # <--- The plugin
forcefield = ForceField('amber99sb.xml', 'tip3p.xml')
modeller = Modeller(pdb.topology, pdb.positions)
modeller.addHydrogens(forcefield)
modeller.addSolvent(forcefield, padding=1*nanometer)
pos = np.array(modeller.positions.value_in_unit(nanometer))
box = np.array(modeller.topology.getPeriodicBoxVectors().value_in_unit(nanometer))
shift = 0.5*(box[0]+box[1]+box[2]) - 0.5*(pos.min(axis=0)+pos.max(axis=0))
modeller.positions = [Vec3(*(p+shift)) for p in pos]*nanometer
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME, nonbondedCutoff=1*nanometer, constraints=HBonds)
integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)
simulation = Simulation(modeller.topology, system, integrator)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy()
simulation.reporters.append(PDBReporter('output.pdb', 1000))
simulation.reporters.append(StateDataReporter(stdout, 1000, step=True, potentialEnergy=True, temperature=True))
simulation.step(10000)

Note: from openmm.app import * rebinds ForceField to OpenMM's class. If you mix this snippet with the other Quick Start examples (which start with from pose import *), make sure the OpenMM star imports come after from pose import *, the last star import wins, and this example needs OpenMM's ForceField('amber99sb.xml', ...). If you want both classes accessible in the same session, use explicit imports instead of * (e.g. from openmm.app import ForceField as OpenMMForceField).

RDKit plugin:

from rdkit import Chem

CFF = 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
m = Chem.MolFromSmiles(CFF)
# Manipulate a molecule using RDKit here
molstr = Chem.MolToMolBlock(m)

m = Molecule()
m.Import(molstr)
m.GetInfo()

This RDKit plugin gives you the power and flexibility to manipulate molecules using RDKit and then import them to the Molecule() class when they are ready.

API Reference

Call Class

Class	Description
`p = Pose()`	Calls the `Pose()` class for proteins, DNA, and RNA
`m = Molecule()`	Calls the `Molecule()` class for small organic molecules

Each class have similar methods and data structure, but with slight differences in the way they are used.

Building & I/O

Method	Description
`p.Import(filename='1YN3.pdb', chain=['A', 'B'], model=1, strict=True)`	Imports a structure from a PDB or mmCIF file and constructs the `p.data` object. Can import a protein, DNA, or RNA structure. `chain` accepts a single chain ID (`'A'`), a list of chains (`['A', 'B']`), or `None` to import all chains. `model` selects which model to import from multi-model files (e.g. NMR ensembles); defaults to `1`. For atoms with multiple conformers, the highest-occupancy conformer is kept. Cannot import a structure that is a mixture of proteins and nucleic acids in separate chains, import each macromolecule type as a separate pose, `strict=False` will allow importing broken chains
`m.Import(filename='caffiene.sdf')`	Imports a structure from a PDB, SDF, mmCIF, MOL, or MOL2 files, or an RDKit block string and constructs the `m.data` object
`p.Export('out.pdb', fmt=None)`	Write the full structure, and all chains, to a PDB or mmCIF file. `fmt='PDB'` or `fmt='CIF'` will export the structure as a string and not a file (ideal to plug the structure to other libraries such as OpenMM)
`m.Export('out.sdf', fmt=None)`	Write the full structure to a PDB, SDF, mmCIF, MOL, or MOL2 file. `fmt='PDB'` or `fmt='CIF'` or `'SDF'`/`'MOL'`/`'MOL2'` will export the structure as a string and not a file (ideal to plug the structure to other libraries such as OpenMM and RDKit)
`p.Build('MSLESNRGI', chain='A', fmt='protein')`	Build a macromolecule from a one-letter sequence. For a polypeptide add the sequence and choose the format `fmt='Protein'`, uppercase = L-amino acids, lowercase = D-amino acids. For a nucleic acid add the sequence and choose the format `fmt='DNA'` or `fmt='RNA'`. You can add more chains by repeating the command with different chain `chain='A'` values. A structure can either be a protein, or a nucleic acid (DNA/RNA), it cannot be a mixture of the two
`p.ReBuild(sequence=None, mirror=False, _mutated=None)`	Rebuild the polypeptide or nucleic acid. Use `sequence='AGLMTSWVLVA'` to rebuild the structure with multiple bulk mutations on chain A. Use `sequence={'A':'MSLKLSTVVA', 'B':'ASLKSWFWVA'}` to perform mutations at multiple chains at the same time. Use `mirror=True` to rebuild a protein and convert L-amino acids → D-amino acids and D-amino acids → L-amino acids. Running `p.ReBuild(sequence=None)` will add missing Hydrogens. For DNA and RNA, the `sequence=''` length must match exactly the original sequence length, otherwise an error will be raised
`p.Mutate(1, 'V', fast=True)`	Mutate a single monomer. For proteins: `p.Mutate(1, 'V')` = residue 1 → L-Valine, `p.Mutate(1, 'v')` = residue 1 → D-Valine. For DNA: `p.Mutate(0, 'T')` = nucleotide 0 → Thymine. For RNA: `p.Mutate(0, 'U')` = nucleotide 0 → Uracil. For double-stranded nucleic acids, the complementary base is also updated automatically. The `fast=True` argument means the mutation is performed by vector addition without ensuring the stability of the backbone (also the `CalcDSSP(), CalcSASA, and CalcRg()` etc.. are not re-computed) so these needs to be called after the mutation, in return the mutation is very fast, ideal for large mutation simulations. If `fast=False` the mutated residue is added to the structure and the entire structure rebuilt using ReBuild(), this is more accurate but very slow for large simulations

Note: Import() will not import broken structures such as broken chains or missing backbone atoms. The structure needs to be complete. If it is not complete, then re-generate it to fill in the missing sections.

Measurements

Method	Description
`p.GetDistance(0, 'N', 5, 'CA')`	Get the distance in Å between any two atoms. Example: residue 0 nitrogen atom to residue 5 CA atom
`m.GetDistance(0, 5)`	Get the distance in Å between any two atoms. Example: atom 0 to atom 5
`p.GetDihedral(2, 'PHI')`	Calculate the amino acid φ/ψ/ω/χ and nucleotide α/β/γ/δ/ε/ζ/χ dihedral angles. In this example we are measuring the PHI angle of the 3rd protein residue (index 2). For protein χ dihedral use `p.GetDihedral(4, 'chi', 1)` 5th residue (index 4), CHI 1 angle
`m.GetDihedral(0, 1, 2, 3)`	Calculate a dihedral angle between 4 atoms. In this example the dihedral angle is made up of the atoms at indeces 0, 1, 2, and 3
`p.GetAngle(0, 'N', 5, 'CA', 17, 'C')`	Get the angle between any three atoms in the whole structure. Example: N of residue 1, CA of residue 5, and C angle of residue 17, with the CA atom in the middle being the pivot
`m.GetAngle(0, 5, 17)`	Get the angle between any three atoms in the whole structure. Example: atom at index 1, atom at index 5, and atom at index 17, with atom at index 5 being the pivot
`p.GetAtomBonds(0, 1)`	Confirm and get the PDB name and element name `[atom 1 element name, atom 1 PDB name, atom 2 PDB name, atom 2 element name]` for two atoms (if they are bonded together). Use the atom indeces. If the two atoms are not bonded an error will be raised
`m.GetAtomBonds(1)`	Get all atom names bonded to this atom index ['atom name 1', 'atom name 2', 'atom name 3']
`p.GetAtomCoord(3, 'N')`	Get the XYZ coordinates of an atom of a residue or a nucleotide (monomers). Example: `N` nitrogen of monomer index `3`
`m.GetAtomCoord(3)`	Get the XYZ coordinates of an atom given its index. Example: atom at index `3`
`p.GetAtomList(PDB=False)`	Get a list of all atom element names for the entire structure. Use `PDB=True` for PDB-formatted names
`m.GetAtomList()`	Get a list of all atom element names for the entire structure
`p.GetAtomHybridisation()`	Get a list of all atom hybridisations for the entire structure
`m.GetAtomHybridisation()`	Get a list of all atom hybridisations for the entire structure
`p.GetAtomIdx(3, 'N')`	Get the atom index in `p.data['Coordinates']` from its name within a monomer. This is the opposite of `p.GetAtomCoord(3, 'N')`
`p.GetIdentity(0, 'Atom')`	Identify the PDB name of an atom, or an amino acid, or a nucleotide by its index. Example `p.GetIdentity(5, 'Atom')` or `p.GetIdentity(5, 'amino acid')` or `p.GetIdentity(5, 'nucleotide')`. Also, specifically just for atoms, you are return its partial charge using `p.GetIdentity(3, 'Atom', charge=True)`
`p.GetInfo()`	Print a formatted summary of the structure's information
`m.GetInfo()`	Print a formatted summary of the structure's information and a graphical representation of the molecule
`m.CalcSMILES()`	Calculate the SMILES representation of a molecule and add it to `m.data['SMILES']`
`m.CalcSMARTS()`	Calculate the SMARTS representation of a molecule and add it to `m.data['SMARTS']`
`m.CalcSMIRKS()`	Calculate an atom-mapped SMIRKS-style string with hybridisation, connectivity, H-count, and formal-charge tags on each heavy atom. Atoms carry a 1-based atom-map index ':N' where N is the atom's index+1. Hybridization symbol follows SMARTS convention (^1, ^2, ^3 for sp/sp2/sp3; omitted for 's')
`p.CalcMass()`	Calculates the entire molecular mass of a molecule (all chains) in Da (Daltons), updates the value of p.data['Mass']
`m.CalcMass()`	Calculates the entire molecular mass of a molecule
`p.CalcSize()`	Calculates the length of each chain in a structure, updates the value of p.data['Size']. You can get the length of each chain using `p.data['Size'][CHAIN]`
`p.CalcFASTA()`	Compiles the FASTA sequence of each chain, updates the value of p.data['FASTA']. You can get the FASTA sequence of each chain using `p.data['FASTA'][CHAIN]`
`p.CalcRg()`	Calculates the entire Radius of Gyration of a molecule (all chains) in Å (angstrom), updates the value of p.data['Rg']
`m.CalcRg()`	Calculates the entire Radius of Gyration of a molecule
`p.CalcCharge(iterations=6)`	Calculate the Gasteiger-Marsili partial charges to all atoms using iterative equalization (default 6 iterations), updates the value of `p.data['Atoms'][index][2]`
`m.CalcCharge(iterations=6)`	Calculate the Gasteiger-Marsili partial charges to all atoms using iterative equalization (default 6 iterations), updates the value of `m.data['Atoms'][index][2]`
`p.CalcDSSP()`	Calculates each amino acid's secondary structure assignments, only for proteins, and stores them in `p.data['Amino Acids'][i][4]` and updates `p.data['SS'][CHAIN]`, therefore this is where you can get the SS sequence of each chain. Codes: H=α-helix, G=3₁₀-helix, I=π-helix, E=β-sheet, B=β-bridge, T=turn, S=bend, L=loop, P=PPII-helix
`p.CalcSASA(n_points=960, probe_radius=1.4)`	Calculates the Solvent Accessible Surface Area (SASA) for each amino acid, only for proteins, using golden sphere sampling. `n_points` controls sampling density, `probe_radius` is the solvent probe radius in Å (default 1.4 for water). Adds the value to `p.data['Amino Acids'][i][6]`

Manipulation

Method	Description
`p.AdjustDistance(0, 'N', 4, 'C', 17)`	Set the distance between any two atoms in (Å). Example: set the distance between N in residue 0 and C in residue 4 to 17 Å. Order matters: the second atom (and all atoms downstream of it on the same chain) moves, while the first atom stays fixed. `(0, 'N', 0, 'CA', d)` ≠ `(0, 'CA', 0, 'N', d)`
`m.AdjustDistance(0, 4, 17)`	Set the distance between any two atoms in (Å). Example: set the distance between atom at index 0 and atom at index 4 to 17 Å. Order matters: the second atom (and all atoms downstream of it) moves, while the first atom stays fixed. `(0, 1, d)` ≠ `(1, 0, d)`
`p.AdjustAngle(1, 'N', 1, 'CA', 1, 'C', -2)`	Add/subtract degrees from a three-atom angle, with the middle atom being the pivot point. Example: subtract 2° from N–CA–C angle of residue 1, with the CA atom being the pivot
`m.AdjustAngle(0, 1, 2, -2)`	Add/subtract degrees from a three-atom angle, with the middle atom being the pivot point. Example: subtract 2° from the angle represented by atom 0, atom 1, and atom 2, with atom 1 being the pivot
`p.RotateDihedral(1, -60, 'PHI')`	Rotate the amino acid φ/ψ/ω/χ and nucleotide α/β/γ/δ/ε/ζ/χ dihedral angles. Example: residue 1 PHI dihedral to -60°
`m.RotateDihedral(0, 1, 2, 3, -60)`	Rotate any dihedral angle represented by four atoms. Example: rotate a dihedral angle represented by atom index 0, atom index 1, atom index 2, and atoms index 3 to become -60°
`p.MovePose(theta=5, u=[18, 10, 5], l=6, ori=[0, 0, 0])`	Rotate and/or translate the whole structure. `theta` = rotation angle in degrees, `u` = rotation axis vector (will be normalised), `l` = translation distance in Å, `ori` = target point to translate towards. All parameters are optional (default `None`); you can rotate only, translate only, or both
`m.MovePose(theta=5, u=[18, 10, 5], l=6, ori=[0, 0, 0])`	Rotate and/or translate the whole structure. `theta` = rotation angle in degrees, `u` = rotation axis vector (will be normalised), `l` = translation distance in Å, `ori` = target point to translate towards. All parameters are optional (default `None`); you can rotate only, translate only, or both

Force Field & Energy Score

The ForceField() class evaluates the total potential energy and analytical per-atom forces of a Pose or Molecule. It is configured by name at construction, the name keys into database.json['Energy Parameters'][name], which carries both the SMIRKS-keyed parameter sections and the explicit list of potential methods to evaluate (under the Terms sub-key). The currently shipped names are:

Name	Purpose	Terms
`OpenFF`	Small-molecule force field. Bonded + vdW parameters from OpenFF Sage 2.3.0 (CC-BY-4.0); per-atom AM1BCC charges from a NumPy reimplementation of NAGL	6: Bond, Angle, ProperTorsion, ImproperTorsion (Fourier), VDW (12-6), Electrostatic
`Default`	Deterministic smoke-test / regression FF. Single wildcard SMIRKS per section with dummy parameters tuned so that `ForceField()(Pose().Build('AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz')) = 100.00 kJ/mol` exactly. Exercises every potential method, dispatch branch and cache path; verifies the math, not the physics. Not for production simulation	9: Bond, Angle, UB, ProperTorsion, ImproperTorsion (harmonic), VDW (12-6), Electrostatic, Polarisation, CMAP

A hash of the bond graph + atom records + amino-acid assignments is cached, so repeated calls during minimisation, MD, or annealing only recompute coordinate-dependent quantities.

Method	Description
`ff = ForceField(name='Default', strict=False)`	Build a force field. `name` is the parameter-set name in `database.json['Energy Parameters']`. Defaults to `'Default'`. `strict=True` raises `RuntimeError` on any SMIRKS coverage gap; `strict=False` warns once per pose and falls through with `K=0` for unmatched fragments. The list of potential methods to evaluate is taken from `database.json['Energy Parameters'][name]['Terms']`
`E = ff(pose, grad=False, box=None, v=False)`	Evaluate total potential energy in kJ/mol. `box=None` disables PBC; pass a `(3,)` array for an orthorhombic box or a `(3, 3)` array for a triclinic box, in Å. `v=True` will print errors for missing SMIRKS that result in k=0
`E, F = ff(pose, grad=True, box=None)`	Evaluate total potential energy plus per-atom forces. Returns a tuple `(float, ndarray)` where forces are shape `(N, 3)` in kJ/mol/Å
`ff.BondPotential(pose, cache, alg='harmonic', grad=True, box=None)`	Bond-stretching term. `alg='harmonic'` uses `Σ K_b·(r − r₀)²`, `alg='morse'` uses `Σ D_e·(1 − e^(−a(r − r₀)))²`
`ff.AnglePotential(pose, cache, grad=True, box=None)`	Harmonic three-atom angle term, `Σ K_θ·(θ − θ₀)²` over every bonded triplet
`ff.UBPotential(pose, cache, grad=True, box=None)`	Urey-Bradley 1-3 stretching term, `Σ K_UB·(s − s₀)²` between the outer atoms of every bonded triplet
`ff.ProperTorsionPotential(pose, cache, grad=True, box=None)`	Proper dihedral (torsion) term, multi-component Fourier `Σ K_φ·(1 + cos(n·φ − φ₀))` over every i-j-k-l quartet
`ff.ImproperTorsionPotential(pose, cache, alg='harmonic', grad=True, box=None)`	Improper dihedral term over degree-3 atoms. `alg='harmonic'` uses `Σ K_φ·(ψ − ψ₀)²`, `alg='fourier'` uses `Σ K_φ·(1 + cos(n·ψ − ψ₀))`
`ff.VDWPotential(pose, cache, alg='12-6', grad=True, box=None)`	Van der Waals (Lennard-Jones) non-bonded term, with 1-4 scaling masks. `alg='12-6'` is the standard form, `alg='9-6'` is a softer variant
`ff.ElectrostaticPotential(pose, cache, alg='constant', grad=True, box=None)`	Electrostatic non-bonded term. `alg='constant'` uses uniform εᵣ; `alg='ddd'` uses a distance-dependent dielectric `ε(r) = εᵣ·r`
`ff.PolarisationPotential(pose, cache, alg='constant', grad=True, box=None)`	Induced-dipole polarisation term, `−½·Σ α_i·\|E_i\|²` with the per-atom field built from neighbour charges. Evaluates to 0 when the active FF assigns `α = 0` to all atoms
`ff.CMAPPotential(pose, cache, grad=True, box=None)`	CMAP backbone (φ, ψ) cross-term correction over every interior protein residue, evaluated by bicubic Catmull-Rom interpolation on the per-residue 24×24 energy grids stored under `[name]['cmap']`. Skips residues whose 1-letter code has no grid

Charge model: under OpenFF, partial charges are computed by ForceField.NAGLCharges(pose), a NumPy reimplementation of the openff-gnn-am1bcc-1.0.0 graph neural network released by the Open Force Field Initiative. Output is bit-equivalent to upstream NAGL float32 inference, with the total constrained to the molecule's formal charge via electronegativity equalisation. NAGL weights live under [name]['AM1BCC']; force fields without that sub-key (e.g. Default) skip NAGL and fall back to library charges then atom-record charges. SMIRKS pattern assignment for bonded and vdW parameters is done in pose.energy.SMIRKSMatch(pose, params), a pure-NumPy SMIRKS engine. All numerical values in database.json are in kJ/mol (lengths in Å, angles in degrees).

The Score() class is a hybrid physics + statistical scoring function for protein design and small-molecule docking. It is configured by name at construction, the name keys into database.json['Score Parameters'][name], which carries the per-term {weight, …} blocks, the atom-type / residue-type lookup tables, and the explicit list of energy-term methods to evaluate (under the Terms sub-key).

A hash of the bond graph + atom records + amino-acid assignments is cached, so repeated calls during minimisation, MD, or annealing only recompute coordinate-dependent quantities. The cache is built by pose.energy.ScoreMatch(pose, params, ligand=None), the single support function that handles atom typing, pair-list construction, spline grids, and HBond geometry; Score.__call__ dispatches each term method against that cache.

Method	Description
`sc = Score(name='Default', strict=False)`	Build a scorer. `name` is the parameter-set name in `database.json['Score Parameters']`. Defaults to `'Default'`. `strict=True` raises on any term dispatch gap; `strict=False` skips missing terms silently
`total = sc(pose)`	Evaluate the weighted total score over a protein `Pose`. Returns a `float`
`total = sc(pose, ligand=mol)`	Evaluate the weighted total score for a receptor `Pose` + small-molecule `Molecule` complex. The small-molecule pair terms (`Gauss1`, `Gauss2`, `Repulsion`, `Hydrophobic`, `HBond`) compute receptor-ligand atom pairs; the affinity normalisation accounts for ligand rotatable bonds
`total, terms = sc(pose, decompose=True)`	Same as the protein call but also return a per-term dict keyed by method name. Useful for term-weight fitting and for diagnosing why a design scores poorly
`sc.FaAtrPotential(pose, cache)`	Full-atom Lennard-Jones attractive term over inter-residue pairs, long-range smoothing between `fa_atr_short` and `fa_atr_long`
`sc.FaRepPotential(pose, cache)`	Full-atom Lennard-Jones repulsive term over inter-residue pairs
`sc.FaSolPotential(pose, cache)`	Pairwise Lazaridis-Karplus implicit-solvent term over inter-residue pairs, using `LK_DGFREE`/`LK_LAMBDA`/`LK_VOLUME` from `Atom_types`
`sc.FaIntraRepPotential(pose, cache)`	Within-residue Lennard-Jones repulsive term, with 1-N count-pair weighting
`sc.FaIntraSolXover4Potential(pose, cache)`	Within-residue solvation term with crossover handling at 1-4 pairs
`sc.FaElecPotential(pose, cache)`	Coulomb electrostatics with sigmoidal distance-dependent dielectric (`coulomb_C0`, `sigmoidal_D`, `sigmoidal_D0`, `sigmoidal_S`)
`sc.LkBallWtdPotential(pose, cache)`	Lazaridis-Karplus solvation with weighted directional water sites placed geometrically per donor/acceptor
`sc.HBondSrBbPotential(pose, cache)`	Short-range (helix-range) backbone-backbone hydrogen bond, polynomial in distance × directional functions in three angles, fade-out across the distance shell
`sc.HBondLrBbPotential(pose, cache)`	Long-range (β-sheet-range) backbone-backbone hydrogen bond, same shape as `HBondSrBb` over wider separations
`sc.HBondBbScPotential(pose, cache)`	Backbone-sidechain hydrogen bond
`sc.HBondScPotential(pose, cache)`	Sidechain-sidechain hydrogen bond
`sc.FaDunPotential(pose, cache)`	Backbone-dependent rotamer score, `-log P(χ \| φ, ψ)` from bilinear interpolation over rotamer-well μ/σ grids plus a cyclic cubic spline in the terminal χ for semi-rotameric residues
`sc.RamaPreProTermPotential(pose, cache)`	Ramachandran φ/ψ probability with pre-proline special-casing, Catmull-Rom bicubic interpolation, entropy correction
`sc.PAaPpPotential(pose, cache)`	Shapovalov amino-acid probability at given (φ, ψ), Catmull-Rom bicubic interpolation
`sc.OmegaPotential(pose, cache)`	Peptide-bond ω-torsion harmonic penalty around 180° (and 0° for cis), with phi/psi-dependent Gaussian shape
`sc.ProClosePotential(pose, cache)`	Proline ring-closure penalty (N-virtual and Cα-virtual distances plus the χ4 dihedral)
`sc.DslfFa13Potential(pose, cache)`	Disulfide-bond geometry constraint (Sγ-Sγ′ distance, Cα-Cβ-Sγ angles, Cβ-Sγ-Sγ′-Cβ′ dihedrals)
`sc.YhhPlanarityPotential(pose, cache)`	Tyrosine hydroxyl rotamer-planarity penalty
`sc.RefPotential(pose, cache)`	Per-amino-acid unfolded-state reference energy, using `METHOD_WEIGHTS_ref` for the residue baseline
`sc.Gauss1Potential(pose, cache, ligand=mol)`	Gaussian attractive pair term centred at d=0 Å over receptor-ligand pairs typed via `XS_atom_types`
`sc.Gauss2Potential(pose, cache, ligand=mol)`	Gaussian attractive pair term centred at d=3 Å
`sc.RepulsionPotential(pose, cache, ligand=mol)`	Overlap-only penalty for receptor-ligand pairs: positive when atoms penetrate, zero otherwise
`sc.HydrophobicPotential(pose, cache, ligand=mol)`	Slope-step bonus over hydrophobic-hydrophobic receptor-ligand pairs
`sc.HBondPotential(pose, cache, ligand=mol)`	Slope-step bonus over donor-acceptor receptor-ligand pairs
`sc.DefaultOffsetPotential(pose, cache)`	Calibration offset: contributes a constant per residue, used to drive `Default` to its regression-sentinel value

Score is chirality-aware: L-amino acids, D-amino acids, mixed L/D sequences, and non-canonical residues all score correctly with no extra arguments or special-cased call sites.

Tools

These are standalone tools (not Pose() class methods) and thus are called on their own:

Function	Description
`Parameterise('PTR.cif', 'ptr_rot.json', 'PTR', 'B', backup=True)`	Add a non-canonical amino acid to the unified `database.json`. Takes the RCSB CCD `.cif` file, a Dunbrack BBDEP2010-format rotamer-library JSON (e.g. produced by `nnca_pipeline/scripts/build_*_rotamer.py`), the three-letter tricode, and a single-letter unicode for the `Amino Acids` slot. Inserts the residue into both `Amino Acids[unicode]` (atoms / bonds / hybridisation inferred from the CIF) and `Rotamer Library["residues"][tricode]` (chi means, sigmas, populations from the JSON) in one atomic write. `backup=True` (default) timestamps a `database.json.bak.<YYYYMMDD-HHMMSS>` before modifying. Chi axes are taken from `rot_entry["method"]["chi_axes"]` (the user-confirmed source of truth); the CIF walker's chi tracing is no longer used. Calls `DBLoad.cache_clear()` on success so subsequently constructed Pose / ForceField / Score / Rotamers instances see the new residue immediately. The rotamer JSON file can be generated using this repo
`RMSD(pose1, pose2, alg='align', export='aligned.pdb')`	Computes the Root Mean Squared Deviation between two protein or nucleic acids `Pose` structures using Cα (alpha-carbon) atoms for proteins, or C1 atoms for nulceic acids. Returns the RMSD in (Å). Supported algorithms: `'align'` (sequence alignment + iterative Kabsch), `'kabsch'` (SVD-based optimal rotation), `'quaternion'` (eigenvalue-based optimal rotation), or `'simple'` (translation only, no rotation). Can export the aligned structures to `aligned_1.pdb, aligned_2.pdb`
`BLAST(sequence1, sequence2)`	Perform pairwise protein or nucleic acid sequence alignment using the Smith-Waterman local alignment algorithm with BLOSUM62 substitution scores, matching the statistical model used by NCBI BLASTP. Returns: `(alignment_string, percent_identity, e_value)`
`MSA([sequence1, sequence2, sequence3....])`	Aligns three or more protein or nucleic acid sequences using a ClustalW-like progressive alignment strategy, pairwise distances are computed with `BLAST()`. Returns: `(alignment_string, aligned_list, conservation_list, entropy_list, pssm_array, dca_array)` where `conservation_list` is a per-column score in [0, 1] (1 = fully conserved), `entropy_list` is per-column Shannon entropy in bits, `pssm_array` is a `(L, 20)` log-odds matrix in BLOSUM62 column order (`ARNDCQEGHILKMFPSTWYV`), and `dca_array` is an `(L, L)` APC-corrected mean-field DCA direct-information matrix
`Isoelectric(sequence)`	Calculates the protein's isoelectric point (pI) using the EMBOSS pKa scale and bisection on `[0, 14]`. Takes a protein sequence and returns a float, the pH at which the protein has zero net charge
`Hydrophobicity(sequence, window=9, scale='eisenberg')`	Calculates the hydrophobicity profile from a protein sequence using a sliding window. Supported scales: `'eisenberg'` (default, normalized consensus), `'kyte-doolittle'`, `'hopp-woods'`, `'engelman'`. Returns a tuple of two lists `(positions, scores)` where `positions` are zero-based indices of the window centers, these lists are used to plot the graph
`Aliphatic(sequence)`	Calculates the Aliphatic index of a protein from its sequence (Ikai 1980: `AI = X(A) + 2.9·X(V) + 3.9·(X(I) + X(L))`), returns a float value
`ExtinctCoeff(sequence, reduced=True)`	Calculates the molar extinction coefficient at 280 nm in water (Pace 1995: `ε = nW·5500 + nY·1490 + (nC/2)·125`). With `reduced=True` (default) cysteines are treated as reduced and contribute 0; with `reduced=False` cysteines are treated as cystines and contribute `(nC // 2) · 125`. Returns an int value in M⁻¹ cm⁻¹
`Instability(sequence)`	Calculates the Instability index of a protein (Guruprasad et al. 1990) using the DIWV dipeptide weight table. Returns a float; values below 40 generally indicate a stable protein
`GRAVY(sequence)`	Calculates the Grand Average of Hydropathy using the Kyte-Doolittle hydropathy scale, returns a float value
`Split(pose, chain=None, start=None, end=None)`	Slice a Pose into a new Pose object. Takes the original `pose`, the `chain` if you want to split out an entire chain, or `start, end` if you want to split out a range of monomer residues (zero-based, inclusive). Works for proteins, DNA, and RNA. Atom and residue indices, the bond graph, and coordinates are all renumbered densely from zero in the returned pose
`Concatenate(pose1, pose2, fuse=False)`	Combine two poses of the same Type. With `fuse=False` (default) `pose2` is appended to `pose1` as additional chains, preserving the original coordinates of both poses; chain IDs in `pose2` that collide with `pose1` are renamed to the next free letter. With `fuse=True` the concatenated FASTA is rebuilt as a single continuous polymer with idealised geometry, the original input coordinates are discarded
`PCR(sequence)`	Generates forward and reverse PCR primers for a DNA template (DNA only, accepts only A/C/G/T, template must be ≥ 36 bp). Uses a 5-tier relaxation strategy so that any chemically valid template always returns a primer pair. Ideal tier requires length 18–25, GC 40–60%, nearest-neighbor SantaLucia 1998 Tm in `[55, 65]` °C, a 3' GC clamp, no run of 4 identical bases, no internal palindrome (hairpin), no 3' self-dimer, and \|ΔTm\| ≤ 2 °C. If no pair satisfies it the search falls through progressively relaxed Good / Fair / Poor / Last resort tiers, each widening the length / GC / Tm / ΔTm bounds and dropping the GC clamp / hairpin / dimer gates. When the result comes from any tier below Ideal, a warning is printed to stdout naming the tier and which gates were relaxed (e.g. `Warning: PCR primers are suboptimal (Poor tier), GC% outside 40-60; Tm outside 55-65 °C; GC clamp missing`). Returns a tuple `(forward_string, reverse_string, warning_message_for_suboptimal_primers)`
`Translate(sequence, fmt='protein', organism='ecoli')`	Translates between protein, DNA, and RNA. The input alphabet is auto-detected. Takes a sequence and translates it to the requested `fmt` format. Nucleotide → protein translation uses the standard genetic code and returns `*` for stop codons. Protein → DNA/RNA back-translation is codon-optimised by selecting the highest-frequency codon (deterministic) for the chosen `organism`, which takes `'ecoli'` (default) or `'human'`. Returns the translated sequence as an uppercase string
`PROSITE(sequence, pattern)`	Search a protein sequence for a PROSITE-style pattern. Pattern grammar: `[ABC]` = any of A/B/C, `{ABC}` = any except A/B/C, `x` = any residue, `x(n)` / `x(n,m)` = quantifiers, `A(n)` / `A(n,m)` = repeat literal residues, `<` / `>` = anchor at sequence start/end, `-` = token separator (stripped). Returns a list of tuples `[(start, end, match), ...]` with 1-based, inclusive positions
`HydrogenBondMap(pose)`	Generates a backbone hydrogen-bond donor/acceptor map for a protein pose (proteins only). Uses the same DSSP electrostatic criterion as `p.CalcDSSP()` (Kabsch & Sander 1983: `E < -2.092` kJ/mol). Returns an array of shape `(N_atoms, N_atoms)` where 0 = no bond, 1 = this atom is a donor (backbone N), 2 = this atom is an acceptor (backbone O)
`ContactMap(pose)`	Generates a monomer-monomer distance map in angstroms. The molecule type is auto-detected from `pose.data['Type']`: distances between protein residues are calculated from the Cα atoms, while distances between DNA and RNA bases are calculated from their C1' atoms. Returns an array of shape `(N_residues, N_residues)` with zero on the diagonal
`Rotamers(10, pose)`	Single-amino-acid rotamer packer: snap the residue's backbone (φ, ψ) to the nearest 10° cell of `database.json['Rotamer Library']`, pick the rotamer k* with the largest `P_k` in that cell, and apply its mean χ values to every χ of the residue via `pose.RotateDihedral`. No-op (silent) for residues with no χ atoms (Gly, Ala), residues at chain ends with undefined backbone, and non-canonical residues missing from the library. Handles D-amino acids automatically via lookup at (−φ, −ψ) and μ negation. Derived from the Dunbrack BBDEP2010 rotamer library (CC-BY-4.0)
`Minimise(pose, ff=None, max_steps=500, ftol=1.0, dt_fs=0.1, dt_max_fs=2.0, step_max=0.2, etol=1e-6, stall_k=10, box=None)`	Relax pose coordinates using the FIRE2 algorithm (Guénolé et al. 2020) with a trust-region step limiter that bounds per-atom displacement to `step_max` Å. Mutates `pose.data['Coordinates']` in place. `ftol` is the convergence threshold on max\|force\| in kJ/mol/Å; `dt_fs` is the initial integration step in fs and `dt_max_fs` the adaptive ceiling; `etol` and `stall_k` trigger early stop after K consecutive stalled energy steps. Returns `(final_E, log)` where `log` carries `'energies'`, `'fmax'`, `'max_step'`, `'converged'`, `'n_steps'`
`Anneal(pose, ff=None, n_steps=10000, T_start=2000.0, T_end=10.0, sigma_small=5.0, sigma_large=30.0, p_large=0.2, p_shear=0.5, target_acc=0.30, adapt_window=100, seed=None, box=None)`	Simulated annealing over backbone φ/ψ with two Metropolis move types, single-angle (random φ or ψ) and shear (compensating ψᵢ +Δ / φᵢ₊₁ −Δ that leaves residues 0..i−1 unmoved). Each step picks a small (adaptive `sigma_small`) or large (fixed `sigma_large`) Gaussian perturbation; `sigma_small` is updated by Robbins-Monro every `adapt_window` small moves to track `target_acc` ~ 0.30. Geometric cooling from `T_start` to `T_end`. Returns `(E_best, log)` with `'energies'`, `'temperatures'`, `'accepted'`, `'move_types'` (0=single, 1=shear, 2=invalid), `'sigma_history'`, `'best_step'`. The pose is left at the lowest-energy frame
`Pack(pose, score=None, ff=None, n_steps=2000, T_start=10.0, T_end=0.1, patience=400, seed=None)`	Sidechain repacking via simulated annealing over the full Rotamer Library ensemble at each residue's current backbone (φ, ψ). At construction the candidate set per repackable residue is built once from `database.json['Rotamer Library']` (the full list of (μ_χ tuple, P_k) entries at that residue's grid cell, this can be 3 rotamers for Val, up to ~80 for Lys/Arg). The SA loop picks a random repackable residue, samples one of its rotamers k weighted by `P_k` (so dominant rotamers are explored more often but rare ones remain reachable), applies the trial χ tuple, rescores, and accepts via Metropolis: `dE ≤ 0` or `random() < exp(−dE/T)`. Geometric cooling from `T_start` to `T_end`. Tracks the best-scoring configuration seen and restores it before returning. Early-exit if no acceptance occurs in `patience` consecutive steps. `score` is a reusable `Score` instance; if `None`, one is built from `ff` (or a fresh `ForceField` if `ff` is also `None`). Using `Score` rather than the bare force field matters because the statistical terms (rotamer prior, KBP, reference state) discriminate native-like rotamer choices in a way pure-physics forces cannot. D-amino acids handled automatically. Returns `(E_final, log)` where `log` carries `'energies'`, `'temperatures'`, `'accepts'` (bool array of accept/reject per step), `'best_E'`, `'steps_run'`, `'converged'` (True if early-exited via stagnation), `'n_residues'` (count of repackable residues)
`MolecularDynamics(pose, ff=None, n_steps=1000, dt_fs=2.0, T=300.0, thermostat='nve', friction_ps=1.0, constraints='hbonds', shake_tol=1e-8, shake_max=100, seed=None, trajectory_every=0, box=None)`	Velocity-Verlet NVE or BAOAB Langevin NVT integration. Initial velocities are sampled from Maxwell-Boltzmann at `T` with the centre-of-mass momentum zeroed and projected onto the constraint manifold. `thermostat='nve'` runs energy-conserving dynamics; `thermostat='langevin'` runs the BAOAB stochastic splitting at temperature `T` with friction `friction_ps` ps⁻¹. `constraints='hbonds'` enables vectorised SHAKE/RATTLE on every X–H bond (target lengths read from `database.json['Energy Parameters']`), making `dt_fs=2.0` stable; `constraints='none'` disables them. `trajectory_every=k` saves a coordinate snapshot every k steps. Returns `(final_E, log)` with `'energies'`, `'kinetic'`, `'temperatures'`, `'frames'`, `'n_constraints'`, `'dof'`
`Port('openff')`	Ports the OpenFF Sage 2.3.0, or AMBER ff19SB, or CHARMM36 parameters into database.json ['Energy Parameters'] so you can use these force fields. Also ports REF15 and Autodock Vina to database.json ['Score Parameters']. Arguments are 'openff' or 'ff19sb' or 'charmm36' or 'ref15' or 'vina', and they are the same strings that will be used in `ForceField(name='')` or `Score(name='')`
`Cyclise(mode='head-to-tail', res1=0, atom1='N', res2=5, atom2='C', precoil=True)`	Form an intramolecular bond to make a cyclic peptide. Default `mode='head-to-tail'` amide-bonds the N-terminus to the C-terminus: drops the extra N-terminal hydrogens and the C-terminal OXT, adds the closing C–N bond, re-assigns charges, and records the closure in `data['Cyclic']`. With `precoil=True` (default) it coils the backbone and runs cyclic coordinate descent so the closing bond forms at ~1.33 Å. IMPORTANT: Relax the ring afterwards with `tools.Minimise(p, ff=ForceField())`. Note: `RotateDihedral`/`AdjustDistance` are undefined on a closed ring and must not be used after cyclisation

BLAST handles sequences beyond the 20 canonical L-amino acids automatically: D-amino acids: stored as lowercase letters in pose.data['FASTA']. BLAST uppercases both sequences before alignment, treating each D-amino acid as its L-counterpart for scoring purposes. This correctly reflects the chemical reality that D- and L-forms of the same residue have identical side-chain chemistry. Non-canonical amino acids: any letter not in the 20-letter BLOSUM62 alphabet falls back to: +4 for a self-match (equal to the minimum BLOSUM62 diagonal), −1 for a mismatch. This keeps non-canonical residues visible to the aligner without inflating scores.

MSA handles sequences beyond the 20 canonical L-amino acids, identical to BLAST()

For Parameterise() this is the workflow:

Download the CIF file for the amino acid from the RCSB Chemical Component Dictionary (e.g. https://files.rcsb.org/ligands/download/PTR.cif).
Produce a backbone-dependent rotamer library JSON in Dunbrack BBDEP2010 schema. The nnca_pipeline/scripts/build_*_rotamer.py scripts at the project root generate one from PDB-mining + adaptive KDE + BGMM (THGLab ptm_sc methodology, MIT-licensed). The output JSON must carry method.chi_axes (atom-name 4-tuples), Parameterise reads this as the source of truth for the Chi Angle Atoms field.
Call Parameterise(cif_file, rotamer_json_file, tricode, unicode). A timestamped backup of database.json is created automatically; pass backup=False to opt out.

Key Concepts

Zero-based indexing

All residue and atom indices start at 0, not 1. Residue 0 is the N-terminal amino acid. This is the opposite of PDB convention.

p.Build('MSLESNRGI', chain='A', fmt='protein') # Construct a polypeptide
p.GetDihedral(0, 'PHI')                        # PHI of first residue (index 0)
p.GetDihedral(2, 'chi', 1)                     # CHI 1 of third residue (index 2)
p.GetDistance(0, 'N', 1, 'CA')                 # N of residue 0 to CA of residue 1
p.Build('MSLESNRGI', chain='B', fmt='protein') # Add a second chain

Accessing the data structure directly

p.data['FASTA']              # Sequence string
p.data['Size']               # Number of residues (int)
p.data['Amino Acids'][0]     # [letter, chain, bb_indices, sc_indices, secondary structure, tricode, SASA]
p.data['Atoms'][0]           # [pdb_name, element, charge, occupancy, temp_factor, hybridisation]
p.data['Coordinates']        # Numpy array, shape (N, 3)
p.data['Bonds']              # Adjacency list: {atom_index: [bonded_atom_indices]}

Iterating over residues and atoms:

for idx, aa in p.data['Amino Acids'].items():
    symbol, chain, bb, sc, ss, tricode, sasa = aa
    print(f'Residue {idx}: {tricode} ({symbol}), SS={ss}')

for idx, atom in p.data['Atoms'].items():
    name, element, charge, occupancy, temp, hybrid = atom
    xyz = p.data['Coordinates'][idx]
    print(f'Atom {idx}: {name} ({element}) at {xyz}')

Supported Amino Acids

Uppercase = L-form, lowercase = D-form. All 26 are supported in mixed L/D sequences. Additional amino acids can be added to the database.json file.

The N-terminus is protonated, as expected at physiological pH (~7.4), and therefore exists as a positively charged ammonium group (–NH₃⁺)


A - ALA	B - ALY	C - CYS	D - ASP	E - GLU
F - PHE	G - GLY	H - HIS	I - ILE	J - MSE
K - LYS	L - LEU	M - MET	N - ASN	O - TPO
P - PRO	Q - GLN	R - ARG	S - SER	T - THR
U - SEC	V - VAL	W - TRP	X - TRF	Y - TYR
Z - PTR

Supported Nucleotides

DNA


A - DA	T - DT	C - DC	G - DG

RNA


A - A	U - U	C - C	G - G

Data Structure Reference

Get the content of the structure's JSON object using print(p.data[KEY])

This p.data structure from the Pose() class represents proteins, DNA, and RNA:

Key	Value Type	Description
`Type`	String	Identifies the structure as a protein, DNA, or RNA
`Energy`	Float	Potential energy of the molecule
`Rg`	Float	Radius of gyration
`Mass`	Float	Mass in Daltons
`Size`	Dict	Length of each chain, ie: the number of monomers for each chain
`FASTA`	Dict	One-letter sequence for each chain
`SS`	Dict	One-letter amino acid secondary structure asignments for each chain
`Nucleotides`	Dict	`{index: [symbol, chain, bb_atom_indices, sc_atom_indices, tricode]}`, zero-based indexing
`Amino Acids`	Dict	`{index: [symbol, chain, bb_atom_indices, sc_atom_indices, secondary_struct, tricode, SASA]}`, zero-based indexing
`Atoms`	Dict	`{atom_index: [pdb_name, element, partial charge, occupancy, temp_factor, hybridisation]}`, zero-based indexing
`Bonds`	Dict	Bond graph as adjacency list: `{atom_index: [bonded_atom_indices]}`
`BondOrders`	Dict	Bond order as an adjacency list, 1 = single bonds, 1.5 = aromatic resonance partial-double bond, 2 = double bonds, 3 = triple bonds
`Coordinates`	NumPy array	Shape `(N, 3)`, Cartesian XYZ for each atom

This m.data structure from the Molecule() class represents small organic molecules:

Key	Value Type	Description
`Type`	String	Identifies the structure as a molecule
`Energy`	Float	Potential energy of the molecule
`Rg`	Float	Radius of gyration
`Mass`	Float	Molecule's molecular mass
`SMILES`	Str	The SMILES representation of the molecule as a string
`SMARTS`	Str	The SMARTS representation of the molecile as a string
`SMIRKS`	Str	An atom-mapped SMIRKS-style string that contain hybridisation, formal-charge, and other info about the molecule
'Formula'	Str	The molecular formula of the molecule
`Atoms`	Dict	`{atom_index: [pdb_name, element, partial charge, hybridisation]}`, zero-based indexing
`Bonds`	Dict	Bond graph as adjacency list: `{atom_index: [bonded_atom_indices]}`
`BondOrders`	Dict	Bond order as an adjacency list, 1 = single bonds, 1.5 = aromatic resonance partial-double bond, 2 = double bonds, 3 = triple bonds
`Coordinates`	NumPy array	Shape `(N, 3)`, Cartesian XYZ for each atom

The hybridisation field is one of 's' (hydrogens), 'sp', 'sp2', or 'sp3'.

Each atom's hybridisation is stored as the last element of its atom record: p.data['Atoms'][i][5] for Pose(), m.data['Atoms'][i][3] for Molecule().

database.json overview

Pose ships a single database.json file (~70 MB) under pose/ with four top-level keys:

Top-level key	Purpose
`Amino Acids`	Per-residue topology templates: backbone & sidechain atoms, vectors, bonds, χ angle atoms. Used by `Pose.Build`, `Pose.Mutate`, all dihedral routines
`Nucleotides`	Per-nucleotide topology templates for DNA and RNA
`Rotamer Library`	Backbone-dependent rotamer mixture data (Dunbrack BBDEP2010 derived), multimodal `{P_k, μ_k_χ_c, σ_k_χ_c}` per residue type per 10° (φ, ψ) cell. Consumed by `Score._rotamer_prior`, `tools.Rotamers`, and `tools.Pack`
`Energy Parameters`	Named force-field parameter sets, keyed by force-field name. Two ship today `openFF` (Sage 2.3.0 small-molecule FF, CC-BY-4.0, plus NAGL AM1-BCC neural-network weights for charges) and `Default` (deterministic smoke-test that exercises every potential method on a calibrated 100.00 kJ/mol anchor). Each named block carries SMIRKS-keyed `Bonds`, `Angles`, `UB`, `ProperTorsions`, `ImproperTorsions`, `vdW`, `Electrostatic`, `Constraints`, `cmap`, plus a global `Constants` dict and a `Terms` list that selects which potential methods to evaluate. The `openFF` block additionally carries an `AM1BCC` sub-key with base64-encoded float32 NAGL weights. All numerical values in kJ/mol (Å, degrees). See "Description of energy parameters in database.json" below
`Score Parameters`	Named score parameter sets, keyed by score function name. `Default` (deterministic smoke-test that exercises every term method on a calibrated 100.00 kJ/mol anchor)

The whole file is loaded once per Python process via the cached module-level loader pose.DBLoad()

Description of amino acids in database.json:

This information resides in database['Amino Acids'][AMINO_ACID_UNICODE or BACKBONE]

Dictionary Key	Value Type	Description of Values
`Vectors`	List of lists	The position of each atom relative to the N of the backbone. If the N coorinate is X, Y, Z = 0, 0, 0 you will get these vectors. To find the correct vectors position the N at coordinate X, Y, Z = 0, 0, 0, and use the corresponding coordinates of each atom
`Tricode`	String	The three letter code for each amino acid, `[0]` for the L-AA tricode `[1]` for the D-AA tricode
`Fused`	Boolian	True = the sidechain is fused to the backbone
`Backbone Atoms` or `Sidechain Atoms`	List of lists	The atom identity of each coordinate point, for example: first coordinate point is the nitrogen with symbol N and PDB entry N, next atom is the hydrogen that is bonded to the nitrogen with symbol H and PDB entry 1H etc... Unlike the PDB where all hydrogens are collected after the amino acid, here each atom's hydrogens come right after it. This makes for easier matrix operations. Order is index [0] = PDB atom's name, index [1] = element, index [2] = partial charge, index [3] = occupancy, index [4] = temperature factor, index [5] = hybridisation
`Chi Angle Atoms`	List of lists	The atoms in the sidechain that are contributing to a chi angle
`Bonds`	Dictionary	The bond graph as an adjacency list
`BondOrders`	Dictionary	The bond order graph as an adjacency list, 1 = single bonds, 1.5 = aromatic resonance partial-double bond, 2 = double bonds, 3 = triple bonds

Backbone-dependent rotamer data (formerly carried as a per-amino-acid BBDEP field on this object) now lives in the separate Rotamer Library top-level block, see "Description of the Rotamer Library in database.json" below.

Description of nucleotides in database.json:

This information resides in database['Nucleotides'][NUCEOTIDE_TRICODE]

Dictionary Key	Value Type	Description of Values
`Vectors`	List of lists	The position of each atom relative to the N of the backbone. If the N coorinate is X, Y, Z = 0, 0, 0 you will get these vectors. To find the correct vectors position the N at coordinate X, Y, Z = 0, 0, 0, and use the corresponding coordinates of each atom
`Tricode`	String	The three letter code for each nucleotide
`Type`	String	Identify as `DNA` or `RNA`
`Backbone Atoms`	List of lists	The atom identity of each backbone coordinate point, first coordinate point is the phosphorus with symbol P and PDB entry P, next atom is the oxygen atom that is bonded to the phosphorus with symbol O and PDB entry OP1 etc... Order is index [0] = PDB atom's name, index [1] = element, index [2] = partial charge, index [3] = occupancy, index [4] = temperature factor, index [5] = hybridisation
`Base Atoms`	List of lists	The atom identity of each nitrogen base coordinate point. Order is index [0] = PDB atom's name, index [1] = element, index [2] = partial charge, index [3] = occupancy, index [4] = temperature factor, index [5] = hybridisation
`Chi Angle Atoms`	List of lists	The atoms in the sidechain that are contributing to a chi angle
`Bonds`	Dictionary	The bond graph as an adjacency list
`BondOrders`	Dictionary	The bond order graph as an adjacency list, 1 = single bonds, 1.5 = aromatic resonance partial-double bond, 2 = double bonds, 3 = triple bonds

Description of the Rotamer Library in database.json:

This information resides in database['Rotamer Library'] and is consumed by Score(), tools.Rotamers(), and tools.Pack(). Derived from the Dunbrack BBDEP2010 rotamer library (Shapovalov & Dunbrack 2011, CC-BY-4.0). Currently covers 24 chi-bearing residue types: ALY, ARG, ASN, ASP, CYS, GLN, GLU, HIS, ILE, LEU, LYS, MET, MSE, PHE, PRO, PTR, SEC, SER, THR, TPO, TRF, TRP, TYR, VAL. Glycine and Alanine carry no χ angles and need no entry.

Top-level shape:

Key	Value Type	Description
`format`	String	Format identifier (currently `"rot_v1"`)
`version`	Int	Schema version
`phi_start`	Float	First (φ, ψ) bin lower edge in degrees, default `-180`
`phi_step`	Float	(φ, ψ) bin width in degrees, default `10`
`phi_n`	Int	Number of φ bins, default `36`
`psi_start`	Float	(analogous)
`psi_step`	Float	(analogous)
`psi_n`	Int	Number of ψ bins, default `36`
`density_grids`	List	Per-residue total-density grids (auxiliary, not consumed by current code paths)
`residues`	Dict	Per-residue rotamer mixture data, keyed by 3-letter code

Per-residue entry, database['Rotamer Library']['residues'][TRICODE]:

Key	Value Type	Description
`n_chi`	Int	Number of χ angles for this residue type (1 for VAL/SER/THR/CYS/SEC, 2 for LEU/ILE/PHE/TYR/TRP/HIS/ASN/ASP/PRO/MSE/TRF, 3 for MET/GLN/GLU, 4 for ARG/LYS)
`rotamers`	Dict	The CSR-packed rotamer mixture (see below)
`densities`	List/None	Optional per-cell density auxiliary; usually `null`

Per-residue rotamers block, CSR-packed for compactness:

Key	Value Type	Description
`columns`	List of strings	Column schema for each row of `table`, e.g. for VAL: `['count', 'prob', 'chi1', 'sig1']`; for LEU (n_chi=2): `['count', 'prob', 'chi1', 'chi2', 'sig1', 'sig2']`; in general `[count, prob, chi1..chiN, sig1..sigN]`
`table`	List of rows	Flat list of all rotamer rows across every (φ, ψ) cell. Each row matches `columns`: the empirical observation `count`, the per-cell normalised probability `prob` (sums to 1 within a cell), the per-rotamer mean χ values in degrees, and the per-rotamer χ standard deviations in degrees
`bin_offsets`	List of ints	CSR indexing, length `phi_n × psi_n + 1 = 1297` for the default 36×36 grid. Cell `(i_phi, i_psi)` is at `bin_idx = i_phi × psi_n + i_psi`, and its rotamer rows are `table[bin_offsets[bin_idx] : bin_offsets[bin_idx+1]]`
`top_chi`	List	Optional precomputed (most-probable rotamer mean χ values per cell) lookup, currently unused by `Score._rotamer_prior` / `Rotamers` / `Pack` (which slice `table` directly)

Lookup pattern (used internally by Rotamers, Pack, and Score._rotamer_prior):

D-amino acid handling: the library is keyed on the L-form 3-letter code only. Consumers fetch the L cell at (−φ, −ψ) and negate the recovered μ values when applying them, exploiting the chi/Ramachandran mirror symmetry between enantiomers.

Description of energy parameters in database.json:

database['Energy Parameters'][NAME] is a dict of named force-field parameter sets. Two ship today: openFF (production small-molecule FF, derived from OpenFF Sage 2.3.0, CC-BY-4.0) and Default (deterministic smoke-test FF). ForceField(name='openFF') and ForceField(name='Default') (or any case variant) select between them. ForceField() with no argument defaults to 'Default'. Each named block has the same nested schema:

Sub-key	Value Type	Description
`Constants`	Dict	Global constants. `epsilon_r` is the relative dielectric (default 1.0); `f_lj` and `f_elec` are the 1-4 non-bonded scaling factors (LJ = 0.5, electrostatics = 5/6)
`Constraints`	Dict	SMIRKS-keyed bond constraints. Sage 2.3.0 declares every X–H bond as constrained (`[#1:1]-[*:2]`), making them rigid under SHAKE/RATTLE. Constrained bonds get `K_b = 0` at compile (zero bond-stretch energy); equilibrium `r_0` is preserved for MD constraint solving. `Default` ships an empty `Constraints` dict
`Bonds`	Dict	SMIRKS-keyed harmonic bond parameters. Each value is `{id?, r_0, K_b}`, equilibrium length in Å, force constant in kJ/mol/Å². Stored using the upstream `E = ½K(r − r₀)²` convention (the ½ factor is absorbed into `K_b` at compile time, so the potential evaluates `Σ K_b·(r − r₀)²` with the absorbed `K_b`)
`Angles`	Dict	SMIRKS-keyed harmonic angle parameters. Each value is `{id?, theta_0, K_theta}`, equilibrium angle in degrees, force constant in kJ/mol/rad². Same ½-factor convention as bonds
`UB`	Dict	SMIRKS-keyed Urey-Bradley 1-3 stretching parameters. Each value is `{id?, s_0, K_ub}`, equilibrium 1-3 distance in Å, force constant in kJ/mol/Å². Optional in any FF; the potential evaluates to 0 if the section is empty or returns no matches
`ProperTorsions`	Dict	SMIRKS-keyed proper-torsion Fourier components. Each value is `{id?, components: [{n, phi_0, K_phi, idivf}, ...]}`, multiplicity (int ≥ 1), phase in degrees, barrier height in kJ/mol, divisor factor (typically 1.0). The potential evaluates `Σ K_phi·(1 + cos(n·φ − φ₀)) / idivf` per component
`ImproperTorsions`	Dict	SMIRKS-keyed improper torsions, trefoil-expanded at compile time into the three cyclic permutations of the outer atoms (each contributing `K_phi / 3`). Same component shape as `ProperTorsions`. Evaluated as harmonic or Fourier depending on the `alg` field of the method's entry in `Terms`
`vdW`	Dict	SMIRKS-keyed Lennard-Jones parameters. Each value is `{id?, epsilon, r, alpha?, sigma?}`, well depth in kJ/mol, half-min-distance `r` in Å (or `sigma` directly), optional atomic polarisability `alpha` in Å³ co-keyed with the same SMIRKS. Sigma is derived as `r * 2 / 2^(1/6)` when not explicitly given. `alpha` defaults to 0; when 0 for every atom, `PolarisationPotential` evaluates to 0
`Electrostatic`	Dict	SMIRKS-keyed library charges (water, ions, Xe). Each value is `{id?, q: [c₀, c₁, …]}`, literal partial charges in elementary-charge units, one per tagged atom in the SMIRKS pattern. Library charges take priority over NAGL inference on matched atoms
`CMAP`	Dict	SMIRKS-keyed φ/ψ backbone correction grids, keyed by one-letter amino-acid code. Each value is a 24×24 list-of-lists. The potential evaluates by bicubic Catmull-Rom interpolation; residues whose code is missing from the dict contribute 0. `cmap` is unused on `Molecule` poses (the cache stays empty)
`Terms`	List	Ordered list of potential methods to evaluate, each entry is `[method_name, kwargs_dict]`. Example: `["BondPotential", {"alg": "harmonic"}]`. `ForceField.__call__` iterates this list and dispatches to each named method on `ForceField`
`AM1BCC`	Dict	(only under `openFF`) NAGL graph-neural-network weights for AM1-BCC partial-charge prediction. `gcn_layers[0..5]` (each with `fc_neigh_w`, `fc_self_w`, `fc_self_b`) and `readout` (`linear_0_w/b`, `linear_1_w/b`). Each weight tensor is `{shape, data}` where `data` is base64-encoded float32 bytes, bit-exact NAGL float32 inference, ~13 MB total. FFs without this sub-key (e.g. `Default`) skip NAGL inference

Field name conventions, keys follow physics-textbook naming so the schema reads as formulas: r_0 (equilibrium bond length), K_b (bond force constant), theta_0/K_theta (angles), s_0/K_ub (Urey-Bradley), n/phi_0/K_phi (torsion multiplicity / phase / barrier height), r (vdW half-min-distance), q (literal partial charges). The optional id field on openFF entries carries the upstream Sage identifier (e.g. 'b1', 'a1', 't1'); Default entries omit it.

Default (smoke-test / regression FF, 9 terms), one broad-wildcard SMIRKS per section ([*:1]~[*:2] for bonds, [*:1]~[*:2]~[*:3] for angles + UB, etc.). All linear coefficients (K_b, K_theta, K_ub, K_phi, vdW.epsilon, CMAP grid values) were uniformly calibrated so that ForceField()(Pose().Build('AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz')) returns 100.00 kJ/mol exactly (within float64 ulps). Electrostatic.q = [0.0] and vdW.alpha = 0 by design, so ElectrostaticPotential and PolarisationPotential both evaluate to 0, this isolates the calibration from quadratic charge-dependent terms. CMAP carries a 24×24 constant grid for every A-Z one-letter code. Default is not for production; its only purpose is to drive every potential method, every cache path, and every dispatch branch in ForceField through a deterministic check.

Description of score parameters in database.json:

database['Score Parameters'][NAME] is a dict of named score-function parameter sets. The canonical default set is Default with the following parameters:

Sub-key	Value Type	Description
`Constants`	Dict	Global constants used by multiple terms. Distance cutoffs (`fa_max_dis`, `fa_elec_max_dis`, `fa_elec_min_dis`, `fa_atr_short`, `fa_atr_long`, `cutoff`), dielectric model (`eps_core`, `eps_solvent`, `sigmoidal_D`, `sigmoidal_D0`, `sigmoidal_S`), Coulomb constant (`coulomb_C0`), 1-N connectivity scaling (`connectivity_weight = {'3': 0.0, '4': 0.2, '5+': 1.0}`), and small-molecule affinity scaling (`scale`, `per_residue`, `nrot_w`, `glue_w`). Distances in Å, Coulomb constant in kcal·Å/e²
`Atom_types`	Dict	154 full-atom types keyed by code (`CNH2`, `COO`, `CAbb`, `Nbb`, `Hpol`, …). Each value carries `element`, `LJ_RADIUS` (Å), `LJ_WDEPTH` (kcal/mol), `LK_DGFREE` (kcal/mol), `LK_LAMBDA` (Å), `LK_VOLUME` (Å³), plus boolean `acceptor`/`donor` flags and HBond chemical-type tags. Drives per-atom typing for the pair, solvation, electrostatic, and HBond terms
`XS_atom_types`	Dict	31 small-molecule atom types keyed by code (`C_H`, `C_P`, `N_D`, `N_A`, `N_DA`, `O_A`, `O_DA`, `S_P`, halogens, metals). Each value carries `radius` (Å) and boolean `hydrophobic`/`acceptor`/`donor`. Drives ligand atom typing for the Gauss/Repulsion/Hydrophobic/HBond ligand-pair terms
`Residue_types`	Dict	21 canonical amino-acid templates keyed by 3-letter code. Each carries a per-atom mapping `{N: {type, mm_type, charge}, CA: {…}, …}`. Resolves protein-pose atom names to full-atom types and partial charges
`METHOD_WEIGHTS_ref`	List	20-float array of per-amino-acid reference free energies, indexed by 1-letter code in alphabetical order (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y). Consumed by `RefPotential` after multiplying by `Ref.weight`
`FaAtr`	Dict	`{weight}`. Pairwise Lennard-Jones attractive component with long-range smoothing between `fa_atr_short` and `fa_atr_long`
`FaRep`	Dict	`{weight}`. Pairwise Lennard-Jones repulsive component, dominant under the LJ minimum
`FaSol`	Dict	`{weight}`. Pairwise Lazaridis-Karplus implicit-solvent term using `LK_DGFREE`, `LK_LAMBDA`, `LK_VOLUME` from `Atom_types`
`FaIntraRep`	Dict	`{weight}`. Within-residue Lennard-Jones repulsive term over 1-4 and longer pairs
`FaIntraAtr`	Dict	`{weight}`. Within-residue LJ attractive (declared; not dispatched by `Default`'s `Terms`, reserved for restricted named subsets)
`FaIntraSolXover4`	Dict	`{weight}`. Within-residue solvation with crossover handling at 1-4 pairs
`FaElec`	Dict	`{weight}`. Pairwise Coulomb electrostatics with sigmoidal distance-dependent dielectric (`coulomb_C0`, `sigmoidal_D`, `sigmoidal_D0`, `sigmoidal_S`)
`LkBallIso`	Dict	`{weight}`. Isotropic Lazaridis-Karplus solvation (declared; folded into `LkBallWtd` at runtime)
`LkBallWtd`	Dict	`{weight}`. Lazaridis-Karplus solvation with weighted directional water sites placed geometrically per donor/acceptor
`LkBallBridge`	Dict	`{weight}`. Bridging-water adjustment (declared; folded into `LkBallWtd` at runtime)
`FaDun`	Dict	`{weight}`. Backbone-dependent rotamer score, `-log P(χ \| φ, ψ)` from bilinear interpolation over rotamer-well μ/σ grids plus a cyclic cubic spline in the terminal χ for semi-rotameric residues
`RamaPreProTerm`	Dict	`{weight}`. Ramachandran φ/ψ probability with pre-proline special-casing, Catmull-Rom bicubic interpolation, entropy correction
`PAaPp`	Dict	`{weight}`. Shapovalov amino-acid probability at given (φ, ψ), Catmull-Rom bicubic interpolation
`Omega`	Dict	`{weight}`. Peptide-bond ω-torsion harmonic penalty around 180° (and 0° for cis)
`ProClose`	Dict	`{weight}`. Proline-ring closure penalty
`DslfFa13`	Dict	`{weight}`. Disulfide-bond geometry constraint (Cβ–Sγ–Sγ′–Cβ′ torsion plus Cα–Cβ–Sγ angles)
`YhhPlanarity`	Dict	`{weight}`. Tyrosine hydroxyl rotamer-planarity penalty
`Ref`	Dict	`{weight}`. Per-AA reference-energy multiplier; the residue baseline comes from `METHOD_WEIGHTS_ref`
`HBondSrBb`	Dict	`{weight}`. Short-range (helix) backbone-backbone hydrogen bond
`HBondLrBb`	Dict	`{weight}`. Long-range (β-sheet) backbone-backbone hydrogen bond
`HBondBbSc`	Dict	`{weight}`. Backbone-sidechain hydrogen bond
`HBondSc`	Dict	`{weight}`. Sidechain-sidechain hydrogen bond
`Gauss1`	Dict	`{offset, width, cutoff, weight}`. Gaussian attractive pair term centred at `offset = 0 Å` with half-width `width = 0.5 Å`, evaluated up to `cutoff = 8 Å`; pair selection by `XS_atom_types`
`Gauss2`	Dict	`{offset, width, cutoff, weight}`. Second Gaussian, `offset = 3 Å`, `width = 2 Å`
`Repulsion`	Dict	`{offset, cutoff, weight}`. Overlap-only penalty: positive when atoms penetrate inside `offset`, zero otherwise; pair selection by `XS_atom_types`
`Hydrophobic`	Dict	`{good, bad, cutoff, weight}`. Slope-step bonus over hydrophobic-hydrophobic pairs (XS types marked `hydrophobic`); linear ramp from `good` (full bonus) to `bad` (zero)
`HBond`	Dict	`{good, bad, cutoff, weight}`. Slope-step bonus over donor-acceptor pairs (XS types marked `donor`/`acceptor`/`_DA`); same ramp shape as `Hydrophobic`
`CartBonded`	Dict	`{weight}`. Cartesian bond/angle/torsion deviation penalty (declared; not dispatched by `Default`'s `Terms`)
`DefaultOffset`	Dict	`{weight}`. Calibration anchor: contributes a constant per residue, scaled by `scale / per_residue` from `Constants`. Used as the regression sentinel for `Default` (see calibration paragraph)
`Terms`	List	Ordered list of `[method_name, kwargs_dict]` pairs to evaluate. `Score.__call__` iterates this list and dispatches to each named method on `Score`. The `Default` set carries 25 entries — every potential method in the framework

Default (smoke-test / regression set, 25 dispatched terms), contains dummy parameter values, it is used as a smoke-test to ensure that all Score() methods are correctly working. Score()(Pose().Build('AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz')) returns 100.00 exactly.

Community & Contributions

Contributions are welcome! Open an issue or pull request on GitHub, or just email me.

Chat with users and contributors in real time: IRC: #pose channel on the irc.libera.chat network, Or use the Libera web chat, no install needed.

Come ask questions, share what you've built with Pose, or discuss contributions.

How to Cite

If Pose is useful in your research, please cite it. The repository ships a CITATION.cff file at the project root with the canonical citation metadata; GitHub's "Cite this repository" button and most reference managers (Zotero, Mendeley) can import it directly. The current entry is:

Sabban, S. Pose: A bare metal Python library for building and manipulating protein molecular structures. 2023. https://github.com/sarisabban/Pose (ORCID: 0000-0002-9621-2395)

License

Pose is released under the Apache License, Version 2.0. The full licence text lives in the LICENSE file at the project root, and per Apache-2.0 convention a NOTICE file records the copyright and attribution.

SPDX-License-Identifier: Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 408 Commits
pose		pose
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Pose

Video Tutorial

What is Pose?

Installation

Quick Start

API Reference

Call Class

Building & I/O

Measurements

Manipulation

Force Field & Energy Score

Tools

Key Concepts

Zero-based indexing

Accessing the data structure directly

Supported Amino Acids

Supported Nucleotides

DNA

RNA

Data Structure Reference

database.json overview

Description of amino acids in database.json:

Description of nucleotides in database.json:

Description of the Rotamer Library in database.json:

Description of energy parameters in database.json:

Description of score parameters in database.json:

Community & Contributions

How to Cite

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages