add gromacs to frames node#463
Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughAdds a GROMACS→ASE converter function ChangesGROMACS-to-ASE Conversion & ZnTrack Node
Sequence DiagramsequenceDiagram
actor User
participant AddDataGMX as AddDataGMX\n(ZnTrack Node)
participant gmx_to_ase as gmx_to_ase\n(Function)
participant MDAnalysis as MDAnalysis\n(Universe / Trajectory)
participant EDR as EDR Parser
participant ASE as ASE\n(Atoms + SinglePointCalculator)
participant HDF5 as HDF5\n(frames.h5)
User->>AddDataGMX: run()
AddDataGMX->>gmx_to_ase: gmx_to_ase(topology, trajectory, edr, start, stop, step)
gmx_to_ase->>MDAnalysis: load topology & trajectory
MDAnalysis-->>gmx_to_ase: Universe & timesteps
loop per selected timestep
gmx_to_ase->>MDAnalysis: read frame (positions, box, velocities)
alt edr provided
gmx_to_ase->>EDR: load EDR & find nearest sample by time
EDR-->>gmx_to_ase: energy, pressure terms, all EDR terms
gmx_to_ase->>gmx_to_ase: build 6-component stress vector
end
gmx_to_ase->>ASE: create Atoms, set positions/cell/pbc/velocities
gmx_to_ase->>ASE: attach SinglePointCalculator (energy, forces, stress, results)
ASE-->>gmx_to_ase: Atoms frame
end
gmx_to_ase-->>AddDataGMX: list[Atoms]
AddDataGMX->>HDF5: persist frames via znh5md.IO.extend()
HDF5-->>AddDataGMX: confirmation
User->>AddDataGMX: .frames
AddDataGMX->>HDF5: read frames
HDF5-->>User: list[Atoms]
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)
134-134: Consider warning when EDR and trajectory times diverge significantly.The nearest-time matching via
argminsilently succeeds even when the time difference is large (e.g., different output frequencies). Consider logging a warning if the matched time differs by more than a small tolerance.Example tolerance check
idx = np.argmin(np.abs(edr_times - ts.time)) time_diff = abs(edr_times[idx] - ts.time) if time_diff > 0.1: # ps, adjust as appropriate warnings.warn(f"EDR time {edr_times[idx]:.3f} differs from trajectory time {ts.time:.3f} by {time_diff:.3f} ps")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` at line 134, The nearest-time matching using idx = np.argmin(np.abs(edr_times - ts.time)) can silently match large time gaps; in the function or loop where that line appears (e.g., the trajectory frame handling block in add_data_gromacs.py), compute time_diff = abs(edr_times[idx] - ts.time) after selecting idx and if time_diff exceeds a small tolerance (e.g., 0.1 ps, configurable) emit a warning (warnings.warn or your logger.warn) indicating the EDR time, trajectory time, and the difference so users know the outputs are misaligned; make the tolerance configurable or documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 211: The run method in the class defined in add_data_gromacs.py is
missing a return type annotation; update the def run(self): signature to include
the appropriate return type (e.g., -> None or the actual return type used) so it
conforms with the project's full type annotation guideline, and update any
related docstring or callers if necessary; locate the method by the symbol run
in that file and add the correct type hint.
- Around line 202-207: The annotated types for node attributes are too narrow:
change trajectory and edr from Path to Optional[Path] and start, stop, step from
int to Optional[int] to reflect that zntrack.deps_path(None) /
zntrack.params(None) can return None; update the import to include
typing.Optional (e.g., from typing import Optional) and modify the annotations
for trajectory, edr, start, stop, and step accordingly while leaving topology as
Path.
- Around line 239-240: The example prints may raise an IndexError and shows the
wrong unit: guard access to frames[1] by checking len(frames) >= 2 before
referencing frames[1].calc.results (or else reference frame 0 consistently), and
fix the unit label by using the actual energy units returned by
frames[0].get_potential_energy() (or an energy_unit variable if present) instead
of hardcoding "kJ/mol" (use "eV" if energies are converted to eV); update the
two print lines to conditionally print the second-frame EDR terms only when a
second frame exists and to display the correct units from the energy
value/source.
- Line 135: The energy read from the EDR file (assigned to variable energy in
the line energy = float(edr_data["Potential"][idx])) is in kJ/mol and must be
converted to eV before passing to ASE's SinglePointCalculator; change the code
to convert from kJ/mol to eV (using ase.units, e.g. multiply/divide by units.kJ,
units.mol and units.eV) and supply the converted value to the
SinglePointCalculator so downstream ASE methods return correct energies.
---
Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 134: The nearest-time matching using idx = np.argmin(np.abs(edr_times -
ts.time)) can silently match large time gaps; in the function or loop where that
line appears (e.g., the trajectory frame handling block in add_data_gromacs.py),
compute time_diff = abs(edr_times[idx] - ts.time) after selecting idx and if
time_diff exceeds a small tolerance (e.g., 0.1 ps, configurable) emit a warning
(warnings.warn or your logger.warn) indicating the EDR time, trajectory time,
and the difference so users know the outputs are misaligned; make the tolerance
configurable or documented.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9d200379-0275-4ef7-a180-661cd8621105
📒 Files selected for processing (1)
ipsuite/data_loading/add_data_gromacs.py
| # GROMACS pressure in bar -> store as-is (not ASE native eV/ų) | ||
| stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) |
There was a problem hiding this comment.
Stress units mismatch: GROMACS pressure is in bar, ASE expects eV/ų.
The comment acknowledges the unit discrepancy, but storing stress in bar breaks ASE's stress conventions. Any code calling atoms.get_stress() will receive values in the wrong units, leading to incorrect virial/pressure calculations downstream.
Convert bar to eV/ų: 1 bar = 1e-4 GPa = 6.2415e-7 eV/ų (use ase.units for precision).
Proposed fix
+from ase import units
+
+# bar -> eV/ų
+BAR_TO_EV_ANG3 = 1.0 / (units.bar / (units.eV / units.Ang**3))
+
...
- # GROMACS pressure in bar -> store as-is (not ASE native eV/ų)
- stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])
+ # Convert GROMACS pressure (bar) to ASE stress (eV/ų)
+ stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * BAR_TO_EV_ANG3|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py`
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)
156-165:⚠️ Potential issue | 🟠 MajorConvert the pressure tensor before assigning
stress.ASE works in eV-based units, and its pressure/stress quantities are represented in
eV/ų; passing the rawPres-*values here keepsatoms.get_stress()in bar instead. If you want to preserve the native tensor too, keep the bar numbers only under thePres-*keys and convert the array you pass asstress. (ase-lib.org)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 156 - 165, The code currently assigns raw GROMACS Pres-* values (in bar) directly to stress, which leaves ASE seeing stress in bar; instead, convert those pressure components to ASE internal units (eV/ų) before assigning to the stress array. Locate the block reading edr_data["Pres-XX"/"Pres-YY"/..."] and replace the direct assignment to stress with a conversion step (keep the raw bar values in edr_data if you need them) — e.g., compute a conversion factor or use ASE unit helpers (ase.units / ase.units.convert) to convert from bar to eV/ų and then set stress = np.array([pxx_converted, pyy_converted, pzz_converted, pyz_converted, pxz_converted, pxy_converted]) so atoms.get_stress() returns values in ASE units.
🧹 Nitpick comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)
210-219: Make the node example runnable end-to-end.Right now the example stops after construction, so it isn't testable in the same way as the other data-loading nodes. Add
project.repro()and one observable result/assertion.📝 Suggested docstring tweak
>>> with project: ... md = ips.Gmx2Frames( ... topology="gromacs/system.gro", ... trajectory="gromacs/production.xtc", ... edr="gromacs/production.edr", ... start=1, ... ) + >>> project.repro() + >>> len(md.frames) > 0 + TrueAs per coding guidelines: "Each Node’s docstring should include a testable example using
projectandips".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 210 - 219, The docstring example for Gmx2Frames stops after construction and isn't runnable; update it to call project.repro() and assert one observable outcome so it executes end-to-end. Specifically, after constructing md = ips.Gmx2Frames(...), invoke project.repro() and then check a result such as verifying md.frames (or another public attribute/method on Gmx2Frames) has expected length/type (e.g., assert len(md.frames) > 0 or isinstance(md.frames, list/ndarray)) so the example both runs and asserts behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 60-73: _match_edr_frame currently scans the whole edr_times array
per frame which is O(n×m) and silently returns an out-of-tolerance index;
replace the np.argmin approach with np.searchsorted on the monotonic edr_times
(or use a moving pointer maintained by the caller) to find the nearest candidate
on the left/right, compute the nearest index and time_diff, and if time_diff >
tolerance reject the match (raise a ValueError or return a sentinel like
None/-1) instead of returning a bad index; update callers to handle the
rejection and keep the function signature (_match_edr_frame(edr_times,
traj_time, tolerance)) and logging behavior.
- Around line 147-179: ts.forces are left in MDAnalysis units (kJ/(mol·Å)) but
ASE expects eV/Å; multiply the copied forces by the same conversion factor used
for energy (the (kJ / mol) factor) before passing into SinglePointCalculator.
Update the code that sets forces (the ts.forces.copy() branch) to convert units
(e.g., forces = ts.forces.copy() * (kJ / mol) when ts.has_forces) so the forces
variable passed to SinglePointCalculator(atoms, energy=..., forces=forces,
stress=...) is in eV/Å.
- Around line 44-57: The code currently derives element symbols from
u.atoms.types which misclassifies CHARMM/AMBER types; change the logic to use
atom names and MDAnalysis's element guessing instead: use u.atoms.names (or call
u.guess_TopologyAttrs(to_guess=['elements']) and read u.atoms.elements) and
build the symbols list from those guessed elements (or rely on
MDAnalysis.DefaultGuesser behavior) rather than transforming types; update the
block that references types, t, and symbols to use names/elements (keeping the
same return of symbols) and remove the manual type-based heuristics.
---
Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 156-165: The code currently assigns raw GROMACS Pres-* values (in
bar) directly to stress, which leaves ASE seeing stress in bar; instead, convert
those pressure components to ASE internal units (eV/ų) before assigning to the
stress array. Locate the block reading edr_data["Pres-XX"/"Pres-YY"/..."] and
replace the direct assignment to stress with a conversion step (keep the raw bar
values in edr_data if you need them) — e.g., compute a conversion factor or use
ASE unit helpers (ase.units / ase.units.convert) to convert from bar to eV/ų
and then set stress = np.array([pxx_converted, pyy_converted, pzz_converted,
pyz_converted, pxz_converted, pxy_converted]) so atoms.get_stress() returns
values in ASE units.
---
Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 210-219: The docstring example for Gmx2Frames stops after
construction and isn't runnable; update it to call project.repro() and assert
one observable outcome so it executes end-to-end. Specifically, after
constructing md = ips.Gmx2Frames(...), invoke project.repro() and then check a
result such as verifying md.frames (or another public attribute/method on
Gmx2Frames) has expected length/type (e.g., assert len(md.frames) > 0 or
isinstance(md.frames, list/ndarray)) so the example both runs and asserts
behavior.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3211bbb4-e6f9-44fd-bc99-ce5c595a7702
📒 Files selected for processing (1)
ipsuite/data_loading/add_data_gromacs.py
* 📝 Add docstrings to `addGmx2FramesNode` Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py` * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Julian Peters <108301472+MrJulEnergy@users.noreply.github.com>
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 1 file(s) based on 2 unresolved review comments. A stacked PR containing fixes has been created.
Time taken: |
Fixed 1 file(s) based on 2 unresolved review comments. Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)
178-189:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStress is still stored in bar, not ASE's
eV/ų.The added comment acknowledges the mismatch, but anything calling
atoms.get_stress()will silently get wrong-unit values. Either convert (bar → eV/ųviaase.units) or do not attachstress=and keep the raw pressure terms only insideextra_results.Proposed fix
-from ase.units import kJ, mol +from ase.units import Ang, bar, eV, kJ, mol @@ - # GROMACS pressure in bar -> store as-is (not ASE native eV/ų) - stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) + # GROMACS pressure in bar -> ASE stress in eV/ų + bar_to_eV_per_A3 = bar / (eV / Ang**3) + stress = ( + np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * bar_to_eV_per_A3 + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 178 - 189, The code is attaching GROMACS pressures (in bar) as an ASE-style stress array (variable stress built from edr_data[...] and idx) which will make atoms.get_stress() return wrong units; either convert the values to ASE's eV/ų before attaching or stop attaching them as stress and instead store the raw pressures in extra_results. Fix by one of two options: (A) import ase.units and apply the bar→eV/ų conversion to the stress array (convert the built stress variable) before assigning it to the Atoms object so atoms.get_stress() is correct, or (B) remove setting atoms.stress/stress= and instead place the raw pxx/pyy/... array into the molecule/atoms.extra_results (e.g., "gromacs_pressure_bar") so callers can convert explicitly; update references to atoms.get_stress() usage accordingly.
🧹 Nitpick comments (2)
ipsuite/data_loading/add_data_gromacs.py (2)
252-283: 💤 Low value
runandframesdocstrings use Google style, not NumPy.The two methods use Google-style sections (
Returns:/ inlineParameters:). Per the project guideline, docstrings must be in NumPy style with----underlines.Proposed fix
def run(self) -> None: - """ - Convert the configured GROMACS inputs into ASE Atoms frames and persist them - to the node's HDF5 output at self.frames_path. - - The node's topology, optional trajectory, optional EDR file, and slicing - parameters (start, stop, step) are used to produce the frames which are - written to the frames_path via znh5md. - """ + """Convert configured GROMACS inputs into ASE frames and persist them. + + The frames are written to ``self.frames_path`` via ``znh5md``. + """ @@ `@property` def frames(self) -> typing.List[Atoms]: - """ - Return all ASE `Atoms` frames stored in the node's HDF5 frames file. - - Returns: - typing.List[Atoms]: A list of ASE `Atoms` objects read - from the HDF5 file at `self.frames_path`. - """ + """Return all ASE ``Atoms`` frames stored in the HDF5 frames file. + + Returns + ------- + list[Atoms] + Frames read from ``self.frames_path``. + """As per coding guidelines: "Write docstrings in NumPy style".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 252 - 283, Update the docstrings for the methods run and frames to NumPy style: replace the Google-style "Returns:" and inline parameter descriptions with NumPy sections using section headers followed by underlines (e.g., "Returns\n-------") and format parameter descriptions under a "Parameters\n----------" block where applicable; keep the existing prose but reformat the return type line in frames to "Returns\n-------\ntyping.List[Atoms]\n description..." and do the same for run's description so both docstrings follow the project's NumPy docstring convention.
36-67: 💤 Low valueApprove with one defensive nit on the type fallback.
The
u.atoms.elementsfirst-pass plus type fallback addresses the prior CHARMM-GUI concern. Two small things on the fallback path:
- Line 66:
t[0].upper()willIndexErroron an empty type string. Guard witht[0:1].upper() or "X".- The docstring uses Google-style (
Parameters:/Returns:); convert to NumPy style for consistency.As per coding guidelines: "Write docstrings in NumPy style".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 36 - 67, The fallback in _get_symbols can IndexError when an atom type string is empty and the docstring style is inconsistent: update the fallback branch that currently uses t[0].upper() to safely handle empty types (use a slice like t[0:1].upper() or a default "X") and ensure the rest of the function still respects _TYPE_TO_ELEMENT and types lookups; also convert the function docstring from Google-style Parameters/Returns to the project's NumPy-style docstring format so it matches coding guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 191-193: The code currently writes raw GROMACS EDR values into
extra_results (via the loop over edr_terms/edr_data), which mixes units with ASE
keys; change the storage so values go under a namespaced dict (e.g.,
extra_results.setdefault('gmx_edr', {}) and then set
extra_results['gmx_edr'][term] = float(edr_data[term][idx])) to avoid colliding
with ASE standard keys, and update the function docstring to state that gmx_edr
contains raw GROMACS units (kJ/mol, bar, K, …). Ensure you check/initialize the
'gmx_edr' dict before the loop and do not modify existing ASE keys like
energy/forces/stress.
- Around line 154-160: Atoms are being constructed with pbc=True regardless of
whether a valid cell exists (ts.dimensions), causing inconsistent state; change
the flow so you only enable periodic boundary conditions when a valid box is
detected (use ts.dimensions -> box and the same all(box[:3] > 0) check) —
construct Atoms with pbc=False (or omit pbc) initially and then if box is valid
call atoms.set_cell(box, scale_atoms=False) and set atoms.pbc = True (or
construct with pbc=True at that point) so pbc reflects whether the cell was
actually set.
- Around line 232-241: The docstring example incorrectly instantiates
ips.Gmx2Frames which doesn't exist; change the example to use the real public
class AddDataGMX and ensure it is shown being used with project and ips (e.g.
ips.AddDataGMX(...)) so it is testable; update the example arguments to match
AddDataGMX's signature (topology, trajectory, edr, start) and keep the
surrounding "with project:" context so doctests run correctly, referencing the
AddDataGMX symbol in the docstring.
- Around line 162-165: The velocity conversion is wrong: MDAnalysis gives
velocities in Å/ps and you must convert to ASE's internal time units by
dividing, not multiplying. In the block using ts.has_velocities and
atoms.set_velocities(ts.velocities / 1000.0), change the conversion to divide by
both 1000.0 and ASE's fs constant (ase.units.fs) so velocities = ts.velocities /
(1000.0 * ase.units.fs); ensure ase.units (or from ase import units as
ase_units) is imported and used, and remove any inversion that multiplies by fs.
---
Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 178-189: The code is attaching GROMACS pressures (in bar) as an
ASE-style stress array (variable stress built from edr_data[...] and idx) which
will make atoms.get_stress() return wrong units; either convert the values to
ASE's eV/ų before attaching or stop attaching them as stress and instead store
the raw pressures in extra_results. Fix by one of two options: (A) import
ase.units and apply the bar→eV/ų conversion to the stress array (convert the
built stress variable) before assigning it to the Atoms object so
atoms.get_stress() is correct, or (B) remove setting atoms.stress/stress= and
instead place the raw pxx/pyy/... array into the molecule/atoms.extra_results
(e.g., "gromacs_pressure_bar") so callers can convert explicitly; update
references to atoms.get_stress() usage accordingly.
---
Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 252-283: Update the docstrings for the methods run and frames to
NumPy style: replace the Google-style "Returns:" and inline parameter
descriptions with NumPy sections using section headers followed by underlines
(e.g., "Returns\n-------") and format parameter descriptions under a
"Parameters\n----------" block where applicable; keep the existing prose but
reformat the return type line in frames to
"Returns\n-------\ntyping.List[Atoms]\n description..." and do the same for
run's description so both docstrings follow the project's NumPy docstring
convention.
- Around line 36-67: The fallback in _get_symbols can IndexError when an atom
type string is empty and the docstring style is inconsistent: update the
fallback branch that currently uses t[0].upper() to safely handle empty types
(use a slice like t[0:1].upper() or a default "X") and ensure the rest of the
function still respects _TYPE_TO_ELEMENT and types lookups; also convert the
function docstring from Google-style Parameters/Returns to the project's
NumPy-style docstring format so it matches coding guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eb8dff76-fac6-4793-b6a4-c93bc9321871
📒 Files selected for processing (3)
ipsuite/__init__.pyiipsuite/data_loading/__init__.pyipsuite/data_loading/add_data_gromacs.py
| positions = ts.positions.copy() | ||
| box = ts.dimensions | ||
|
|
||
| atoms = Atoms(symbols=symbols, positions=positions, pbc=True) | ||
|
|
||
| if box is not None and all(box[:3] > 0): | ||
| atoms.set_cell(box, scale_atoms=False) |
There was a problem hiding this comment.
pbc=True is hardcoded even when there is no valid cell.
Atoms are created with pbc=True before checking whether box is valid. For non-periodic systems (vacuum, or dimensions returning zeros), this produces inconsistent state: PBC enabled with no cell. Tie pbc to whether the cell was actually set.
Proposed fix
- atoms = Atoms(symbols=symbols, positions=positions, pbc=True)
-
- if box is not None and all(box[:3] > 0):
- atoms.set_cell(box, scale_atoms=False)
+ has_cell = box is not None and all(box[:3] > 0)
+ atoms = Atoms(symbols=symbols, positions=positions, pbc=has_cell)
+ if has_cell:
+ atoms.set_cell(box, scale_atoms=False)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 154 - 160, Atoms are
being constructed with pbc=True regardless of whether a valid cell exists
(ts.dimensions), causing inconsistent state; change the flow so you only enable
periodic boundary conditions when a valid box is detected (use ts.dimensions ->
box and the same all(box[:3] > 0) check) — construct Atoms with pbc=False (or
omit pbc) initially and then if box is valid call atoms.set_cell(box,
scale_atoms=False) and set atoms.pbc = True (or construct with pbc=True at that
point) so pbc reflects whether the cell was actually set.
| # Velocities (e.g. from .gro or .trr) | ||
| if ts.has_velocities: | ||
| # MDAnalysis: Å/ps, ASE: Å/fs -> divide by 1000 | ||
| atoms.set_velocities(ts.velocities / 1000.0) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
ASE Atoms set_velocities expected units internal time unit ase.units.fs
💡 Result:
The ASE Atoms.set_velocities method expects velocities in ASE's internal units: length per time, where length is Ångström (Å = 1.0) and time is Å * sqrt(u / eV) ≈ 10.18 fs (since 1 fs ≈ 0.0982 in internal time units).[1][2] Velocities are dimensionless in the base system but interpreted as v_internal = Δposition / Δtime, with positions in Å and time derived from ase.units (eV=1, Å=1, u=1).[3][4][5] Kinetic energy is computed as sum(0.5 * m * v^2) in eV, confirming v has units sqrt(2*E/m) = sqrt(eV / u) / Å = 1 / time_unit.[6] To set velocities corresponding to a physical speed in fs^-1 (e.g., from VASP's Å/fs), scale by the time unit conversion: v_ase = v_physical * (1 fs / ase.units.fs) ≈ v_physical * 10.18, since ASE time unit ≈ 10.18 fs.[7][1] For example, 1 Å/fs in VASP requires ≈10.18 in ASE units to match kinetic energy.[7]
Citations:
- 1: https://cmrdb-test.fysik.dtu.dk/ase/ase/units.html
- 2: https://wiki.fysik.dtu.dk/ase/_modules/ase/units.html
- 3: https://ase-lib.org/_modules/ase/atoms.html
- 4: https://ase-lib.org/ase/atoms.html
- 5: https://wiki.fysik.dtu.dk/ase/_modules/ase/atoms.html
- 6: https://docs.matlantis.com/atomistic-simulation-tutorial/en/1_5_ase_calculator.html
- 7: https://listserv.fysik.dtu.dk/pipermail/ase-users/2013-April/001771.html
🏁 Script executed:
# Find project configuration files to check ASE version
find . -maxdepth 2 -type f \( -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" -o -name "requirements.txt" -o -name "requirements*.txt" \) | head -20Repository: zincware/IPSuite
Length of output: 77
🏁 Script executed:
# Read the actual file to see current imports and code
head -50 ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 1189
🏁 Script executed:
# Check the exact lines mentioned in the review
sed -n '155,175p' ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 820
🏁 Script executed:
# Read pyproject.toml to find ASE version pinning
cat pyproject.toml | grep -A 20 -i "ase\|dependencies"Repository: zincware/IPSuite
Length of output: 1917
🏁 Script executed:
# Search for other uses of ase.units to see patterns
rg "from ase.units import|ase\.units\." --type py -B 2 -A 2Repository: zincware/IPSuite
Length of output: 942
🏁 Script executed:
# Look for other velocity or momentum related code
rg "velocit|momentum|set_velocit" --type py -B 2 -A 2 | head -100Repository: zincware/IPSuite
Length of output: 5529
🏁 Script executed:
# Search for tests related to gromacs loading
find . -name "*test*gromacs*" -o -name "*gromacs*test*" | head -10Repository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Check if there are any test files for data_loading
find . -path "*/test*" -name "*add_data_gromacs*" -o -path "*/test*" -name "*gromacs*"Repository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Look for any issue references or TODOs related to velocity
rg "TODO|FIXME|BUG|velocity.*unit|unit.*velocity" ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Check what ase.units.fs actually equals by looking at ASE documentation or imports
python3 << 'EOF'
try:
from ase import units
print(f"ase.units.fs = {units.fs}")
print(f"1 / ase.units.fs = {1 / units.fs}")
print(f"ASE time unit in fs: {1/units.fs:.4f} fs")
except Exception as e:
print(f"Error: {e}")
EOFRepository: zincware/IPSuite
Length of output: 89
🏁 Script executed:
# Search for tests or examples using add_data_gromacs or trajectory loading
find . -type f -name "*.py" -exec grep -l "add_data_gromacs\|gromacs\|MDAnalysis.*trajectory" {} \; | head -20Repository: zincware/IPSuite
Length of output: 180
🏁 Script executed:
# Look for any set_velocities calls to see if there are other patterns
rg "set_velocities|get_velocities" --type py -B 3 -A 3Repository: zincware/IPSuite
Length of output: 1320
🏁 Script executed:
# Check if there are notebooks or examples demonstrating this
find . -name "*.ipynb" -o -name "*example*" -type f | grep -E "(gromacs|traj|velocity)" | head -10Repository: zincware/IPSuite
Length of output: 42
🏁 Script executed:
# Look at the comment in the current code more carefully - what does it claim ASE expects?
sed -n '160,170p' ipsuite/data_loading/add_data_gromacs.pyRepository: zincware/IPSuite
Length of output: 533
Velocity unit conversion is incorrect — values are ~10× too small.
MDAnalysis stores velocities in Å/ps. The current conversion only changes this to Å/fs by dividing by 1000. However, ASE's internal velocity unit is Å / (ASE time unit) where the ASE time unit ≈ 10.18 fs (equivalently, ase.units.fs ≈ 0.0982). The current code is missing the conversion from Å/fs to ASE's internal units. This causes velocities to be ~10× too small, leading to kinetic energies that are ~100× too small (via 0.5·m·v²).
The proposed fix has the operation inverted—it multiplies by fs (a small number < 1), which would make the problem worse. The correct conversion should divide by fs:
Corrected fix
-from ase.units import kJ, mol
+from ase.units import fs, kJ, mol
@@
if ts.has_velocities:
- # MDAnalysis: Å/ps, ASE: Å/fs -> divide by 1000
- atoms.set_velocities(ts.velocities / 1000.0)
+ # MDAnalysis: Å/ps -> ASE internal velocity (Å / ASE time unit)
+ atoms.set_velocities(ts.velocities / (1000.0 * fs))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 162 - 165, The
velocity conversion is wrong: MDAnalysis gives velocities in Å/ps and you must
convert to ASE's internal time units by dividing, not multiplying. In the block
using ts.has_velocities and atoms.set_velocities(ts.velocities / 1000.0), change
the conversion to divide by both 1000.0 and ASE's fs constant (ase.units.fs) so
velocities = ts.velocities / (1000.0 * ase.units.fs); ensure ase.units (or from
ase import units as ase_units) is imported and used, and remove any inversion
that multiplies by fs.
| # Store all EDR terms for this frame | ||
| for term in edr_terms: | ||
| extra_results[term] = float(edr_data[term][idx]) |
There was a problem hiding this comment.
extra_results keeps EDR values in raw GROMACS units.
All EDR terms (kJ/mol energies, bar pressures, K temperatures, …) are written verbatim into calc.results alongside ASE-unit energy/forces/stress. Mixing unit systems on the same calculator is a footgun for downstream consumers. At minimum, document this in the function docstring; preferably namespace these (e.g., a gmx_edr dict) so they cannot collide with ASE's standard keys (energy, forces, stress, dipole, …).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 191 - 193, The code
currently writes raw GROMACS EDR values into extra_results (via the loop over
edr_terms/edr_data), which mixes units with ASE keys; change the storage so
values go under a namespaced dict (e.g., extra_results.setdefault('gmx_edr', {})
and then set extra_results['gmx_edr'][term] = float(edr_data[term][idx])) to
avoid colliding with ASE standard keys, and update the function docstring to
state that gmx_edr contains raw GROMACS units (kJ/mol, bar, K, …). Ensure you
check/initialize the 'gmx_edr' dict before the loop and do not modify existing
ASE keys like energy/forces/stress.
| Examples | ||
| -------- | ||
| >>> with project: | ||
| ... md = ips.Gmx2Frames( | ||
| ... topology="gromacs/system.gro", | ||
| ... trajectory="gromacs/production.xtc", | ||
| ... edr="gromacs/production.edr", | ||
| ... start=1, | ||
| ... ) | ||
| """ |
There was a problem hiding this comment.
Docstring example uses a non-existent class name and is not testable.
The example references ips.Gmx2Frames(...), but the public symbol is AddDataGMX (also in __all__). The example would raise AttributeError. Per coding guidelines, the Node docstring must include a testable example using project and ips.
Proposed fix
Examples
--------
- >>> with project:
- ... md = ips.Gmx2Frames(
- ... topology="gromacs/system.gro",
- ... trajectory="gromacs/production.xtc",
- ... edr="gromacs/production.edr",
- ... start=1,
- ... )
+ >>> project = ips.Project()
+ >>> with project:
+ ... md = ips.AddDataGMX(
+ ... topology="gromacs/system.gro",
+ ... trajectory="gromacs/production.xtc",
+ ... edr="gromacs/production.edr",
+ ... start=1,
+ ... )
+ >>> project.run()As per coding guidelines: "Each Node's docstring should include a testable example using project and ips".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ipsuite/data_loading/add_data_gromacs.py` around lines 232 - 241, The
docstring example incorrectly instantiates ips.Gmx2Frames which doesn't exist;
change the example to use the real public class AddDataGMX and ensure it is
shown being used with project and ips (e.g. ips.AddDataGMX(...)) so it is
testable; update the example arguments to match AddDataGMX's signature
(topology, trajectory, edr, start) and keep the surrounding "with project:"
context so doctests run correctly, referencing the AddDataGMX symbol in the
docstring.
Summary by CodeRabbit