add gromacs to frames node by MrJulEnergy · Pull Request #463 · zincware/IPSuite

MrJulEnergy · 2026-04-03T10:00:10Z

Summary by CodeRabbit

New Features
- Added GROMACS → ASE trajectory conversion producing per-frame atoms with positions, cell, velocities, energies, stresses and other EDR terms when available.
- New workflow node to run conversion, slice frames, persist results to a reusable frames file, and expose frames for downstream use.
- Package exports updated to surface the new data-loading capability and demo entry for quick inspection.

for more information, see https://pre-commit.ci

coderabbitai · 2026-04-03T10:02:23Z

📝 Walkthrough

Walkthrough

Adds a GROMACS→ASE converter function gmx_to_ase() and a zntrack node AddDataGMX that build ASE Atoms frames (positions, cell/pbc, velocities), optionally attach per-frame energies/stress/terms from .edr samples matched by time, and persist frames to frames.h5.

Changes

GROMACS-to-ASE Conversion & ZnTrack Node

Layer / File(s)	Summary
Data Shape / Utilities `ipsuite/data_loading/add_data_gromacs.py`	Adds type→element mapping and `_get_symbols()` to derive per-atom chemical symbols. Introduces EDR time/term matching helpers and logging scaffolding.
Core Conversion `ipsuite/data_loading/add_data_gromacs.py`	Adds `gmx_to_ase(topology, trajectory=None, edr=None, start=None, stop=None, step=None)` that loads an MDAnalysis Universe, iterates selected timesteps, and constructs `ase.Atoms` with positions, optionally cell/pbc and velocities.
EDR Integration `ipsuite/data_loading/add_data_gromacs.py`	When an `.edr` is provided: loads EDR samples, finds nearest sample by timestamp (with tolerance warnings), reads `Potential` for energy, assembles a 6-component stress from `Pres-*` terms when available, and collects all per-frame EDR terms.
Calculator Attachment `ipsuite/data_loading/add_data_gromacs.py`	Attaches `ase.calculators.singlepoint.SinglePointCalculator` to frames when energy/forces/stress are present; populates `energy`, `forces`, `stress`, and stores extra EDR terms in `calc.results`.
Persistence / ZnTrack Node `ipsuite/data_loading/add_data_gromacs.py`	Adds `AddDataGMX(zntrack.Node)` with parameters (`topology`, `trajectory`, `edr`, `start`, `stop`, `step`, `frames_path`), `run()` calling `gmx_to_ase`, and persists frames using `znh5md.IO.extend()`. Adds a `frames` property to read back stored ASE frames.
Public API Export `ipsuite/__init__.pyi`, `ipsuite/data_loading/__init__.py`	Exports `AddDataGMX` from `ipsuite.data_loading` and updates `__all__` to include the new node.
Example / CLI `ipsuite/data_loading/add_data_gromacs.py`	Includes a `__main__` demo showing example usage and printing frame/EDR keys.

Sequence Diagram

sequenceDiagram
    actor User
    participant AddDataGMX as AddDataGMX\n(ZnTrack Node)
    participant gmx_to_ase as gmx_to_ase\n(Function)
    participant MDAnalysis as MDAnalysis\n(Universe / Trajectory)
    participant EDR as EDR Parser
    participant ASE as ASE\n(Atoms + SinglePointCalculator)
    participant HDF5 as HDF5\n(frames.h5)

    User->>AddDataGMX: run()
    AddDataGMX->>gmx_to_ase: gmx_to_ase(topology, trajectory, edr, start, stop, step)
    gmx_to_ase->>MDAnalysis: load topology & trajectory
    MDAnalysis-->>gmx_to_ase: Universe & timesteps

    loop per selected timestep
        gmx_to_ase->>MDAnalysis: read frame (positions, box, velocities)
        alt edr provided
            gmx_to_ase->>EDR: load EDR & find nearest sample by time
            EDR-->>gmx_to_ase: energy, pressure terms, all EDR terms
            gmx_to_ase->>gmx_to_ase: build 6-component stress vector
        end
        gmx_to_ase->>ASE: create Atoms, set positions/cell/pbc/velocities
        gmx_to_ase->>ASE: attach SinglePointCalculator (energy, forces, stress, results)
        ASE-->>gmx_to_ase: Atoms frame
    end

    gmx_to_ase-->>AddDataGMX: list[Atoms]
    AddDataGMX->>HDF5: persist frames via znh5md.IO.extend()
    HDF5-->>AddDataGMX: confirmation

    User->>AddDataGMX: .frames
    AddDataGMX->>HDF5: read frames
    HDF5-->>User: list[Atoms]

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through GROMACS trails by moonlight,

found atoms, boxes, velocities in flight,
matched EDR beats by nearest chime,
tucked energies and stresses in time,
frames snug in HDF5 — what a delight!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'add gromacs to frames node' accurately describes the main change: introducing a new AddDataGMX node class that converts GROMACS data to ASE frames.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch addGmx2FramesNode

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

ipsuite/data_loading/add_data_gromacs.py (1)

134-134: Consider warning when EDR and trajectory times diverge significantly.

The nearest-time matching via argmin silently succeeds even when the time difference is large (e.g., different output frequencies). Consider logging a warning if the matched time differs by more than a small tolerance.

Example tolerance check

idx = np.argmin(np.abs(edr_times - ts.time))
time_diff = abs(edr_times[idx] - ts.time)
if time_diff > 0.1:  # ps, adjust as appropriate
    warnings.warn(f"EDR time {edr_times[idx]:.3f} differs from trajectory time {ts.time:.3f} by {time_diff:.3f} ps")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` at line 134, The nearest-time
matching using idx = np.argmin(np.abs(edr_times - ts.time)) can silently match
large time gaps; in the function or loop where that line appears (e.g., the
trajectory frame handling block in add_data_gromacs.py), compute time_diff =
abs(edr_times[idx] - ts.time) after selecting idx and if time_diff exceeds a
small tolerance (e.g., 0.1 ps, configurable) emit a warning (warnings.warn or
your logger.warn) indicating the EDR time, trajectory time, and the difference
so users know the outputs are misaligned; make the tolerance configurable or
documented.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 211: The run method in the class defined in add_data_gromacs.py is
missing a return type annotation; update the def run(self): signature to include
the appropriate return type (e.g., -> None or the actual return type used) so it
conforms with the project's full type annotation guideline, and update any
related docstring or callers if necessary; locate the method by the symbol run
in that file and add the correct type hint.
- Around line 202-207: The annotated types for node attributes are too narrow:
change trajectory and edr from Path to Optional[Path] and start, stop, step from
int to Optional[int] to reflect that zntrack.deps_path(None) /
zntrack.params(None) can return None; update the import to include
typing.Optional (e.g., from typing import Optional) and modify the annotations
for trajectory, edr, start, stop, and step accordingly while leaving topology as
Path.
- Around line 239-240: The example prints may raise an IndexError and shows the
wrong unit: guard access to frames[1] by checking len(frames) >= 2 before
referencing frames[1].calc.results (or else reference frame 0 consistently), and
fix the unit label by using the actual energy units returned by
frames[0].get_potential_energy() (or an energy_unit variable if present) instead
of hardcoding "kJ/mol" (use "eV" if energies are converted to eV); update the
two print lines to conditionally print the second-frame EDR terms only when a
second frame exists and to display the correct units from the energy
value/source.
- Line 135: The energy read from the EDR file (assigned to variable energy in
the line energy = float(edr_data["Potential"][idx])) is in kJ/mol and must be
converted to eV before passing to ASE's SinglePointCalculator; change the code
to convert from kJ/mol to eV (using ase.units, e.g. multiply/divide by units.kJ,
units.mol and units.eV) and supply the converted value to the
SinglePointCalculator so downstream ASE methods return correct energies.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 134: The nearest-time matching using idx = np.argmin(np.abs(edr_times -
ts.time)) can silently match large time gaps; in the function or loop where that
line appears (e.g., the trajectory frame handling block in add_data_gromacs.py),
compute time_diff = abs(edr_times[idx] - ts.time) after selecting idx and if
time_diff exceeds a small tolerance (e.g., 0.1 ps, configurable) emit a warning
(warnings.warn or your logger.warn) indicating the EDR time, trajectory time,
and the difference so users know the outputs are misaligned; make the tolerance
configurable or documented.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d200379-0275-4ef7-a180-661cd8621105

📥 Commits

Reviewing files that changed from the base of the PR and between 236f4ac and 4584b12.

📒 Files selected for processing (1)

ipsuite/data_loading/add_data_gromacs.py

coderabbitai · 2026-04-03T10:07:26Z

+                # GROMACS pressure in bar -> store as-is (not ASE native eV/Å³)
+                stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])


⚠️ Potential issue | 🟠 Major

Stress units mismatch: GROMACS pressure is in bar, ASE expects eV/Å³.

The comment acknowledges the unit discrepancy, but storing stress in bar breaks ASE's stress conventions. Any code calling atoms.get_stress() will receive values in the wrong units, leading to incorrect virial/pressure calculations downstream.

Convert bar to eV/Å³: 1 bar = 1e-4 GPa = 6.2415e-7 eV/Å³ (use ase.units for precision).

Proposed fix

+from ase import units + +# bar -> eV/Å³ +BAR_TO_EV_ANG3 = 1.0 / (units.bar / (units.eV / units.Ang**3)) + ... - # GROMACS pressure in bar -> store as-is (not ASE native eV/Å³) - stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) + # Convert GROMACS pressure (bar) to ASE stress (eV/Å³) + stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * BAR_TO_EV_ANG3

coderabbitai · 2026-04-03T10:12:00Z

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #464

@MrJulEnergy

Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py`

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

ipsuite/data_loading/add_data_gromacs.py (1)
156-165: ⚠️ Potential issue | 🟠 Major

Convert the pressure tensor before assigning stress.

ASE works in eV-based units, and its pressure/stress quantities are represented in eV/Å³; passing the raw Pres-* values here keeps atoms.get_stress() in bar instead. If you want to preserve the native tensor too, keep the bar numbers only under the Pres-* keys and convert the array you pass as stress. (ase-lib.org)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 156 - 165, The code
currently assigns raw GROMACS Pres-* values (in bar) directly to stress, which
leaves ASE seeing stress in bar; instead, convert those pressure components to
ASE internal units (eV/Å³) before assigning to the stress array. Locate the
block reading edr_data["Pres-XX"/"Pres-YY"/..."] and replace the direct
assignment to stress with a conversion step (keep the raw bar values in edr_data
if you need them) — e.g., compute a conversion factor or use ASE unit helpers
(ase.units / ase.units.convert) to convert from bar to eV/Å³ and then set stress
= np.array([pxx_converted, pyy_converted, pzz_converted, pyz_converted,
pxz_converted, pxy_converted]) so atoms.get_stress() returns values in ASE
units.

🧹 Nitpick comments (1)

ipsuite/data_loading/add_data_gromacs.py (1)

210-219: Make the node example runnable end-to-end.

Right now the example stops after construction, so it isn't testable in the same way as the other data-loading nodes. Add project.repro() and one observable result/assertion.

📝 Suggested docstring tweak

     >>> with project:
     ...     md = ips.Gmx2Frames(
     ...         topology="gromacs/system.gro",
     ...         trajectory="gromacs/production.xtc",
     ...         edr="gromacs/production.edr",
     ...         start=1,
     ...     )
+    >>> project.repro()
+    >>> len(md.frames) > 0
+    True

As per coding guidelines: "Each Node’s docstring should include a testable example using project and ips".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 210 - 219, The
docstring example for Gmx2Frames stops after construction and isn't runnable;
update it to call project.repro() and assert one observable outcome so it
executes end-to-end. Specifically, after constructing md = ips.Gmx2Frames(...),
invoke project.repro() and then check a result such as verifying md.frames (or
another public attribute/method on Gmx2Frames) has expected length/type (e.g.,
assert len(md.frames) > 0 or isinstance(md.frames, list/ndarray)) so the example
both runs and asserts behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 60-73: _match_edr_frame currently scans the whole edr_times array
per frame which is O(n×m) and silently returns an out-of-tolerance index;
replace the np.argmin approach with np.searchsorted on the monotonic edr_times
(or use a moving pointer maintained by the caller) to find the nearest candidate
on the left/right, compute the nearest index and time_diff, and if time_diff >
tolerance reject the match (raise a ValueError or return a sentinel like
None/-1) instead of returning a bad index; update callers to handle the
rejection and keep the function signature (_match_edr_frame(edr_times,
traj_time, tolerance)) and logging behavior.
- Around line 147-179: ts.forces are left in MDAnalysis units (kJ/(mol·Å)) but
ASE expects eV/Å; multiply the copied forces by the same conversion factor used
for energy (the (kJ / mol) factor) before passing into SinglePointCalculator.
Update the code that sets forces (the ts.forces.copy() branch) to convert units
(e.g., forces = ts.forces.copy() * (kJ / mol) when ts.has_forces) so the forces
variable passed to SinglePointCalculator(atoms, energy=..., forces=forces,
stress=...) is in eV/Å.
- Around line 44-57: The code currently derives element symbols from
u.atoms.types which misclassifies CHARMM/AMBER types; change the logic to use
atom names and MDAnalysis's element guessing instead: use u.atoms.names (or call
u.guess_TopologyAttrs(to_guess=['elements']) and read u.atoms.elements) and
build the symbols list from those guessed elements (or rely on
MDAnalysis.DefaultGuesser behavior) rather than transforming types; update the
block that references types, t, and symbols to use names/elements (keeping the
same return of symbols) and remove the manual type-based heuristics.

---

Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 156-165: The code currently assigns raw GROMACS Pres-* values (in
bar) directly to stress, which leaves ASE seeing stress in bar; instead, convert
those pressure components to ASE internal units (eV/Å³) before assigning to the
stress array. Locate the block reading edr_data["Pres-XX"/"Pres-YY"/..."] and
replace the direct assignment to stress with a conversion step (keep the raw bar
values in edr_data if you need them) — e.g., compute a conversion factor or use
ASE unit helpers (ase.units / ase.units.convert) to convert from bar to eV/Å³
and then set stress = np.array([pxx_converted, pyy_converted, pzz_converted,
pyz_converted, pxz_converted, pxy_converted]) so atoms.get_stress() returns
values in ASE units.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 210-219: The docstring example for Gmx2Frames stops after
construction and isn't runnable; update it to call project.repro() and assert
one observable outcome so it executes end-to-end. Specifically, after
constructing md = ips.Gmx2Frames(...), invoke project.repro() and then check a
result such as verifying md.frames (or another public attribute/method on
Gmx2Frames) has expected length/type (e.g., assert len(md.frames) > 0 or
isinstance(md.frames, list/ndarray)) so the example both runs and asserts
behavior.

🪄 Autofix (Beta)

✅ Autofix completed

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3211bbb4-e6f9-44fd-bc99-ce5c595a7702

📥 Commits

Reviewing files that changed from the base of the PR and between 4584b12 and c4a2475.

📒 Files selected for processing (1)

ipsuite/data_loading/add_data_gromacs.py

@MrJulEnergy

* 📝 Add docstrings to `addGmx2FramesNode` Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py` * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Julian Peters <108301472+MrJulEnergy@users.noreply.github.com>

coderabbitai · 2026-05-06T18:06:57Z

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 2 unresolved review comments.

A stacked PR containing fixes has been created.

Stacked PR: #468
Files modified:

ipsuite/data_loading/add_data_gromacs.py

Time taken: 2m 34s

Fixed 1 file(s) based on 2 unresolved review comments. Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

ipsuite/data_loading/add_data_gromacs.py (1)

178-189: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stress is still stored in bar, not ASE's eV/Å³.

The added comment acknowledges the mismatch, but anything calling atoms.get_stress() will silently get wrong-unit values. Either convert (bar → eV/Å³ via ase.units) or do not attach stress= and keep the raw pressure terms only inside extra_results.

Proposed fix

-from ase.units import kJ, mol
+from ase.units import Ang, bar, eV, kJ, mol
@@
-                # GROMACS pressure in bar -> store as-is (not ASE native eV/Å³)
-                stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])
+                # GROMACS pressure in bar -> ASE stress in eV/Å³
+                bar_to_eV_per_A3 = bar / (eV / Ang**3)
+                stress = (
+                    np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * bar_to_eV_per_A3
+                )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 178 - 189, The code is
attaching GROMACS pressures (in bar) as an ASE-style stress array (variable
stress built from edr_data[...] and idx) which will make atoms.get_stress()
return wrong units; either convert the values to ASE's eV/Å³ before attaching or
stop attaching them as stress and instead store the raw pressures in
extra_results. Fix by one of two options: (A) import ase.units and apply the
bar→eV/Å³ conversion to the stress array (convert the built stress variable)
before assigning it to the Atoms object so atoms.get_stress() is correct, or (B)
remove setting atoms.stress/stress= and instead place the raw pxx/pyy/... array
into the molecule/atoms.extra_results (e.g., "gromacs_pressure_bar") so callers
can convert explicitly; update references to atoms.get_stress() usage
accordingly.

🧹 Nitpick comments (2)

ipsuite/data_loading/add_data_gromacs.py (2)

252-283: 💤 Low value

run and frames docstrings use Google style, not NumPy.

The two methods use Google-style sections (Returns: / inline Parameters:). Per the project guideline, docstrings must be in NumPy style with ---- underlines.

Proposed fix

     def run(self) -> None:
-        """
-        Convert the configured GROMACS inputs into ASE Atoms frames and persist them
-        to the node's HDF5 output at self.frames_path.
-
-        The node's topology, optional trajectory, optional EDR file, and slicing
-        parameters (start, stop, step) are used to produce the frames which are
-        written to the frames_path via znh5md.
-        """
+        """Convert configured GROMACS inputs into ASE frames and persist them.
+
+        The frames are written to ``self.frames_path`` via ``znh5md``.
+        """
@@
     `@property`
     def frames(self) -> typing.List[Atoms]:
-        """
-        Return all ASE `Atoms` frames stored in the node's HDF5 frames file.
-
-        Returns:
-            typing.List[Atoms]: A list of ASE `Atoms` objects read
-            from the HDF5 file at `self.frames_path`.
-        """
+        """Return all ASE ``Atoms`` frames stored in the HDF5 frames file.
+
+        Returns
+        -------
+        list[Atoms]
+            Frames read from ``self.frames_path``.
+        """

As per coding guidelines: "Write docstrings in NumPy style".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 252 - 283, Update the
docstrings for the methods run and frames to NumPy style: replace the
Google-style "Returns:" and inline parameter descriptions with NumPy sections
using section headers followed by underlines (e.g., "Returns\n-------") and
format parameter descriptions under a "Parameters\n----------" block where
applicable; keep the existing prose but reformat the return type line in frames
to "Returns\n-------\ntyping.List[Atoms]\n    description..." and do the same
for run's description so both docstrings follow the project's NumPy docstring
convention.

36-67: 💤 Low value

Approve with one defensive nit on the type fallback.

The u.atoms.elements first-pass plus type fallback addresses the prior CHARMM-GUI concern. Two small things on the fallback path:

Line 66: t[0].upper() will IndexError on an empty type string. Guard with t[0:1].upper() or "X".
The docstring uses Google-style (Parameters: / Returns:); convert to NumPy style for consistency.

As per coding guidelines: "Write docstrings in NumPy style".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 36 - 67, The fallback
in _get_symbols can IndexError when an atom type string is empty and the
docstring style is inconsistent: update the fallback branch that currently uses
t[0].upper() to safely handle empty types (use a slice like t[0:1].upper() or a
default "X") and ensure the rest of the function still respects _TYPE_TO_ELEMENT
and types lookups; also convert the function docstring from Google-style
Parameters/Returns to the project's NumPy-style docstring format so it matches
coding guidelines.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 191-193: The code currently writes raw GROMACS EDR values into
extra_results (via the loop over edr_terms/edr_data), which mixes units with ASE
keys; change the storage so values go under a namespaced dict (e.g.,
extra_results.setdefault('gmx_edr', {}) and then set
extra_results['gmx_edr'][term] = float(edr_data[term][idx])) to avoid colliding
with ASE standard keys, and update the function docstring to state that gmx_edr
contains raw GROMACS units (kJ/mol, bar, K, …). Ensure you check/initialize the
'gmx_edr' dict before the loop and do not modify existing ASE keys like
energy/forces/stress.
- Around line 154-160: Atoms are being constructed with pbc=True regardless of
whether a valid cell exists (ts.dimensions), causing inconsistent state; change
the flow so you only enable periodic boundary conditions when a valid box is
detected (use ts.dimensions -> box and the same all(box[:3] > 0) check) —
construct Atoms with pbc=False (or omit pbc) initially and then if box is valid
call atoms.set_cell(box, scale_atoms=False) and set atoms.pbc = True (or
construct with pbc=True at that point) so pbc reflects whether the cell was
actually set.
- Around line 232-241: The docstring example incorrectly instantiates
ips.Gmx2Frames which doesn't exist; change the example to use the real public
class AddDataGMX and ensure it is shown being used with project and ips (e.g.
ips.AddDataGMX(...)) so it is testable; update the example arguments to match
AddDataGMX's signature (topology, trajectory, edr, start) and keep the
surrounding "with project:" context so doctests run correctly, referencing the
AddDataGMX symbol in the docstring.
- Around line 162-165: The velocity conversion is wrong: MDAnalysis gives
velocities in Å/ps and you must convert to ASE's internal time units by
dividing, not multiplying. In the block using ts.has_velocities and
atoms.set_velocities(ts.velocities / 1000.0), change the conversion to divide by
both 1000.0 and ASE's fs constant (ase.units.fs) so velocities = ts.velocities /
(1000.0 * ase.units.fs); ensure ase.units (or from ase import units as
ase_units) is imported and used, and remove any inversion that multiplies by fs.

---

Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 178-189: The code is attaching GROMACS pressures (in bar) as an
ASE-style stress array (variable stress built from edr_data[...] and idx) which
will make atoms.get_stress() return wrong units; either convert the values to
ASE's eV/Å³ before attaching or stop attaching them as stress and instead store
the raw pressures in extra_results. Fix by one of two options: (A) import
ase.units and apply the bar→eV/Å³ conversion to the stress array (convert the
built stress variable) before assigning it to the Atoms object so
atoms.get_stress() is correct, or (B) remove setting atoms.stress/stress= and
instead place the raw pxx/pyy/... array into the molecule/atoms.extra_results
(e.g., "gromacs_pressure_bar") so callers can convert explicitly; update
references to atoms.get_stress() usage accordingly.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 252-283: Update the docstrings for the methods run and frames to
NumPy style: replace the Google-style "Returns:" and inline parameter
descriptions with NumPy sections using section headers followed by underlines
(e.g., "Returns\n-------") and format parameter descriptions under a
"Parameters\n----------" block where applicable; keep the existing prose but
reformat the return type line in frames to
"Returns\n-------\ntyping.List[Atoms]\n    description..." and do the same for
run's description so both docstrings follow the project's NumPy docstring
convention.
- Around line 36-67: The fallback in _get_symbols can IndexError when an atom
type string is empty and the docstring style is inconsistent: update the
fallback branch that currently uses t[0].upper() to safely handle empty types
(use a slice like t[0:1].upper() or a default "X") and ensure the rest of the
function still respects _TYPE_TO_ELEMENT and types lookups; also convert the
function docstring from Google-style Parameters/Returns to the project's
NumPy-style docstring format so it matches coding guidelines.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb8dff76-fac6-4793-b6a4-c93bc9321871

📥 Commits

Reviewing files that changed from the base of the PR and between c4a2475 and 67b2ee2.

📒 Files selected for processing (3)

ipsuite/__init__.pyi
ipsuite/data_loading/__init__.py
ipsuite/data_loading/add_data_gromacs.py

coderabbitai · 2026-05-06T19:01:39Z

+        positions = ts.positions.copy()
+        box = ts.dimensions
+
+        atoms = Atoms(symbols=symbols, positions=positions, pbc=True)
+
+        if box is not None and all(box[:3] > 0):
+            atoms.set_cell(box, scale_atoms=False)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

pbc=True is hardcoded even when there is no valid cell.

Atoms are created with pbc=True before checking whether box is valid. For non-periodic systems (vacuum, or dimensions returning zeros), this produces inconsistent state: PBC enabled with no cell. Tie pbc to whether the cell was actually set.

Proposed fix

- atoms = Atoms(symbols=symbols, positions=positions, pbc=True) - - if box is not None and all(box[:3] > 0): - atoms.set_cell(box, scale_atoms=False) + has_cell = box is not None and all(box[:3] > 0) + atoms = Atoms(symbols=symbols, positions=positions, pbc=has_cell) + if has_cell: + atoms.set_cell(box, scale_atoms=False)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 154 - 160, Atoms are being constructed with pbc=True regardless of whether a valid cell exists (ts.dimensions), causing inconsistent state; change the flow so you only enable periodic boundary conditions when a valid box is detected (use ts.dimensions -> box and the same all(box[:3] > 0) check) — construct Atoms with pbc=False (or omit pbc) initially and then if box is valid call atoms.set_cell(box, scale_atoms=False) and set atoms.pbc = True (or construct with pbc=True at that point) so pbc reflects whether the cell was actually set.

coderabbitai · 2026-05-06T19:01:39Z

+        # Velocities (e.g. from .gro or .trr)
+        if ts.has_velocities:
+            # MDAnalysis: Å/ps, ASE: Å/fs -> divide by 1000
+            atoms.set_velocities(ts.velocities / 1000.0)


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

ASE Atoms set_velocities expected units internal time unit ase.units.fs

💡 Result:

The ASE Atoms.set_velocities method expects velocities in ASE's internal units: length per time, where length is Ångström (Å = 1.0) and time is Å * sqrt(u / eV) ≈ 10.18 fs (since 1 fs ≈ 0.0982 in internal time units).[1][2] Velocities are dimensionless in the base system but interpreted as v_internal = Δposition / Δtime, with positions in Å and time derived from ase.units (eV=1, Å=1, u=1).[3][4][5] Kinetic energy is computed as sum(0.5 * m * v^2) in eV, confirming v has units sqrt(2*E/m) = sqrt(eV / u) / Å = 1 / time_unit.[6] To set velocities corresponding to a physical speed in fs^-1 (e.g., from VASP's Å/fs), scale by the time unit conversion: v_ase = v_physical * (1 fs / ase.units.fs) ≈ v_physical * 10.18, since ASE time unit ≈ 10.18 fs.[7][1] For example, 1 Å/fs in VASP requires ≈10.18 in ASE units to match kinetic energy.[7]

Citations:

1: https://cmrdb-test.fysik.dtu.dk/ase/ase/units.html

2: https://wiki.fysik.dtu.dk/ase/_modules/ase/units.html

3: https://ase-lib.org/_modules/ase/atoms.html

4: https://ase-lib.org/ase/atoms.html

5: https://wiki.fysik.dtu.dk/ase/_modules/ase/atoms.html

6: https://docs.matlantis.com/atomistic-simulation-tutorial/en/1_5_ase_calculator.html

7: https://listserv.fysik.dtu.dk/pipermail/ase-users/2013-April/001771.html

🏁 Script executed:

# Find project configuration files to check ASE version find . -maxdepth 2 -type f \( -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" -o -name "requirements.txt" -o -name "requirements*.txt" \) | head -20

Repository: zincware/IPSuite

Length of output: 77

🏁 Script executed:

# Read the actual file to see current imports and code head -50 ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 1189

🏁 Script executed:

# Check the exact lines mentioned in the review sed -n '155,175p' ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 820

🏁 Script executed:

# Read pyproject.toml to find ASE version pinning cat pyproject.toml | grep -A 20 -i "ase\|dependencies"

Repository: zincware/IPSuite

Length of output: 1917

🏁 Script executed:

# Search for other uses of ase.units to see patterns rg "from ase.units import|ase\.units\." --type py -B 2 -A 2

Repository: zincware/IPSuite

Length of output: 942

🏁 Script executed:

# Look for other velocity or momentum related code rg "velocit|momentum|set_velocit" --type py -B 2 -A 2 | head -100

Repository: zincware/IPSuite

Length of output: 5529

🏁 Script executed:

# Search for tests related to gromacs loading find . -name "*test*gromacs*" -o -name "*gromacs*test*" | head -10

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Check if there are any test files for data_loading find . -path "*/test*" -name "*add_data_gromacs*" -o -path "*/test*" -name "*gromacs*"

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Look for any issue references or TODOs related to velocity rg "TODO|FIXME|BUG|velocity.*unit|unit.*velocity" ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Check what ase.units.fs actually equals by looking at ASE documentation or imports python3 << 'EOF' try: from ase import units print(f"ase.units.fs = {units.fs}") print(f"1 / ase.units.fs = {1 / units.fs}") print(f"ASE time unit in fs: {1/units.fs:.4f} fs") except Exception as e: print(f"Error: {e}") EOF

Repository: zincware/IPSuite

Length of output: 89

🏁 Script executed:

# Search for tests or examples using add_data_gromacs or trajectory loading find . -type f -name "*.py" -exec grep -l "add_data_gromacs\|gromacs\|MDAnalysis.*trajectory" {} \; | head -20

Repository: zincware/IPSuite

Length of output: 180

🏁 Script executed:

# Look for any set_velocities calls to see if there are other patterns rg "set_velocities|get_velocities" --type py -B 3 -A 3

Repository: zincware/IPSuite

Length of output: 1320

🏁 Script executed:

# Check if there are notebooks or examples demonstrating this find . -name "*.ipynb" -o -name "*example*" -type f | grep -E "(gromacs|traj|velocity)" | head -10

Repository: zincware/IPSuite

Length of output: 42

🏁 Script executed:

# Look at the comment in the current code more carefully - what does it claim ASE expects? sed -n '160,170p' ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 533

Velocity unit conversion is incorrect — values are ~10× too small.

MDAnalysis stores velocities in Å/ps. The current conversion only changes this to Å/fs by dividing by 1000. However, ASE's internal velocity unit is Å / (ASE time unit) where the ASE time unit ≈ 10.18 fs (equivalently, ase.units.fs ≈ 0.0982). The current code is missing the conversion from Å/fs to ASE's internal units. This causes velocities to be ~10× too small, leading to kinetic energies that are ~100× too small (via 0.5·m·v²).

The proposed fix has the operation inverted—it multiplies by fs (a small number < 1), which would make the problem worse. The correct conversion should divide by fs:

Corrected fix

-from ase.units import kJ, mol +from ase.units import fs, kJ, mol @@ if ts.has_velocities: - # MDAnalysis: Å/ps, ASE: Å/fs -> divide by 1000 - atoms.set_velocities(ts.velocities / 1000.0) + # MDAnalysis: Å/ps -> ASE internal velocity (Å / ASE time unit) + atoms.set_velocities(ts.velocities / (1000.0 * fs))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 162 - 165, The velocity conversion is wrong: MDAnalysis gives velocities in Å/ps and you must convert to ASE's internal time units by dividing, not multiplying. In the block using ts.has_velocities and atoms.set_velocities(ts.velocities / 1000.0), change the conversion to divide by both 1000.0 and ASE's fs constant (ase.units.fs) so velocities = ts.velocities / (1000.0 * ase.units.fs); ensure ase.units (or from ase import units as ase_units) is imported and used, and remove any inversion that multiplies by fs.

coderabbitai · 2026-05-06T19:01:39Z

+            # Store all EDR terms for this frame
+            for term in edr_terms:
+                extra_results[term] = float(edr_data[term][idx])


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

extra_results keeps EDR values in raw GROMACS units.

All EDR terms (kJ/mol energies, bar pressures, K temperatures, …) are written verbatim into calc.results alongside ASE-unit energy/forces/stress. Mixing unit systems on the same calculator is a footgun for downstream consumers. At minimum, document this in the function docstring; preferably namespace these (e.g., a gmx_edr dict) so they cannot collide with ASE's standard keys (energy, forces, stress, dipole, …).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 191 - 193, The code currently writes raw GROMACS EDR values into extra_results (via the loop over edr_terms/edr_data), which mixes units with ASE keys; change the storage so values go under a namespaced dict (e.g., extra_results.setdefault('gmx_edr', {}) and then set extra_results['gmx_edr'][term] = float(edr_data[term][idx])) to avoid colliding with ASE standard keys, and update the function docstring to state that gmx_edr contains raw GROMACS units (kJ/mol, bar, K, …). Ensure you check/initialize the 'gmx_edr' dict before the loop and do not modify existing ASE keys like energy/forces/stress.

coderabbitai · 2026-05-06T19:01:39Z

+    Examples
+    --------
+    >>> with project:
+    ...     md = ips.Gmx2Frames(
+    ...         topology="gromacs/system.gro",
+    ...         trajectory="gromacs/production.xtc",
+    ...         edr="gromacs/production.edr",
+    ...         start=1,
+    ...     )
+    """


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Docstring example uses a non-existent class name and is not testable.

The example references ips.Gmx2Frames(...), but the public symbol is AddDataGMX (also in __all__). The example would raise AttributeError. Per coding guidelines, the Node docstring must include a testable example using project and ips.

Proposed fix

Examples -------- - >>> with project: - ... md = ips.Gmx2Frames( - ... topology="gromacs/system.gro", - ... trajectory="gromacs/production.xtc", - ... edr="gromacs/production.edr", - ... start=1, - ... ) + >>> project = ips.Project() + >>> with project: + ... md = ips.AddDataGMX( + ... topology="gromacs/system.gro", + ... trajectory="gromacs/production.xtc", + ... edr="gromacs/production.edr", + ... start=1, + ... ) + >>> project.run()

As per coding guidelines: "Each Node's docstring should include a testable example using project and ips".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ipsuite/data_loading/add_data_gromacs.py` around lines 232 - 241, The docstring example incorrectly instantiates ips.Gmx2Frames which doesn't exist; change the example to use the real public class AddDataGMX and ensure it is shown being used with project and ips (e.g. ips.AddDataGMX(...)) so it is testable; update the example arguments to match AddDataGMX's signature (topology, trajectory, edr, start) and keep the surrounding "with project:" context so doctests run correctly, referencing the AddDataGMX symbol in the docstring.

MrJulEnergy and others added 2 commits April 3, 2026 11:59

add gromacs to frames node

2f21136

[pre-commit.ci] auto fixes from pre-commit.com hooks

4584b12

for more information, see https://pre-commit.ci

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

coderabbitai Bot added a commit that referenced this pull request Apr 3, 2026

📝 Add docstrings to addGmx2FramesNode

9f805c0

Docstrings generation was requested by @MrJulEnergy. * #463 (comment) The following files were modified: * `ipsuite/data_loading/add_data_gromacs.py`

coderabbitai Bot mentioned this pull request Apr 3, 2026

📝 Add docstrings to addGmx2FramesNode #464

Merged

fix code rabbit remarks

c4a2475

coderabbitai Bot reviewed Apr 3, 2026

View reviewed changes

Comment thread ipsuite/data_loading/add_data_gromacs.py

Comment thread ipsuite/data_loading/add_data_gromacs.py

Comment thread ipsuite/data_loading/add_data_gromacs.py Outdated

coderabbitai Bot mentioned this pull request May 6, 2026

fix: CodeRabbit auto-fixes for PR #463 #468

Merged

coderabbitai Bot and others added 5 commits May 6, 2026 20:19

fix: apply CodeRabbit auto-fixes (#468)

d8207ed

Fixed 1 file(s) based on 2 unresolved review comments. Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

rename to AddDataGMX

c259ba8

pre commit cleanup

c695783

add new node to init

88fd54a

[pre-commit.ci] auto fixes from pre-commit.com hooks

67b2ee2

for more information, see https://pre-commit.ci

coderabbitai Bot reviewed May 6, 2026

View reviewed changes

		# GROMACS pressure in bar -> store as-is (not ASE native eV/Å³)
		stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])

Conversation

MrJulEnergy commented Apr 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixes Applied Successfully

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MrJulEnergy commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

coderabbitai Bot commented Apr 3, 2026 •

edited

Loading

coderabbitai Bot left a comment •

edited

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading