Skip to content

add gromacs to frames node#463

Open
MrJulEnergy wants to merge 9 commits into
mainfrom
addGmx2FramesNode
Open

add gromacs to frames node#463
MrJulEnergy wants to merge 9 commits into
mainfrom
addGmx2FramesNode

Conversation

@MrJulEnergy

@MrJulEnergy MrJulEnergy commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features
    • Added GROMACS → ASE trajectory conversion producing per-frame atoms with positions, cell, velocities, energies, stresses and other EDR terms when available.
    • New workflow node to run conversion, slice frames, persist results to a reusable frames file, and expose frames for downstream use.
    • Package exports updated to surface the new data-loading capability and demo entry for quick inspection.

@coderabbitai

coderabbitai Bot commented Apr 3, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a GROMACS→ASE converter function gmx_to_ase() and a zntrack node AddDataGMX that build ASE Atoms frames (positions, cell/pbc, velocities), optionally attach per-frame energies/stress/terms from .edr samples matched by time, and persist frames to frames.h5.

Changes

GROMACS-to-ASE Conversion & ZnTrack Node

Layer / File(s) Summary
Data Shape / Utilities
ipsuite/data_loading/add_data_gromacs.py
Adds type→element mapping and _get_symbols() to derive per-atom chemical symbols. Introduces EDR time/term matching helpers and logging scaffolding.
Core Conversion
ipsuite/data_loading/add_data_gromacs.py
Adds gmx_to_ase(topology, trajectory=None, edr=None, start=None, stop=None, step=None) that loads an MDAnalysis Universe, iterates selected timesteps, and constructs ase.Atoms with positions, optionally cell/pbc and velocities.
EDR Integration
ipsuite/data_loading/add_data_gromacs.py
When an .edr is provided: loads EDR samples, finds nearest sample by timestamp (with tolerance warnings), reads Potential for energy, assembles a 6-component stress from Pres-* terms when available, and collects all per-frame EDR terms.
Calculator Attachment
ipsuite/data_loading/add_data_gromacs.py
Attaches ase.calculators.singlepoint.SinglePointCalculator to frames when energy/forces/stress are present; populates energy, forces, stress, and stores extra EDR terms in calc.results.
Persistence / ZnTrack Node
ipsuite/data_loading/add_data_gromacs.py
Adds AddDataGMX(zntrack.Node) with parameters (topology, trajectory, edr, start, stop, step, frames_path), run() calling gmx_to_ase, and persists frames using znh5md.IO.extend(). Adds a frames property to read back stored ASE frames.
Public API Export
ipsuite/__init__.pyi, ipsuite/data_loading/__init__.py
Exports AddDataGMX from ipsuite.data_loading and updates __all__ to include the new node.
Example / CLI
ipsuite/data_loading/add_data_gromacs.py
Includes a __main__ demo showing example usage and printing frame/EDR keys.

Sequence Diagram

sequenceDiagram
    actor User
    participant AddDataGMX as AddDataGMX\n(ZnTrack Node)
    participant gmx_to_ase as gmx_to_ase\n(Function)
    participant MDAnalysis as MDAnalysis\n(Universe / Trajectory)
    participant EDR as EDR Parser
    participant ASE as ASE\n(Atoms + SinglePointCalculator)
    participant HDF5 as HDF5\n(frames.h5)

    User->>AddDataGMX: run()
    AddDataGMX->>gmx_to_ase: gmx_to_ase(topology, trajectory, edr, start, stop, step)
    gmx_to_ase->>MDAnalysis: load topology & trajectory
    MDAnalysis-->>gmx_to_ase: Universe & timesteps

    loop per selected timestep
        gmx_to_ase->>MDAnalysis: read frame (positions, box, velocities)
        alt edr provided
            gmx_to_ase->>EDR: load EDR & find nearest sample by time
            EDR-->>gmx_to_ase: energy, pressure terms, all EDR terms
            gmx_to_ase->>gmx_to_ase: build 6-component stress vector
        end
        gmx_to_ase->>ASE: create Atoms, set positions/cell/pbc/velocities
        gmx_to_ase->>ASE: attach SinglePointCalculator (energy, forces, stress, results)
        ASE-->>gmx_to_ase: Atoms frame
    end

    gmx_to_ase-->>AddDataGMX: list[Atoms]
    AddDataGMX->>HDF5: persist frames via znh5md.IO.extend()
    HDF5-->>AddDataGMX: confirmation

    User->>AddDataGMX: .frames
    AddDataGMX->>HDF5: read frames
    HDF5-->>User: list[Atoms]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through GROMACS trails by moonlight,

found atoms, boxes, velocities in flight,
matched EDR beats by nearest chime,
tucked energies and stresses in time,
frames snug in HDF5 — what a delight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'add gromacs to frames node' accurately describes the main change: introducing a new AddDataGMX node class that converts GROMACS data to ASE frames.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch addGmx2FramesNode

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)

134-134: Consider warning when EDR and trajectory times diverge significantly.

The nearest-time matching via argmin silently succeeds even when the time difference is large (e.g., different output frequencies). Consider logging a warning if the matched time differs by more than a small tolerance.

Example tolerance check
idx = np.argmin(np.abs(edr_times - ts.time))
time_diff = abs(edr_times[idx] - ts.time)
if time_diff > 0.1:  # ps, adjust as appropriate
    warnings.warn(f"EDR time {edr_times[idx]:.3f} differs from trajectory time {ts.time:.3f} by {time_diff:.3f} ps")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` at line 134, The nearest-time
matching using idx = np.argmin(np.abs(edr_times - ts.time)) can silently match
large time gaps; in the function or loop where that line appears (e.g., the
trajectory frame handling block in add_data_gromacs.py), compute time_diff =
abs(edr_times[idx] - ts.time) after selecting idx and if time_diff exceeds a
small tolerance (e.g., 0.1 ps, configurable) emit a warning (warnings.warn or
your logger.warn) indicating the EDR time, trajectory time, and the difference
so users know the outputs are misaligned; make the tolerance configurable or
documented.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 211: The run method in the class defined in add_data_gromacs.py is
missing a return type annotation; update the def run(self): signature to include
the appropriate return type (e.g., -> None or the actual return type used) so it
conforms with the project's full type annotation guideline, and update any
related docstring or callers if necessary; locate the method by the symbol run
in that file and add the correct type hint.
- Around line 202-207: The annotated types for node attributes are too narrow:
change trajectory and edr from Path to Optional[Path] and start, stop, step from
int to Optional[int] to reflect that zntrack.deps_path(None) /
zntrack.params(None) can return None; update the import to include
typing.Optional (e.g., from typing import Optional) and modify the annotations
for trajectory, edr, start, stop, and step accordingly while leaving topology as
Path.
- Around line 239-240: The example prints may raise an IndexError and shows the
wrong unit: guard access to frames[1] by checking len(frames) >= 2 before
referencing frames[1].calc.results (or else reference frame 0 consistently), and
fix the unit label by using the actual energy units returned by
frames[0].get_potential_energy() (or an energy_unit variable if present) instead
of hardcoding "kJ/mol" (use "eV" if energies are converted to eV); update the
two print lines to conditionally print the second-frame EDR terms only when a
second frame exists and to display the correct units from the energy
value/source.
- Line 135: The energy read from the EDR file (assigned to variable energy in
the line energy = float(edr_data["Potential"][idx])) is in kJ/mol and must be
converted to eV before passing to ASE's SinglePointCalculator; change the code
to convert from kJ/mol to eV (using ase.units, e.g. multiply/divide by units.kJ,
units.mol and units.eV) and supply the converted value to the
SinglePointCalculator so downstream ASE methods return correct energies.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Line 134: The nearest-time matching using idx = np.argmin(np.abs(edr_times -
ts.time)) can silently match large time gaps; in the function or loop where that
line appears (e.g., the trajectory frame handling block in add_data_gromacs.py),
compute time_diff = abs(edr_times[idx] - ts.time) after selecting idx and if
time_diff exceeds a small tolerance (e.g., 0.1 ps, configurable) emit a warning
(warnings.warn or your logger.warn) indicating the EDR time, trajectory time,
and the difference so users know the outputs are misaligned; make the tolerance
configurable or documented.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d200379-0275-4ef7-a180-661cd8621105

📥 Commits

Reviewing files that changed from the base of the PR and between 236f4ac and 4584b12.

📒 Files selected for processing (1)
  • ipsuite/data_loading/add_data_gromacs.py

Comment thread ipsuite/data_loading/add_data_gromacs.py Outdated
Comment on lines +145 to +146
# GROMACS pressure in bar -> store as-is (not ASE native eV/ų)
stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stress units mismatch: GROMACS pressure is in bar, ASE expects eV/ų.

The comment acknowledges the unit discrepancy, but storing stress in bar breaks ASE's stress conventions. Any code calling atoms.get_stress() will receive values in the wrong units, leading to incorrect virial/pressure calculations downstream.

Convert bar to eV/ų: 1 bar = 1e-4 GPa = 6.2415e-7 eV/ų (use ase.units for precision).

Proposed fix
+from ase import units
+
+# bar -> eV/ų
+BAR_TO_EV_ANG3 = 1.0 / (units.bar / (units.eV / units.Ang**3))
+
 ...
-                # GROMACS pressure in bar -> store as-is (not ASE native eV/ų)
-                stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])
+                # Convert GROMACS pressure (bar) to ASE stress (eV/ų)
+                stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * BAR_TO_EV_ANG3

Comment thread ipsuite/data_loading/add_data_gromacs.py Outdated
Comment thread ipsuite/data_loading/add_data_gromacs.py Outdated
Comment thread ipsuite/data_loading/add_data_gromacs.py Outdated
@coderabbitai

coderabbitai Bot commented Apr 3, 2026

Copy link
Copy Markdown

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #464

coderabbitai Bot added a commit that referenced this pull request Apr 3, 2026
Docstrings generation was requested by @MrJulEnergy.

* #463 (comment)

The following files were modified:

* `ipsuite/data_loading/add_data_gromacs.py`

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)

156-165: ⚠️ Potential issue | 🟠 Major

Convert the pressure tensor before assigning stress.

ASE works in eV-based units, and its pressure/stress quantities are represented in eV/ų; passing the raw Pres-* values here keeps atoms.get_stress() in bar instead. If you want to preserve the native tensor too, keep the bar numbers only under the Pres-* keys and convert the array you pass as stress. (ase-lib.org)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 156 - 165, The code
currently assigns raw GROMACS Pres-* values (in bar) directly to stress, which
leaves ASE seeing stress in bar; instead, convert those pressure components to
ASE internal units (eV/ų) before assigning to the stress array. Locate the
block reading edr_data["Pres-XX"/"Pres-YY"/..."] and replace the direct
assignment to stress with a conversion step (keep the raw bar values in edr_data
if you need them) — e.g., compute a conversion factor or use ASE unit helpers
(ase.units / ase.units.convert) to convert from bar to eV/ų and then set stress
= np.array([pxx_converted, pyy_converted, pzz_converted, pyz_converted,
pxz_converted, pxy_converted]) so atoms.get_stress() returns values in ASE
units.
🧹 Nitpick comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)

210-219: Make the node example runnable end-to-end.

Right now the example stops after construction, so it isn't testable in the same way as the other data-loading nodes. Add project.repro() and one observable result/assertion.

📝 Suggested docstring tweak
     >>> with project:
     ...     md = ips.Gmx2Frames(
     ...         topology="gromacs/system.gro",
     ...         trajectory="gromacs/production.xtc",
     ...         edr="gromacs/production.edr",
     ...         start=1,
     ...     )
+    >>> project.repro()
+    >>> len(md.frames) > 0
+    True

As per coding guidelines: "Each Node’s docstring should include a testable example using project and ips".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 210 - 219, The
docstring example for Gmx2Frames stops after construction and isn't runnable;
update it to call project.repro() and assert one observable outcome so it
executes end-to-end. Specifically, after constructing md = ips.Gmx2Frames(...),
invoke project.repro() and then check a result such as verifying md.frames (or
another public attribute/method on Gmx2Frames) has expected length/type (e.g.,
assert len(md.frames) > 0 or isinstance(md.frames, list/ndarray)) so the example
both runs and asserts behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 60-73: _match_edr_frame currently scans the whole edr_times array
per frame which is O(n×m) and silently returns an out-of-tolerance index;
replace the np.argmin approach with np.searchsorted on the monotonic edr_times
(or use a moving pointer maintained by the caller) to find the nearest candidate
on the left/right, compute the nearest index and time_diff, and if time_diff >
tolerance reject the match (raise a ValueError or return a sentinel like
None/-1) instead of returning a bad index; update callers to handle the
rejection and keep the function signature (_match_edr_frame(edr_times,
traj_time, tolerance)) and logging behavior.
- Around line 147-179: ts.forces are left in MDAnalysis units (kJ/(mol·Å)) but
ASE expects eV/Å; multiply the copied forces by the same conversion factor used
for energy (the (kJ / mol) factor) before passing into SinglePointCalculator.
Update the code that sets forces (the ts.forces.copy() branch) to convert units
(e.g., forces = ts.forces.copy() * (kJ / mol) when ts.has_forces) so the forces
variable passed to SinglePointCalculator(atoms, energy=..., forces=forces,
stress=...) is in eV/Å.
- Around line 44-57: The code currently derives element symbols from
u.atoms.types which misclassifies CHARMM/AMBER types; change the logic to use
atom names and MDAnalysis's element guessing instead: use u.atoms.names (or call
u.guess_TopologyAttrs(to_guess=['elements']) and read u.atoms.elements) and
build the symbols list from those guessed elements (or rely on
MDAnalysis.DefaultGuesser behavior) rather than transforming types; update the
block that references types, t, and symbols to use names/elements (keeping the
same return of symbols) and remove the manual type-based heuristics.

---

Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 156-165: The code currently assigns raw GROMACS Pres-* values (in
bar) directly to stress, which leaves ASE seeing stress in bar; instead, convert
those pressure components to ASE internal units (eV/ų) before assigning to the
stress array. Locate the block reading edr_data["Pres-XX"/"Pres-YY"/..."] and
replace the direct assignment to stress with a conversion step (keep the raw bar
values in edr_data if you need them) — e.g., compute a conversion factor or use
ASE unit helpers (ase.units / ase.units.convert) to convert from bar to eV/ų
and then set stress = np.array([pxx_converted, pyy_converted, pzz_converted,
pyz_converted, pxz_converted, pxy_converted]) so atoms.get_stress() returns
values in ASE units.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 210-219: The docstring example for Gmx2Frames stops after
construction and isn't runnable; update it to call project.repro() and assert
one observable outcome so it executes end-to-end. Specifically, after
constructing md = ips.Gmx2Frames(...), invoke project.repro() and then check a
result such as verifying md.frames (or another public attribute/method on
Gmx2Frames) has expected length/type (e.g., assert len(md.frames) > 0 or
isinstance(md.frames, list/ndarray)) so the example both runs and asserts
behavior.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3211bbb4-e6f9-44fd-bc99-ce5c595a7702

📥 Commits

Reviewing files that changed from the base of the PR and between 4584b12 and c4a2475.

📒 Files selected for processing (1)
  • ipsuite/data_loading/add_data_gromacs.py

Comment thread ipsuite/data_loading/add_data_gromacs.py
Comment thread ipsuite/data_loading/add_data_gromacs.py
Comment thread ipsuite/data_loading/add_data_gromacs.py Outdated
* 📝 Add docstrings to `addGmx2FramesNode`

Docstrings generation was requested by @MrJulEnergy.

* #463 (comment)

The following files were modified:

* `ipsuite/data_loading/add_data_gromacs.py`

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Julian Peters <108301472+MrJulEnergy@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 2 unresolved review comments.

A stacked PR containing fixes has been created.

  • Stacked PR: #468
  • Files modified:
  • ipsuite/data_loading/add_data_gromacs.py

Time taken: 2m 34s

coderabbitai Bot and others added 5 commits May 6, 2026 20:19
Fixed 1 file(s) based on 2 unresolved review comments.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
ipsuite/data_loading/add_data_gromacs.py (1)

178-189: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stress is still stored in bar, not ASE's eV/ų.

The added comment acknowledges the mismatch, but anything calling atoms.get_stress() will silently get wrong-unit values. Either convert (bar → eV/ų via ase.units) or do not attach stress= and keep the raw pressure terms only inside extra_results.

Proposed fix
-from ase.units import kJ, mol
+from ase.units import Ang, bar, eV, kJ, mol
@@
-                # GROMACS pressure in bar -> store as-is (not ASE native eV/ų)
-                stress = np.array([pxx, pyy, pzz, pyz, pxz, pxy])
+                # GROMACS pressure in bar -> ASE stress in eV/ų
+                bar_to_eV_per_A3 = bar / (eV / Ang**3)
+                stress = (
+                    np.array([pxx, pyy, pzz, pyz, pxz, pxy]) * bar_to_eV_per_A3
+                )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 178 - 189, The code is
attaching GROMACS pressures (in bar) as an ASE-style stress array (variable
stress built from edr_data[...] and idx) which will make atoms.get_stress()
return wrong units; either convert the values to ASE's eV/ų before attaching or
stop attaching them as stress and instead store the raw pressures in
extra_results. Fix by one of two options: (A) import ase.units and apply the
bar→eV/ų conversion to the stress array (convert the built stress variable)
before assigning it to the Atoms object so atoms.get_stress() is correct, or (B)
remove setting atoms.stress/stress= and instead place the raw pxx/pyy/... array
into the molecule/atoms.extra_results (e.g., "gromacs_pressure_bar") so callers
can convert explicitly; update references to atoms.get_stress() usage
accordingly.
🧹 Nitpick comments (2)
ipsuite/data_loading/add_data_gromacs.py (2)

252-283: 💤 Low value

run and frames docstrings use Google style, not NumPy.

The two methods use Google-style sections (Returns: / inline Parameters:). Per the project guideline, docstrings must be in NumPy style with ---- underlines.

Proposed fix
     def run(self) -> None:
-        """
-        Convert the configured GROMACS inputs into ASE Atoms frames and persist them
-        to the node's HDF5 output at self.frames_path.
-
-        The node's topology, optional trajectory, optional EDR file, and slicing
-        parameters (start, stop, step) are used to produce the frames which are
-        written to the frames_path via znh5md.
-        """
+        """Convert configured GROMACS inputs into ASE frames and persist them.
+
+        The frames are written to ``self.frames_path`` via ``znh5md``.
+        """
@@
     `@property`
     def frames(self) -> typing.List[Atoms]:
-        """
-        Return all ASE `Atoms` frames stored in the node's HDF5 frames file.
-
-        Returns:
-            typing.List[Atoms]: A list of ASE `Atoms` objects read
-            from the HDF5 file at `self.frames_path`.
-        """
+        """Return all ASE ``Atoms`` frames stored in the HDF5 frames file.
+
+        Returns
+        -------
+        list[Atoms]
+            Frames read from ``self.frames_path``.
+        """

As per coding guidelines: "Write docstrings in NumPy style".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 252 - 283, Update the
docstrings for the methods run and frames to NumPy style: replace the
Google-style "Returns:" and inline parameter descriptions with NumPy sections
using section headers followed by underlines (e.g., "Returns\n-------") and
format parameter descriptions under a "Parameters\n----------" block where
applicable; keep the existing prose but reformat the return type line in frames
to "Returns\n-------\ntyping.List[Atoms]\n    description..." and do the same
for run's description so both docstrings follow the project's NumPy docstring
convention.

36-67: 💤 Low value

Approve with one defensive nit on the type fallback.

The u.atoms.elements first-pass plus type fallback addresses the prior CHARMM-GUI concern. Two small things on the fallback path:

  • Line 66: t[0].upper() will IndexError on an empty type string. Guard with t[0:1].upper() or "X".
  • The docstring uses Google-style (Parameters: / Returns:); convert to NumPy style for consistency.

As per coding guidelines: "Write docstrings in NumPy style".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 36 - 67, The fallback
in _get_symbols can IndexError when an atom type string is empty and the
docstring style is inconsistent: update the fallback branch that currently uses
t[0].upper() to safely handle empty types (use a slice like t[0:1].upper() or a
default "X") and ensure the rest of the function still respects _TYPE_TO_ELEMENT
and types lookups; also convert the function docstring from Google-style
Parameters/Returns to the project's NumPy-style docstring format so it matches
coding guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 191-193: The code currently writes raw GROMACS EDR values into
extra_results (via the loop over edr_terms/edr_data), which mixes units with ASE
keys; change the storage so values go under a namespaced dict (e.g.,
extra_results.setdefault('gmx_edr', {}) and then set
extra_results['gmx_edr'][term] = float(edr_data[term][idx])) to avoid colliding
with ASE standard keys, and update the function docstring to state that gmx_edr
contains raw GROMACS units (kJ/mol, bar, K, …). Ensure you check/initialize the
'gmx_edr' dict before the loop and do not modify existing ASE keys like
energy/forces/stress.
- Around line 154-160: Atoms are being constructed with pbc=True regardless of
whether a valid cell exists (ts.dimensions), causing inconsistent state; change
the flow so you only enable periodic boundary conditions when a valid box is
detected (use ts.dimensions -> box and the same all(box[:3] > 0) check) —
construct Atoms with pbc=False (or omit pbc) initially and then if box is valid
call atoms.set_cell(box, scale_atoms=False) and set atoms.pbc = True (or
construct with pbc=True at that point) so pbc reflects whether the cell was
actually set.
- Around line 232-241: The docstring example incorrectly instantiates
ips.Gmx2Frames which doesn't exist; change the example to use the real public
class AddDataGMX and ensure it is shown being used with project and ips (e.g.
ips.AddDataGMX(...)) so it is testable; update the example arguments to match
AddDataGMX's signature (topology, trajectory, edr, start) and keep the
surrounding "with project:" context so doctests run correctly, referencing the
AddDataGMX symbol in the docstring.
- Around line 162-165: The velocity conversion is wrong: MDAnalysis gives
velocities in Å/ps and you must convert to ASE's internal time units by
dividing, not multiplying. In the block using ts.has_velocities and
atoms.set_velocities(ts.velocities / 1000.0), change the conversion to divide by
both 1000.0 and ASE's fs constant (ase.units.fs) so velocities = ts.velocities /
(1000.0 * ase.units.fs); ensure ase.units (or from ase import units as
ase_units) is imported and used, and remove any inversion that multiplies by fs.

---

Duplicate comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 178-189: The code is attaching GROMACS pressures (in bar) as an
ASE-style stress array (variable stress built from edr_data[...] and idx) which
will make atoms.get_stress() return wrong units; either convert the values to
ASE's eV/ų before attaching or stop attaching them as stress and instead store
the raw pressures in extra_results. Fix by one of two options: (A) import
ase.units and apply the bar→eV/ų conversion to the stress array (convert the
built stress variable) before assigning it to the Atoms object so
atoms.get_stress() is correct, or (B) remove setting atoms.stress/stress= and
instead place the raw pxx/pyy/... array into the molecule/atoms.extra_results
(e.g., "gromacs_pressure_bar") so callers can convert explicitly; update
references to atoms.get_stress() usage accordingly.

---

Nitpick comments:
In `@ipsuite/data_loading/add_data_gromacs.py`:
- Around line 252-283: Update the docstrings for the methods run and frames to
NumPy style: replace the Google-style "Returns:" and inline parameter
descriptions with NumPy sections using section headers followed by underlines
(e.g., "Returns\n-------") and format parameter descriptions under a
"Parameters\n----------" block where applicable; keep the existing prose but
reformat the return type line in frames to
"Returns\n-------\ntyping.List[Atoms]\n    description..." and do the same for
run's description so both docstrings follow the project's NumPy docstring
convention.
- Around line 36-67: The fallback in _get_symbols can IndexError when an atom
type string is empty and the docstring style is inconsistent: update the
fallback branch that currently uses t[0].upper() to safely handle empty types
(use a slice like t[0:1].upper() or a default "X") and ensure the rest of the
function still respects _TYPE_TO_ELEMENT and types lookups; also convert the
function docstring from Google-style Parameters/Returns to the project's
NumPy-style docstring format so it matches coding guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb8dff76-fac6-4793-b6a4-c93bc9321871

📥 Commits

Reviewing files that changed from the base of the PR and between c4a2475 and 67b2ee2.

📒 Files selected for processing (3)
  • ipsuite/__init__.pyi
  • ipsuite/data_loading/__init__.py
  • ipsuite/data_loading/add_data_gromacs.py

Comment on lines +154 to +160
positions = ts.positions.copy()
box = ts.dimensions

atoms = Atoms(symbols=symbols, positions=positions, pbc=True)

if box is not None and all(box[:3] > 0):
atoms.set_cell(box, scale_atoms=False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

pbc=True is hardcoded even when there is no valid cell.

Atoms are created with pbc=True before checking whether box is valid. For non-periodic systems (vacuum, or dimensions returning zeros), this produces inconsistent state: PBC enabled with no cell. Tie pbc to whether the cell was actually set.

Proposed fix
-        atoms = Atoms(symbols=symbols, positions=positions, pbc=True)
-
-        if box is not None and all(box[:3] > 0):
-            atoms.set_cell(box, scale_atoms=False)
+        has_cell = box is not None and all(box[:3] > 0)
+        atoms = Atoms(symbols=symbols, positions=positions, pbc=has_cell)
+        if has_cell:
+            atoms.set_cell(box, scale_atoms=False)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 154 - 160, Atoms are
being constructed with pbc=True regardless of whether a valid cell exists
(ts.dimensions), causing inconsistent state; change the flow so you only enable
periodic boundary conditions when a valid box is detected (use ts.dimensions ->
box and the same all(box[:3] > 0) check) — construct Atoms with pbc=False (or
omit pbc) initially and then if box is valid call atoms.set_cell(box,
scale_atoms=False) and set atoms.pbc = True (or construct with pbc=True at that
point) so pbc reflects whether the cell was actually set.

Comment on lines +162 to +165
# Velocities (e.g. from .gro or .trr)
if ts.has_velocities:
# MDAnalysis: Å/ps, ASE: Å/fs -> divide by 1000
atoms.set_velocities(ts.velocities / 1000.0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

ASE Atoms set_velocities expected units internal time unit ase.units.fs

💡 Result:

The ASE Atoms.set_velocities method expects velocities in ASE's internal units: length per time, where length is Ångström (Å = 1.0) and time is Å * sqrt(u / eV) ≈ 10.18 fs (since 1 fs ≈ 0.0982 in internal time units).[1][2] Velocities are dimensionless in the base system but interpreted as v_internal = Δposition / Δtime, with positions in Å and time derived from ase.units (eV=1, Å=1, u=1).[3][4][5] Kinetic energy is computed as sum(0.5 * m * v^2) in eV, confirming v has units sqrt(2*E/m) = sqrt(eV / u) / Å = 1 / time_unit.[6] To set velocities corresponding to a physical speed in fs^-1 (e.g., from VASP's Å/fs), scale by the time unit conversion: v_ase = v_physical * (1 fs / ase.units.fs) ≈ v_physical * 10.18, since ASE time unit ≈ 10.18 fs.[7][1] For example, 1 Å/fs in VASP requires ≈10.18 in ASE units to match kinetic energy.[7]

Citations:


🏁 Script executed:

# Find project configuration files to check ASE version
find . -maxdepth 2 -type f \( -name "setup.py" -o -name "setup.cfg" -o -name "pyproject.toml" -o -name "requirements.txt" -o -name "requirements*.txt" \) | head -20

Repository: zincware/IPSuite

Length of output: 77


🏁 Script executed:

# Read the actual file to see current imports and code
head -50 ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 1189


🏁 Script executed:

# Check the exact lines mentioned in the review
sed -n '155,175p' ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 820


🏁 Script executed:

# Read pyproject.toml to find ASE version pinning
cat pyproject.toml | grep -A 20 -i "ase\|dependencies"

Repository: zincware/IPSuite

Length of output: 1917


🏁 Script executed:

# Search for other uses of ase.units to see patterns
rg "from ase.units import|ase\.units\." --type py -B 2 -A 2

Repository: zincware/IPSuite

Length of output: 942


🏁 Script executed:

# Look for other velocity or momentum related code
rg "velocit|momentum|set_velocit" --type py -B 2 -A 2 | head -100

Repository: zincware/IPSuite

Length of output: 5529


🏁 Script executed:

# Search for tests related to gromacs loading
find . -name "*test*gromacs*" -o -name "*gromacs*test*" | head -10

Repository: zincware/IPSuite

Length of output: 42


🏁 Script executed:

# Check if there are any test files for data_loading
find . -path "*/test*" -name "*add_data_gromacs*" -o -path "*/test*" -name "*gromacs*"

Repository: zincware/IPSuite

Length of output: 42


🏁 Script executed:

# Look for any issue references or TODOs related to velocity
rg "TODO|FIXME|BUG|velocity.*unit|unit.*velocity" ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 42


🏁 Script executed:

# Check what ase.units.fs actually equals by looking at ASE documentation or imports
python3 << 'EOF'
try:
    from ase import units
    print(f"ase.units.fs = {units.fs}")
    print(f"1 / ase.units.fs = {1 / units.fs}")
    print(f"ASE time unit in fs: {1/units.fs:.4f} fs")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: zincware/IPSuite

Length of output: 89


🏁 Script executed:

# Search for tests or examples using add_data_gromacs or trajectory loading
find . -type f -name "*.py" -exec grep -l "add_data_gromacs\|gromacs\|MDAnalysis.*trajectory" {} \; | head -20

Repository: zincware/IPSuite

Length of output: 180


🏁 Script executed:

# Look for any set_velocities calls to see if there are other patterns
rg "set_velocities|get_velocities" --type py -B 3 -A 3

Repository: zincware/IPSuite

Length of output: 1320


🏁 Script executed:

# Check if there are notebooks or examples demonstrating this
find . -name "*.ipynb" -o -name "*example*" -type f | grep -E "(gromacs|traj|velocity)" | head -10

Repository: zincware/IPSuite

Length of output: 42


🏁 Script executed:

# Look at the comment in the current code more carefully - what does it claim ASE expects?
sed -n '160,170p' ipsuite/data_loading/add_data_gromacs.py

Repository: zincware/IPSuite

Length of output: 533


Velocity unit conversion is incorrect — values are ~10× too small.

MDAnalysis stores velocities in Å/ps. The current conversion only changes this to Å/fs by dividing by 1000. However, ASE's internal velocity unit is Å / (ASE time unit) where the ASE time unit ≈ 10.18 fs (equivalently, ase.units.fs ≈ 0.0982). The current code is missing the conversion from Å/fs to ASE's internal units. This causes velocities to be ~10× too small, leading to kinetic energies that are ~100× too small (via 0.5·m·v²).

The proposed fix has the operation inverted—it multiplies by fs (a small number < 1), which would make the problem worse. The correct conversion should divide by fs:

Corrected fix
-from ase.units import kJ, mol
+from ase.units import fs, kJ, mol
@@
         if ts.has_velocities:
-            # MDAnalysis: Å/ps, ASE: Å/fs -> divide by 1000
-            atoms.set_velocities(ts.velocities / 1000.0)
+            # MDAnalysis: Å/ps -> ASE internal velocity (Å / ASE time unit)
+            atoms.set_velocities(ts.velocities / (1000.0 * fs))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 162 - 165, The
velocity conversion is wrong: MDAnalysis gives velocities in Å/ps and you must
convert to ASE's internal time units by dividing, not multiplying. In the block
using ts.has_velocities and atoms.set_velocities(ts.velocities / 1000.0), change
the conversion to divide by both 1000.0 and ASE's fs constant (ase.units.fs) so
velocities = ts.velocities / (1000.0 * ase.units.fs); ensure ase.units (or from
ase import units as ase_units) is imported and used, and remove any inversion
that multiplies by fs.

Comment on lines +191 to +193
# Store all EDR terms for this frame
for term in edr_terms:
extra_results[term] = float(edr_data[term][idx])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

extra_results keeps EDR values in raw GROMACS units.

All EDR terms (kJ/mol energies, bar pressures, K temperatures, …) are written verbatim into calc.results alongside ASE-unit energy/forces/stress. Mixing unit systems on the same calculator is a footgun for downstream consumers. At minimum, document this in the function docstring; preferably namespace these (e.g., a gmx_edr dict) so they cannot collide with ASE's standard keys (energy, forces, stress, dipole, …).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 191 - 193, The code
currently writes raw GROMACS EDR values into extra_results (via the loop over
edr_terms/edr_data), which mixes units with ASE keys; change the storage so
values go under a namespaced dict (e.g., extra_results.setdefault('gmx_edr', {})
and then set extra_results['gmx_edr'][term] = float(edr_data[term][idx])) to
avoid colliding with ASE standard keys, and update the function docstring to
state that gmx_edr contains raw GROMACS units (kJ/mol, bar, K, …). Ensure you
check/initialize the 'gmx_edr' dict before the loop and do not modify existing
ASE keys like energy/forces/stress.

Comment on lines +232 to +241
Examples
--------
>>> with project:
... md = ips.Gmx2Frames(
... topology="gromacs/system.gro",
... trajectory="gromacs/production.xtc",
... edr="gromacs/production.edr",
... start=1,
... )
"""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Docstring example uses a non-existent class name and is not testable.

The example references ips.Gmx2Frames(...), but the public symbol is AddDataGMX (also in __all__). The example would raise AttributeError. Per coding guidelines, the Node docstring must include a testable example using project and ips.

Proposed fix
     Examples
     --------
-    >>> with project:
-    ...     md = ips.Gmx2Frames(
-    ...         topology="gromacs/system.gro",
-    ...         trajectory="gromacs/production.xtc",
-    ...         edr="gromacs/production.edr",
-    ...         start=1,
-    ...     )
+    >>> project = ips.Project()
+    >>> with project:
+    ...     md = ips.AddDataGMX(
+    ...         topology="gromacs/system.gro",
+    ...         trajectory="gromacs/production.xtc",
+    ...         edr="gromacs/production.edr",
+    ...         start=1,
+    ...     )
+    >>> project.run()

As per coding guidelines: "Each Node's docstring should include a testable example using project and ips".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ipsuite/data_loading/add_data_gromacs.py` around lines 232 - 241, The
docstring example incorrectly instantiates ips.Gmx2Frames which doesn't exist;
change the example to use the real public class AddDataGMX and ensure it is
shown being used with project and ips (e.g. ips.AddDataGMX(...)) so it is
testable; update the example arguments to match AddDataGMX's signature
(topology, trajectory, edr, start) and keep the surrounding "with project:"
context so doctests run correctly, referencing the AddDataGMX symbol in the
docstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant