Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
# Umpire Agent Guide

This file provides guidance to coding agents working in this repository.

## Overview

Umpire is a resource management library for discovering, provisioning, and managing memory on machines with multiple memory devices like NUMA nodes and GPUs. It provides a unified interface to allocate and free data across different memory spaces (host, device, unified, pinned, etc.) and supports various memory allocation strategies (pools, advisors, prefetchers, etc.).

## Repo Skills

Use the narrowest matching repo-local skill under `skills/` before making non-trivial changes:

Use more than one only when a task genuinely spans multiple areas, such as a backend change that also needs new tests.

## ⚠️ WARNING: Auto-Generated Code

**NEVER directly edit auto-generated files.** Umpire uses code generation tools, and your changes will be overwritten.

### Fortran Interface (Shroud)

The Fortran interface is generated using [Shroud](https://shroud.readthedocs.io/en/latest/).

**Auto-generated files** (DO NOT EDIT):
- `src/umpire/interface/c_fortran/*.f` - All Fortran files
- `src/umpire/interface/c_fortran/wrap*.cpp` - C wrapper files
- `src/umpire/interface/c_fortran/wrap*.h` - C wrapper headers
- `src/umpire/interface/c_fortran/types*.h` - Type definition files
- `src/umpire/interface/c_fortran/genc*.inc` - Generated include files

**To modify Fortran interface:**
1. Edit `src/umpire/interface/umpire_shroud.yaml`
2. Run Shroud to regenerate files (via the build system or GitHub workflow)
3. Never manually edit the generated `.f`, `.cpp`, or `.h` files

**Identifying auto-generated files:**
- Look for headers like `! Generated by genumpiresplicer.py` or `! wrapf*.f`
- Check for Shroud copyright/generation comments at the top
- Files in `c_fortran/` directory starting with `wrap*` or `gen*` are generated

## Build System

Umpire uses CMake and BLT (Build, Link, and Test) as its build system. **BLT is included as a submodule** - always ensure submodules are initialized before building.

### Initial Setup

```bash
git submodule init && git submodule update
mkdir build && cd build
cmake ..
make
```

### Running Tests

```bash
# From build directory
make test
# Or use ctest directly
ctest
ctest -R <test_name_pattern> # Run specific tests
```

### Running Single Test

```bash
# From build directory
./bin/<test_executable>
# Example:
./bin/allocator_tests
```

### Common CMake Options

- `UMPIRE_ENABLE_*` options are Umpire-owned; some build examples also use BLT-facing `ENABLE_*` options, and `UMPIRE_ENABLE_OPENMP_TARGET` is distinct from `UMPIRE_ENABLE_OPENMP`
- `BLT_CXX_STD`: C++ standard (default: c++20, minimum: c++20)
- `UMPIRE_ENABLE_CUDA`: Build with CUDA support (default: depends on ENABLE_CUDA)
- `UMPIRE_ENABLE_HIP`: Build with HIP support (default: depends on ENABLE_HIP)
- `UMPIRE_ENABLE_OPENMP`: Build with OpenMP support
- `UMPIRE_ENABLE_TESTS`: Build tests (default: On)
- `UMPIRE_ENABLE_EXAMPLES`: Build examples (default: On)
- `UMPIRE_ENABLE_BENCHMARKS`: Build benchmarks (requires UMPIRE_ENABLE_DEVELOPER_BENCHMARKS)
- `UMPIRE_ENABLE_LOGGING`: Enable logging (default: On)
- `CMAKE_BUILD_TYPE`: Build type (Release, Debug, RelWithDebInfo)

## Architecture

### Core Components

1. **ResourceManager** (`src/umpire/ResourceManager.{hpp,cpp}`): Singleton that manages all allocators and provides the primary interface for getting allocators and introspecting allocations.

2. **Allocator** (`src/umpire/Allocator.{hpp,cpp}`): User-facing interface for memory allocation/deallocation. Wraps an AllocationStrategy and provides a unified API.

3. **AllocationStrategy** (`src/umpire/strategy/AllocationStrategy.hpp`): Abstract base class for all allocation strategies. Strategies are composable and can be chained.

4. **MemoryResource** (`src/umpire/resource/MemoryResource.hpp`): Represents actual memory resources (e.g., host, CUDA device, HIP device).

5. **MemoryOperation** (`src/umpire/op/MemoryOperation.hpp`): Platform-specific operations for memory manipulation (copy, memset, prefetch, advise). Operations are registered in the MemoryOperationRegistry and selected based on source/destination resource types.

### Key Directories

- `src/umpire/`: Core library implementation
- `alloc/`: Low-level allocator implementations (MallocAllocator, CudaMallocAllocator, etc.)
- `resource/`: Memory resource implementations and factories
- `strategy/`: Allocation strategies (pools, advisors, prefetchers, limiters, etc.)
- `op/`: Memory operations (copy, memset, advise, prefetch)
- `util/`: Utility classes (AllocationMap, Platform, MemoryResourceTraits)
- `interface/`: C and Fortran interfaces
- `event/`: Event recording and replay functionality

- `tests/`: Test suite
- `unit/`: Unit tests for individual components
- `integration/`: Integration tests for end-to-end functionality
- `applications/`: Application-level tests

- `examples/`: Example code and tutorials
- `tutorial/`: Tutorial examples (C and Fortran)
- `cookbook/`: Recipe-style examples

### Memory Resources

Available memory resources (platform-dependent):
- `HOST`: Standard host memory
- `DEVICE`: GPU device memory (CUDA/HIP/SYCL)
- `UM`: Unified memory (CUDA/HIP managed memory)
- `PINNED`: Pinned/page-locked host memory
- `DEVICE_CONST`: Constant memory on device
- `FILE`: File-backed memory (memory-mapped files)
- `SHARED`: Shared memory between processes
- `SHARED::POSIX`: IPC shared memory (POSIX implementation)
- `SHARED::MPI3`: MPI-3 shared memory
- Note: Use full names (`SHARED::POSIX` or `SHARED::MPI3`) when both are enabled
- `NO_OP`: No-op memory resource (for testing/debugging)

### Allocation Strategies

Strategies can be composed to create complex allocation patterns:
- **Pools**: `DynamicPoolList`, `DynamicSizePool`, `QuickPool`, `FixedPool`, `MixedPool`
- **Advisors**: `AllocationAdvisor` (for memory access hints)
- **Prefetchers**: `AllocationPrefetcher` (for explicit prefetching)
- **Limiters**: `SizeLimiter` (enforce allocation size limits)
- **Alignment**: `AlignedAllocator` (enforce memory alignment)
- **NUMA**: `NumaPolicy` (NUMA node binding)

## Development Workflow

### Branch Strategy

- `develop`: Main branch for all development (PRs target this branch)
- Feature branches: `feature/<name>`
- Bugfix branches: `bugfix/<name>`
- Task branches: `task/<name>`
- Note: The `main` branch is deprecated and should not be used

### Code Style

- C++20 standard is required
- Use Doxygen comments for public APIs
- Follow existing code formatting patterns

### Adding New Features

1. Create feature branch from `develop`
2. Implement feature with tests
3. Add Doxygen documentation for new public APIs
4. Add a minimalistic example demonstrating basic functionality (in `examples/`, `examples/cookbook/`, or `examples/tutorial/`)
5. Ensure all tests pass
6. Create PR targeting `develop`

**Examples should be:**
- Simple and focused on demonstrating the core feature
- Self-contained and easy to compile
- Well-commented to explain what the feature does
- Useful for both automated testing and human understanding

### Testing Requirements

- Add unit tests for new classes/functions in `tests/unit/`
- Add integration tests for end-to-end features in `tests/integration/`
- Ensure tests pass on various configurations (host-only, CUDA, HIP)

## Common Patterns

### Getting an Allocator

```cpp
auto& rm = umpire::ResourceManager::getInstance();
umpire::Allocator alloc = rm.getAllocator("HOST");
```

### Creating a Pool

```cpp
auto& rm = umpire::ResourceManager::getInstance();
auto alloc = rm.getAllocator("HOST");
auto pool = rm.makeAllocator<umpire::strategy::QuickPool>(
"my_pool", alloc);
```

### Introspecting Allocations

```cpp
auto& rm = umpire::ResourceManager::getInstance();
auto record = rm.findAllocationRecord(ptr);
size_t size = record.size;
std::string name = record.name;
```

### Memory Operations

```cpp
auto& rm = umpire::ResourceManager::getInstance();
rm.copy(dest_ptr, src_ptr); // Automatically determines copy operation
rm.memset(ptr, 0); // Set memory to value
```

## Key Architectural Principles

1. **Strategy Pattern**: AllocationStrategy implementations allow flexible composition of memory management behaviors
2. **Factory Pattern**: MemoryResourceFactory creates appropriate resources based on platform capabilities
3. **Singleton Pattern**: ResourceManager is the central registry for all allocators
4. **Introspection**: AllocationMap tracks all allocations for debugging and introspection
5. **Platform Abstraction**: Platform-specific operations are abstracted through MemoryOperation registry

## Critical Constraints (HPC Performance Library)

Umpire is a **performance-critical HPC library**. All changes must preserve performance, portability, and API stability.
If you believe this is not possible, inform the user before proceeding.

### Architectural Invariants

**Core concepts and their relationships:**
- ResourceManager (singleton, thread-safe)
- Allocator (lightweight handle, must remain O(1) operations)
- If this will not be the case, inform the user
- MemoryResource (backend abstraction)
- AllocationStrategy (composable, may have different complexity)
- For example, you can apply a SizeLimiter strategy to a QuickPool to impose a strict upper bound on how much the QuickPool can grow.
- MemoryOperation (platform-specific operations)

**Must maintain:**
- Allocators remain lightweight handles (no heavy state)
- Allocation/deallocation O(1) unless strategy explicitly requires otherwise
- Backend support remains conditionally compiled (no forced dependencies)
- No backend-specific code in generic layers (strict separation)
- No cross-layer violations

### Performance Requirements

**In hot paths (allocate/deallocate), avoid:**
- Virtual calls (unless already required by design)
- `dynamic_cast` in performance-sensitive code
- `std::function` in allocator operations
- Exceptions in fast allocation paths
- Unnecessary heap allocations
- Hidden device synchronization (`cudaDeviceSynchronize()`, etc.)

**All new logic in allocation paths must justify its performance cost.**

### Thread Safety Rules

- ResourceManager is thread-safe (already implemented)
- Allocators must not introduce race conditions
- Strategies must document thread-safety guarantees
- No static non-const globals outside ResourceManager
- No global mutable state

### GPU Backend Rules

When working with CUDA, HIP, SYCL, or device allocators:
- Ensure proper conditional compilation (`#ifdef UMPIRE_ENABLE_CUDA`, etc.)
- No device-host synchronization unless explicitly required
- No implicit stream synchronization
- No hidden `cudaDeviceSynchronize()` or equivalent
- Document any synchronization points

### What You Can Do

- Add tests (unit, integration)
- Improve documentation
- Refactor internal implementation (without changing public API) - get approval from user first!
- Add new allocators or strategies (if explicitly requested)
- Fix bugs with tests demonstrating the issue

### What You Must NOT Do (Without Explicit Approval)

- **Edit auto-generated code** (see warning section above - edit source YAML instead)
- Break public API compatibility
- Modify allocator semantics silently
- Change default allocator behavior
- Modify ResourceManager initialization logic
- Alter memory tracking logic
- Change device memory semantics
- Remove or break backend support (CUDA, HIP, SYCL, etc.)
- Introduce runtime overhead to allocation fast paths
- Add global mutable state
- Embed backend-specific logic in generic code

### Testing Requirements

**All changes must:**
- Build with host-only configuration
- Add or update unit tests if behavior changes
- Avoid introducing nondeterminism in tests

**Testing considerations:**
- Ask the user if the test should build with CUDA enabled.
- Ask the user if the test should build with HIP enabled.
- Ask the user if the test should build with any other options enabled (e.g. sanitizer support, fortran enabled, etc.).
- If you are trying to test IPC Shared Memory or MPI3 shared memory and you are working on a Apple or Windows environment:
- Tell the user that a test can't be run here - they will have to use LC resources to successfully build and run that test

**Tests should:**
- Avoid large allocations unless done by design as part of the test
- Avoid device synchronization unless validating behavior
- Clean up all allocations
- Run quickly (this is a CI constraint)

### Code Style Expectations

- Follow existing formatting conventions (RAJA/LLNL C++ style)
- Prefer clarity over cleverness
- No unnecessary template metaprogramming
- Avoid deep inheritance chains
- Avoid overengineering or unnecessary abstraction layers
- Use Doxygen comments for all public APIs

### Documentation Requirements

When adding allocators, strategies, or resources:
- Update relevant documentation
- Add usage examples
- Document performance implications if possible
- Document backend constraints and requirements if possible
- Document thread-safety guarantees if possible

### When Uncertain

**If architectural impact is unclear:**
- Ask for clarification before implementing
- Do not guess about memory semantics
- Do not assume thread-safety without verification
- Do not make changes that could affect hot paths without discussion

**Umpire correctness and performance take precedence over feature velocity.**

## Important Conventions

- Memory sizes are in bytes
- Allocator names must be unique strings
- AllocationStrategy objects form a hierarchy (strategies can wrap other strategies)
- C++20 is the minimum required standard (enforced by CMake configuration)
- The ResourceManager must be initialized before use (happens automatically on first getInstance())

## Documentation

- Full documentation: https://umpire.readthedocs.io/
- Tutorial: https://umpire.readthedocs.io/en/develop/sphinx/tutorial.html
- API documentation is generated from Doxygen comments in headers

## Other Notes

- When you need to create an example or show the user some sample Umpire code which involves a memory pool, always use QuickPool unless:
- The allocation size is known to always be the same amount in bytes - then, you can use FixedPool
- The allocations need some sort of synchronization guarantee to avoid data races - then, you can use ResourceAwarePool
- The allocations will be deallocated in the opposite order in which they were allocated - then, you can use DynamicPoolList
- The user specifically asks for you to use a different pool

## Maintaining This File

**When to update this file:**

This file should be updated when making changes that affect how future developers (human or AI) work with the codebase:

- Adding new core components or architectural patterns
- Introducing new memory resources or allocation strategies
- Adding new build options or dependencies
- Changing development workflows or testing requirements
- Adding new code generation tools (like Shroud)
- Modifying branch strategies or contribution guidelines
- Adding new categories of files that should/shouldn't be edited
- Introducing new performance constraints or safety requirements

**At minimum:** When completing a major feature or architectural change, ask the user whether `AGENTS.md` and `CLAUDE.md` should be updated to reflect the changes. Consider:
- Would a future agent benefit from knowing about this?
- Are there new constraints or patterns that should be documented?

Keeping this file up-to-date ensures that future coding agents and human developers have accurate, helpful guidance.
1 change: 1 addition & 0 deletions CLAUDE.md
Loading
Loading