diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..f5c74c1f0 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,387 @@ +# Umpire Agent Guide + +This file provides guidance to coding agents working in this repository. + +## Overview + +Umpire is a resource management library for discovering, provisioning, and managing memory on machines with multiple memory devices like NUMA nodes and GPUs. It provides a unified interface to allocate and free data across different memory spaces (host, device, unified, pinned, etc.) and supports various memory allocation strategies (pools, advisors, prefetchers, etc.). + +## Repo Skills + +Use the narrowest matching repo-local skill under `skills/` before making non-trivial changes: + +Use more than one only when a task genuinely spans multiple areas, such as a backend change that also needs new tests. + +## ⚠️ WARNING: Auto-Generated Code + +**NEVER directly edit auto-generated files.** Umpire uses code generation tools, and your changes will be overwritten. + +### Fortran Interface (Shroud) + +The Fortran interface is generated using [Shroud](https://shroud.readthedocs.io/en/latest/). + +**Auto-generated files** (DO NOT EDIT): +- `src/umpire/interface/c_fortran/*.f` - All Fortran files +- `src/umpire/interface/c_fortran/wrap*.cpp` - C wrapper files +- `src/umpire/interface/c_fortran/wrap*.h` - C wrapper headers +- `src/umpire/interface/c_fortran/types*.h` - Type definition files +- `src/umpire/interface/c_fortran/genc*.inc` - Generated include files + +**To modify Fortran interface:** +1. Edit `src/umpire/interface/umpire_shroud.yaml` +2. Run Shroud to regenerate files (via the build system or GitHub workflow) +3. Never manually edit the generated `.f`, `.cpp`, or `.h` files + +**Identifying auto-generated files:** +- Look for headers like `! Generated by genumpiresplicer.py` or `! wrapf*.f` +- Check for Shroud copyright/generation comments at the top +- Files in `c_fortran/` directory starting with `wrap*` or `gen*` are generated + +## Build System + +Umpire uses CMake and BLT (Build, Link, and Test) as its build system. **BLT is included as a submodule** - always ensure submodules are initialized before building. + +### Initial Setup + +```bash +git submodule init && git submodule update +mkdir build && cd build +cmake .. +make +``` + +### Running Tests + +```bash +# From build directory +make test +# Or use ctest directly +ctest +ctest -R # Run specific tests +``` + +### Running Single Test + +```bash +# From build directory +./bin/ +# Example: +./bin/allocator_tests +``` + +### Common CMake Options + +- `UMPIRE_ENABLE_*` options are Umpire-owned; some build examples also use BLT-facing `ENABLE_*` options, and `UMPIRE_ENABLE_OPENMP_TARGET` is distinct from `UMPIRE_ENABLE_OPENMP` +- `BLT_CXX_STD`: C++ standard (default: c++20, minimum: c++20) +- `UMPIRE_ENABLE_CUDA`: Build with CUDA support (default: depends on ENABLE_CUDA) +- `UMPIRE_ENABLE_HIP`: Build with HIP support (default: depends on ENABLE_HIP) +- `UMPIRE_ENABLE_OPENMP`: Build with OpenMP support +- `UMPIRE_ENABLE_TESTS`: Build tests (default: On) +- `UMPIRE_ENABLE_EXAMPLES`: Build examples (default: On) +- `UMPIRE_ENABLE_BENCHMARKS`: Build benchmarks (requires UMPIRE_ENABLE_DEVELOPER_BENCHMARKS) +- `UMPIRE_ENABLE_LOGGING`: Enable logging (default: On) +- `CMAKE_BUILD_TYPE`: Build type (Release, Debug, RelWithDebInfo) + +## Architecture + +### Core Components + +1. **ResourceManager** (`src/umpire/ResourceManager.{hpp,cpp}`): Singleton that manages all allocators and provides the primary interface for getting allocators and introspecting allocations. + +2. **Allocator** (`src/umpire/Allocator.{hpp,cpp}`): User-facing interface for memory allocation/deallocation. Wraps an AllocationStrategy and provides a unified API. + +3. **AllocationStrategy** (`src/umpire/strategy/AllocationStrategy.hpp`): Abstract base class for all allocation strategies. Strategies are composable and can be chained. + +4. **MemoryResource** (`src/umpire/resource/MemoryResource.hpp`): Represents actual memory resources (e.g., host, CUDA device, HIP device). + +5. **MemoryOperation** (`src/umpire/op/MemoryOperation.hpp`): Platform-specific operations for memory manipulation (copy, memset, prefetch, advise). Operations are registered in the MemoryOperationRegistry and selected based on source/destination resource types. + +### Key Directories + +- `src/umpire/`: Core library implementation + - `alloc/`: Low-level allocator implementations (MallocAllocator, CudaMallocAllocator, etc.) + - `resource/`: Memory resource implementations and factories + - `strategy/`: Allocation strategies (pools, advisors, prefetchers, limiters, etc.) + - `op/`: Memory operations (copy, memset, advise, prefetch) + - `util/`: Utility classes (AllocationMap, Platform, MemoryResourceTraits) + - `interface/`: C and Fortran interfaces + - `event/`: Event recording and replay functionality + +- `tests/`: Test suite + - `unit/`: Unit tests for individual components + - `integration/`: Integration tests for end-to-end functionality + - `applications/`: Application-level tests + +- `examples/`: Example code and tutorials + - `tutorial/`: Tutorial examples (C and Fortran) + - `cookbook/`: Recipe-style examples + +### Memory Resources + +Available memory resources (platform-dependent): +- `HOST`: Standard host memory +- `DEVICE`: GPU device memory (CUDA/HIP/SYCL) +- `UM`: Unified memory (CUDA/HIP managed memory) +- `PINNED`: Pinned/page-locked host memory +- `DEVICE_CONST`: Constant memory on device +- `FILE`: File-backed memory (memory-mapped files) +- `SHARED`: Shared memory between processes + - `SHARED::POSIX`: IPC shared memory (POSIX implementation) + - `SHARED::MPI3`: MPI-3 shared memory + - Note: Use full names (`SHARED::POSIX` or `SHARED::MPI3`) when both are enabled +- `NO_OP`: No-op memory resource (for testing/debugging) + +### Allocation Strategies + +Strategies can be composed to create complex allocation patterns: +- **Pools**: `DynamicPoolList`, `DynamicSizePool`, `QuickPool`, `FixedPool`, `MixedPool` +- **Advisors**: `AllocationAdvisor` (for memory access hints) +- **Prefetchers**: `AllocationPrefetcher` (for explicit prefetching) +- **Limiters**: `SizeLimiter` (enforce allocation size limits) +- **Alignment**: `AlignedAllocator` (enforce memory alignment) +- **NUMA**: `NumaPolicy` (NUMA node binding) + +## Development Workflow + +### Branch Strategy + +- `develop`: Main branch for all development (PRs target this branch) +- Feature branches: `feature/` +- Bugfix branches: `bugfix/` +- Task branches: `task/` +- Note: The `main` branch is deprecated and should not be used + +### Code Style + +- C++20 standard is required +- Use Doxygen comments for public APIs +- Follow existing code formatting patterns + +### Adding New Features + +1. Create feature branch from `develop` +2. Implement feature with tests +3. Add Doxygen documentation for new public APIs +4. Add a minimalistic example demonstrating basic functionality (in `examples/`, `examples/cookbook/`, or `examples/tutorial/`) +5. Ensure all tests pass +6. Create PR targeting `develop` + +**Examples should be:** +- Simple and focused on demonstrating the core feature +- Self-contained and easy to compile +- Well-commented to explain what the feature does +- Useful for both automated testing and human understanding + +### Testing Requirements + +- Add unit tests for new classes/functions in `tests/unit/` +- Add integration tests for end-to-end features in `tests/integration/` +- Ensure tests pass on various configurations (host-only, CUDA, HIP) + +## Common Patterns + +### Getting an Allocator + +```cpp +auto& rm = umpire::ResourceManager::getInstance(); +umpire::Allocator alloc = rm.getAllocator("HOST"); +``` + +### Creating a Pool + +```cpp +auto& rm = umpire::ResourceManager::getInstance(); +auto alloc = rm.getAllocator("HOST"); +auto pool = rm.makeAllocator( + "my_pool", alloc); +``` + +### Introspecting Allocations + +```cpp +auto& rm = umpire::ResourceManager::getInstance(); +auto record = rm.findAllocationRecord(ptr); +size_t size = record.size; +std::string name = record.name; +``` + +### Memory Operations + +```cpp +auto& rm = umpire::ResourceManager::getInstance(); +rm.copy(dest_ptr, src_ptr); // Automatically determines copy operation +rm.memset(ptr, 0); // Set memory to value +``` + +## Key Architectural Principles + +1. **Strategy Pattern**: AllocationStrategy implementations allow flexible composition of memory management behaviors +2. **Factory Pattern**: MemoryResourceFactory creates appropriate resources based on platform capabilities +3. **Singleton Pattern**: ResourceManager is the central registry for all allocators +4. **Introspection**: AllocationMap tracks all allocations for debugging and introspection +5. **Platform Abstraction**: Platform-specific operations are abstracted through MemoryOperation registry + +## Critical Constraints (HPC Performance Library) + +Umpire is a **performance-critical HPC library**. All changes must preserve performance, portability, and API stability. +If you believe this is not possible, inform the user before proceeding. + +### Architectural Invariants + +**Core concepts and their relationships:** +- ResourceManager (singleton, thread-safe) +- Allocator (lightweight handle, must remain O(1) operations) + - If this will not be the case, inform the user +- MemoryResource (backend abstraction) +- AllocationStrategy (composable, may have different complexity) + - For example, you can apply a SizeLimiter strategy to a QuickPool to impose a strict upper bound on how much the QuickPool can grow. +- MemoryOperation (platform-specific operations) + +**Must maintain:** +- Allocators remain lightweight handles (no heavy state) +- Allocation/deallocation O(1) unless strategy explicitly requires otherwise +- Backend support remains conditionally compiled (no forced dependencies) +- No backend-specific code in generic layers (strict separation) +- No cross-layer violations + +### Performance Requirements + +**In hot paths (allocate/deallocate), avoid:** +- Virtual calls (unless already required by design) +- `dynamic_cast` in performance-sensitive code +- `std::function` in allocator operations +- Exceptions in fast allocation paths +- Unnecessary heap allocations +- Hidden device synchronization (`cudaDeviceSynchronize()`, etc.) + +**All new logic in allocation paths must justify its performance cost.** + +### Thread Safety Rules + +- ResourceManager is thread-safe (already implemented) +- Allocators must not introduce race conditions +- Strategies must document thread-safety guarantees +- No static non-const globals outside ResourceManager +- No global mutable state + +### GPU Backend Rules + +When working with CUDA, HIP, SYCL, or device allocators: +- Ensure proper conditional compilation (`#ifdef UMPIRE_ENABLE_CUDA`, etc.) +- No device-host synchronization unless explicitly required +- No implicit stream synchronization +- No hidden `cudaDeviceSynchronize()` or equivalent +- Document any synchronization points + +### What You Can Do + +- Add tests (unit, integration) +- Improve documentation +- Refactor internal implementation (without changing public API) - get approval from user first! +- Add new allocators or strategies (if explicitly requested) +- Fix bugs with tests demonstrating the issue + +### What You Must NOT Do (Without Explicit Approval) + +- **Edit auto-generated code** (see warning section above - edit source YAML instead) +- Break public API compatibility +- Modify allocator semantics silently +- Change default allocator behavior +- Modify ResourceManager initialization logic +- Alter memory tracking logic +- Change device memory semantics +- Remove or break backend support (CUDA, HIP, SYCL, etc.) +- Introduce runtime overhead to allocation fast paths +- Add global mutable state +- Embed backend-specific logic in generic code + +### Testing Requirements + +**All changes must:** +- Build with host-only configuration +- Add or update unit tests if behavior changes +- Avoid introducing nondeterminism in tests + +**Testing considerations:** +- Ask the user if the test should build with CUDA enabled. +- Ask the user if the test should build with HIP enabled. +- Ask the user if the test should build with any other options enabled (e.g. sanitizer support, fortran enabled, etc.). +- If you are trying to test IPC Shared Memory or MPI3 shared memory and you are working on a Apple or Windows environment: + - Tell the user that a test can't be run here - they will have to use LC resources to successfully build and run that test + +**Tests should:** +- Avoid large allocations unless done by design as part of the test +- Avoid device synchronization unless validating behavior +- Clean up all allocations +- Run quickly (this is a CI constraint) + +### Code Style Expectations + +- Follow existing formatting conventions (RAJA/LLNL C++ style) +- Prefer clarity over cleverness +- No unnecessary template metaprogramming +- Avoid deep inheritance chains +- Avoid overengineering or unnecessary abstraction layers +- Use Doxygen comments for all public APIs + +### Documentation Requirements + +When adding allocators, strategies, or resources: +- Update relevant documentation +- Add usage examples +- Document performance implications if possible +- Document backend constraints and requirements if possible +- Document thread-safety guarantees if possible + +### When Uncertain + +**If architectural impact is unclear:** +- Ask for clarification before implementing +- Do not guess about memory semantics +- Do not assume thread-safety without verification +- Do not make changes that could affect hot paths without discussion + +**Umpire correctness and performance take precedence over feature velocity.** + +## Important Conventions + +- Memory sizes are in bytes +- Allocator names must be unique strings +- AllocationStrategy objects form a hierarchy (strategies can wrap other strategies) +- C++20 is the minimum required standard (enforced by CMake configuration) +- The ResourceManager must be initialized before use (happens automatically on first getInstance()) + +## Documentation + +- Full documentation: https://umpire.readthedocs.io/ +- Tutorial: https://umpire.readthedocs.io/en/develop/sphinx/tutorial.html +- API documentation is generated from Doxygen comments in headers + +## Other Notes + +- When you need to create an example or show the user some sample Umpire code which involves a memory pool, always use QuickPool unless: + - The allocation size is known to always be the same amount in bytes - then, you can use FixedPool + - The allocations need some sort of synchronization guarantee to avoid data races - then, you can use ResourceAwarePool + - The allocations will be deallocated in the opposite order in which they were allocated - then, you can use DynamicPoolList + - The user specifically asks for you to use a different pool + +## Maintaining This File + +**When to update this file:** + +This file should be updated when making changes that affect how future developers (human or AI) work with the codebase: + +- Adding new core components or architectural patterns +- Introducing new memory resources or allocation strategies +- Adding new build options or dependencies +- Changing development workflows or testing requirements +- Adding new code generation tools (like Shroud) +- Modifying branch strategies or contribution guidelines +- Adding new categories of files that should/shouldn't be edited +- Introducing new performance constraints or safety requirements + +**At minimum:** When completing a major feature or architectural change, ask the user whether `AGENTS.md` and `CLAUDE.md` should be updated to reflect the changes. Consider: +- Would a future agent benefit from knowing about this? +- Are there new constraints or patterns that should be documented? + +Keeping this file up-to-date ensures that future coding agents and human developers have accurate, helpful guidance. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 000000000..47dc3e3d8 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file diff --git a/skills/building-the-code/SKILL.md b/skills/building-the-code/SKILL.md new file mode 100644 index 000000000..5d8201883 --- /dev/null +++ b/skills/building-the-code/SKILL.md @@ -0,0 +1,33 @@ +--- +name: building +description: Instructions for building Umpire +--- + +# Building + +Umpire uses a make-based system for building. Umpire has a few submodules which should be up-to-date (run `git submodule update --init --recursive` to make sure). Building should always happen in a separate `build` directory unless otherwise noted. + +The most common build recipe is to run `make -j`. If there is an error in the build, you can rerun with `make VERBOSE=1` to get more information. A summary of this verbose output should be given to the user. + +To clean a build run `make clean`. + +Try to build with more recent versions of compilers if possible. For example, building with `gcc` version 10.3.1 is better than building with 8.3.1. If you see you are using a very old version of a compiler, notify the user. For example, if you are using `gcc` version 4.9.3, notify the user right away! + +## Common Build Configurations + +Although Umpire should build from the `build` directory with just `cmake ../`, there are a few common build configurations that should always work. From within the `build` directory, common cmake commands are things like: + +- `cmake -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ ../` + - This will build Umpire with defaults using the `gcc` compiler. Other common compilers to use include `clang`. +- `cmake -DCMAKE_CXX_FLAGS="-fsanitize=address -g" -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=Debug -DENABLE_DEVELOPER_DEFAULTS=On ../` + - This will build Umpire with the clang sanitizer trying to detect memory leaks. + +If you need more information about cmake configuration options, be sure to check out the documentation at `https://umpire.readthedocs.io/en/develop/sphinx/advanced_configuration.html`. + +## Builds should not take a long time + +If the build is taking a long time, notify the user of your cmake command and build configuration. Umpire is relatively simple and typically does not involve a long build time. + +## Configuration Errors + +If you are building Umpire and see a CMake configuration error, try running `make clean` and reattempt the build. You can also try deleting the CMakeCache.txt file to regenerate the cmake. diff --git a/skills/running-the-code/SKILL.md b/skills/running-the-code/SKILL.md new file mode 100644 index 000000000..ac84e888a --- /dev/null +++ b/skills/running-the-code/SKILL.md @@ -0,0 +1,30 @@ +--- +name: running-the-code +description: Information on running the Umpire code and where to find examples. +--- + +# Running Umpire Examples + +After a build completes, the executable files will be placed in the `bin` folder within the `build` directory. You can run Umpire exectuables from the `build` directory and any corresponding output and/or output files will be placed in the `build` directory. + +## Examples + +Examples can be found in `Umpire/examples`. You will see that a `tutorial` and `cookbook` subdirectory exist within the `examples` directory. The `cookbook` subdirectory contains a lot of valuable how-to examples. It is called a `cookbook` because the examples within provide a "recipe" for how to do something. For example, the "recipe_no_introspection" example shows how to turn off introspection when creating a QuickPool allocator. + +## Sample Run Commands For Examples + +In this example, we are running the allocator.cxx example to see which allocator we "got" after running `rm.getAllocator` and to view all available allocators by name: +`./bin/allocator` + +This will generate output. If there is output generated from any Umpire example or test, make sure the user can see that. + +## Tests + +Tests can be found in `Umpire/tests`. From the `tests` directory, you can see that we have several different kinds of tests including `unit` and `integration` (and others). We try to keep the kinds of tests consistent with the folder they are created under. + +## Sample Run Commands for Tests + +In this example, we are running the `strategy_tests` test: +`ctest -T test -R strategy_tests --output-on-failure` + +Be sure to show the user any output, regardless of whether it is due to failure or success. diff --git a/skills/shroud/SKILL.md b/skills/shroud/SKILL.md new file mode 100644 index 000000000..52b2e9fe8 --- /dev/null +++ b/skills/shroud/SKILL.md @@ -0,0 +1,26 @@ +--- +name: shroud +description: Workflow guidance for working with Umpire's Fortran interface with shroud, including safe editing steps and common shroud pitfalls. +--- + +# Shroud Workflow + +Use this skill when you need to edit the Fortran interface in Umpire. Keep the generated sources consistent. Umpire uses Shroud to generate the Fortran interface. You can learn more about Shroud at `https://shroud.readthedocs.io/en/latest/`. + +## Mental Model (important) + +- `src/umpire/interface` is the location of Umpire's Fortran code. +- `src/umpire/interface/umpire_shroud.yaml` is the main file that describes how Umpire uses Shroud. + - The generated Fortran from `umpire_shroud` will overwrite any `src/umpire/interface/**.f` or `src/umpire/interface/**.cpp` files. + +After the `umpire_shroud.yaml` file is processed, the resulting .f and .cpp files are the ones that get compiled. + +## Safe Edit Procedure + +1. Make necessary edits to `src/umpire/interface/umpire_shroud.yaml` +2. Rebuild the code making sure to enable Fortran in the cmake configuration with `ENABLE_FORTRAN=On` +3. Verify the results by building the code and running `make test` + +## Common pitfalls + +- **Editing the wrong file**: if you edit `*.f` or `*.cpp` under `src/umpire/interface/`, you're editing the transformed source; changes will be overwritten by a github action which runs any time the umpire_shroud.yaml is edited to regenerate the fortran code.