Mpi evaluationmanager dev#761
Open
nychiang wants to merge 9 commits into
Open
Conversation
bug fixed Fix multi-node MPI worker detection in EvaluationManager Fixed critical bug in _get_num_workers() that prevented correct worker counting for true multi-node MPI execution (mpiexec -n N where N > 1). Key changes: - evaluation_manager.py: Query MPI.COMM_WORLD.Get_size() - 1 instead of relying solely on MPI4PY_FUTURES_MAX_WORKERS environment variable - util.py: Enhanced MPIEvaluator documentation with multi-node examples - EvaluationManagerCI.py: Added -e flag to test thread/process/mpi executors with proper rank handling for multi-node MPI Before: Multi-node profiling incorrectly reported 1 worker After: Correctly reports N-1 workers for mpiexec -n N Backward compatible - legacy single-node MPI mode still supported. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
EvaluationManagerCI.py now handles all executor types (thread/process/mpi) making the separate MPI-only example redundant. update fix CI
Major fixes: - Fix MPI hanging by switching to OpenMPI and python -m mpi4py.futures - Fix missing profiling output by storing timing data as instance variables New features: - Comprehensive scaling test infrastructure (submit_bo_scaling.sh) - BODriverEX_mpi.py: Simple 2D test problem for MPI scaling - Thread/Process/MPI executor test scripts - Complete documentation suite Details in COMMIT_MESSAGE.txt and CHANGELOG.md Files changed: - hiopbbpy/utils/evaluation_manager.py (profiling fix) - hiopbbpy/utils/util.py (debug output) - test_multinode_mpi.sh (OpenMPI + proper launcher) - Created: BODriverEX_mpi.py, scaling test scripts, documentation Tested on LLNL LC (Dane) with 1-64 workers, single and multi-node.
9c8e5f9 to
80c54df
Compare
thartland
reviewed
May 20, 2026
thartland
reviewed
May 20, 2026
thartland
reviewed
May 20, 2026
| if __name__ == "__main__": | ||
| do_profiling = True | ||
|
|
||
| for prob_type in prob_type_l: |
Collaborator
There was a problem hiding this comment.
If prob_type_l contains one element do we want to keep this loop over elements of prob_type_l?
Collaborator
Author
There was a problem hiding this comment.
umm... let's keep it since we can easily switch prob_type_l to a list with more than one elements
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
|
|
||
| **BODriverEX_mpi.py**: Simple 2D LpNorm optimization | ||
| - 64 initial samples, 20 BO iterations | ||
| - No xfoil dependency, fast evaluations |
Collaborator
There was a problem hiding this comment.
Remove "No xfoil dependency"
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
thartland
reviewed
May 28, 2026
nychiang
added a commit
that referenced
this pull request
May 28, 2026
Remove all legacy xfoil references from shell scripts and documentation, and clean up unused variables in response to reviewer feedback. Changes: - Update job names in test scripts from xfoil_* to eval_mgr_* - Remove unused HIOP_XFOIL_JOB_ROOT and TOTAL_TASKS variables - Remove xfoil references from README.md and TESTING_GUIDE.md - Remove references to non-existent files (README_EvaluationManager.md, README_XFOIL.md) - Remove ds4mems package dependency mention Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Remove all legacy xfoil references from shell scripts and documentation, and clean up unused variables in response to reviewer feedback. Changes: - Update job names in test scripts from xfoil_* to eval_mgr_* - Remove unused HIOP_XFOIL_JOB_ROOT and TOTAL_TASKS variables - Remove xfoil references from README.md and TESTING_GUIDE.md - Remove references to non-existent files (README_EvaluationManager.md, README_XFOIL.md) - Remove ds4mems package dependency mention
Remove remaining xfoil sections that were missed in the previous cleanup: - Production xfoil BO section - xfoil Problems subsection with file references - xfoil_bo/submit_bo_xfoil.sbatch reference
b261406 to
42c3a01
Compare
Collaborator
Author
All the comments have been addressed in the new commits. |
The --no-cache flag was preventing Spack from using cached dependency builds, causing 'compiler-wrapper not installed' errors. Dependencies installed in the previous step should be available from cache for the package-only build.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Redesign EvaluationManager to work with MPI on multiple nodes, and add comprehensive testing