Do you need to file an issue?
Is your feature request related to a problem? Please describe.
GraphRAG currently constructs its community hierarchy using Leiden clustering (graphrag/index/operations/cluster_graph.py). Leiden is stochastic (seed-dependent), and on large entity graphs the hierarchical community detection step can be a significant portion of indexing time.
In our research ("Core-based Hierarchies for Efficient GraphRAG"), we found that k-core decomposition can build the community hierarchy deterministically and more efficiently, while producing communities of comparable or better quality on the standard GraphRAG global-search benchmarks. There is currently no way to swap the community-detection strategy in GraphRAG without modifying core code.
Describe the solution you'd like
Add k-core–based hierarchical community construction as an optional, pluggable alternative to Leiden, selectable via config and CLI. Leiden remains the default, so behavior is unchanged unless explicitly opted into.
Concretely:
- A new operation kcore_cluster_graph() that peels the graph by k-core number, splits each level into size-bounded communities, and produces the similar Communities structure Leiden returns (so all downstream workflows are untouched).
- Three heuristic variants from the paper: RkH (residual-aware k-core hierarchy), M2hC, and MRC.
- A new community_algo field on ClusterGraphConfig (default "leiden") and a --community flag on graphrag index.
- Branch in create_communities to dispatch to Leiden vs. k-core based on that config.
I intend to implement this myself. The work is already prototyped as a fork of GraphRAG v2.7.0, available here: https://github.com/erdemUB/KDD26. I will port it onto the latest main and open a PR with tests and a semversioner change doc. I'd like maintainer input on the preferred extension point before I submit: i.e. whether you'd prefer a config-string switch as above, or a more formal pluggable "clustering strategy" interface.
Additional context
- This work has been accepted at KDD'26. Paper: Core-based Hierarchies for Efficient GraphRAG — https://arxiv.org/pdf/2603.05207
- The change is small and localized: one new operation file plus a config field, a CLI flag, and a dispatch branch. The new path reuses the existing create_graph, stable_largest_connected_component, and the entire downstream community-report/summarization pipeline unchanged.
Do you need to file an issue?
Is your feature request related to a problem? Please describe.
GraphRAG currently constructs its community hierarchy using Leiden clustering (graphrag/index/operations/cluster_graph.py). Leiden is stochastic (seed-dependent), and on large entity graphs the hierarchical community detection step can be a significant portion of indexing time.
In our research ("Core-based Hierarchies for Efficient GraphRAG"), we found that k-core decomposition can build the community hierarchy deterministically and more efficiently, while producing communities of comparable or better quality on the standard GraphRAG global-search benchmarks. There is currently no way to swap the community-detection strategy in GraphRAG without modifying core code.
Describe the solution you'd like
Add k-core–based hierarchical community construction as an optional, pluggable alternative to Leiden, selectable via config and CLI. Leiden remains the default, so behavior is unchanged unless explicitly opted into.
Concretely:
- A new operation kcore_cluster_graph() that peels the graph by k-core number, splits each level into size-bounded communities, and produces the similar Communities structure Leiden returns (so all downstream workflows are untouched).
- Three heuristic variants from the paper: RkH (residual-aware k-core hierarchy), M2hC, and MRC.
- A new community_algo field on ClusterGraphConfig (default "leiden") and a --community flag on graphrag index.
- Branch in create_communities to dispatch to Leiden vs. k-core based on that config.
I intend to implement this myself. The work is already prototyped as a fork of GraphRAG v2.7.0, available here: https://github.com/erdemUB/KDD26. I will port it onto the latest main and open a PR with tests and a semversioner change doc. I'd like maintainer input on the preferred extension point before I submit: i.e. whether you'd prefer a config-string switch as above, or a more formal pluggable "clustering strategy" interface.
Additional context