Skip to content

K-core based hierarchical community construction as an alternative to Leiden #2407

@jakir-sust

Description

@jakir-sust

Do you need to file an issue?

  • I have searched the existing issues and this feature is not already filed.
  • My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
  • I believe this is a legitimate feature request, not just a question. If this is a question, please use the Discussions area.

Is your feature request related to a problem? Please describe.

GraphRAG currently constructs its community hierarchy using Leiden clustering (graphrag/index/operations/cluster_graph.py). Leiden is stochastic (seed-dependent), and on large entity graphs the hierarchical community detection step can be a significant portion of indexing time.

In our research ("Core-based Hierarchies for Efficient GraphRAG"), we found that k-core decomposition can build the community hierarchy deterministically and more efficiently, while producing communities of comparable or better quality on the standard GraphRAG global-search benchmarks. There is currently no way to swap the community-detection strategy in GraphRAG without modifying core code.

Describe the solution you'd like

Add k-core–based hierarchical community construction as an optional, pluggable alternative to Leiden, selectable via config and CLI. Leiden remains the default, so behavior is unchanged unless explicitly opted into.

Concretely:
- A new operation kcore_cluster_graph() that peels the graph by k-core number, splits each level into size-bounded communities, and produces the similar Communities structure Leiden returns (so all downstream workflows are untouched).
- Three heuristic variants from the paper: RkH (residual-aware k-core hierarchy), M2hC, and MRC.
- A new community_algo field on ClusterGraphConfig (default "leiden") and a --community flag on graphrag index.
- Branch in create_communities to dispatch to Leiden vs. k-core based on that config.

I intend to implement this myself. The work is already prototyped as a fork of GraphRAG v2.7.0, available here: https://github.com/erdemUB/KDD26. I will port it onto the latest main and open a PR with tests and a semversioner change doc. I'd like maintainer input on the preferred extension point before I submit: i.e. whether you'd prefer a config-string switch as above, or a more formal pluggable "clustering strategy" interface.

Additional context

  • This work has been accepted at KDD'26. Paper: Core-based Hierarchies for Efficient GraphRAG — https://arxiv.org/pdf/2603.05207
  • The change is small and localized: one new operation file plus a config field, a CLI flag, and a dispatch branch. The new path reuses the existing create_graph, stable_largest_connected_component, and the entire downstream community-report/summarization pipeline unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions