Skip to content

Retain raw performance using internal C Function#245

Open
Vizonex wants to merge 26 commits into
aio-libs:masterfrom
Vizonex:_helpers_h
Open

Retain raw performance using internal C Function#245
Vizonex wants to merge 26 commits into
aio-libs:masterfrom
Vizonex:_helpers_h

Conversation

@Vizonex
Copy link
Copy Markdown
Member

@Vizonex Vizonex commented May 19, 2026

What do these changes do?

These changes attempt to retain the raw performance of propcache's under_cached_property class by using a C function instead of a Cython one.

  • This change is inspired by something I ended doing with reductable-params
    which is a low level packer/sender tool that filters keyword arguments out of a function before sending allowing for higher level callback customization. Though that project seeks to move entirely to C, it stores all of it's critical functions in the header file to help retain it's raw performance, this concept is what I'm seeking to try and do with propcache to give it the same boost.
  • If we ever decide to migrate fully to C Like with multidict, this function that has been made can be used as a blueprint incase needed.

Feel free to let me know if you want me to give the other class cached_property the same treatmet.

Are there changes in behavior for the user?

Related issue number

Edit: This does not close 244 as 244 is a workflow concurrency issue. Yes I'm noticing it on my libraries also.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes

@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided There is a change note present in this PR label May 19, 2026
@Vizonex Vizonex changed the title Retain raw performance using header file Retain raw performance using internal C Function May 19, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 19, 2026

Merging this PR will not alter performance

✅ 4 untouched benchmarks


Comparing Vizonex:_helpers_h (d005d50) with master (bf54e5b)

Open in CodSpeed

@Vizonex
Copy link
Copy Markdown
Member Author

Vizonex commented May 19, 2026

Seems tests/test_benchmarks.py in this line of code wants to play games with the code coverage, otherwise this PR should work as expected. .-.

except ImportError:
    pytestmark = pytest.mark.skip("pytest_codspeed needs to be installed")

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 20, 2026

@aiolibsbot review

@bdraco
Copy link
Copy Markdown
Member

bdraco commented May 20, 2026

Too tired to give it a pass tonight but asked the bot to take a look in case it can same some human cycles

@aiolibsbot
Copy link
Copy Markdown
Contributor

aiolibsbot commented May 20, 2026

PR Review — Retain raw performance using internal C Function

The reference-counting bug flagged in the earlier review is fixed in this revision: on a cache miss the function returns the PyObject_CallOneArg reference directly (with PyDict_SetItem adding its own dict reference) and only Py_INCREFs the borrowed reference on the hit path, with Py_CLEAR cleanup on the PyDict_SetItem failure path — refcounts match the original Cython. Cython's automatic NULL->exception propagation for the object-returning extern is also correct, so there is no blocking correctness bug in the current diff. What's missing is justification and test coverage: CodSpeed reports no performance change and the author notes #244 is a workflow issue, so the 'retain raw performance' premise is undemonstrated and the in-code #244 attribution is misleading; no new tests were added, and the existing relative-delta refcount tests cannot detect a constant-offset leak — the exact failure class this C rewrite risks. Recommended before merge: attach a benchmark (or reframe as a maintainability/blueprint change), add a backend-parity refcount test plus coverage for the new non-dict branch, fix the changelog wording/classification, and consider a real .h file instead of C-in-a-docstring. Not blocking-buggy, but not ready to merge as-is.


🟡 Important

1. Validate the performance premise; comment misattributes #244 (`src/propcache/_helpers_c.pyx`, L7-8)

First, the good news: this revision's refcounting now looks correct. The miss path returns the PyObject_CallOneArg reference directly (and PyDict_SetItem adds its own dict reference), and only the hit path Py_INCREFs the borrowed PyDict_GetItem result — with Py_CLEAR cleanup on the PyDict_SetItem failure path. That matches the original Cython refcounts, so the leak flagged in the earlier review is resolved.

The remaining question is whether the change earns its keep. CodSpeed reports 'will not alter performance' with 4 untouched benchmarks, so the stated goal — retain raw performance — is not demonstrated by any measurement. The inline comment here (/* Fixes performance regression when generating cython code. */ / /* SEE: .../issues/244 */) also conflicts with your own note that #244 is a workflow-concurrency issue, not a perf regression. Please either (a) attach a benchmark showing the regression this restores, or (b) drop the #244 attribution and reframe the PR as a maintainability/blueprint change. As written, the diff trades a few lines of readable Cython for hand-maintained C with no measured upside.

/* Fixes performance regression when generating cython code. */
/* SEE: https://github.com/aio-libs/propcache/issues/244 */
2. No tests added; existing refcount tests can't catch a constant-offset leak (`src/propcache/_helpers_c.pyx`, L44-50)

The checklist says 'Unit tests for the changes exist,' but the diff adds none. Two concrete gaps:

  1. The new if (!PyDict_Check(cache)) branch (raising TypeError("Expected dict, ...")) is never exercised.

  2. More importantly, test_under_cached_property_no_refcount_leak / test_cached_property_no_refcount_leak anchor every assertion to initial_refcount, captured after the first (miss-path) access. A constant +1 over-count on the miss path would inflate that baseline uniformly, so all the relative-delta assertions still pass — i.e. these tests would NOT have caught the leak the earlier revision had. Since this PR rewrites exactly that refcount-critical path in C, add a test that pins absolute semantics on a miss: e.g. after first access assert sys.getrefcount(result) matches the pure-Python backend's value, or assert the cache holds exactly one reference and that clearing the cache + dropping the only external ref collects the object. Parametrize across the C and Py backends so they must agree.

object under_cached_property_get(
    object wrapped, object name, object cache, object inst
)

🟢 Suggestions

1. Prefer a real .h file over C embedded in a Cython verbatim docstring (`src/propcache/_helpers_c.pyx`, L5-6)

Embedding the function body inside the cdef extern from "Python.h": """...""" verbatim block works, but it's hard to maintain: no C syntax highlighting, no separate compilation unit, and the C lives inside what reads like a docstring. If the C-function approach is kept, move the body to a real header (e.g. src/propcache/_helpers.h) and cdef extern from "_helpers.h". That gives you a normal C file to format/lint, keeps the .pyx to declarations only, and matches how other aio-libs C accelerators are structured.

cdef extern from "Python.h":
    """
2. Hit path has a borrowed-ref race under the free-threaded build (`src/propcache/_helpers_c.pyx`, L41-42)

The module declares freethreading_compatible=True. On the hit path, PyDict_GetItem returns a borrowed reference and you Py_INCREF it afterwards; under the free-threaded (no-GIL) build another thread could evict and free that entry between the lookup and the INCREF. This is pre-existing (the prior Cython code did the same), so it isn't introduced here and is arguably out of scope — but since you're rewriting this path in C and re-asserting free-threading support, consider PyDict_GetItemRef (CPython 3.13+, with a pythoncapi-compat fallback for 3.10–3.12), which atomically returns a strong reference and removes the race.

Py_INCREF(val);  /* borrowed from PyDict_GetItem */
return val;
3. Changelog: 'it's' -> 'its', and 'bugfix' classification is questionable (`CHANGES/245.bugfix.rst`, L1-3)

Two nits: (1) it's critical logic should be its critical logic (possessive, no apostrophe). (2) The fragment is named 245.bugfix.rst, but there's no demonstrated bug or regression being fixed (CodSpeed shows no perf delta, and #244 is a workflow issue per your own comment). Recent fragments in CHANGES/ use .contrib.rst; contrib or misc would describe this change more accurately than bugfix.

Retain raw performance of propcache by keeping it's critical logic

Checklist

  • Reference counting correct on hit and miss paths
  • Owned references released on error paths (Py_CLEAR on SetItem failure)
  • Cython NULL->exception propagation for object-returning extern
  • New code branches covered by tests — warning #2
  • Performance benefit demonstrated by benchmark — warning #1
  • Changelog entry accurate — suggestion #4
  • No hardcoded secrets or unsafe operations

Summary

The reference-counting bug flagged in the earlier review is fixed in this revision: on a cache miss the function returns the PyObject_CallOneArg reference directly (with PyDict_SetItem adding its own dict reference) and only Py_INCREFs the borrowed reference on the hit path, with Py_CLEAR cleanup on the PyDict_SetItem failure path — refcounts match the original Cython. Cython's automatic NULL->exception propagation for the object-returning extern is also correct, so there is no blocking correctness bug in the current diff. What's missing is justification and test coverage: CodSpeed reports no performance change and the author notes #244 is a workflow issue, so the 'retain raw performance' premise is undemonstrated and the in-code #244 attribution is misleading; no new tests were added, and the existing relative-delta refcount tests cannot detect a constant-offset leak — the exact failure class this C rewrite risks. Recommended before merge: attach a benchmark (or reframe as a maintainability/blueprint change), add a backend-parity refcount test plus coverage for the new non-dict branch, fix the changelog wording/classification, and consider a real .h file instead of C-in-a-docstring. Not blocking-buggy, but not ready to merge as-is.


To rebase specific severity levels, mention me: @aiolibsbot rebase critical (fixes 🔴 only), @aiolibsbot rebase important (fixes 🔴 + 🟡), or just @aiolibsbot rebase for all.


Automated review by Kōanebca276
fde50f5
109b3a7
e936ad5
a95f2f2
067c6c6
90930ea
6163eb9
5723881
951f14c
a4a11fa
99b85f6
b1fb8cc
b5fe811
431da53
e0a2d7b
5104841
8e66e62
f38ad33
8800269
64dfa66
7d054a1

@aiolibsbot
Copy link
Copy Markdown
Contributor

@Vizonex: Seems tests/test_benchmarks.py in this line of code wants to play games with the code coverage, ot...

The coverage drop you're seeing is just the environmental except ImportError: pytestmark = pytest.mark.skip(...) branch in tests/test_benchmarks.py — that line only executes when pytest_codspeed isn't installed, so it shows as uncovered in runs where it is installed (or vice-versa). You can silence it with # pragma: no cover on the except ImportError: branch, or add pytest_codspeed to the test extra so the import always succeeds and the line is covered. It's unrelated to the PR's correctness.

The more important testing point: the existing test_under_cached_property_no_refcount_leak / test_cached_property_no_refcount_leak tests will pass even though this PR leaks a reference on every cache miss. All their assertions are relative to initial_refcount (so a constant +1 offset cancels), and count_sentinels() only counts distinct live instances that the local variables keep alive anyway. To actually catch the leak, add a case that creates the value via a miss, then drops BOTH the local and the cache entry (del a._cache["prop"]), and asserts count_sentinels() == initial_sentinel_count after a gc.collect(). See my inline comment on _helpers_c.pyx for the refcount trace and the fix.

@aiolibsbot
Copy link
Copy Markdown
Contributor

@bdraco: Too tired to give it a pass tonight but asked the bot to take a look in case it can same some human ...

Bottom line so you can triage quickly: there's one blocking issue. The new C under_cached_property_get leaks a reference on every cache miss — it holds onto the strong ref from PyObject_CallOneArg, lets PyDict_SetItem take its own ref, and then Py_XINCREFs again before returning (net +3 instead of the original +2). The original Cython did Py_DECREF(val) to drop the call's ref; that step is missing here. It hits both under_cached_property and cached_property. The author's // TODO: Validate if that is true... comment marks the exact spot.

CI is green because the no_refcount_leak tests only assert relative deltas, so the constant offset is invisible — don't take the passing suite as confirmation.

Fix is one line of logic: return val; directly on the miss path (you already own that ref), and Py_INCREF(val); return val; only on the hit path. Details + suggested regression test are in the inline comments.

@Vizonex
Copy link
Copy Markdown
Member Author

Vizonex commented May 21, 2026

code coverage in the pypy section seems the be the culprit with this PR's workflows not passing. Someone is going to need to look into fixing that a bit later.

@aiolibsbot
Copy link
Copy Markdown
Contributor

@Vizonex: Seems tests/test_benchmarks.py in this line of code wants to play games with the code coverage, ot...

The mechanism behind the coverage drop: when pytest_codspeed can't be imported, pytestmark = pytest.mark.skip(...) skips the entire tests/test_benchmarks.py module, so none of its lines are recorded as executed and total coverage falls under the gate. Two clean fixes: (1) install pytest-codspeed in the coverage CI job so the benchmarks actually run (they're tiny and fast), or (2) exclude the benchmark module from coverage measurement, e.g. [tool.coverage.run] omit = ["tests/test_benchmarks.py"]. Option (1) is preferable since it keeps the benchmarks exercised on every run rather than silently dropping them; option (2) is fine if you'd rather not add the plugin to the coverage matrix.

@aiolibsbot
Copy link
Copy Markdown
Contributor

@Vizonex: code coverage in the pypy section seems the be the culprit with this PR's workflows not passing. Som...

One thing worth noting for triage: in src/propcache/_helpers.py, NO_EXTENSIONS is forced True whenever sys.implementation.name != "cpython", so PyPy imports the pure-Python _helpers_py.py and never loads the compiled _helpers_c extension. That means none of the C-function changes in this PR are exercised under PyPy at all — the coverage delta there isn't caused by this diff's C code. It's the same test_benchmarks.py module-skip issue: under PyPy the codspeed plugin path differs, the benchmark module gets skipped, and its lines drop out of the total. Fixing it at the coverage-config level (omit the benchmark module, or ensure the plugin is present in the coverage job) should clear it without touching this PR's logic.

@Vizonex
Copy link
Copy Markdown
Member Author

Vizonex commented May 22, 2026

@Vizonex: Seems tests/test_benchmarks.py in this line of code wants to play games with the code coverage, ot...

The mechanism behind the coverage drop: when pytest_codspeed can't be imported, pytestmark = pytest.mark.skip(...) skips the entire tests/test_benchmarks.py module, so none of its lines are recorded as executed and total coverage falls under the gate. Two clean fixes: (1) install pytest-codspeed in the coverage CI job so the benchmarks actually run (they're tiny and fast), or (2) exclude the benchmark module from coverage measurement, e.g. [tool.coverage.run] omit = ["tests/test_benchmarks.py"]. Option (1) is preferable since it keeps the benchmarks exercised on every run rather than silently dropping them; option (2) is fine if you'd rather not add the plugin to the coverage matrix.

@bdraco Any thoughts on omitting tests/test_benchmarks.py?

Comment thread CHANGES/245.bugfix.rst
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No imperative mood and no period before the em dash plz.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are my changes much improved now?

Retain raw performance of propcache by moving critical sections to C
-- by :user:`Vizonex`.

@Vizonex Vizonex requested a review from webknjaz May 23, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided There is a change note present in this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants