refactor: add internal-schema-cache-lock-id by mkleczek · Pull Request #4962 · PostgREST/postgrest

mkleczek · 2026-05-28T05:14:51Z

This change adds possibility of step-by-step execution of SQL statements in querySchemaCache function.
If a new internal-schema-cache-lock-id config is set (to an integer value), querySchemaCache will execute SELECT pg_advisory_xact_lock(<internal-schema-cache-lock-id>, <next_lock_number>) before each query (next_lock_number starts from 0 and is incremented after each statement).

That way it is possible to control the execution of queries in querySchemaCache, eg:

In a database session:

SELECT pg_advisory_lock(<internal-schema-cache-lock-id>, lock_number) FROM generate_series(0, 10) AS lock_number

Trigger schema cache loading...

In the database session:

-- Unblock running first query in querySchemaCache
SELECT pg_advisory_unlock(<internal-schema-cache-lock-id>, 0)

Run some actions before querySchemaCache executes second query.

-- Unblock running the second query in querySchemaCache
SELECT pg_advisory_unlock(<internal-schema-cache-lock-id>, 1)

... etc.

steve-chavez · 2026-05-28T15:21:35Z

    "Slow schema cache load environment PostgREST."
    return {
        **defaultenv,
+        "PGRST_INTERNAL_SCHEMA_CACHE_LOCK_ID": "1111",


Q: Do we really need to specify an id externally? Seems like it could just be a boolean and we generate the id internally?

-- Unblock running first query in querySchemaCache
SELECT pg_advisory_unlock(, 0)

Ah, I guess we couldn't do the above if we generate the id internally. The idea is to run the unlocks during testing right?

@mkleczek I think we can make this clearer and just have a boolean config by adding a wrapper SQL function.

So our schema cache queries have a sort of name id:

28/May/2026:13:05:53 -0500: tables: 50.432 ms, keydeps: 29.778 ms, rels: 8.315 ms, funcs: 48.383 ms, comprels: 5.557 ms, dreps: 2.049 ms, mhandlers: 5.745 ms, tzones: 39.676 ms

So we could have:

SELECT pgrst_scache_advisory_unlock('tables') SELECT pgrst_scache_advisory_unlock('keydeps') SELECT pgrst_scache_advisory_unlock('rels') -- ...

The wrapper func would have to internally translate those to hardcoded lock ids, which I think it's safe to do for our tests.

WDYT?

Also, do we really need to unlock them individually or would a pg_advisory_unlock_all be enough? If so maybe we don't need to know the ids on the tests.

steve-chavez · 2026-05-28T15:27:18Z

This looks like a great idea 💯. Left some comments.

taimoorzaeem · 2026-05-29T06:18:46Z

-    , dbQueryTimings = qsTime
-    }
+  (`evalStateT` 0) $ do
+    tabs    <- sqlWithLock $ sqlTimedStmt gucTbls  conf   allTables


Maybe it would be good to add a comment here on why have added sqlWithLock here? It would help whoever touches the code here later.

We should also be able to remove the evalStateT if we use hardcoded ids plus a wrapper as I mentioned above.

wolfgangwalther

I admit, I have not followed the whole discussion about that internal schema sleep thing and the related tests, but do we really need to be able to step through the schema cache queries? Or would it be enough to somehow block any schema cache from loading for a while?

I just tested a LOCK TABLE pg_class IN ACCESS EXCLUSIVE MODE;, which works as expected - you can't select from pg_class anymore while a different transaction holds this lock. This should at least allow to queue up some schema cache reloads and see whether all of them execute.

wolfgangwalther · 2026-05-30T11:14:22Z

+  -- if configInternalSCLockId is set run queries step-by-step waiting for lock release before each
+  SQL.runSteppedTransaction @SchemaCacheLabel configInternalSCLockId $


So we are changing the way PostgREST operates depending on whether we are in a test-case or not?

Hard no from me. We really want to test the actual code we're running in production and not something that is just running that way in tests.

So we are changing the way PostgREST operates depending on whether we are in a test-case or not?

We already have it: internal-schema-cache-query-sleep.
This PR goal is to replace it with much more capable internal-schema-cache-lock-id

Hard no from me. We really want to test the actual code we're running in production and not something that is just running that way in tests.

That's a noble goal but some tests - especially concurrency related - cannot be really written without certain features available (eg. suspending computation at specific moments in time etc.). If you are able to provide such facilities without cooperation from the program, I am all for using them. So far nobody came up with anything like that.

mkleczek · 2026-05-30T11:26:08Z

I admit, I have not followed the whole discussion about that internal schema sleep thing and the related tests, but do we really need to be able to step through the schema cache queries? Or would it be enough to somehow block any schema cache from loading for a while?

See this discussion: https://github.com/PostgREST/postgrest/pull/4880/changes#r3228963615

This is the reason why we even discuss this PR :)

I just tested a LOCK TABLE pg_class IN ACCESS EXCLUSIVE MODE;, which works as expected - you can't select from pg_class anymore while a different transaction holds this lock. This should at least allow to queue up some schema cache reloads and see whether all of them execute.

This is too coarse grained, I think. The issue is that we want to be able to trigger a schema change in a specific moments during already running schema reloading process.

This triggered some thinking and going into the rabbit hole...

I would say it would be useful if we had a debugging facility allowing us to perform step-by-step execution of sql statements, as long as it does not require very intrusive changes in the code.

wolfgangwalther · 2026-05-30T11:37:36Z

This is too coarse grained, I think. The issue is that we want to be able to trigger a schema change in a specific moments during already running schema reloading process.

After looking at the thread, it looks like we only need a way to block towards the end of the schema cache reload, right?

I assume that PostgREST loads the schema cache in a single transaction in order, so timezones - if enabled - always come last? In that case an exclusive access lock on pg_timezone_names should ensure that the first schema cache reload loads all the actual table/function data and then waits. Then the second one does the same with the new schema. Then you release both and test that the data is up2date for the second reload. Is that enough to rewrite the existing test while throwing that internal sleep thing away?

mkleczek · 2026-05-30T11:46:44Z

This is too coarse grained, I think. The issue is that we want to be able to trigger a schema change in a specific moments during already running schema reloading process.

After looking at the thread, it looks like we only need a way to block towards the end of the schema cache reload, right?

I assume that PostgREST loads the schema cache in a single transaction in order, so timezones - if enabled - always come last? In that case an exclusive access lock on pg_timezone_names should ensure that the first schema cache reload loads all the actual table/function data and then waits. Then the second one does the same with the new schema. Then you release both and test that the data is up2date for the second reload. Is that enough to rewrite the existing test while throwing that internal sleep thing away?

I did consider doing it that way but I don't really like it because:

it makes the test bound to internals of querySchemaCache (in particular order of queries) which in turn makes the test brittle
it only covers this particular test, which in turn is not really a good test - see this comment: https://github.com/PostgREST/postgrest/pull/4880/changes#r3312439266

I'd say that using access locks on catalog tables is in reality a hack providing a "poor man" step-by-step query execution.

This PR is an attempt to provide this kind of infrastructure. This infrastructure (in the new SqlTransaction module) is independent from the way how it is used. In this PR it is guarded by config property (replacing existing config).

wolfgangwalther · 2026-05-30T12:17:36Z

it makes the test bound to internals of querySchemaCache (in particular order of queries) which in turn makes the test brittle

That's true, but I consider a test that is bound to internals better than a test that requires changing internals.

it only covers this particular test, which in turn is not really a good test - see this comment: https://github.com/PostgREST/postgrest/pull/4880/changes#r3312439266

I understood that this deficiency would be covered by (additionally) checking the explicit number of schema cache reloads that happened, right?

So the thing we're really trying to do is to get rid of that internal sleep thing. Which the locking would do.

I'd say that using access locks on catalog tables is in reality a hack providing a "poor man" step-by-step query execution.

Sure, I agree on that. Tests are full of hacks, though - and it just doesn't help to avoid hacks, but to then not run the actual code that runs in production.

Now if we had tons of tests that make use of this infrastructure, it could make sense to think about how to implement it in a mostly transparent way - but we're talking about a single test here and I have not heard about any other ideas or requirements for other tests in this area. I might have missed that part, though.

steve-chavez · 2026-06-01T17:54:01Z

@@ -0,0 +1,201 @@
+{-# LANGUAGE AllowAmbiguousTypes        #-}


Just looked at the code in this file and it's quite a lot. Is this really needed for this new test machinery?

It is needed to have type level labeling of statements, like:

tabs <- SQL.statement @Tables conf allTables keyDeps <- SQL.statement @KeyDependencies conf allViewsKeyDependencies m2oRels <- SQL.statement @Relationships mempty allM2OandO2ORels

Label in the signature of SQL.statement is not present in any of the parameter types:
statement :: forall {k} (label :: k) m a b. (SqlTransaction k m, LabelConstraint k label) => a -> SQL.Statement a b -> m b - so the type of the label cannot be inferred from the arguments alone - hence AllowAmbiguousTypes.

I want to the statements to be labeled like this so that the labels are stable even if you change the underlying mechanism to execute them.
For example: you can add individual statement timing and step-by-step execution to any transaction transparently - to validate the concept I've implemented two PoCs:

step-by-step execution of queries during request handling

individual query timings added to Server-Timing header

Bear with me - I am still working on it and trying to come up with the best (ie. most ergonomic and stable) API. So it might change.

mkleczek marked this pull request as draft May 28, 2026 05:15

mkleczek mentioned this pull request May 28, 2026

fix: Start listening after schema cache load #4880

Open

mkleczek requested a review from steve-chavez May 28, 2026 05:28

mkleczek force-pushed the push-moumrvslwqvy branch from 81d9e4d to 0c9d5a7 Compare May 28, 2026 05:36

steve-chavez reviewed May 28, 2026

View reviewed changes

Comment thread src/PostgREST/SchemaCache.hs Outdated

taimoorzaeem reviewed May 29, 2026

View reviewed changes

mkleczek force-pushed the push-moumrvslwqvy branch 2 times, most recently from 32a2995 to 0e7cf88 Compare May 29, 2026 14:06

steve-chavez requested a review from wolfgangwalther May 29, 2026 16:57

mkleczek force-pushed the push-moumrvslwqvy branch 5 times, most recently from ec5270c to 5bc5cdb Compare May 30, 2026 11:07

wolfgangwalther reviewed May 30, 2026

View reviewed changes

mkleczek force-pushed the push-moumrvslwqvy branch 8 times, most recently from 0bbf723 to 57ddd27 Compare May 31, 2026 21:32

steve-chavez reviewed Jun 1, 2026

View reviewed changes

mkleczek force-pushed the push-moumrvslwqvy branch from 57ddd27 to 38137e9 Compare June 2, 2026 07:13

steve-chavez mentioned this pull request Jun 3, 2026

fix: request failures when work_mem is set on a role #4977

Merged

mkleczek added 2 commits June 5, 2026 07:56

refactor: add SqlTransaction module

89bee3d

refactor: add internal-schema-cache-lock-id

2766b62

mkleczek force-pushed the push-moumrvslwqvy branch from 38137e9 to 2766b62 Compare June 5, 2026 11:14

		-- if configInternalSCLockId is set run queries step-by-step waiting for lock release before each
		SQL.runSteppedTransaction @SchemaCacheLabel configInternalSCLockId $

Uh oh!

Conversation

mkleczek commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

steve-chavez commented May 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wolfgangwalther left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek commented May 30, 2026

Uh oh!

wolfgangwalther commented May 30, 2026

Uh oh!

mkleczek commented May 30, 2026

Uh oh!

wolfgangwalther commented May 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mkleczek Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

mkleczek commented May 28, 2026 •

edited

Loading

mkleczek Jun 2, 2026 •

edited

Loading