Skip to content

refactor: add internal-schema-cache-lock-id#4962

Draft
mkleczek wants to merge 2 commits into
PostgREST:mainfrom
mkleczek:push-moumrvslwqvy
Draft

refactor: add internal-schema-cache-lock-id#4962
mkleczek wants to merge 2 commits into
PostgREST:mainfrom
mkleczek:push-moumrvslwqvy

Conversation

@mkleczek
Copy link
Copy Markdown
Collaborator

@mkleczek mkleczek commented May 28, 2026

This change adds possibility of step-by-step execution of SQL statements in querySchemaCache function.
If a new internal-schema-cache-lock-id config is set (to an integer value), querySchemaCache will execute SELECT pg_advisory_xact_lock(<internal-schema-cache-lock-id>, <next_lock_number>) before each query (next_lock_number starts from 0 and is incremented after each statement).

That way it is possible to control the execution of queries in querySchemaCache, eg:

In a database session:

SELECT pg_advisory_lock(<internal-schema-cache-lock-id>, lock_number) FROM generate_series(0, 10) AS lock_number

Trigger schema cache loading...

In the database session:

-- Unblock running first query in querySchemaCache
SELECT pg_advisory_unlock(<internal-schema-cache-lock-id>, 0)

Run some actions before querySchemaCache executes second query.

-- Unblock running the second query in querySchemaCache
SELECT pg_advisory_unlock(<internal-schema-cache-lock-id>, 1)

... etc.

@mkleczek mkleczek marked this pull request as draft May 28, 2026 05:15
@mkleczek mkleczek requested a review from steve-chavez May 28, 2026 05:28
@mkleczek mkleczek force-pushed the push-moumrvslwqvy branch from 81d9e4d to 0c9d5a7 Compare May 28, 2026 05:36
Comment thread test/io/conftest.py Outdated
"Slow schema cache load environment PostgREST."
return {
**defaultenv,
"PGRST_INTERNAL_SCHEMA_CACHE_LOCK_ID": "1111",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Do we really need to specify an id externally? Seems like it could just be a boolean and we generate the id internally?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- Unblock running first query in querySchemaCache
SELECT pg_advisory_unlock(, 0)

Ah, I guess we couldn't do the above if we generate the id internally. The idea is to run the unlocks during testing right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkleczek I think we can make this clearer and just have a boolean config by adding a wrapper SQL function.

So our schema cache queries have a sort of name id:

28/May/2026:13:05:53 -0500: tables: 50.432 ms, keydeps: 29.778 ms, rels: 8.315 ms, funcs: 48.383 ms, comprels: 5.557 ms, dreps: 2.049 ms, mhandlers: 5.745 ms, tzones: 39.676 ms

So we could have:

SELECT pgrst_scache_advisory_unlock('tables')
SELECT pgrst_scache_advisory_unlock('keydeps')
SELECT pgrst_scache_advisory_unlock('rels')
-- ...

The wrapper func would have to internally translate those to hardcoded lock ids, which I think it's safe to do for our tests.

WDYT?


Also, do we really need to unlock them individually or would a pg_advisory_unlock_all be enough? If so maybe we don't need to know the ids on the tests.

Comment thread src/PostgREST/SchemaCache.hs Outdated
@steve-chavez
Copy link
Copy Markdown
Member

This looks like a great idea 💯. Left some comments.

Comment thread src/PostgREST/SchemaCache.hs Outdated
, dbQueryTimings = qsTime
}
(`evalStateT` 0) $ do
tabs <- sqlWithLock $ sqlTimedStmt gucTbls conf allTables
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to add a comment here on why have added sqlWithLock here? It would help whoever touches the code here later.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be able to remove the evalStateT if we use hardcoded ids plus a wrapper as I mentioned above.

@mkleczek mkleczek force-pushed the push-moumrvslwqvy branch 2 times, most recently from 32a2995 to 0e7cf88 Compare May 29, 2026 14:06
@mkleczek mkleczek force-pushed the push-moumrvslwqvy branch 5 times, most recently from ec5270c to 5bc5cdb Compare May 30, 2026 11:07
Copy link
Copy Markdown
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit, I have not followed the whole discussion about that internal schema sleep thing and the related tests, but do we really need to be able to step through the schema cache queries? Or would it be enough to somehow block any schema cache from loading for a while?

I just tested a LOCK TABLE pg_class IN ACCESS EXCLUSIVE MODE;, which works as expected - you can't select from pg_class anymore while a different transaction holds this lock. This should at least allow to queue up some schema cache reloads and see whether all of them execute.

Comment on lines +169 to +170
-- if configInternalSCLockId is set run queries step-by-step waiting for lock release before each
SQL.runSteppedTransaction @SchemaCacheLabel configInternalSCLockId $
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are changing the way PostgREST operates depending on whether we are in a test-case or not?

Hard no from me. We really want to test the actual code we're running in production and not something that is just running that way in tests.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are changing the way PostgREST operates depending on whether we are in a test-case or not?

We already have it: internal-schema-cache-query-sleep.
This PR goal is to replace it with much more capable internal-schema-cache-lock-id

Hard no from me. We really want to test the actual code we're running in production and not something that is just running that way in tests.

That's a noble goal but some tests - especially concurrency related - cannot be really written without certain features available (eg. suspending computation at specific moments in time etc.). If you are able to provide such facilities without cooperation from the program, I am all for using them. So far nobody came up with anything like that.

@mkleczek
Copy link
Copy Markdown
Collaborator Author

I admit, I have not followed the whole discussion about that internal schema sleep thing and the related tests, but do we really need to be able to step through the schema cache queries? Or would it be enough to somehow block any schema cache from loading for a while?

See this discussion: https://github.com/PostgREST/postgrest/pull/4880/changes#r3228963615

This is the reason why we even discuss this PR :)

I just tested a LOCK TABLE pg_class IN ACCESS EXCLUSIVE MODE;, which works as expected - you can't select from pg_class anymore while a different transaction holds this lock. This should at least allow to queue up some schema cache reloads and see whether all of them execute.

This is too coarse grained, I think. The issue is that we want to be able to trigger a schema change in a specific moments during already running schema reloading process.

This triggered some thinking and going into the rabbit hole...

I would say it would be useful if we had a debugging facility allowing us to perform step-by-step execution of sql statements, as long as it does not require very intrusive changes in the code.

@wolfgangwalther
Copy link
Copy Markdown
Member

This is too coarse grained, I think. The issue is that we want to be able to trigger a schema change in a specific moments during already running schema reloading process.

After looking at the thread, it looks like we only need a way to block towards the end of the schema cache reload, right?

I assume that PostgREST loads the schema cache in a single transaction in order, so timezones - if enabled - always come last? In that case an exclusive access lock on pg_timezone_names should ensure that the first schema cache reload loads all the actual table/function data and then waits. Then the second one does the same with the new schema. Then you release both and test that the data is up2date for the second reload. Is that enough to rewrite the existing test while throwing that internal sleep thing away?

@mkleczek
Copy link
Copy Markdown
Collaborator Author

This is too coarse grained, I think. The issue is that we want to be able to trigger a schema change in a specific moments during already running schema reloading process.

After looking at the thread, it looks like we only need a way to block towards the end of the schema cache reload, right?

I assume that PostgREST loads the schema cache in a single transaction in order, so timezones - if enabled - always come last? In that case an exclusive access lock on pg_timezone_names should ensure that the first schema cache reload loads all the actual table/function data and then waits. Then the second one does the same with the new schema. Then you release both and test that the data is up2date for the second reload. Is that enough to rewrite the existing test while throwing that internal sleep thing away?

I did consider doing it that way but I don't really like it because:

I'd say that using access locks on catalog tables is in reality a hack providing a "poor man" step-by-step query execution.

This PR is an attempt to provide this kind of infrastructure. This infrastructure (in the new SqlTransaction module) is independent from the way how it is used. In this PR it is guarded by config property (replacing existing config).

@wolfgangwalther
Copy link
Copy Markdown
Member

it makes the test bound to internals of querySchemaCache (in particular order of queries) which in turn makes the test brittle

That's true, but I consider a test that is bound to internals better than a test that requires changing internals.

it only covers this particular test, which in turn is not really a good test - see this comment: https://github.com/PostgREST/postgrest/pull/4880/changes#r3312439266

I understood that this deficiency would be covered by (additionally) checking the explicit number of schema cache reloads that happened, right?

So the thing we're really trying to do is to get rid of that internal sleep thing. Which the locking would do.

I'd say that using access locks on catalog tables is in reality a hack providing a "poor man" step-by-step query execution.

Sure, I agree on that. Tests are full of hacks, though - and it just doesn't help to avoid hacks, but to then not run the actual code that runs in production.

Now if we had tons of tests that make use of this infrastructure, it could make sense to think about how to implement it in a mostly transparent way - but we're talking about a single test here and I have not heard about any other ideas or requirements for other tests in this area. I might have missed that part, though.

@mkleczek mkleczek force-pushed the push-moumrvslwqvy branch 8 times, most recently from 0bbf723 to 57ddd27 Compare May 31, 2026 21:32
@@ -0,0 +1,201 @@
{-# LANGUAGE AllowAmbiguousTypes #-}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked at the code in this file and it's quite a lot. Is this really needed for this new test machinery?

Copy link
Copy Markdown
Collaborator Author

@mkleczek mkleczek Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed to have type level labeling of statements, like:

tabs <- SQL.statement @Tables conf allTables
keyDeps <- SQL.statement @KeyDependencies conf allViewsKeyDependencies
m2oRels <- SQL.statement @Relationships mempty allM2OandO2ORels

Label in the signature of SQL.statement is not present in any of the parameter types:
statement :: forall {k} (label :: k) m a b. (SqlTransaction k m, LabelConstraint k label) => a -> SQL.Statement a b -> m b - so the type of the label cannot be inferred from the arguments alone - hence AllowAmbiguousTypes.

I want to the statements to be labeled like this so that the labels are stable even if you change the underlying mechanism to execute them.
For example: you can add individual statement timing and step-by-step execution to any transaction transparently - to validate the concept I've implemented two PoCs:

  • step-by-step execution of queries during request handling
  • individual query timings added to Server-Timing header

Bear with me - I am still working on it and trying to come up with the best (ie. most ergonomic and stable) API. So it might change.

@mkleczek mkleczek force-pushed the push-moumrvslwqvy branch from 38137e9 to 2766b62 Compare June 5, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants