Skip to content

test: negative pgrst_db_pool_available in metrics#4928

Merged
steve-chavez merged 1 commit into
PostgREST:mainfrom
steve-chavez:negative-connection
May 17, 2026
Merged

test: negative pgrst_db_pool_available in metrics#4928
steve-chavez merged 1 commit into
PostgREST:mainfrom
steve-chavez:negative-connection

Conversation

@steve-chavez

@steve-chavez steve-chavez commented May 15, 2026

Copy link
Copy Markdown
Member

Proves the failure on #4622.

This doesn't require additional test infra, only nginx. Taking advantage of the stream {} context which is also compatible with unix socket besides TCP.

Same idea as in #4622 (comment) but using unix socket.

@steve-chavez steve-chavez force-pushed the negative-connection branch 4 times, most recently from 04727cc to 7639520 Compare May 16, 2026 01:36
@steve-chavez

Copy link
Copy Markdown
Member Author

@wolfgangwalther Compared to #4855, this test is to the point (shorter/passes coverage) and if #4868 will make it irrelevant it will be easy to delete later IMO.

@steve-chavez steve-chavez force-pushed the negative-connection branch from 7639520 to deccad7 Compare May 16, 2026 02:02
Comment thread test/io/test_io.py Outdated
Comment thread test/io/nginx/nginx.conf Outdated
@wolfgangwalther

Copy link
Copy Markdown
Member

Compared to #4855, this test is to the point (shorter/passes coverage) and if #4868 will make it irrelevant it will be easy to delete later IMO.

Let's recap the problems I see with #4855 (including the two preliminary PRs introducing toxiproxy):

  • I think the observability test suite is wrong in general. Not a fault of that PR, though.
  • While I think toxiproxy is the right tool for the job, the specific way of introducing it had limitations that would soon be visible when writing more tests.
  • I believe the executable should be tested in its entirety for that test-case and not just some parts of code via haskell.

This PR tests the full executable and doesn't extend on the observability test-suite. Cool.

I will have to say, though, that using nginx in this way - while an interesting idea - is a huge step backwards from the toxiproxy approach. This is just not the right tool for the job. Nobody will use nginx as a proxy between PostgREST and PostgreSQL ever. This is entirely made up.

Now, you're essentially saying: In light of #4868 both of these approaches would be throw-away code and this one has less impact on testing infrastructure over all. I see that argument, yes, so I won't block on the addition. But I'll also have to say I can't happily approve such a hack either :)

If this helps unblocking #4672, please go ahead.

Proves the failure on PostgREST#4622.

This doesn't require additional test infra, only nginx. Taking advantage
of the `stream {}` context which is also compatible with unix socket
besides TCP.
@steve-chavez steve-chavez force-pushed the negative-connection branch from deccad7 to 3690c85 Compare May 17, 2026 16:29
@steve-chavez

steve-chavez commented May 17, 2026

Copy link
Copy Markdown
Member Author

This is just not the right tool for the job. Nobody will use nginx as a proxy between PostgREST and PostgreSQL ever. This is entirely made up.

I'd argue Nginx is way underused as TCP/unix socket proxy, all the lua directives of openresty (e.g. set_by_lua_block) don't work in the stream {} context, which indicates this low usage since openresty is a popular framework for Nginx plugin development. Since this is purely a test setup, I don't think it matters whether it's used in prod or not.

But sure, this can be temporary/throwaway so #4672 is not merged without complying to our own rule here:

All fixes or features must have a test proving the improvement.
https://github.com/PostgREST/postgrest/blob/main/CONTRIBUTING.md#code

I see that argument, yes, so I won't block on the addition. But I'll also have to say I can't happily approve such a hack either :)

Can't approve my own PR (I remember I used to be able to do this 😮) so I'll have to request other reviews.

@wolfgangwalther

Copy link
Copy Markdown
Member

Can't approve my own PR (I remember I used to be able to do this 😮) so I'll have to request other reviews.

You can just check the red box "Merge without waiting for requirements to be met".

@steve-chavez steve-chavez merged commit 70327cf into PostgREST:main May 17, 2026
53 of 54 checks passed
@wolfgangwalther

Copy link
Copy Markdown
Member

In https://github.com/PostgREST/postgrest/actions/runs/26001265044/job/76424775709?pr=4902, I get:

FAILED test/io/test_io.py::test_positive_pool_metric - [XPASS(strict)] pgrst_db_pool_available should not go negative on pg network failures

Does that mean the test does not reliably reproduce the issue and accidentally passed in this case?

@steve-chavez

Copy link
Copy Markdown
Member Author

Fixed on 8868d13. It just needed more sleep time, originally I put 4 seconds but I didn't want the test to take too long. We can also get rid of the sleep by using a wait_until_status_code for Nginx to start and then another for the pool metric to go negative, but I didn't want to add more python helpers.

Does that mean the test does not reliably reproduce the issue and accidentally passed in this case?

Well, that seems too harsh, it already passed multiple times and just failed once. It might be flaky but I wouldn't frame it as "accidentally".

@wolfgangwalther

Copy link
Copy Markdown
Member

Does that mean the test does not reliably reproduce the issue and accidentally passed in this case?

Well, that seems too harsh, it already passed multiple times and just failed once. It might be flaky but I wouldn't frame it as "accidentally".

No, I mean it passed (instead of the expected fail) accidentally in my PR.

@wolfgangwalther

wolfgangwalther commented May 18, 2026

Copy link
Copy Markdown
Member

Fixed on 8868d13.

Seems like it's not fixed, yet. I just got another "failure" (unexpected pass) looking the same in https://github.com/PostgREST/postgrest/actions/runs/26036288673/job/76535229229?pr=4935, which ran the most recent code.

@wolfgangwalther

Copy link
Copy Markdown
Member

We can also get rid of the sleep by using a wait_until_status_code for Nginx to start and then another for the pool metric to go negative, but I didn't want to add more python helpers.

Depending on timing instead of waiting for some event to happen is bad, see #3424. So I'm much in favor of doing this properly.

@mkleczek

mkleczek commented May 18, 2026

Copy link
Copy Markdown
Collaborator

We can also get rid of the sleep by using a wait_until_status_code for Nginx to start and then another for the pool metric to go negative, but I didn't want to add more python helpers.

Depending on timing instead of waiting for some event to happen is bad, see #3424. So I'm much in favor of doing this properly.

I don't wanna be that guy but... withToxiProxy would make it soo easy to use toxiproxy-cli to manage this kind of proxies (see https://github.com/shopify/toxiproxy#cli-example). Oh, well ;-)

@steve-chavez

Copy link
Copy Markdown
Member Author

Depending on timing instead of waiting for some event to happen is bad, see #3424. So I'm much in favor of doing this properly.

Solution was waiting for the liveness endpoint instead of readiness, did that on 1eba985 and 6220ab3, the sleep is gone. In hindsight I should have pushed those before merging #4672 so we can double check on CI, but I confirmed locally the test did failed fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants