fix: implement connection tracking in metrics#4672
Conversation
Fix should have an entry on the CHANGELOG |
0f475e3 to
592e5a3
Compare
1f3b4bc to
bce1f6e
Compare
7383b59 to
9b3c004
Compare
9b3c004 to
6af7b30
Compare
|
@mkleczek Since this is a |
27d9a1d to
e438eed
Compare
90134d4 to
c196a70
Compare
|
@steve-chavez @wolfgangwalther - what do you think is required to push this forward? #4622 is being now reported by our support and it is becoming urgent to fix it. |
c389f35 to
202f84e
Compare
This does not introduce any new test infrastructure that I'd be opposed to, so I won't block on the long term vision of how our tests should be structured. It uses the existing infrastructure. I might not like the way the test is written, but I don't see a need to block on that either. To be clear, the question in #4672 (comment) was asked to get a feeling of how things could be done differently, if we had better test infrastructure elsewhere - and not to block this PR's progress. Imho the only previously blocking comment is #4672 (comment). Now, since I wrote that comment, I started a major discussion on how we should test in general, which is blocking the other, test-infra related, issues/PRs. I don't think we should hold this PR hostage to that either. TLDR: No blockers for me. |
It's gonna be really confusing when we look back in history and we say we fix #4622 and there isn't a precise test proving it (the current test does not). Let's not set a precedent here that can later hurt us with tech debt, so we should first clear the above thread. |
The thread you linked is about the currently existing test, so that doesn't quite match the first sentence about a missing test. If you're concerned about a missing test, then we need to clear #4672 (comment). I'd still say the situation now is different compared to when #1766 happened - we are actively working on improving the test situation and we have an open PR to track the addition of the herein-missing test. Thus, I'd say the risk of this getting forgotten is much smaller than earlier. I'd say we should go ahead with this. |
202f84e to
8e12a9a
Compare
|
@wolfgangwalther Let's not merge because I have a much simpler test almost ready for PR, let's merge this after that. |
|
No worries, I don't intend to merge. I just wanted to make my implicit approval explicit. I am well aware that you still have a thread open (this is now actually blocking the merge as well) - and I'm not just going to override you and resolve that thread. That's for you to decide :) |
8e12a9a to
bf18e9a
Compare
Right now metrics observation handler does not track database connections but updates a single Gauge based on HasqlPoolObs events. This is problematic because Hasql pool reports various connection events in multiple phases. The connection state machine is not simple and to precisely report the number of connections in various states, it is necessary to track their lifecycles. This change adds a ConnTrack data structure and logic to track database connections lifecycles. At the moment it supports "connected" and "inUse" connection counts precisely. The "pgrst_db_pool_available" metric is implemented on top of ConnTrack instead of a simple Gauge.
bf18e9a to
7785329
Compare
| ### Fixed | ||
|
|
||
| - Fix unnecessary connection pool flushes during schema cache reloading by @mkleczek in #4645 | ||
| - Fix race condition in pool_available metric causing negative values during network instability by @mkleczek in #4622 |
There was a problem hiding this comment.
@mkleczek This was added in an old version Fixed section https://github.com/PostgREST/postgrest/blob/main/CHANGELOG.md#fixed-2 😕
There was a problem hiding this comment.
Ehh... rebasing changelog is inherently tricky. My bad.
Raised #4942
Yup, sounds good. |
DISCLAIMER:
This commit was authored entirely by a human without the assistance of LLMs.Right now metrics observation handler does not track database connections but updates a single Gauge based on HasqlPoolObs events.
This is problematic because Hasql pool reports various connection events in various states that make it impossible to predict the state change from the received event. The connection state machine is not simple and to precisely report the number of connections in various states, it is necessary to track their lifecycles.
Fixes #4622