Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,14 @@ All notable changes to this project will be documented in this file. From versio
- Add config `db-timezone-enabled` for optional querying of timezones by @taimoorzaeem in #4751
- Log schema cache queries timings on `log-level=debug` by @steve-chavez in #4805
- Add GHC runtime metrics to the metrics endpoint by @mkleczek in #4862
- Enable starting multiple PostgREST instances using the same ports on platforms supporting it by @mkleczek in #4703 #4694

### Fixed

- Shutdown should wait for in flight requests by @mkleczek in #4702
- Remove automatic transaction retries on `40001 (serialization_failure)` errors to prevent replication lag by @laurenceisla in #3673
- Fix unexpected results when embedding and filtering the same table more than once by @laurenceisla in #4075
- Stop reporting 503s errors unnecessarily while the schema cache is loading at startup by @mkleczek in #4880

### Changed

Expand Down
164 changes: 164 additions & 0 deletions docs/how-tos/zero-downtime-upgrades.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
.. _zero_downtime_upgrades:

Zero-Downtime Upgrades
======================

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:author: `mkleczek <https://github.com/mkleczek>`_

We've been doing this for almost all how-tos:

When :ref:`server-reuseport` is enabled on an operating system that supports
``SO_REUSEPORT``, PostgREST can start more than one process on the same
:ref:`server-host` and :ref:`server-port`. This allows a new PostgREST process
to start and become ready before the old process is stopped.

While both processes are running, the operating system distributes new
connections between them. After the old process exits, the new process receives
all new connections.

This is useful for upgrades and restarts:

1. Keep the old PostgREST process serving requests.
2. Start the new PostgREST process on the same host and port.
3. Wait for the new process to report ``/ready``.
4. Stop the old process.

Configuration
-------------

Both processes should use the same public host and port:

.. code-block:: ini

# /etc/postgrest/postgrest.conf
server-host = "127.0.0.1"
server-port = 3000
server-reuseport = true

admin-server-host = "127.0.0.1"
admin-server-port = 3001

The second process can use the same configuration file and override only the
admin server port:

.. code-block:: bash

PGRST_ADMIN_SERVER_PORT=3002 postgrest /etc/postgrest/postgrest.conf

.. important::

Use a different :ref:`admin-server-port` for each PostgREST process during
the handover. Admin ports are not shared between processes. This keeps
readiness checks unambiguous: ``/ready`` on the new admin port can only be
answered by the new process.
Comment thread
steve-chavez marked this conversation as resolved.

Before using this in production, keep these details in mind:

- This works for host and port based servers. It does not apply when
:ref:`server-unix-socket` is used.
- If :ref:`server-reuseport` is disabled, the new process will fail to start
with an address-in-use error and the old process will keep serving requests.
- If :ref:`server-reuseport` is enabled on an operating system that does not
support ``SO_REUSEPORT``, PostgREST will fail to start because the
configuration is not supported on that platform.
- If the new process uses the same :ref:`admin-server-port` as the old process,
it will fail to start because that admin port is already in use.
- Each PostgREST process has its own :ref:`db-pool`. During the handover, the
total possible database connections can temporarily double.
- The old and new processes may both serve requests for a short time. Database
migrations should be compatible with both versions while they overlap.

Manual Handover
---------------

Assuming the old process is already serving on ``127.0.0.1:3000`` and its PID
is stored in ``OLD_PID``:

.. code-block:: bash

PGRST_ADMIN_SERVER_PORT=3002 postgrest /etc/postgrest/postgrest.conf &
NEW_PID=$!

curl --fail http://127.0.0.1:3002/ready

kill -TERM "$OLD_PID"

The ``curl`` request checks the new process through its own admin server port.
If the new process cannot load its configuration, connect to the database, or
load the schema cache, ``/ready`` will not return a successful response and the
old process can keep serving traffic.

Example Script
--------------

The following script shows the full sequence for a setup that stores the old
process PID in a PID file. Adapt the start and stop commands to your process
manager.

.. code-block:: bash

#!/usr/bin/env bash
set -euo pipefail

POSTGREST=${POSTGREST:-postgrest}
CONFIG=${CONFIG:-/etc/postgrest/postgrest.conf}
PID_FILE=${PID_FILE:-/run/postgrest.pid}

ADMIN_HOST=${ADMIN_HOST:-127.0.0.1}
NEW_ADMIN_PORT=${NEW_ADMIN_PORT:-3002}
READY_TIMEOUT=${READY_TIMEOUT:-30}
STOP_TIMEOUT=${STOP_TIMEOUT:-30}

if [[ ! -s "$PID_FILE" ]]; then
echo "PID file not found or empty: $PID_FILE" >&2
exit 1
fi

OLD_PID=$(<"$PID_FILE")

if ! kill -0 "$OLD_PID" 2>/dev/null; then
echo "Old PostgREST process is not running: $OLD_PID" >&2
exit 1
fi

PGRST_ADMIN_SERVER_HOST="$ADMIN_HOST" \
PGRST_ADMIN_SERVER_PORT="$NEW_ADMIN_PORT" \
"$POSTGREST" "$CONFIG" &
NEW_PID=$!

cleanup_new_process() {
kill "$NEW_PID" 2>/dev/null || true
}
trap cleanup_new_process EXIT INT TERM

READY_URL="http://$ADMIN_HOST:$NEW_ADMIN_PORT/ready"
READY_DEADLINE=$((SECONDS + READY_TIMEOUT))

until curl --fail --silent --show-error --output /dev/null "$READY_URL"; do
if ! kill -0 "$NEW_PID" 2>/dev/null; then
echo "New PostgREST process exited before it became ready" >&2
exit 1
fi

if (( SECONDS >= READY_DEADLINE )); then
echo "New PostgREST process did not become ready at $READY_URL" >&2
exit 1
fi

sleep 1
done

printf '%s\n' "$NEW_PID" > "$PID_FILE"

kill -TERM "$OLD_PID" 2>/dev/null || true

STOP_DEADLINE=$((SECONDS + STOP_TIMEOUT))

while kill -0 "$OLD_PID" 2>/dev/null; do
if (( SECONDS >= STOP_DEADLINE )); then
echo "Old PostgREST process did not stop after SIGTERM; sending SIGKILL" >&2
kill -KILL "$OLD_PID"
break
fi

sleep 1
done

trap - EXIT INT TERM
echo "PostgREST handover complete: $OLD_PID -> $NEW_PID"
4 changes: 4 additions & 0 deletions docs/postgrest.dict
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ DSL
DevOps
Dramatiq
dockerize
downtime
enum
Enums
Entra
Expand All @@ -59,6 +60,7 @@ HMAC
htmx
Htmx
Homebrew
handover
hstore
HTTP
HTTPS
Expand Down Expand Up @@ -113,6 +115,7 @@ ov
parametrized
passphrase
PBKDF
PID
PgBouncer
pgcrypto
pgjwt
Expand Down Expand Up @@ -144,6 +147,7 @@ Redux
refactor
reloadable
Reloadable
reuseport
requester's
RESTful
RLS
Expand Down
12 changes: 9 additions & 3 deletions docs/references/admin_server.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,15 @@ Two endpoints ``live`` and ``ready`` will then be available. Both these endpoint

.. important::

If you have a machine with multiple network interfaces and multiple PostgREST instances in the same port, you need to specify a unique :ref:`hostname <server-host>`
in the configuration of each PostgREST instance for the health check to work correctly. Don't use the special values(``!4``, ``*``, etc) in this case because the health check
could report a false positive.
Multiple PostgREST instances can share the same public API host and port when
:ref:`server-reuseport` is enabled on operating systems that support
``SO_REUSEPORT``. Admin ports are not shared: give each instance a different
:ref:`admin-server-port`, otherwise the new instance will fail to start.

If the machine has multiple network interfaces, configure concrete
:ref:`server-host` and :ref:`admin-server-host` values when you need health
checks to target a specific process. Avoid special values (``!4``, ``*``, etc)
in this case because the health check could report a false positive.

Live
----
Expand Down
49 changes: 49 additions & 0 deletions docs/references/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,11 @@ admin-server-port

Specifies the port for the :ref:`admin_server`. Cannot be equal to :ref:`server-port`.

When running multiple PostgREST instances on the same :ref:`server-port`, use
a different ``admin-server-port`` for each instance. Admin ports are not shared
between instances, so readiness checks always target one specific PostgREST
instance. See :ref:`zero_downtime_upgrades`.

.. _app.settings.*:

app.settings.*
Expand Down Expand Up @@ -899,6 +904,50 @@ server-port

The TCP port to bind the web server. Use ``0`` to automatically assign a port.

When :ref:`server-reuseport` is enabled on an operating system that supports
``SO_REUSEPORT``, you can start multiple PostgREST instances on the same
:ref:`server-host` and ``server-port``. For example, two PostgREST processes
can use the same configuration:

.. code:: ini

server-host = "127.0.0.1"
server-port = 3000
server-reuseport = true

New connections are then distributed by the operating system between the
running PostgREST processes. This can be used to start a replacement process
before stopping the old one, or to run several PostgREST processes behind one
port.

If ``server-reuseport`` is disabled, starting another PostgREST process on
the same host and port will fail with the usual address-in-use error.

For a step-by-step example, see :ref:`zero_downtime_upgrades`.

.. _server-reuseport:

server-reuseport
----------------

=============== =================================
**Type** Bool
**Default** false
**Reloadable** N
**Environment** PGRST_SERVER_REUSEPORT
**In-Database** `n/a`
=============== =================================

Enables ``SO_REUSEPORT`` on the TCP server socket. This allows multiple
PostgREST processes to bind to the same :ref:`server-host` and
:ref:`server-port` when the operating system supports it.

Enabling this setting on an operating system that does not support
``SO_REUSEPORT`` is a configuration error. PostgREST will fail to start
instead of falling back to a normal TCP socket.

This setting does not apply when :ref:`server-unix-socket` is used.

.. _server-trace-header:

server-trace-header
Expand Down
16 changes: 8 additions & 8 deletions src/PostgREST/Admin.hs
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,20 @@ import qualified PostgREST.AppState as AppState
import qualified Network.Socket as NS
import Protolude

runAdmin :: AppState -> Maybe NS.Socket -> NS.Socket -> Warp.Settings -> IO ()
runAdmin appState maybeAdminSocket socketREST settings = do
runAdmin :: AppState -> Maybe NS.Socket -> IO (Maybe NS.Socket) -> Warp.Settings -> IO ()
runAdmin appState maybeAdminSocket getSocketREST settings = do
whenJust maybeAdminSocket $ \adminSocket -> do
address <- resolveSocketToAddress adminSocket
observer $ AdminStartObs address
void . forkIO $ Warp.runSettingsSocket settings adminSocket adminApp
where
adminApp = admin appState socketREST
adminApp = admin appState getSocketREST
observer = AppState.getObserver appState

-- | PostgREST admin application
admin :: AppState.AppState -> NS.Socket -> Wai.Application
admin appState socketREST req respond = do
isMainAppReachable <- isRight <$> reachMainApp socketREST
admin :: AppState.AppState -> IO (Maybe NS.Socket) -> Wai.Application
admin appState getSocketREST req respond = do
isMainAppReachable <- getSocketREST >>= maybe (pure False) (fmap isRight . reachMainApp)
isLoaded <- AppState.isLoaded appState
isPending <- AppState.isPending appState

Expand All @@ -44,8 +44,8 @@ admin appState socketREST req respond = do
respond $ Wai.responseLBS (if isMainAppReachable then HTTP.status200 else HTTP.status500) [] mempty
["ready"] ->
let
status | not isMainAppReachable = HTTP.status500
| isPending = HTTP.status503
status | isPending = HTTP.status503
| not isMainAppReachable = HTTP.status500
| isLoaded = HTTP.status200
| otherwise = HTTP.status500
in
Expand Down
Loading