Skip to content

Add nb_cfg_timestamp to SB_Global for propagation latency.#306

Open
tehhobbit wants to merge 2 commits into
ovn-org:mainfrom
tehhobbit:nb-cfg-timestamp-upstream
Open

Add nb_cfg_timestamp to SB_Global for propagation latency.#306
tehhobbit wants to merge 2 commits into
ovn-org:mainfrom
tehhobbit:nb-cfg-timestamp-upstream

Conversation

@tehhobbit

@tehhobbit tehhobbit commented May 20, 2026

Copy link
Copy Markdown

Large scale OVN deployments commonly disable per-chassis nb_cfg
write-back by setting options:enable_chassis_nb_cfg_update to
false. With thousands of hypervisors each writing completion
back to Chassis_Private on every generation, the resulting write
amplification can overload the southbound OVSDB cluster.
Disabling write-back eliminates this pressure but also removes
any signal for measuring how long a northbound change takes to
reach each hypervisor.

OVN_Northbound already records nb_cfg_timestamp in NB_Global
when ovn-northd advances nb_cfg, but hypervisors connect to the
southbound database only. This patch adds the equivalent
timestamp to SB_Global, written atomically with each nb_cfg
update. ovn-controller reads this value and stores it in the
local OVS bridge external_ids as ovn-nb-cfg-sb-ts alongside the
existing ovn-nb-cfg-ts (local completion time). An external
collector can read both values from the bridge and compute
per-chassis propagation latency histograms without any writes to
the southbound database, keeping measurement overhead independent
of fleet size.

Placing the timestamp in SB_Global rather than requiring
collectors to reach the northbound database means it travels
transparently through any relay or VPN between the southbound
cluster and the hypervisor, naturally including that transit in
the measurement.

Tested in the OVN sandbox and a two-container central/HV setup.
Confirmed nb_cfg_timestamp is written to SB_Global on each
nb_cfg advance, propagated to br-int external_ids as
ovn-nb-cfg-sb-ts, and continues updating correctly when
enable_chassis_nb_cfg_update is false.

Assisted-by: Claude Sonnet 4.5, Claude Code

@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch 2 times, most recently from 8a05ac1 to 0099551 Compare May 21, 2026 17:10
Comment thread br-controller/ovn-br-controller.c Outdated
get_ovnbr_cfg(ovnbrrec_br_global_table_get(ovnbr_idl_loop.idl),
ovnbr_cond_seqno, ovnbr_expected_cond_seqno));
ovnbr_cond_seqno, ovnbr_expected_cond_seqno),
0);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: Alligment is not correct here.
Pass "0" at L302 or allign L303 properly

Comment thread controller/ovn-controller.c Outdated
* server to send updates that happened before SB_Global.nb_cfg.
*/
if (cond_seqno != expected_cond_seqno) {
if (ts_out) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is technically not related to your code, but since we are updating this function anyway lets do something like below


static uint64_t
get_nb_cfg(.....) {

static uint64_t nb_cfg = 0;
static int64_t nb_cfg_ts = 0;

if (cond_seqno == expected_cond_seqno) {
const struct sbrec_sb_global *sb
= sbrec_sb_global_table_first(sb_global_table);
nb_cfg = sb ? sb->nb_cfg : 0;
nb_cfg_ts = sb ? sb->nb_cfg_timestamp : 0;
}

if (ts_out) {
*ts_out = nb_cfg_ts;
}

return nb_cfg;
}


I think the code looks cleaner this way.

Comment thread controller/ovn-controller.c Outdated
ovnsb_idl_loop.idl),
ovnsb_cond_seqno,
ovnsb_expected_cond_seqno));
{

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coding style is a little odd here.

You don't need to enclose this block in {}

Comment thread controller/ovn-controller.c Outdated
* timestamp that corresponded to this exact nb_cfg
* generation -- not whatever SB_Global value has
* moved on to by the time the barrier acks. */
int64_t sb_nb_cfg_ts = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can declare 'sb_nb_cfg_ts' as uint64_t and let the funciton get_nb_cfg() handle the int64_t to uint64_t conversion implicitly just like how sb_nb_cfg is done

@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch 2 times, most recently from 22ab3b0 to 075b2d1 Compare June 8, 2026 14:06
dceara pushed a commit to dceara/ovn that referenced this pull request Jun 11, 2026
Large scale OVN deployments commonly disable the per-chassis nb_cfg
write-back mechanism by setting options:enable_chassis_nb_cfg_update
to false.  With thousands of hypervisors each writing their nb_cfg
completion back to Chassis_Private on every generation, the resulting
write amplification can overload the southbound OVSDB cluster.
Disabling write-back eliminates this pressure but also removes the
only existing signal for measuring how long a northbound change takes
to reach each hypervisor.

OVN_Northbound already records nb_cfg_timestamp in NB_Global when
ovn-northd advances nb_cfg, but hypervisors connect to the southbound
database only.  This patch adds the same timestamp to SB_Global,
written atomically with each nb_cfg update.  ovn-controller reads
this value and stores it in the local OVS bridge external_ids as
ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local
completion time).  An external collector such as ovs_exporter can
read both values from the bridge and compute per-chassis propagation
latency histograms without any writes to the southbound database,
keeping measurement overhead independent of fleet size.

Placing the timestamp in SB_Global rather than requiring collectors
to reach the northbound database means it travels transparently
through any relay or VPN between the southbound cluster and the
hypervisor, naturally including that transit in the measurement.

Testing: confirmed in OVN sandbox and a two-container central/HV
setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg
advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and
continues to update correctly when enable_chassis_nb_cfg_update is
set to false.

Signed-off-by: Loke Berne <loke@tehhobbit.net>
Assisted-by: Claude Sonnet 4.6
Submitted-at: ovn-org#306
Signed-off-by: Numan Siddique <numans@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch 2 times, most recently from 1ab7c09 to 91f6ec6 Compare June 12, 2026 09:24
Large scale OVN deployments commonly disable the per-chassis nb_cfg
write-back mechanism by setting options:enable_chassis_nb_cfg_update
to false.  With thousands of hypervisors each writing their nb_cfg
completion back to Chassis_Private on every generation, the resulting
write amplification can overload the southbound OVSDB cluster.
Disabling write-back eliminates this pressure but also removes the
only existing signal for measuring how long a northbound change takes
to reach each hypervisor.

OVN_Northbound already records nb_cfg_timestamp in NB_Global when
ovn-northd advances nb_cfg, but hypervisors connect to the southbound
database only.  This patch adds the same timestamp to SB_Global,
written atomically with each nb_cfg update.  ovn-controller reads
this value and stores it in the local OVS bridge external_ids as
ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local
completion time).  An external collector such as ovs_exporter can
read both values from the bridge and compute per-chassis propagation
latency histograms without any writes to the southbound database,
keeping measurement overhead independent of fleet size.

Placing the timestamp in SB_Global rather than requiring collectors
to reach the northbound database means it travels transparently
through any relay or VPN between the southbound cluster and the
hypervisor, naturally including that transit in the measurement.

Testing: confirmed in OVN sandbox and a two-container central/HV
setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg
advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and
continues to update correctly when enable_chassis_nb_cfg_update is
set to false.

Signed-off-by: Loke Berne <loke@tehhobbit.net>
Assisted-by: Claude Sonnet 4.6
@tehhobbit tehhobbit force-pushed the nb-cfg-timestamp-upstream branch from 91f6ec6 to 952c978 Compare June 12, 2026 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants