Skip to content

[doc] High frequency telemetry support of MIXED tam tel type mode HLD#2379

Open
DavidZagury wants to merge 6 commits into
sonic-net:masterfrom
DavidZagury:master_hft_mixed_mode
Open

[doc] High frequency telemetry support of MIXED tam tel type mode HLD#2379
DavidZagury wants to merge 6 commits into
sonic-net:masterfrom
DavidZagury:master_hft_mixed_mode

Conversation

@DavidZagury

Copy link
Copy Markdown

No description provided.

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

Signed-off-by: david.zagury <davidza@nvidia.com>
What I did
Add a new §7.6.1 "CounterSyncd label resolution in MIXED" that
describes the aggregating per-record lookup CounterSyncd performs
across the per-group sessions that share a template_id in MIXED.
Tighten the surrounding "no CounterSyncd changes" claims in §2, §5,
§6, and §7.6 so they distinguish the unchanged public interface
(STATE_DB schema, IPFIX wire format, OpenTelemetry export) from the
contained internal extension. Add a sentence to the §12
m_next_label limitation bullet making explicit that the monotonic
per-profile allocation is the design contract that the §7.6.1
aggregation relies on. Update the table of contents to include
§7.6.1.

Why I did it
The original HLD claimed "no CounterSyncd changes" in three places
on the assumption that per-profile-unique labels alone would let
the existing single-session lookup resolve every field. In practice
CounterSyncd routes data records to sessions by template_id, and in
MIXED multiple per-group sessions share a template_id, so the
single-session lookup picks one session by last-writer-wins and
labels owned by sibling sessions fall back to unknown_<N>. The
implementation adds session_template_ids and an aggregating lookup
that unions every contributing session's object_id_name_map; the
HLD now documents this and is consistent with the shipped behavior.
The Enterprise=0 padding-field defensive filter in CounterSyncd
remains out of scope here - it is vendor-quirk handling and not
part of the MIXED design.

Signed-off-by: david.zagury <davidza@nvidia.com>
@DavidZagury DavidZagury force-pushed the master_hft_mixed_mode branch from a673312 to 00a7d61 Compare June 10, 2026 15:40
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
No pipelines are associated with this pull request.

Comment thread doc/high-frequency-telemetry/high-frequency-telemetry-mixed-mode-hld.md Outdated
Comment thread doc/high-frequency-telemetry/high-frequency-telemetry-mixed-mode-hld.md Outdated
| `MIXED_TYPE` only | MIXED_TYPE | yes |
| neither | - | no (logged) |

SINGLE_TYPE is preferred when both are advertised so that the behavior of all existing platforms is unchanged. This is consistent with the SAI specification, which declares `SAI_TAM_TEL_TYPE_MODE_SINGLE_TYPE` as the default value of `SAI_TAM_TEL_TYPE_ATTR_MODE`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, MIXED_TYPE can be the default behavior in the Orchagent, since mixed_type mode is more efficient and uses fewer states and SAI objects.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection to using MIXED_TYPE as default. I was just not sure there will be no issues with other vendors with changing that behavior. If you think there shouldn't be an issue with changing the behavior I will agree.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not an issue. I can discuss it with other vendors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the default mode to mixed_type in this document?

Comment thread doc/high-frequency-telemetry/high-frequency-telemetry-mixed-mode-hld.md Outdated
Comment thread doc/high-frequency-telemetry/high-frequency-telemetry-mixed-mode-hld.md Outdated

CounterSyncd resolves IPFIX data-record fields to SAI counter identities by looking up each field's IPFIX element ID (the per-object label assigned by `HFTelProfile`) against the `object_names` list of the STATE_DB session that owns the template. In SINGLE mode each `sai_tam_tel_type` carries its own template_id, so there is exactly one session per template_id and the lookup is unambiguous.

In MIXED mode the orchagent replicates the combined IPFIX template into every per-group `HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE` entry (§7.6), so all per-group sessions of a profile share a template_id. A resolution that consults only one session per template_id would correctly resolve labels owned by that session, but labels owned by sibling sessions (for example a PORT label seen when the QUEUE session won CounterSyncd's internal `template_id → session` race) would fall back to `unknown_<label>`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In MIXED mode the orchagent replicates the combined IPFIX template into every per-group `HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE` entry (§7.6), so all per-group sessions of a profile share a template_id. A resolution that consults only one session per template_id would correctly resolve labels owned by that session, but labels owned by sibling sessions (for example a PORT label seen when the QUEUE session won CounterSyncd's internal `template_id → session` race) would fall back to `unknown_<label>`.
In MIXED mode the orchagent replicates the combined IPFIX template into every per-group `HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE` entry (§7.6), so all per-group sessions of a profile share a set of template_id. A resolution that consults only one session per template_id would correctly resolve labels owned by that session, but labels owned by sibling sessions (for example a PORT label seen when the QUEUE session won CounterSyncd's internal `template_id → session` race) would fall back to `unknown_<label>`.

A session may correspond to multiple IPFIX templates.

Also, we need to support dynamic updates, so a session may temporarily have two sets of templates during the transition period. This is similar to a two-step commit: the old template ID set will be deleted only after the data plane, meaning the telemetry data, starts using the new template ID set.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic updates is a general countersyncd improvement, not something that supported today and is not specific to MIXED.
The current replace behavior is unchanged by this design, I am not sure it should be part of this design

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is about two things:

  1. We need to use multiple IPFIX templates instead of a single template_id, so this description must be corrected.

  2. I agree that dynamic update do not need to be included in this PR. However, our design should not prevent it in the future. I’m wondering whether we should duplicate templates for each type entry, because in mixed mode, dynamic updates may require an atomic update to the template set of a session.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we emit only one combined session into State DB in mixed mode, such as HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE|profile? Do you have any concerns or suggestions? I think this would make everything simpler.

Comment thread doc/high-frequency-telemetry/high-frequency-telemetry-mixed-mode-hld.md Outdated
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
@Pterosaur

Copy link
Copy Markdown
Contributor

Hi @DavidZagury , It looks like the local commit hasn’t been pushed yet.

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
No pipelines are associated with this pull request.

@DavidZagury

Copy link
Copy Markdown
Author

Hi @DavidZagury , It looks like the local commit hasn’t been pushed yet.

@Pterosaur pushed


## 6. Architecture Design

The architecture diagram from the [base HLD §6](high-frequency-telemetry-hld.md#6-architecture-design) is unchanged. The bulk of the change is internal to `Orchagent → High frequency telemetry Orch`; the SAI/syncd boundary, the CounterSyncd public interface, the OpenTelemetry container, and the Redis databases are unaffected. CounterSyncd's label-resolution path is extended for the MIXED case where one template is shared across per-group sessions (see §7.6.1).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The architecture diagram from the [base HLD §6](high-frequency-telemetry-hld.md#6-architecture-design) is unchanged. The bulk of the change is internal to `Orchagent → High frequency telemetry Orch`; the SAI/syncd boundary, the CounterSyncd public interface, the OpenTelemetry container, and the Redis databases are unaffected. CounterSyncd's label-resolution path is extended for the MIXED case where one template is shared across per-group sessions (see §7.6.1).
The architecture diagram from the [base HLD §6](high-frequency-telemetry-hld.md#6-architecture-design) is unchanged. The bulk of the change is internal to `Orchagent → High frequency telemetry Orch`; the SAI/syncd boundary, the CounterSyncd public interface, the OpenTelemetry container, and the Redis databases are unaffected. CounterSyncd's label-resolution path is extended for the MIXED case where one set of templates is shared across per-group sessions (see §7.6.1).

Same question: how about combining all per-group sessions into one profile session?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants