[doc] High frequency telemetry support of MIXED tam tel type mode HLD#2379
[doc] High frequency telemetry support of MIXED tam tel type mode HLD#2379DavidZagury wants to merge 6 commits into
Conversation
|
/azp run |
Signed-off-by: david.zagury <davidza@nvidia.com>
What I did Add a new §7.6.1 "CounterSyncd label resolution in MIXED" that describes the aggregating per-record lookup CounterSyncd performs across the per-group sessions that share a template_id in MIXED. Tighten the surrounding "no CounterSyncd changes" claims in §2, §5, §6, and §7.6 so they distinguish the unchanged public interface (STATE_DB schema, IPFIX wire format, OpenTelemetry export) from the contained internal extension. Add a sentence to the §12 m_next_label limitation bullet making explicit that the monotonic per-profile allocation is the design contract that the §7.6.1 aggregation relies on. Update the table of contents to include §7.6.1. Why I did it The original HLD claimed "no CounterSyncd changes" in three places on the assumption that per-profile-unique labels alone would let the existing single-session lookup resolve every field. In practice CounterSyncd routes data records to sessions by template_id, and in MIXED multiple per-group sessions share a template_id, so the single-session lookup picks one session by last-writer-wins and labels owned by sibling sessions fall back to unknown_<N>. The implementation adds session_template_ids and an aggregating lookup that unions every contributing session's object_id_name_map; the HLD now documents this and is consistent with the shipped behavior. The Enterprise=0 padding-field defensive filter in CounterSyncd remains out of scope here - it is vendor-quirk handling and not part of the MIXED design. Signed-off-by: david.zagury <davidza@nvidia.com>
a673312 to
00a7d61
Compare
|
/azp run |
|
No pipelines are associated with this pull request. |
| | `MIXED_TYPE` only | MIXED_TYPE | yes | | ||
| | neither | - | no (logged) | | ||
|
|
||
| SINGLE_TYPE is preferred when both are advertised so that the behavior of all existing platforms is unchanged. This is consistent with the SAI specification, which declares `SAI_TAM_TEL_TYPE_MODE_SINGLE_TYPE` as the default value of `SAI_TAM_TEL_TYPE_ATTR_MODE`. |
There was a problem hiding this comment.
IMHO, MIXED_TYPE can be the default behavior in the Orchagent, since mixed_type mode is more efficient and uses fewer states and SAI objects.
There was a problem hiding this comment.
I have no objection to using MIXED_TYPE as default. I was just not sure there will be no issues with other vendors with changing that behavior. If you think there shouldn't be an issue with changing the behavior I will agree.
There was a problem hiding this comment.
I think it's not an issue. I can discuss it with other vendors.
There was a problem hiding this comment.
Could you please update the default mode to mixed_type in this document?
|
|
||
| CounterSyncd resolves IPFIX data-record fields to SAI counter identities by looking up each field's IPFIX element ID (the per-object label assigned by `HFTelProfile`) against the `object_names` list of the STATE_DB session that owns the template. In SINGLE mode each `sai_tam_tel_type` carries its own template_id, so there is exactly one session per template_id and the lookup is unambiguous. | ||
|
|
||
| In MIXED mode the orchagent replicates the combined IPFIX template into every per-group `HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE` entry (§7.6), so all per-group sessions of a profile share a template_id. A resolution that consults only one session per template_id would correctly resolve labels owned by that session, but labels owned by sibling sessions (for example a PORT label seen when the QUEUE session won CounterSyncd's internal `template_id → session` race) would fall back to `unknown_<label>`. |
There was a problem hiding this comment.
| In MIXED mode the orchagent replicates the combined IPFIX template into every per-group `HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE` entry (§7.6), so all per-group sessions of a profile share a template_id. A resolution that consults only one session per template_id would correctly resolve labels owned by that session, but labels owned by sibling sessions (for example a PORT label seen when the QUEUE session won CounterSyncd's internal `template_id → session` race) would fall back to `unknown_<label>`. | |
| In MIXED mode the orchagent replicates the combined IPFIX template into every per-group `HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE` entry (§7.6), so all per-group sessions of a profile share a set of template_id. A resolution that consults only one session per template_id would correctly resolve labels owned by that session, but labels owned by sibling sessions (for example a PORT label seen when the QUEUE session won CounterSyncd's internal `template_id → session` race) would fall back to `unknown_<label>`. |
A session may correspond to multiple IPFIX templates.
Also, we need to support dynamic updates, so a session may temporarily have two sets of templates during the transition period. This is similar to a two-step commit: the old template ID set will be deleted only after the data plane, meaning the telemetry data, starts using the new template ID set.
There was a problem hiding this comment.
dynamic updates is a general countersyncd improvement, not something that supported today and is not specific to MIXED.
The current replace behavior is unchanged by this design, I am not sure it should be part of this design
There was a problem hiding this comment.
My comment is about two things:
-
We need to use multiple IPFIX templates instead of a single template_id, so this description must be corrected.
-
I agree that dynamic update do not need to be included in this PR. However, our design should not prevent it in the future. I’m wondering whether we should duplicate templates for each type entry, because in mixed mode, dynamic updates may require an atomic update to the template set of a session.
There was a problem hiding this comment.
Could we emit only one combined session into State DB in mixed mode, such as HIGH_FREQUENCY_TELEMETRY_SESSION_TABLE|profile? Do you have any concerns or suggestions? I think this would make everything simpler.
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
Signed-off-by: david.zagury <davidza@nvidia.com>
|
Hi @DavidZagury , It looks like the local commit hasn’t been pushed yet. |
|
/azp run |
|
No pipelines are associated with this pull request. |
@Pterosaur pushed |
|
|
||
| ## 6. Architecture Design | ||
|
|
||
| The architecture diagram from the [base HLD §6](high-frequency-telemetry-hld.md#6-architecture-design) is unchanged. The bulk of the change is internal to `Orchagent → High frequency telemetry Orch`; the SAI/syncd boundary, the CounterSyncd public interface, the OpenTelemetry container, and the Redis databases are unaffected. CounterSyncd's label-resolution path is extended for the MIXED case where one template is shared across per-group sessions (see §7.6.1). |
There was a problem hiding this comment.
| The architecture diagram from the [base HLD §6](high-frequency-telemetry-hld.md#6-architecture-design) is unchanged. The bulk of the change is internal to `Orchagent → High frequency telemetry Orch`; the SAI/syncd boundary, the CounterSyncd public interface, the OpenTelemetry container, and the Redis databases are unaffected. CounterSyncd's label-resolution path is extended for the MIXED case where one template is shared across per-group sessions (see §7.6.1). | |
| The architecture diagram from the [base HLD §6](high-frequency-telemetry-hld.md#6-architecture-design) is unchanged. The bulk of the change is internal to `Orchagent → High frequency telemetry Orch`; the SAI/syncd boundary, the CounterSyncd public interface, the OpenTelemetry container, and the Redis databases are unaffected. CounterSyncd's label-resolution path is extended for the MIXED case where one set of templates is shared across per-group sessions (see §7.6.1). |
Same question: how about combining all per-group sessions into one profile session?
No description provided.