Skip to content

THREESCALE-6077: job sync system zync#4307

Open
jlledom wants to merge 6 commits into
masterfrom
THREESCALE-6077-job-sync-system-zync
Open

THREESCALE-6077: job sync system zync#4307
jlledom wants to merge 6 commits into
masterfrom
THREESCALE-6077-job-sync-system-zync

Conversation

@jlledom

@jlledom jlledom commented May 22, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

Just a rake task to dump all existing data from porta to zync, in case we need to do a full resync after say a db reset.

It also adds some tests for the new task.

Which issue(s) this PR fixes

https://redhat.atlassian.net/browse/THREESCALE-6077

Verification steps

bundle exec rails zync:resync:full

or

PROVIDER_ID=2 bundle exec rails zync:resync:full

jlledom added 3 commits May 21, 2026 17:44
Replaced the hardcoded batch size value (100) with a module-level
BATCH_SIZE constant to improve maintainability and eliminate magic
numbers. This makes it easier to adjust batch processing behavior
across all zync resync tasks from a single location.
Introduces a new zync:resync:full task that comprehensively resyncs
all provider accounts, services, proxies, and applications with the
Zync service. The task publishes domain change events, OIDC
configuration updates, and application creation events to ensure
complete synchronization.

Supports selective resync via PROVIDER_ID environment variable for
troubleshooting individual providers, otherwise processes all active
(non-suspended, non-deleted) providers in the system.
Adds test suite for the zync:resync:full rake task with coverage for:
- Base full resync across all providers, services, and applications
- PROVIDER_ID environment variable filtering to scope resync to a
  single provider
- Exclusion of suspended providers from resync
- Exclusion of scheduled_for_deletion providers from resync

Tests use helper methods (expect_full_resync_events and
expect_no_resync_events) to reduce duplication and improve readability.
Organized into nested test classes (DomainsSyncTest and FullSyncTest)
for better test organization.

Assisted-by: Claude Code
@jlledom jlledom self-assigned this May 22, 2026
@qltysh

qltysh Bot commented May 22, 2026

Copy link
Copy Markdown

❌ 9 blocking issues (9 total)

Tool Category Rule Count
reek Lint Tasks::ZyncTest::FullSyncTest assumes too much for instance variable '@all_accounts' 4
rubocop Lint Block has too many lines. [55/25] 2
reek Lint Tasks::ZyncTest::FullSyncTest has at least 5 instance variables 1
rubocop Style Use ENV\.fetch\('PROVIDER\_ID', nil\) instead of ENV\['PROVIDER\_ID'\]. 1
rubocop Style Use the return of the conditional for variable assignment and comparison. 1

Comment thread lib/tasks/zync.rake Outdated
task domains: [:provider_domains, :proxy_domains]

desc 'Full resync'
task full: :environment do

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define this task as empty body but just add all resync types as dependencies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but that would be suboptimal. We would need to go trough the complete list of accounts once per each type. It's the same thing that happens now with the :domains task: it depends on :provider_domains and :proxy_domains, and each one goes through the whole list of accounts.

Instead, it's better to load each account just once and launch all related events at once.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. proxy_domains iterates over Service and provider_domains iterates over Account, so they iterate over different tables. Now you query services for each account but rarely an account has a full batch of services I assume.

So at the end we probably perform more queries although there is a single loop.

-- not verified by AI but that's how I read the code late at night

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. proxy_domains iterates over Service and provider_domains iterates over Account

You're right.

So at the end we probably perform more queries although there is a single loop.

I of course agree on implementing whatever approach that performs less queries to DB. My approach included N+1 queries. I already considered that when first wrote it but I thought it wouldn't be so important since it's a task you don't run everyday anyway. But AI says there's a big difference in number of queries, so I refactored to split the tasks: 914140d

Comment thread lib/tasks/zync.rake Outdated
Comment on lines +60 to +61
Domains::ProxyDomainsChangedEvent.create_and_publish!(service.proxy)
OIDC::ProxyChangedEvent.create_and_publish!(service.proxy)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we really need to emit both events.

It seems that the event is caught by PublishZyncEventSubscriber

subscribe_event(PublishZyncEventSubscriber.new,
Applications::ApplicationCreatedEvent,
Applications::ApplicationUpdatedEvent,
Applications::ApplicationDeletedEvent,
Applications::ApplicationEnabledChangedEvent,
OIDC::ProxyChangedEvent,
OIDC::ServiceChangedEvent,
Domains::ProviderDomainsChangedEvent,
Domains::ProxyDomainsChangedEvent
)

And it handles both in the same way:

when OIDC::ProxyChangedEvent, Domains::ProxyDomainsChangedEvent
ZyncEvent.create(event, event.proxy)

So, I think it might be redundant.

@jlledom jlledom Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say you are right. We create either OIDC::ProxyChangedEvent or Domains::ProxyDomainsChangedEvent and declare different data on each:

def self.create(proxy)
new(
proxy: proxy,
metadata: {
provider_id: proxy.provider.id,
zync: {
oidc_endpoint: proxy.oidc_issuer_endpoint,
service_id: proxy.service_id,
}
}
)
end

def self.create(proxy, parent_event = nil)
new(
parent_event_id: parent_event&.event_id,
parent_event_type: parent_event&.class&.name,
proxy: MissingModel::MissingProxy.new(id: proxy.id),
staging_domains: [ proxy.staging_domain ],
production_domains: [ proxy.production_domain ],
metadata: {
provider_id: (provider_id = proxy.provider&.id),
zync: {
tenant_id: proxy.tenant_id || proxy.provider&.tenant_id || provider_id,
service_id: proxy.service_id,
}
}
)
end

But that doesn't matter, because the event will be replaced by a generic ZyncEvent that only includes this:

attributes = {
type: type_for(model),
id: model.id,
parent_event_id: event.event_id,
parent_event_type: event.class.name,
tenant_id: provider_id,
}.merge(metadata.fetch(:zync, {}))

Everything not in event[:metadata][:zync] is simply ignored, not sent to zync at all.

But then zync, when receiving that data, also ignores everything except the attributes explicitly declared in the corresponding model, in Zync:

https://github.com/3scale/zync/blob/d2fb71558080dc7f2c976d28ac20c3881e4167c4/app/models/notification.rb#L28-L30

For Proxy, that is service_id, and tenant_id.

All notifications always iclude a tenant_id, so only service_id would ne needed in the event, and both events include it, so yes, it looks redundant.

That's why I say you're right, the only reason I have to think otherwise is... WTF? Why did the ancients perform such amount of overengineering? We are moving a lot of data around just to ignore it apparently, I have the feeling to be missing something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know 🤣
I always found the zync part quite confusing... And the event store too 😬

@akostadinov akostadinov Jun 8, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling to be missing something.

you might be right. I think they had big plans for zync to offload many operations. But then I think we would have created another monolith. Not sure microservices and other concepts existed at the time so maybe it was their take on it -- pure speculation on my side

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my last commit, which refactors the whole thing, I'm only creating Domains::ProxyDomainsChangedEvent. No references to OIDC::ProxyChangedEvent anymore.

@mayorova

mayorova commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

I think it would be nice to add some output, because currently nothing is print in this task, and for me at least it causese some anxiety about - did the job actually do anything? 😬

@jlledom

jlledom commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I think it would be nice to add some output, because currently nothing is print in this task, and for me at least it causese some anxiety about - did the job actually do anything? 😬

I'm following the same pattern we use in the other two tasks in the file: call each_with_progress which is supposed to print a percentage to show the progress. The thing is it doesn't print anything unless you have a lot of data. I have created about 160 providers locally and it only shows the progress once, about 60%, the rest of the execution is silent.

It only shows the progress when the amount of accounts processed so far is multiple of 100, so for 160 accounts, that's just once.

It will be more useful in SaaS with real data, but for testing is basically silent.

What do your suggest? showing progress more frequently? printing logs?

@akostadinov

Copy link
Copy Markdown
Contributor

What do your suggest? showing progress more frequently? printing logs?

I personally would care to know that it started to work and ideally get a progress update every 10 seconds or so. But I'm totally fine with current code, only my request to call into the existing tasks instead of new custom code would be IMO sweetest.

Replace nested-loop :full task with independent tasks per resource type
to eliminate N+1 queries. The nested approach (account.services,
service.cinstances) issues one query per parent record. Flat batched
queries reduce total DB queries significantly (e.g., ~2005 → ~320 for
500 accounts × 3 services × 20 cinstances).

Changes:
- Extract active_providers helper for PROVIDER_ID filtering and
  suspended/deleted exclusion
- Rename :provider_domains → :providers, :proxy_domains → :proxies
  (keep aliases for backward compatibility)
- Add new :services and :applications tasks with flat queries using
  joins(:account).merge(active_providers)
- Remove redundant OIDC::ProxyChangedEvent (zync only uses
  event[:metadata][:zync] which is identical in
  Domains::ProxyDomainsChangedEvent)
- Make :full dependency-only: [:providers, :services, :proxies,
  :applications]
- Add progress labels ("== Resyncing providers ==") to output
- Improve progress granularity to ~10% increments regardless of scope
  size
- Query Proxy directly instead of Service.includes(:proxy) for cleaner
  iteration

Tests updated:
- Add @all_proxies instance variable to avoid N+1 in test expectations
- Remove OIDC::ProxyChangedEvent from expectations
- Update DomainsSyncTest to match new filtered behavior
- Update helper methods to accept proxies parameter

Assisted-by: Claude Code
@jlledom

jlledom commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@mayorova

I think it would be nice to add some output, because currently nothing is print in this task, and for me at least it causese some anxiety about - did the job actually do anything? 😬

@akostadinov

I personally would care to know that it started to work and ideally get a progress update every 10 seconds or so.

I made some changes to make it more responsible. Now it updates after processing every 10% of data, so up to ten times per type. I also added some headers to make it more understandable. Now it looks like this:

$ bundle exec rails zync:resync:full
== Resyncing providers ==
9.76% completed
19.51% completed
29.27% completed
39.02% completed
48.78% completed
58.54% completed
68.29% completed
78.05% completed
87.8% completed
97.56% completed
== Resyncing services ==
9.58% completed
19.16% completed
28.74% completed
38.32% completed
47.9% completed
57.49% completed
67.07% completed
76.65% completed
86.23% completed
95.81% completed
== Resyncing proxies ==
9.58% completed
19.16% completed
28.74% completed
38.32% completed
47.9% completed
57.49% completed
67.07% completed
76.65% completed
86.23% completed
95.81% completed
== Resyncing applications ==
9.91% completed
19.82% completed
29.73% completed
39.64% completed
49.55% completed
59.46% completed
69.37% completed
79.28% completed
89.19% completed
99.1% completed

@jlledom jlledom requested review from akostadinov and mayorova June 10, 2026 10:53
Comment thread lib/tasks/zync.rake

desc 'Resync services with zync'
task services: :environment do
services = Service.joins(:account).merge(active_providers)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so in the past we didn't filter out inactive providers for this task? Wasteful in some environments.

Previously we had .includes(:proxy), does it help avoid individual queries? Idk if you checked how many queries actually run. Also I'm not sure whether .includes(:account) will also help because likely we will query the account of the service anyway. But maybe joins already includes the data for the account. I haven't checked.

Suggested change
services = Service.joins(:account).merge(active_providers)
services = Service.includes(:account).joins(:account).merge(active_providers)

btw if we loop over services, because that query already has the provider accounts, that may really reduce the SQL loop if we have a custom action like before, but I don't ask you to make a custom action 👼 just a FYI, don't think the marginal benefit is worth the additional code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, this is a new task. Bob said:

The zync:resync:services task is redundant. The old proxy_domains resync │
│ already synced services through the dependency mechanism in ZyncWorker.

So we can have it if one needs to run independently but maybe redundant to run with the others? Idk, I was just asking 😬

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of this dependency mechanism. I think we don't need to touch anything because it's not really redundant, it depends on the service existing or not. The mechanism is only triggered when zync returns 422, that wouldn't happen when running our :full task because it runs :services before :proxies.

If we run :proxies directly, then yes, it will sync services, but only after each failed attempt to sync a proxy, so that implies a couple of extra round trips which is suboptimal. I don't think we should remove the :services task, since it's the way to avoid the suboptimal path.

It also works for applications, that is, if you try to sync an application, it will sync the service and proxy first if they don't exist. But again, suboptimal.

def dependencies
return non_persisted_dependencies unless record.persisted?
case record
when Cinstance
[ service = record.service, service.proxy ]
when Proxy
[ record.service ]
when Service
NONE
else
NONE
end
end

Comment thread lib/tasks/zync.rake

desc 'Resync services with zync'
task services: :environment do
services = Service.joins(:account).merge(active_providers)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, but you split out proxies I see now 🤔 ok, we go to the other extreme :) alright.

@akostadinov

Copy link
Copy Markdown
Contributor

Current in :applications task 21:10 [92/1869]

Service.joins(:account).merge(active_providers).find_each

Issue: This will trigger individual queries to fetch the associated account
when creating events.

Suggestion: Add .includes(:account) to eager load:
Service.joins(:account).merge(active_providers).includes(:account).find_each

  1. Task Redundancy Question (Needs Clarification)

Reviewer akostadinov raised a valid question about whether the new :services
task is redundant, since ZyncWorker may already sync services when proxies are
synced through dependency mechanisms.

Recommendation:

  • Verify if Domains::ProxyDomainsChangedEvent already triggers service sync
  • If so, consider removing or documenting why explicit service sync is needed
  • If not, document the independence clearly

jlledom added 2 commits June 11, 2026 12:40
Introduces a lightweight event class that carries only provider_id
and optional service_id in metadata. This event bypasses the full
event publishing chain used by domain-specific events, allowing rake
tasks to publish ZyncEvents directly without triggering unnecessary
queries through intermediate event classes.

The ResyncEvent is designed specifically for manual resync operations
where the model is already loaded and we only need to propagate the
sync request to zync, not re-walk association chains.

Assisted-by: Claude Code
Changes all zync resync rake tasks to publish ZyncEvents directly
via ResyncEvent instead of going through intermediate domain events
(Domains::ProviderDomainsChangedEvent, OIDC::ServiceChangedEvent,
Applications::ApplicationUpdatedEvent). This eliminates N+1 queries
that occurred when intermediate events walked association chains to
extract provider_id and service_id.

For proxies and applications tasks, switches from joins to eager_load
to provide preloaded associations needed for provider_id extraction.

Updates tests to expect ZyncEvent.create_and_publish! calls with
ResyncEvent instances instead of intermediate event expectations.
Introduces load_collections helper that's called explicitly in each
test after any state mutations, making the test data setup clearer.

Assisted-by: Claude Code
@jlledom

jlledom commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Current in :applications task 21:10 [92/1869]

Service.joins(:account).merge(active_providers).find_each

Issue: This will trigger individual queries to fetch the associated account when creating events.

What Bob says doesn't really make sense, the :applications task doesn't load that relation at all.

However, it's true that the events I was creating performed extra queries to provide some data to the event subscriber... data which was completely ignored, but caused N+1 problems.

I refactored this again. We were publishing events like Domains::ProxyDomainsChangedEvent but the only thing they do is to be used a as base to create ZyncEvent events with some particular data, which was also ignored.

So why not jumping a step and create the ZyncEvent events directly? The only information that really reaches Zync and it's taken into account is type, tenant_id and service_id (optional) So we can send such info in a ZyncEvent directly and skip the events that cause N+1s. Here: 0cda2e6

ZyncEvent still requires an event param, you can mock it with an OpenStruct and it works fine, but I think is slightly more clean to create a new ResyncEvent that acts as parent event for ZyncEvent. It doesn't have any superfluous data, only the strictly required to craft the proper ZyncEvent. Here: acf4a86

The only reason to create ResyncEvent is to avoid attaching a generic OpenStruct to the zync event, but this is optional because, guess what, the parent event is ignored.

@jlledom jlledom requested a review from akostadinov June 11, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants