THREESCALE-6077: job sync system zync#4307
Conversation
Replaced the hardcoded batch size value (100) with a module-level BATCH_SIZE constant to improve maintainability and eliminate magic numbers. This makes it easier to adjust batch processing behavior across all zync resync tasks from a single location.
Introduces a new zync:resync:full task that comprehensively resyncs all provider accounts, services, proxies, and applications with the Zync service. The task publishes domain change events, OIDC configuration updates, and application creation events to ensure complete synchronization. Supports selective resync via PROVIDER_ID environment variable for troubleshooting individual providers, otherwise processes all active (non-suspended, non-deleted) providers in the system.
Adds test suite for the zync:resync:full rake task with coverage for: - Base full resync across all providers, services, and applications - PROVIDER_ID environment variable filtering to scope resync to a single provider - Exclusion of suspended providers from resync - Exclusion of scheduled_for_deletion providers from resync Tests use helper methods (expect_full_resync_events and expect_no_resync_events) to reduce duplication and improve readability. Organized into nested test classes (DomainsSyncTest and FullSyncTest) for better test organization. Assisted-by: Claude Code
❌ 9 blocking issues (9 total)
|
| task domains: [:provider_domains, :proxy_domains] | ||
|
|
||
| desc 'Full resync' | ||
| task full: :environment do |
There was a problem hiding this comment.
Can we define this task as empty body but just add all resync types as dependencies?
There was a problem hiding this comment.
We could, but that would be suboptimal. We would need to go trough the complete list of accounts once per each type. It's the same thing that happens now with the :domains task: it depends on :provider_domains and :proxy_domains, and each one goes through the whole list of accounts.
Instead, it's better to load each account just once and launch all related events at once.
There was a problem hiding this comment.
Not sure. proxy_domains iterates over Service and provider_domains iterates over Account, so they iterate over different tables. Now you query services for each account but rarely an account has a full batch of services I assume.
So at the end we probably perform more queries although there is a single loop.
-- not verified by AI but that's how I read the code late at night
There was a problem hiding this comment.
Not sure. proxy_domains iterates over Service and provider_domains iterates over Account
You're right.
So at the end we probably perform more queries although there is a single loop.
I of course agree on implementing whatever approach that performs less queries to DB. My approach included N+1 queries. I already considered that when first wrote it but I thought it wouldn't be so important since it's a task you don't run everyday anyway. But AI says there's a big difference in number of queries, so I refactored to split the tasks: 914140d
| Domains::ProxyDomainsChangedEvent.create_and_publish!(service.proxy) | ||
| OIDC::ProxyChangedEvent.create_and_publish!(service.proxy) |
There was a problem hiding this comment.
I am wondering if we really need to emit both events.
It seems that the event is caught by PublishZyncEventSubscriber
porta/app/lib/event_store/repository.rb
Lines 154 to 163 in 0a34a66
And it handles both in the same way:
porta/app/subscribers/publish_zync_event_subscriber.rb
Lines 24 to 25 in 0a34a66
So, I think it might be redundant.
There was a problem hiding this comment.
I would say you are right. We create either OIDC::ProxyChangedEvent or Domains::ProxyDomainsChangedEvent and declare different data on each:
porta/app/events/oidc/proxy_changed_event.rb
Lines 7 to 18 in 9c74b30
porta/app/events/domains/proxy_domains_changed_event.rb
Lines 4 to 21 in 6171463
But that doesn't matter, because the event will be replaced by a generic ZyncEvent that only includes this:
porta/app/events/zync_event.rb
Lines 11 to 17 in f377917
Everything not in event[:metadata][:zync] is simply ignored, not sent to zync at all.
But then zync, when receiving that data, also ignores everything except the attributes explicitly declared in the corresponding model, in Zync:
For Proxy, that is service_id, and tenant_id.
All notifications always iclude a tenant_id, so only service_id would ne needed in the event, and both events include it, so yes, it looks redundant.
That's why I say you're right, the only reason I have to think otherwise is... WTF? Why did the ancients perform such amount of overengineering? We are moving a lot of data around just to ignore it apparently, I have the feeling to be missing something.
There was a problem hiding this comment.
I honestly don't know 🤣
I always found the zync part quite confusing... And the event store too 😬
There was a problem hiding this comment.
I have the feeling to be missing something.
you might be right. I think they had big plans for zync to offload many operations. But then I think we would have created another monolith. Not sure microservices and other concepts existed at the time so maybe it was their take on it -- pure speculation on my side
There was a problem hiding this comment.
In my last commit, which refactors the whole thing, I'm only creating Domains::ProxyDomainsChangedEvent. No references to OIDC::ProxyChangedEvent anymore.
|
I think it would be nice to add some output, because currently nothing is print in this task, and for me at least it causese some anxiety about - did the job actually do anything? 😬 |
I'm following the same pattern we use in the other two tasks in the file: call It only shows the progress when the amount of accounts processed so far is multiple of 100, so for 160 accounts, that's just once. It will be more useful in SaaS with real data, but for testing is basically silent. What do your suggest? showing progress more frequently? printing logs? |
I personally would care to know that it started to work and ideally get a progress update every 10 seconds or so. But I'm totally fine with current code, only my request to call into the existing tasks instead of new custom code would be IMO sweetest. |
Replace nested-loop :full task with independent tasks per resource type
to eliminate N+1 queries. The nested approach (account.services,
service.cinstances) issues one query per parent record. Flat batched
queries reduce total DB queries significantly (e.g., ~2005 → ~320 for
500 accounts × 3 services × 20 cinstances).
Changes:
- Extract active_providers helper for PROVIDER_ID filtering and
suspended/deleted exclusion
- Rename :provider_domains → :providers, :proxy_domains → :proxies
(keep aliases for backward compatibility)
- Add new :services and :applications tasks with flat queries using
joins(:account).merge(active_providers)
- Remove redundant OIDC::ProxyChangedEvent (zync only uses
event[:metadata][:zync] which is identical in
Domains::ProxyDomainsChangedEvent)
- Make :full dependency-only: [:providers, :services, :proxies,
:applications]
- Add progress labels ("== Resyncing providers ==") to output
- Improve progress granularity to ~10% increments regardless of scope
size
- Query Proxy directly instead of Service.includes(:proxy) for cleaner
iteration
Tests updated:
- Add @all_proxies instance variable to avoid N+1 in test expectations
- Remove OIDC::ProxyChangedEvent from expectations
- Update DomainsSyncTest to match new filtered behavior
- Update helper methods to accept proxies parameter
Assisted-by: Claude Code
I made some changes to make it more responsible. Now it updates after processing every 10% of data, so up to ten times per type. I also added some headers to make it more understandable. Now it looks like this: |
|
|
||
| desc 'Resync services with zync' | ||
| task services: :environment do | ||
| services = Service.joins(:account).merge(active_providers) |
There was a problem hiding this comment.
hmm, so in the past we didn't filter out inactive providers for this task? Wasteful in some environments.
Previously we had .includes(:proxy), does it help avoid individual queries? Idk if you checked how many queries actually run. Also I'm not sure whether .includes(:account) will also help because likely we will query the account of the service anyway. But maybe joins already includes the data for the account. I haven't checked.
| services = Service.joins(:account).merge(active_providers) | |
| services = Service.includes(:account).joins(:account).merge(active_providers) |
btw if we loop over services, because that query already has the provider accounts, that may really reduce the SQL loop if we have a custom action like before, but I don't ask you to make a custom action 👼 just a FYI, don't think the marginal benefit is worth the additional code.
There was a problem hiding this comment.
wait, this is a new task. Bob said:
The zync:resync:services task is redundant. The old proxy_domains resync │
│ already synced services through the dependency mechanism in ZyncWorker.
So we can have it if one needs to run independently but maybe redundant to run with the others? Idk, I was just asking 😬
There was a problem hiding this comment.
I wasn't aware of this dependency mechanism. I think we don't need to touch anything because it's not really redundant, it depends on the service existing or not. The mechanism is only triggered when zync returns 422, that wouldn't happen when running our :full task because it runs :services before :proxies.
If we run :proxies directly, then yes, it will sync services, but only after each failed attempt to sync a proxy, so that implies a couple of extra round trips which is suboptimal. I don't think we should remove the :services task, since it's the way to avoid the suboptimal path.
It also works for applications, that is, if you try to sync an application, it will sync the service and proxy first if they don't exist. But again, suboptimal.
porta/app/events/zync_event.rb
Lines 42 to 54 in f377917
|
|
||
| desc 'Resync services with zync' | ||
| task services: :environment do | ||
| services = Service.joins(:account).merge(active_providers) |
There was a problem hiding this comment.
ah, but you split out proxies I see now 🤔 ok, we go to the other extreme :) alright.
Current in :applications task 21:10 [92/1869]Service.joins(:account).merge(active_providers).find_each Issue: This will trigger individual queries to fetch the associated account Suggestion: Add .includes(:account) to eager load:
Reviewer akostadinov raised a valid question about whether the new :services Recommendation:
|
Introduces a lightweight event class that carries only provider_id and optional service_id in metadata. This event bypasses the full event publishing chain used by domain-specific events, allowing rake tasks to publish ZyncEvents directly without triggering unnecessary queries through intermediate event classes. The ResyncEvent is designed specifically for manual resync operations where the model is already loaded and we only need to propagate the sync request to zync, not re-walk association chains. Assisted-by: Claude Code
Changes all zync resync rake tasks to publish ZyncEvents directly via ResyncEvent instead of going through intermediate domain events (Domains::ProviderDomainsChangedEvent, OIDC::ServiceChangedEvent, Applications::ApplicationUpdatedEvent). This eliminates N+1 queries that occurred when intermediate events walked association chains to extract provider_id and service_id. For proxies and applications tasks, switches from joins to eager_load to provide preloaded associations needed for provider_id extraction. Updates tests to expect ZyncEvent.create_and_publish! calls with ResyncEvent instances instead of intermediate event expectations. Introduces load_collections helper that's called explicitly in each test after any state mutations, making the test data setup clearer. Assisted-by: Claude Code
What Bob says doesn't really make sense, the However, it's true that the events I was creating performed extra queries to provide some data to the event subscriber... data which was completely ignored, but caused N+1 problems. I refactored this again. We were publishing events like So why not jumping a step and create the
The only reason to create |
What this PR does / why we need it:
Just a rake task to dump all existing data from porta to zync, in case we need to do a full resync after say a db reset.
It also adds some tests for the new task.
Which issue(s) this PR fixes
https://redhat.atlassian.net/browse/THREESCALE-6077
Verification steps
or