Skip to content

feat: N=1 cooperative fiber scheduler replacing per-thread host threads#137

Draft
smmathews wants to merge 5 commits into
ran-j:mainfrom
smmathews:feat/ultramodern-scheduler
Draft

feat: N=1 cooperative fiber scheduler replacing per-thread host threads#137
smmathews wants to merge 5 commits into
ran-j:mainfrom
smmathews:feat/ultramodern-scheduler

Conversation

@smmathews

Copy link
Copy Markdown
Contributor

What was wrong

PS2 thread scheduling is cooperative and strictly priority-ordered. When two threads are runnable, the highest-priority one runs uninterrupted until it explicitly yields — there is no time-slicing, no preemption, no "two threads run at once." The previous implementation modeled each PS2 thread as a host std::thread, handing scheduling decisions to the OS. That is structurally incompatible with EE semantics.

The visible symptoms in DQ8:

  • SleepThread + ResumeThread spurious wake. On a real PS2, ResumeThread un-suspends a sleeping thread but does not count as a WakeupThread permit — the thread goes back to sleep. The old model woke the host thread unconditionally, so DQ8's sleep loops would proceed when they shouldn't and corrupt the state machine they were protecting.
  • Priority inversion. Guest code that spawns a high-priority helper thread and spins waiting for it to signal a semaphore could deadlock: the host OS might not schedule the helper before the waiter's time slice expired, but the waiter had no yield point so it would hold the host CPU and starve the helper.
  • Shutdown hangs. There was no mechanism to interrupt a running host thread that was deep in a recompiled block and never hit a yield. scheduler_shutdown could block indefinitely.
  • IRQ/DMAC alarm handlers. Interrupt handlers run as "borrowed" host workers — they enter guest execution briefly, call a syscall, and return. The previous model had no concept of this: a borrowed worker calling SuspendThread(0) would try to look up its own thread ID, get -1, corrupt the thread table, and eventually crash.

These problems are not DQ8-specific. Any game that relies on EE-accurate cooperative scheduling, explicit priority ordering, or the expected behavior of SleepThread/WakeupThread can hit them. DQ8 is where they became unavoidable.

What changed

New: N=1 fiber scheduler (ps2_scheduler.{h,cpp,_internal.h}, ps2_fiber.{h,cpp})

All PS2 threads now run as fibers (POSIX ucontext_t with mmap'd stacks and guard pages) on a single dedicated executor thread. Only one fiber runs at any moment. The scheduler maintains a priority-sorted run queue; when the currently-running fiber blocks or yields, the next-highest-priority fiber resumes immediately. This is structurally the same as what the EE BIOS does.

The correctness-critical protocol is the arm_park / block_current wakeup window. The outline: a fiber publishes itself to an object's wait-list under the object's mutex, releases that mutex, then calls arm_park() (sets state=Blocked) followed by block_current() (actually suspends). A waker that fires after the publish but before block_current() sets wake_pending rather than touching the fiber's run-queue links — block_current() sees this and returns immediately without ever parking. This closes the lost-wakeup window that exists in any lock-then-block pattern.

SleepThread now uses a Mesa re-check loop. Each iteration: arm_park → block_current → re-acquire lock → check terminate / forceRelease / wakeupCount. A ResumeThread that finds no permit re-parks the fiber immediately. A WakeupThread that increments wakeupCount to 1 releases it. These are now unambiguous.

Generation tokens on semaphore and event flag wait-lists. SemaInfo::waitList and EventFlagInfo::waitList now store {tid, token} pairs where the token encodes {generation, tid}. All cross-thread wakeups (SignalSema, DeleteSema, SetEventFlag, DeleteEventFlag, vsync wait-list, notifyRuntimeStop) use enqueue_external_wakeup_validated(tid, token) which drops the wakeup if the tid has been recycled.

Borrowed-worker guards. Seven self-targeting syscalls (SuspendThread, ResumeThread, TerminateThread, ReferThreadStatus, CancelWakeupThread, ChangeThreadPriority, RotateThreadReadyQueue) now check g_currentThreadId == -1 before resolving tid == 0 → self and return KE_ILLEGAL_THID. Blocking syscalls gate ThreadInfo creation on onFiber, so a borrowed worker can never alias g_threads[-1].

Shutdown. scheduler_set_stop_callback lets the runtime register a callback that fires inside scheduler_shutdown, kicking the executor's dispatch loop. yield_point() has an unsampled fast-path: if g_stop && terminateRequested, it throws ThreadExitException immediately rather than waiting for the 128-backedge gate. Both are necessary: the callback wakes a fiber blocked in g_sched_cv.wait, and the fast-path exits a fiber that is running but rarely yields.

Finished-branch teardown fix. The executor's Finished branch now erases the g_fiber_map entry and moves the FiberContext out to a local before dropping g_sched_mutex for the munmap. Previously the map entry was erased after re-acquiring the lock, and a borrowed worker racing in async_guest_beginStartThread on the same tid during that window could have its new FiberContext immediately destroyed by the old teardown's post-relock erase.

How it applies to DQ8

DQ8 has a thread hierarchy with strict priority ordering: a main game loop thread at priority 32, several subsystem threads at higher priorities (lower numbers), and timer/alarm callbacks that fire as borrowed workers. The old model's incorrect SleepThread behavior was the most immediate symptom, but the priority and suspend/resume semantics were also wrong in ways that matter for the save/load system and the enemy AI scheduler. The new model makes all of these correct by construction.

How it applies to other games

Any recompiled PS2 game that uses the EE threading syscalls gets these fixes. The correctness invariants (single executor, priority queue, publish-before-arm wakeup protocol, Mesa sleeps, generation-validated wakeups, borrowed-worker safety) are structural — they apply to the scheduler itself, not to DQ8-specific code paths.

Tests

46 new scheduler test cases in ps2_runtime_expansion_tests.cpp covering:

  • Publish/arm wakeup window: SignalSema hammered by a concurrent thread with no coordination — the waiter must never miss a signal (X1).
  • notify_all coverage under async_guest_begin/end contention (X2).
  • SleepThread Mesa loop: ResumeThread without a WakeupThread permit must not wake the sleeper; the sleeper re-parks and eventually wakes on a real WakeupThread (Y1).
  • Shutdown with blocked fibers, running fibers, and fibers that would re-block through block_current's shutdown guard (Y2, W3).
  • Borrowed-worker self-target guard across all 7 guarded syscalls, with g_threads[-1] non-creation asserted after each call (Y3).
  • Sema/event delete drains all N waiters via generation-validated wakeups; stale-token rejection confirmed with a fabricated generation mismatch (Y4a/Y4b).
  • Finished-branch tid-reuse regression: a borrowed worker cycles through 30 create/start/exit/delete rounds asserting every fiber ran to completion (Z1).
  • Priority ordering, FIFO within equal priority, priority inheritance on join, suspend/resume nesting, vsync waiter termination, stress tests under racing signalers.

Replaces the thread-per-PS2-thread model with a single executor thread
(g_guest_thread) running all guest fibers via POSIX ucontext. Implements
correct publish-before-arm wakeup ordering, SleepThread Mesa loop,
generation-token-validated cross-thread wakeups, borrowed-worker guards,
and a clean shutdown path.

New files:
- ps2_scheduler.{h,cpp,_internal.h}: scheduler core, arm_park/block_current
  protocol, N=1 executor, async_guest_begin/end, yield_point fast-path exit
- ps2_fiber.{h,cpp}: ucontext fiber backend with guard pages

Key correctness properties:
- Publish-before-arm: waiters push to object wait-list under object mutex,
  release it, then arm_park(). Wakers seeing g_running_fiber==fc set
  wake_pending instead of touching fc->next; block_current() consumes it.
  make_ready/enqueue_external_wakeup have no state==Blocked gate so
  publish-window wakes are never dropped.
- SleepThread Mesa loop: re-arms every iteration; distinguishes terminate /
  forceRelease / genuine wakeupCount / spurious; re-asserts THS_WAIT on
  spurious so ResumeThread without a permit does not count as a wakeup.
- Generation tokens: SemaInfo/EventFlagInfo waitLists store {tid, token}
  pairs; all cross-thread wakeups use enqueue_external_wakeup_validated so
  a recycled tid with a new generation drops stale wakeups.
- Borrowed-worker guards: all 7 self-targeting syscalls (tid==0 → self)
  return KE_ILLEGAL_THID when g_currentThreadId==-1 (IRQ/alarm context).
- Finished-branch teardown: g_fiber_map entry erased under g_sched_mutex
  before dropping the lock for munmap, closing a UAF window where a
  borrowed worker recycling the tid could have its new FiberContext
  immediately clobbered by the old teardown's post-relock erase.
- Shutdown: stop callback + yield_point fast-path throw ensure executor
  exits even for fibers that rarely hit the 128-sample yield gate.

Tests: 46 new scheduler test cases (Y1/Y2/Y3/Y4/Z1 suites) added to
ps2_runtime_expansion_tests.cpp covering all the above properties plus
regression tests for every correctness class fixed during development.
@smmathews smmathews force-pushed the feat/ultramodern-scheduler branch from dcd18be to b01382c Compare June 20, 2026 09:56
@smmathews

Copy link
Copy Markdown
Contributor Author

leaving this in draft while I give it a more thorough review, but overall I think it should be a more robust solution than #134

~446 lines removed, all behavior-neutral:

- Remove rate-limited RUNTIME_LOG/debug-trace blocks from Thread.cpp,
  Sync.cpp, and Interrupt.cpp that buried the Mesa-monitor control flow
  under counter statics and stream expressions.
- Remove dead log-feeder locals (beforeCount, afterCount, newBits,
  statusAfter, newWakeupCount) that only existed to feed the deleted logs.
- Unify dispatchIntcHandlersForCause and dispatchDmacHandlersForCause into
  one dispatchHandlersForCause helper — they were identical in structure,
  differing only in map, mask, and tag.
- Collapse triple-line // ---- banner comments above functions to nothing
  or a single line; keep the explanatory blocks above wake_locked,
  arm_park, block_current, and guest_executor_main.
- Compress over-documented field comments in ps2_scheduler_internal.h to
  two-liners with pointers to the authoritative explanation in the .cpp.
- Drop the unused runtime parameter from waitWhileSuspended and the
  dead s_rpcInvokeStackBase thread-local.
- Remove one unreachable line in ps2_fiber.cpp (the author's own comment
  already called it unreachable).
- Trim the pc==0 diagnostic dump in ps2_runtime.cpp (normal thread return,
  not an error).
@smmathews

Copy link
Copy Markdown
Contributor Author

still leaving in draft, I don't like how vita is broken and I can't test it. working on it.

guest_executor_main unlocked before munmap but never re-locked, so the
next g_sched_cv.wait() ran with an unowned unique_lock. On the break
path lk.unlock() then threw EPERM, propagating out of the thread
function and hitting std::terminate.
…n hardening

- ps2_fiber.cpp: add SceFiber backend for PLATFORM_VITA; replace the old
  #error stub with a full implementation using sceFiberRun/sceFiberReturnToThread
  with guard-paged mmap stacks. sceFiberFinalize is always called (not just when
  started) since sceFiberInitialize always runs in ps2fiber_alloc.
  pthread backend gains guard-paged mmap stacks via pthread_attr_setstack.

- ps2_fiber.h: widen ps2fiber_set_tid guard to cover PS2X_FIBER_PTHREAD too.

- ps2_scheduler.cpp: call ps2fiber_set_tid in create_fiber for Vita/pthread;
  join_fiber handles non-fiber callers (host thread TerminateThread) via
  g_sched_cv.wait instead of a fiber yield; block_current throws
  ThreadExitException on shutdown instead of returning WokenInWindow so Mesa
  wait-loops unwind cleanly without an extra re-block cycle.

- Sync.cpp: WaitEventFlag uses a Mesa-style re-block loop for fibers;
  non-fiber path gets one bounded yield only.

- Lifecycle.cpp: reset g_activeThreads in notifyRuntimeStop so test cycles
  that restart the scheduler begin with a clean count.

- RPC.cpp: waitList stores pair<int,uint64_t>; use .first when extracting tid.

- CMakeLists.txt: add PS2X_FIBER_PTHREAD option (Linux TSan/ASan testing)
  and PS2X_SANITIZE cache variable.

- Test fixes: T4 no longer spins (avoids priority deadlock); T5 waits for
  counter stability before sampling suspend; T10/T11 use WakeupThread instead
  of enqueue_external_wakeup; U2 priority/sequence corrected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant