[compiler] eliminate hasStreamParam branching by reusing produceIterator#15372
Draft
patrick-schultz wants to merge 3 commits into
Draft
[compiler] eliminate hasStreamParam branching by reusing produceIterator#15372patrick-schultz wants to merge 3 commits into
patrick-schultz wants to merge 3 commits into
Conversation
ca467fa to
37bb024
Compare
Lift `produceIterator` from a nested function inside `EmitStream.produce` to a `private[ir]` method on `object EmitStream`, and rewrite the stream compilation path in `Compile` to use it instead of the bespoke `compileStepper` machinery. This removes the `hasStreamParam` branching that selected between `TMPStepFunction` and `TableStageToRVDStepFunction`, making stream compilation fully generic. Key changes: - `EmitStream.produceIterator` is now a top-level method that takes an `EmitContext` and derives `elementPType` internally. Stream production is emitted in the constructor with `mb=next`, so the producer's labels are bound to the `next()` method while setup runs once in the ctor. - To make this safe, `emitBlock` gains an overload that accepts an explicit `streamMb: EmitMethodBuilder[_]`, and `EmitStream.produce` threads the outer `mb` through both the `Block` case and the `emit` helper. This ensures stream-typed `Let` bindings get their producers bound to the correct method (`next`), not `cb.emb` (which is `ctor`), preventing `SStreamControlFlow` method-mismatch assertions. - `Compile.Impl` handles `TStream` by calling `produceIterator` and wrapping the result in a `NoBoxLongIteratorAdapter`. - `CompileIterator` object deleted entirely (~280 lines), including `compileStepper`, `compileStream`, `LongIteratorWrapper`, and the deprecated `forTableStageToRVD`/`forTableMapPartitions` methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
37bb024 to
22dc08e
Compare
Collaborator
|
If #15395 merges, the test batch for this PR would only have run the following steps: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
Lift
produceIteratorfrom a nested function insideEmitStream.produceto aprivate[ir]method onobject EmitStream, and rewrite the stream compilation path inCompileto use it instead of the bespokecompileSteppermachinery. This removes thehasStreamParambranching that selected betweenTMPStepFunctionandTableStageToRVDStepFunction, making stream compilation fully generic.Key changes:
EmitStream.produceIteratoris now a top-level method that takes anEmitContextand deriveselementPTypeinternally. Stream production is emitted in the constructor withmb=next, so the producer's labels are bound to thenext()method while setup runs once in the ctor.Compile.Implnow always runs one lastForwardLetspass. When I'm done reworking ourLoweringPassinfrastructure, this should be handled there instead.Compile.ImplhandlesTStreamby callingproduceIteratorand wrapping the result in aNoBoxLongIteratorAdapter.CompileIteratorobject deleted entirely.Let-bound stream issue
This change exposed a pre-existing bug involving streams bound in a
Block. For background,EmitStream.producetakes both aCodeBuildercband aMethodBuildermb.cbis where setup code is emitted (e.g. the computation ofiinStreamIota(i)), whilembis the method which will contain the stream, i.e. where the stream's labels will be defined, and therefore where the stream must be consumed.The problem is that when encountering a bound stream in
emitBlock, we assumedmb = cb.emb, i.e. the stream will be consumed in the current method. This isn't a safe assumption, and in fact it's hard to know which method the stream will be consumed in, especially asemitBlockalso does method splitting.Rather than introducing significant complexity to
emitBlock, I think the right fix is to enforce the invariant that when we callEmit, there can be no let-bound streams. This is easily enforced byForwardLets, because a bound stream is always forwardable.Security Assessment