ADR 0003: Use Generator Pattern for Response Streaming¶

Status¶

Accepted

Date¶

2026-01-05

Context¶

The replay system needs to deliver response chunks to consumers. Responses may consist of multiple chunks that should be delivered in sequence. The library must support both small responses (single chunk) and potentially large responses (many chunks) without excessive memory usage.

Decision¶

Use Python generators (Iterator[ResponseChunk]) for streaming response chunks. The Broker.replay() method returns an iterator that yields chunks lazily.

Rationale¶

Memory efficiency: Chunks are yielded one at a time, avoiding loading entire response into memory
Lazy evaluation: Consumers can process chunks as they arrive, enabling streaming workflows
Pythonic: Generator pattern is idiomatic Python for sequential data
Composable: Generators can be chained, filtered, or transformed using standard itertools
Backpressure: Consumer controls iteration pace, natural flow control
Testable: Easy to collect into list for testing (list(broker.replay(request)))

Implications¶

Positive Implications¶

Scales to arbitrarily large responses without memory pressure
Natural fit for streaming protocols (HTTP chunked encoding, gRPC streaming, etc.)
Consumers can process chunks incrementally (progress indicators, real-time processing)
Simple implementation using yield from interaction.response_chunks
Clear iteration boundary (StopIteration signals completion)

Concerns¶

Consumers must iterate to completion with no random access to middle chunks (mitigation: acceptable tradeoff for memory efficiency)
Generator state cannot be reset for re-iteration (mitigation: call replay() again to get new generator)
Error handling requires try/except around iteration, not just the initial call (mitigation: standard Python pattern for iterators)
Some consumers may prefer materialized list (mitigation: use list() wrapper when needed)

Alternatives¶

Return Tuple of Chunks¶

Returning all chunks as a tuple immediately.

Pros: Simpler for consumers that need all chunks upfront, immutable collection
Cons: Entire response must be loaded into memory before returning, no memory savings for large responses
Reason for rejection: Defeats purpose of streaming; memory inefficient for large responses

Return List of Chunks¶

Returning all chunks as a list immediately.

Pros: Familiar collection type, random access support
Cons: Same memory concerns as tuple, mutable return type contradicts immutability principle
Reason for rejection: Memory inefficiency plus mutability concerns

Async Generator (`AsyncIterator[ResponseChunk]`)¶

Using async generators for streaming chunks.

Pros: Natural fit for I/O-bound operations, enables concurrent processing
Cons: Adds complexity without clear benefit for in-memory replay, forces async/await on all consumers
Reason for rejection: Unnecessary complexity for in-memory cassettes; consider for future I/O-based backends

Callback-based API¶

Using callback functions to deliver chunks (e.g., replay(request, on_chunk=callback)).

Pros: Familiar pattern from JavaScript, enables push-based flow
Cons: Less Pythonic, harder to compose, inverted control flow is harder to test, no natural backpressure mechanism
Reason for rejection: Generators provide better ergonomics and testability in Python

Future Direction¶

This decision should be revisited if:

Async I/O support is added (consider AsyncIterator[ResponseChunk])
Profiling shows generator overhead is measurable (unlikely)
Random access to chunks is frequently needed (consider hybrid approach with optional materialization)