ADR 0003: Use Generator Pattern for Response Streaming¶
Status¶
Accepted
Date¶
2026-01-05
Context¶
The replay system needs to deliver response chunks to consumers. Responses may consist of multiple chunks that should be delivered in sequence. The library must support both small responses (single chunk) and potentially large responses (many chunks) without excessive memory usage.
Decision¶
Use Python generators (Iterator[ResponseChunk]) for streaming response chunks. The Broker.replay() method returns an iterator that yields chunks lazily.
Rationale¶
- Memory efficiency: Chunks are yielded one at a time, avoiding loading entire response into memory
- Lazy evaluation: Consumers can process chunks as they arrive, enabling streaming workflows
- Pythonic: Generator pattern is idiomatic Python for sequential data
- Composable: Generators can be chained, filtered, or transformed using standard itertools
- Backpressure: Consumer controls iteration pace, natural flow control
- Testable: Easy to collect into list for testing (
list(broker.replay(request)))
Implications¶
Positive Implications¶
- Scales to arbitrarily large responses without memory pressure
- Natural fit for streaming protocols (HTTP chunked encoding, gRPC streaming, etc.)
- Consumers can process chunks incrementally (progress indicators, real-time processing)
- Simple implementation using
yield from interaction.response_chunks - Clear iteration boundary (StopIteration signals completion)
Concerns¶
- Consumers must iterate to completion with no random access to middle chunks (mitigation: acceptable tradeoff for memory efficiency)
- Generator state cannot be reset for re-iteration (mitigation: call
replay()again to get new generator) - Error handling requires try/except around iteration, not just the initial call (mitigation: standard Python pattern for iterators)
- Some consumers may prefer materialized list (mitigation: use
list()wrapper when needed)
Alternatives¶
Return Tuple of Chunks¶
Returning all chunks as a tuple immediately.
- Pros: Simpler for consumers that need all chunks upfront, immutable collection
- Cons: Entire response must be loaded into memory before returning, no memory savings for large responses
- Reason for rejection: Defeats purpose of streaming; memory inefficient for large responses
Return List of Chunks¶
Returning all chunks as a list immediately.
- Pros: Familiar collection type, random access support
- Cons: Same memory concerns as tuple, mutable return type contradicts immutability principle
- Reason for rejection: Memory inefficiency plus mutability concerns
Async Generator (AsyncIterator[ResponseChunk])¶
Using async generators for streaming chunks.
- Pros: Natural fit for I/O-bound operations, enables concurrent processing
- Cons: Adds complexity without clear benefit for in-memory replay, forces async/await on all consumers
- Reason for rejection: Unnecessary complexity for in-memory cassettes; consider for future I/O-based backends
Callback-based API¶
Using callback functions to deliver chunks (e.g., replay(request, on_chunk=callback)).
- Pros: Familiar pattern from JavaScript, enables push-based flow
- Cons: Less Pythonic, harder to compose, inverted control flow is harder to test, no natural backpressure mechanism
- Reason for rejection: Generators provide better ergonomics and testability in Python
Future Direction¶
This decision should be revisited if:
- Async I/O support is added (consider
AsyncIterator[ResponseChunk]) - Profiling shows generator overhead is measurable (unlikely)
- Random access to chunks is frequently needed (consider hybrid approach with optional materialization)