ADR 0005: Build Fingerprint Index for O(1) Lookup¶
Status¶
Accepted
Date¶
2026-01-05
Context¶
The cassette stores interactions as a tuple and needs to find interactions matching a given request fingerprint. Without an index, finding a match requires linear search (O(n)) through all interactions. For large cassettes with hundreds or thousands of interactions, this becomes a performance bottleneck.
Decision¶
Build a dict[RequestFingerprint, int] index at cassette construction that maps fingerprints to their first occurrence position in the interactions tuple. The index is built using a model validator and stored as a private attribute.
Rationale¶
- O(1) lookup: Dictionary lookup is constant time, vastly superior to O(n) linear search
- Initialization cost: Index is built once during cassette creation, amortized over many lookups
- Memory efficiency: Index stores only integers (positions), not duplicate interaction data
- First-match alignment: Index naturally stores first occurrence, implementing ADR 0004
- Immutable after creation: Index is built during construction and never modified
- Transparent: Implementation detail hidden from users (private attribute)
Implications¶
Positive Implications¶
- Replay performance is independent of cassette size (O(1) vs O(n))
- Enables large cassettes (thousands of interactions) without performance degradation
- Simple implementation using standard library dict
- Index construction is deterministic and testable
- Memory overhead is minimal (one int per unique fingerprint)
Concerns¶
- Upfront cost to build index during cassette creation (mitigation: acceptable one-time cost amortized over many lookups)
- Uses
object.__setattr__to mutate frozen model (mitigation: required for immutability, Pydantic pattern for private attributes) - Memory overhead for the index dict (mitigation: minimal, ~8 bytes per unique fingerprint)
- Index is not exposed in serialization or repr (mitigation: intentional design, private implementation detail)
Alternatives¶
Linear Search (No Index)¶
Finding interactions by iterating through the tuple without an index.
- Pros: No memory overhead, simpler implementation
- Cons: O(n) lookup time becomes bottleneck for large cassettes, performance degrades proportionally with cassette size
- Reason for rejection: Performance is unacceptable for production use with large cassettes (hundreds or thousands of interactions)
Build Index on First Lookup (Lazy Initialization)¶
Deferring index construction until the first lookup call.
- Pros: Defers cost if cassette is never used
- Cons: Requires mutable state to track "index built" flag, violates frozen model immutability principle
- Reason for rejection: Breaks immutability guarantees; construction-time indexing is simpler and aligns with frozen model design
External Index (Separate Data Structure)¶
Maintaining the index as a separate object outside the cassette.
- Pros: Clear separation of concerns
- Cons: Requires managing two objects (cassette + index) in sync, more complex API (users must pass both), risk of desynchronization if index is mutated
- Reason for rejection: API complexity and synchronization risk outweigh benefits; internal index is simpler and safer
Use List Instead of Tuple for Interactions¶
Storing interactions in a mutable list to enable direct mutation.
- Pros: No workaround needed for frozen models
- Cons: Mutable collection contradicts immutability principle, no advantage for index-based access (tuple supports indexing)
- Reason for rejection: Immutability is a core design principle; tuple is the correct choice for frozen data
Store Interactions in Dict (No Tuple)¶
Using a dict mapping fingerprints to interactions directly, eliminating the tuple.
- Pros: Built-in O(1) lookup without separate index
- Cons: Loses insertion order for duplicates (violates ADR 0004), cannot represent multiple interactions with same fingerprint, fundamentally different data model
- Reason for rejection: Incompatible with first-match strategy; duplicates are legitimate and must be preserved in order
Future Direction¶
This decision should be revisited if:
- Profiling shows index build time is a bottleneck (unlikely, dict insert is fast)
- Memory pressure from large indexes becomes an issue (consider specialized data structures)
- Alternative frozen dataclass patterns emerge (Python 3.13+ features)
- Cassette persistence requires serializing/deserializing the index (versioning strategy needed)
- Matching strategy becomes pluggable (index should be built from the configured fingerprinting strategy)