Skip to content

ADR 0005: Build Fingerprint Index for O(1) Lookup

Status

Accepted

Date

2026-01-05

Context

The cassette stores interactions as a tuple and needs to find interactions matching a given request fingerprint. Without an index, finding a match requires linear search (O(n)) through all interactions. For large cassettes with hundreds or thousands of interactions, this becomes a performance bottleneck.

Decision

Build a dict[RequestFingerprint, int] index at cassette construction that maps fingerprints to their first occurrence position in the interactions tuple. The index is built using a model validator and stored as a private attribute.

Rationale

  • O(1) lookup: Dictionary lookup is constant time, vastly superior to O(n) linear search
  • Initialization cost: Index is built once during cassette creation, amortized over many lookups
  • Memory efficiency: Index stores only integers (positions), not duplicate interaction data
  • First-match alignment: Index naturally stores first occurrence, implementing ADR 0004
  • Immutable after creation: Index is built during construction and never modified
  • Transparent: Implementation detail hidden from users (private attribute)

Implications

Positive Implications

  • Replay performance is independent of cassette size (O(1) vs O(n))
  • Enables large cassettes (thousands of interactions) without performance degradation
  • Simple implementation using standard library dict
  • Index construction is deterministic and testable
  • Memory overhead is minimal (one int per unique fingerprint)

Concerns

  • Upfront cost to build index during cassette creation (mitigation: acceptable one-time cost amortized over many lookups)
  • Uses object.__setattr__ to mutate frozen model (mitigation: required for immutability, Pydantic pattern for private attributes)
  • Memory overhead for the index dict (mitigation: minimal, ~8 bytes per unique fingerprint)
  • Index is not exposed in serialization or repr (mitigation: intentional design, private implementation detail)

Alternatives

Linear Search (No Index)

Finding interactions by iterating through the tuple without an index.

  • Pros: No memory overhead, simpler implementation
  • Cons: O(n) lookup time becomes bottleneck for large cassettes, performance degrades proportionally with cassette size
  • Reason for rejection: Performance is unacceptable for production use with large cassettes (hundreds or thousands of interactions)

Build Index on First Lookup (Lazy Initialization)

Deferring index construction until the first lookup call.

  • Pros: Defers cost if cassette is never used
  • Cons: Requires mutable state to track "index built" flag, violates frozen model immutability principle
  • Reason for rejection: Breaks immutability guarantees; construction-time indexing is simpler and aligns with frozen model design

External Index (Separate Data Structure)

Maintaining the index as a separate object outside the cassette.

  • Pros: Clear separation of concerns
  • Cons: Requires managing two objects (cassette + index) in sync, more complex API (users must pass both), risk of desynchronization if index is mutated
  • Reason for rejection: API complexity and synchronization risk outweigh benefits; internal index is simpler and safer

Use List Instead of Tuple for Interactions

Storing interactions in a mutable list to enable direct mutation.

  • Pros: No workaround needed for frozen models
  • Cons: Mutable collection contradicts immutability principle, no advantage for index-based access (tuple supports indexing)
  • Reason for rejection: Immutability is a core design principle; tuple is the correct choice for frozen data

Store Interactions in Dict (No Tuple)

Using a dict mapping fingerprints to interactions directly, eliminating the tuple.

  • Pros: Built-in O(1) lookup without separate index
  • Cons: Loses insertion order for duplicates (violates ADR 0004), cannot represent multiple interactions with same fingerprint, fundamentally different data model
  • Reason for rejection: Incompatible with first-match strategy; duplicates are legitimate and must be preserved in order

Future Direction

This decision should be revisited if:

  • Profiling shows index build time is a bottleneck (unlikely, dict insert is fast)
  • Memory pressure from large indexes becomes an issue (consider specialized data structures)
  • Alternative frozen dataclass patterns emerge (Python 3.13+ features)
  • Cassette persistence requires serializing/deserializing the index (versioning strategy needed)
  • Matching strategy becomes pluggable (index should be built from the configured fingerprinting strategy)

References