ADR 0002: Use SHA-256 for Request Fingerprinting¶
Status¶
Accepted
Date¶
2026-01-05
Context¶
The replay system needs to match incoming requests against recorded interactions. This requires a stable, deterministic fingerprinting mechanism that uniquely identifies requests based on their protocol, action, target, headers, and body content. The fingerprint must be consistent across different processes and Python sessions.
Decision¶
Use SHA-256 hashing with canonical JSON serialization for request fingerprinting. The canonical representation is generated by serializing a list of request fields [protocol, action, target, headers, body_hex] using json.dumps with separators=(",", ":") and sort_keys=True to ensure determinism and prevent delimiter collisions. Header ordering is preserved (no normalization).
Rationale¶
- Stability: Hash-based fingerprints are deterministic and consistent across processes
- Collision resistance: SHA-256 combined with structure-preserving JSON serialization prevents accidental collisions (e.g., when fields contain the delimiter)
- Standard library: Available in Python's
hashlibandjsonwithout external dependencies - Canonical ordering: Fixed JSON separators ensure identical requests produce identical fingerprints regardless of environment without normalizing request data
- Binary-safe: Hex-encoded body in JSON handles arbitrary binary data safely
- Compact: Fixed-length 64-character hex digest is memory-efficient
Implications¶
Positive Implications¶
- Fingerprints are stable across application restarts and different machines
- O(1) lookup using fingerprints as dictionary keys
- No false positives from hash collisions in realistic scenarios
- Header order affects matching (no normalization)
- Works with any protocol (protocol-agnostic design)
Concerns¶
- Hash computation has O(n) cost proportional to request size (mitigation: acceptable for typical request sizes)
- Changing canonical format breaks compatibility with existing cassettes (mitigation: version 0 is in-memory only)
- Cryptographic hashing may be overkill for this use case (mitigation: no measurable performance impact)
Alternatives¶
Direct Request Comparison (__eq__)¶
Using direct object equality comparison to find matching requests.
- Pros: Simple implementation, no hashing overhead
- Cons: O(n) lookup time for finding interactions in cassette, cannot use requests as dictionary keys without stable hash
- Reason for rejection: Poor performance for large cassettes; O(n) lookup vs O(1) with hash-based index
MD5 Hashing¶
Using MD5 hash algorithm for fingerprinting.
- Pros: Faster than SHA-256, sufficient for non-cryptographic use
- Cons: Considered cryptographically broken, community perception of MD5 weakness could undermine trust
- Reason for rejection: Reputational risk outweighs marginal performance gains
Non-cryptographic Hashes (xxHash, MurmurHash)¶
Using fast non-cryptographic hash algorithms like xxHash or MurmurHash.
- Pros: Faster than SHA-256, designed for hash table use
- Cons: Requires external dependency, not in standard library, adds maintenance burden
- Reason for rejection: Performance difference is negligible for this use case; standard library preference
Tuple-based Keys¶
Using tuples of request fields directly as dictionary keys.
- Pros: No hashing overhead, direct comparison
- Cons: Large memory footprint for storing full request data as keys
- Reason for rejection: Memory inefficient; stable hashing provides compact keys
Future Direction¶
This decision should be revisited if:
- Performance profiling shows hashing is a bottleneck (consider faster non-cryptographic hashes)
- Large request bodies cause excessive memory pressure (consider streaming hash calculation without full in-memory loading)
- Cassette persistence is added and backward compatibility becomes critical (use versioning scheme)
- Protocol-specific fingerprinting is needed (extend with pluggable fingerprint strategies)
- Matching rules need to be customized (make fingerprinting + matching strategy injectable, including hash choice)