Skip to content

ADR 0002: Use SHA-256 for Request Fingerprinting

Status

Accepted

Date

2026-01-05

Context

The replay system needs to match incoming requests against recorded interactions. This requires a stable, deterministic fingerprinting mechanism that uniquely identifies requests based on their protocol, action, target, headers, and body content. The fingerprint must be consistent across different processes and Python sessions.

Decision

Use SHA-256 hashing with canonical JSON serialization for request fingerprinting. The canonical representation is generated by serializing a list of request fields [protocol, action, target, headers, body_hex] using json.dumps with separators=(",", ":") and sort_keys=True to ensure determinism and prevent delimiter collisions. Header ordering is preserved (no normalization).

Rationale

  • Stability: Hash-based fingerprints are deterministic and consistent across processes
  • Collision resistance: SHA-256 combined with structure-preserving JSON serialization prevents accidental collisions (e.g., when fields contain the delimiter)
  • Standard library: Available in Python's hashlib and json without external dependencies
  • Canonical ordering: Fixed JSON separators ensure identical requests produce identical fingerprints regardless of environment without normalizing request data
  • Binary-safe: Hex-encoded body in JSON handles arbitrary binary data safely
  • Compact: Fixed-length 64-character hex digest is memory-efficient

Implications

Positive Implications

  • Fingerprints are stable across application restarts and different machines
  • O(1) lookup using fingerprints as dictionary keys
  • No false positives from hash collisions in realistic scenarios
  • Header order affects matching (no normalization)
  • Works with any protocol (protocol-agnostic design)

Concerns

  • Hash computation has O(n) cost proportional to request size (mitigation: acceptable for typical request sizes)
  • Changing canonical format breaks compatibility with existing cassettes (mitigation: version 0 is in-memory only)
  • Cryptographic hashing may be overkill for this use case (mitigation: no measurable performance impact)

Alternatives

Direct Request Comparison (__eq__)

Using direct object equality comparison to find matching requests.

  • Pros: Simple implementation, no hashing overhead
  • Cons: O(n) lookup time for finding interactions in cassette, cannot use requests as dictionary keys without stable hash
  • Reason for rejection: Poor performance for large cassettes; O(n) lookup vs O(1) with hash-based index

MD5 Hashing

Using MD5 hash algorithm for fingerprinting.

  • Pros: Faster than SHA-256, sufficient for non-cryptographic use
  • Cons: Considered cryptographically broken, community perception of MD5 weakness could undermine trust
  • Reason for rejection: Reputational risk outweighs marginal performance gains

Non-cryptographic Hashes (xxHash, MurmurHash)

Using fast non-cryptographic hash algorithms like xxHash or MurmurHash.

  • Pros: Faster than SHA-256, designed for hash table use
  • Cons: Requires external dependency, not in standard library, adds maintenance burden
  • Reason for rejection: Performance difference is negligible for this use case; standard library preference

Tuple-based Keys

Using tuples of request fields directly as dictionary keys.

  • Pros: No hashing overhead, direct comparison
  • Cons: Large memory footprint for storing full request data as keys
  • Reason for rejection: Memory inefficient; stable hashing provides compact keys

Future Direction

This decision should be revisited if:

  • Performance profiling shows hashing is a bottleneck (consider faster non-cryptographic hashes)
  • Large request bodies cause excessive memory pressure (consider streaming hash calculation without full in-memory loading)
  • Cassette persistence is added and backward compatibility becomes critical (use versioning scheme)
  • Protocol-specific fingerprinting is needed (extend with pluggable fingerprint strategies)
  • Matching rules need to be customized (make fingerprinting + matching strategy injectable, including hash choice)

References