Skip to content

Architecture Decision Record (ADR)

Title

Track and Report Full Reference Chain in Circular Reference Errors

Status

Accepted

Date

2025-10-13

Context

Following ADR 0003 (Structured Exception Design), CircularReferenceError was designed to include structured data about the error. The initial implementation included only the variable name that caused the circular reference:

class CircularReferenceError(EnvResolveError):
    def __init__(self, variable_name: str):
        self.variable_name = variable_name
        super().__init__(f"Circular reference detected: {variable_name}")

However, when debugging circular references in complex configurations, users need to see the full reference chain that led to the cycle, not just the variable where the cycle was detected.

Example scenario:

A=${B}
B=${C}
C=${D}
D=${A}  # Cycle here

Current error: "Circular reference detected: A"

  • User doesn't know which variables are involved in the cycle
  • Hard to trace back through the reference chain
  • Requires manual inspection of all variables to find the loop

Desired error: "Circular reference detected: A -> B -> C -> D -> A"

  • Clear visualization of the complete cycle
  • Easy to identify all variables involved
  • Immediate understanding of the problem

Decision

Extend CircularReferenceError to track and report the full reference chain that forms the cycle:

class CircularReferenceError(EnvResolveError):
    def __init__(self, variable_name: str, chain: list[str] | None = None):
        self.variable_name = variable_name
        self.chain = chain or []
        chain_str = " -> ".join(self.chain) if self.chain else variable_name
        msg = f"Circular reference detected: {chain_str}"
        super().__init__(msg)

Implementation approach:

  • Maintain a stack: list[str] during recursive expansion
  • When a variable already in the stack is encountered, extract the cycle portion
  • Pass the cycle chain to CircularReferenceError constructor
  • Format chain as "A -> B -> C -> A" in error message

Algorithm:

def _resolve(var_name: str, env: dict[str, str], stack: list[str]) -> str:
    if var_name in stack:
        # Found cycle - extract the cycle portion
        cycle_start = stack.index(var_name)
        cycle = [*stack[cycle_start:], var_name]
        raise CircularReferenceError(var_name, cycle)

    stack.append(var_name)
    try:
        return _expand_text(env[var_name], env, stack)
    finally:
        stack.pop()

Rationale

Why track full chain?

  • Debugging efficiency: Users immediately see the problem without manual tracing
  • Error clarity: Complex cycles (A -> B -> C -> D -> A) are instantly visible
  • Actionable information: Users know exactly which variables to fix

Why format as "A -> B -> A"?

  • Visual clarity: Arrow notation is intuitive and commonly used
  • Cycle visibility: Showing start and end makes the loop obvious
  • Familiarity: Matches stack trace and dependency chain conventions

Why list of strings over single string?

  • Programmatic access: Callers can analyze the chain (len(exc.chain) for cycle length)
  • Testing: Can assert specific cycles in tests
  • Future flexibility: Can format chain differently (JSON, graph, etc.)
  • Consistency: Follows ADR 0003's principle of structured data over formatted strings

Implications

Positive Implications

  • Better user experience: Errors are immediately actionable
  • Reduced debugging time: No need to manually trace through variable definitions
  • Professional error messages: Clear, informative, helpful
  • Testing improvement: Can verify exact cycle detection logic
  • Programmatic error handling: Tools can analyze circular dependencies automatically

Concerns

  • Memory overhead: Storing chain list for each error
  • Mitigation: Chains are typically 2-10 variables; minimal memory impact
  • Errors are exceptional path, not hot path
  • Stack management complexity: Need to pass and maintain stack through recursion
  • Mitigation: Stack is implementation detail, not exposed in public API
  • Clear with try/finally pattern
  • Chain extraction logic: Must correctly identify cycle portion
  • Mitigation: Simple slice operation stack[cycle_start:]
  • Well-tested in unit tests

Alternatives

Variable Name Only (Original Design)

Keep only variable_name without chain:

class CircularReferenceError(EnvResolveError):
    def __init__(self, variable_name: str):
        super().__init__(f"Circular reference detected: {variable_name}")
  • Pros:
  • Simplest implementation
  • Minimal memory usage
  • No stack tracking needed
  • Cons:
  • Poor debugging experience
  • User must manually trace references
  • Hard to identify long cycles
  • Rejection reason: Sacrifices usability for minimal complexity reduction

Full Stack Trace in Error Message

Include full Python stack trace showing function calls:

import traceback
msg = f"Circular reference: {variable_name}\n{traceback.format_stack()}"
  • Pros:
  • Shows complete execution context
  • Includes line numbers and file names
  • Cons:
  • Cluttered with implementation details (internal function names)
  • Confuses users with irrelevant information
  • Chain is buried in noise
  • Rejection reason: Too much information, not user-focused

Lazy Chain Computation

Don't track chain during expansion; recompute if error occurs:

def find_cycle(var_name: str, env: dict[str, str]) -> list[str]:
    # Re-traverse to find cycle
    visited = []
    current = var_name
    while current not in visited:
        visited.append(current)
        current = extract_next_var(env[current])
    return visited[visited.index(current):]
  • Pros:
  • No overhead during normal execution
  • Chain only computed when error occurs
  • Cons:
  • Complex re-traversal logic
  • May not find exact same cycle (if nested expansion)
  • Requires parsing variable references again
  • Rejection reason: Complexity outweighs benefits; expansion already maintains stack

Set-Based Cycle Detection Only

Use a set for fast lookup, don't track order:

visited = set()
if var_name in visited:
    raise CircularReferenceError(var_name, list(visited))
  • Pros:
  • Fast O(1) lookup
  • Simple implementation
  • Cons:
  • Set is unordered; can't show reference chain in correct order
  • Cycle path is lost (which variables led to which)
  • Error message is confusing: "Circular reference in {C, A, B, D}" (no order)
  • Rejection reason: Order is critical for understanding the problem

Future Direction

  • Cycle visualization: For complex cycles, consider:
  • ASCII art diagram showing the cycle
  • Graphviz DOT format for automated visualization
  • Suggestion of which variable to change

  • Cycle length limits: If cycles exceed N variables, truncate display:

"Circular reference: A -> B -> ... -> Y -> Z -> A (50 variables in cycle)"
  • Interactive debugging: If running in interactive environment:
  • Highlight cycle variables in configuration file
  • Suggest breaking the cycle with environment override

  • Multiple cycle detection: Currently stops at first cycle found:

  • Consider detecting all cycles in a configuration
  • Report all cycles together for comprehensive fix

  • Performance monitoring: Track cycle detection overhead:

  • If stack management becomes bottleneck, optimize
  • Consider specialized data structure for large configurations

References

  • ADR 0003: Structured Exception Design (establishes pattern of structured data in exceptions)
  • Implementation: src/envresolve/exceptions.py::CircularReferenceError
  • Implementation: src/envresolve/services/expansion.py::_resolve
  • Test cases: tests/unit/test_expansion.py::test_circular_reference_raises_error
  • Graph cycle detection algorithms: https://en.wikipedia.org/wiki/Cycle_detection
  • Error message best practices: https://developers.google.com/tech-writing/error-messages