ExpressionProxy¶
Overview¶
ExpressionProxy is a catalyst expression that acts as a caching proxy wrapper around another expression. It intercepts evaluation requests and delegates them to a SubExprEvaluationRuntime, which provides cached evaluation results to avoid redundant computation of common subexpressions during interpreted expression evaluation.
Syntax¶
This is an internal Catalyst expression used by the Spark SQL engine and is not directly accessible through SQL syntax or DataFrame API. It is automatically created by the query optimizer for subexpression elimination.
Arguments¶
| Argument | Type | Description |
|---|---|---|
| child | Expression | The wrapped expression that this proxy represents |
| id | Int | Unique identifier for this proxy instance used for caching and equality |
| runtime | SubExprEvaluationRuntime | Runtime cache manager that handles evaluation and caching logic |
Return Type¶
Returns the same data type as the wrapped child expression (child.dataType).
Supported Data Types¶
Supports all data types that the wrapped child expression supports, as it acts as a transparent proxy without type restrictions.
Algorithm¶
- When
eval()is called, delegates toruntime.getEval(this)instead of directly evaluating the child - The runtime checks its cache first before falling back to actual expression evaluation
- If cache miss occurs, calls
proxyEval()which directly evaluates the child expression - Uses the unique
idfor cache key management and proxy instance equality - Maintains the same nullability and data type characteristics as the wrapped expression
Partitioning Behavior¶
Does not affect partitioning behavior as it operates at the expression evaluation level within individual partitions:
- Preserves existing partitioning schemes
- No shuffle operations required
- Operates independently on each partition's data
Edge Cases¶
- Null handling behavior matches the wrapped child expression exactly
- Throws
QueryExecutionErrors.cannotGenerateCodeForExpressionErrorif code generation is attempted - Equality comparison only considers the
idfield, ignoring child expression differences - Hash code computation uses only the
idto ensure consistent caching behavior - Falls back to direct child evaluation through
proxyEval()when cache mechanisms fail
Code Generation¶
Does not support Tungsten code generation and will throw a QueryExecutionErrors.cannotGenerateCodeForExpressionError if code generation is attempted. This expression is designed exclusively for interpreted expression evaluation mode.
Examples¶
-- Not directly accessible through SQL
-- Automatically used by Spark internally for subexpression elimination
// Internal usage only - not accessible through DataFrame API
// Created automatically by Catalyst optimizer during query planning
See Also¶
- SubExprEvaluationRuntime - The runtime cache manager used by this proxy
- UnaryExpression - The base class that ExpressionProxy extends
- Expression evaluation and subexpression elimination in Catalyst optimizer