OuterReference¶
Overview¶
OuterReference is a placeholder expression that holds a reference to a field resolved outside of the current execution plan. It is specifically designed for correlated subqueries where inner queries need to reference columns from outer queries during query execution.
Syntax¶
OuterReference is an internal Catalyst expression and is not directly exposed in SQL syntax. It is automatically generated by the Catalyst optimizer when resolving correlated subqueries.
Arguments¶
| Argument | Type | Description |
|---|---|---|
| e | NamedExpression | The named expression that references a field outside the current plan |
Return Type¶
Returns the same data type as the wrapped NamedExpression (e.dataType).
Supported Data Types¶
Supports all data types since it acts as a transparent wrapper around any NamedExpression, inheriting the data type of the wrapped expression.
Algorithm¶
- Acts as a transparent proxy to the wrapped NamedExpression
- Delegates all data type and nullability properties to the underlying expression
- Maintains the same expression ID and qualifier as the original expression
- Provides special SQL representation with "outer" prefix for debugging
- Marked as Unevaluable, meaning it cannot be directly evaluated and must be resolved during query planning
Partitioning Behavior¶
Since OuterReference is marked as Unevaluable:
- Does not directly affect partitioning as it cannot be evaluated
- Must be resolved to actual values before partitioning operations
- The resolved expression's partitioning behavior will determine the final impact
Edge Cases¶
- Null handling: Inherits nullability from the wrapped expression (
e.nullable) - Cannot be directly evaluated due to Unevaluable trait
- SQL representation can be overridden using SINGLE_PASS_SQL_STRING_OVERRIDE tag
- Must be resolved during query analysis phase before execution
Code Generation¶
Does not support code generation since it extends Unevaluable. OuterReference expressions must be resolved to concrete values before the code generation phase.
Examples¶
-- OuterReference is generated internally for correlated subqueries like:
SELECT * FROM table1 t1
WHERE EXISTS (SELECT 1 FROM table2 t2 WHERE t2.id = t1.id)
-- The reference to t1.id inside the subquery becomes an OuterReference
// OuterReference is typically created internally by Catalyst
// Not directly used in DataFrame API, but conceptually represents:
df1.filter(exists(df2.filter($"outer_column" === df2("inner_column"))))
See Also¶
- NamedExpression - the base trait for expressions with names
- LeafExpression - expressions with no child expressions
- Unevaluable - trait for expressions that cannot be directly evaluated
- Attribute - concrete implementations of named expressions