BoundReference¶

Overview¶

BoundReference is a leaf expression that represents a bound reference to a specific slot (column position) in an input tuple. It provides efficient access to column values by directly referencing their ordinal position after all column transformations like pruning have been completed.

Syntax¶

BoundReference is an internal Catalyst expression not directly accessible through SQL syntax. It's created during query planning and optimization phases.

BoundReference(ordinal: Int, dataType: DataType, nullable: Boolean)

Arguments¶

Argument	Type	Description
ordinal	Int	The zero-based column position in the input tuple
dataType	DataType	The expected data type of the referenced column
nullable	Boolean	Whether the referenced column can contain null values

Return Type¶

Returns the data type specified in the dataType parameter. The actual return type matches the column's data type at the specified ordinal position.

Supported Data Types¶

Supports all Spark SQL data types including:

Primitive types (Boolean, Byte, Short, Int, Long, Float, Double)
String and binary types
Date and timestamp types
Complex types (Array, Map, Struct)
Decimal types
User-defined types

Algorithm¶

Uses InternalRow.getAccessor() to create a type-specific accessor function for efficient value retrieval
Direct ordinal-based lookup in the input row without name resolution
Leverages specialized getters for primitive types when using UnsafeRow format
Assumes the input tuple layout is finalized (no further column pruning will occur)
Caches the accessor function to avoid repeated creation overhead

Partitioning Behavior¶

BoundReference is partition-preserving:

Does not require shuffle operations
Maintains existing data partitioning
Operates locally on each partition independently

Edge Cases¶

Null handling: Respects the nullable flag and properly propagates null values using the type-specific accessor
Invalid ordinal: May throw IndexOutOfBoundsException if ordinal exceeds tuple width
Type mismatch: Assumes the actual column type matches the specified dataType parameter
Code generation fallback: Falls back to interpreted evaluation if code generation context is not available

Code Generation¶

Fully supports Tungsten code generation with two execution paths:

With currentVars: Reuses existing generated code variables when available
Without currentVars: Generates direct row access code using CodeGenerator.getValue()
Generates optimized null-checking code only when the column is nullable
Uses Java primitive types for better performance when possible

Examples¶

-- BoundReference is not directly accessible in SQL
-- It's created internally during query planning
SELECT column1, column2 FROM table1;

// Internal usage during Catalyst expression binding
val boundRef = BoundReference(0, IntegerType, nullable = true)
// Evaluates to the value at position 0 in the input row
val result = boundRef.eval(inputRow)