LambdaFunction¶

Overview¶

LambdaFunction represents a lambda function expression along with its arguments in Spark Catalyst expressions. It encapsulates a function expression and its bound variables, enabling higher-order functions to process expressions with local variable scoping. Lambda functions can be marked as hidden for internal bookkeeping within higher-order functions when processing independent expressions.

Syntax¶

-- Used internally within higher-order functions like transform, filter, etc.
transform(array_col, x -> x + 1)
filter(array_col, x -> x > 0)

// DataFrame API - used internally, not directly instantiated by users
LambdaFunction(function, arguments, hidden)

Arguments¶

Argument	Type	Description
function	Expression	The expression that represents the lambda function body
arguments	Seq[NamedExpression]	The lambda function parameters/variables
hidden	Boolean	Whether the lambda function is hidden for internal bookkeeping (default: false)

Return Type¶

The return type matches the data type of the underlying function expression. The LambdaFunction itself acts as a wrapper and returns whatever data type its function expression produces.

Supported Data Types¶

LambdaFunction supports all data types since it acts as a wrapper around the underlying function expression. The supported data types depend entirely on what the wrapped function expression can handle.

Algorithm¶

Wraps a function expression along with its parameter definitions to create scoped variable binding
Evaluates by directly delegating to the underlying function expression's eval method
Manages variable scoping by filtering out lambda argument references from the function's reference set
Tracks binding state by checking if all lambda arguments are resolved
Uses CodegenFallback, meaning it falls back to interpreted evaluation mode

Partitioning Behavior¶

LambdaFunction itself does not directly affect partitioning behavior:

Partitioning impact depends on the higher-order function that contains the lambda
The lambda function wrapper preserves whatever partitioning behavior the underlying function expression has
No shuffle is introduced by the LambdaFunction wrapper itself

Edge Cases¶

Null handling behavior is delegated to the underlying function expression
When not resolved, falls back to default reference calculation behavior
Variable scope isolation: lambda arguments are excluded from external references when resolved
Hidden lambdas are used for internal processing without exposing lambda semantics to users
Binding validation ensures all arguments are resolved before the lambda is considered bound

Code Generation¶

This expression uses CodegenFallback, which means it does not support Tungsten code generation and always falls back to interpreted evaluation mode for safety and simplicity.

Examples¶

-- Lambda functions are used internally in higher-order functions
SELECT transform(array(1, 2, 3), x -> x * 2) as doubled;
SELECT filter(array(1, 2, 3, 4), x -> x % 2 = 0) as evens;
SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x) as sum;

// Internal representation (not directly used by end users)
import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.types._

// Example internal structure for transform(array, x -> x + 1)
val lambdaVar = UnresolvedNamedLambdaVariable(Seq("x"))
val addExpr = Add(lambdaVar, Literal(1))
val lambda = LambdaFunction(addExpr, Seq(lambdaVar), hidden = false)