Skip to content

ApplyFunctionExpression

Overview

ApplyFunctionExpression is a Spark Catalyst expression that serves as a wrapper for executing user-defined scalar functions that implement the ScalarFunction interface. It provides a bridge between Spark's internal expression evaluation framework and external scalar function implementations, allowing custom functions to be seamlessly integrated into SQL queries and DataFrame operations.

Syntax

-- SQL usage (function name depends on registered scalar function)
SELECT custom_function(column1, column2) FROM table
// DataFrame API usage
import org.apache.spark.sql.catalyst.expressions.ApplyFunctionExpression
// Typically used internally by Spark when scalar functions are applied

Arguments

Argument Type Description
function ScalarFunction The scalar function implementation to be executed
children Seq[Expression] The input expressions that will be evaluated and passed as arguments to the scalar function

Return Type

The return type is determined by the ScalarFunction's declared output data type via its outputDataType() method. This can be any valid Spark SQL data type including primitive types, complex types (arrays, maps, structs), or user-defined types.

Supported Data Types

Supports all Spark SQL data types as input arguments, including: - Primitive types (IntegerType, StringType, DoubleType, BooleanType, etc.) - Temporal types (TimestampType, DateType) - Complex types (ArrayType, MapType, StructType) - Binary data (BinaryType) - Decimal types (DecimalType)

The specific supported input types depend on the implementation of the wrapped ScalarFunction.

Algorithm

  • Evaluates all child expressions to obtain input values for the scalar function
  • Converts Spark internal data representations to the format expected by the scalar function
  • Invokes the ScalarFunction.produceResult() method with the prepared input arguments
  • Converts the function result back to Spark's internal data representation
  • Returns the converted result value

Partitioning Behavior

  • Preserves partitioning: This expression operates on individual rows and does not require data movement between partitions
  • No shuffle required: Evaluation is performed locally on each partition
  • Partition-agnostic: The same scalar function logic is applied consistently across all partitions

Edge Cases

  • Null handling: Null propagation behavior depends on the ScalarFunction implementation; by default, null inputs may produce null outputs unless explicitly handled
  • Empty input behavior: When child expressions evaluate to empty/null, the scalar function receives null values as arguments
  • Type coercion: Input arguments are automatically coerced to match the expected input types declared by the ScalarFunction
  • Exception handling: Runtime exceptions thrown by the scalar function will propagate and cause task failure

Code Generation

This expression extends CodegenFallback, which means it does not support Tungsten code generation and will always fall back to interpreted evaluation mode. Each row evaluation will involve method calls to the ScalarFunction implementation rather than generated bytecode.

Examples

-- Example with a hypothetical registered scalar function
SELECT my_custom_hash(name, id) FROM users;

-- Using with complex types
SELECT my_array_processor(ARRAY(1, 2, 3)) FROM table;
// Example DataFrame API usage (internal)
import org.apache.spark.sql.catalyst.expressions.ApplyFunctionExpression
import org.apache.spark.sql.catalyst.expressions.Literal
import org.apache.spark.sql.connector.catalog.functions.ScalarFunction

// This would typically be created internally by Spark's analyzer
val scalarFunc: ScalarFunction = new MyCustomScalarFunction()
val inputs = Seq(Literal("input1"), Literal(42))
val applyExpr = ApplyFunctionExpression(scalarFunc, inputs)

See Also

  • ScalarFunction - The interface that defines scalar function implementations
  • UnresolvedFunction - Expression used during parsing before function resolution
  • CallMethodViaReflection - Alternative approach for calling external methods
  • StaticInvoke - Expression for calling static methods with code generation support