Coalesce¶

Overview¶

The Coalesce expression returns the first non-null value from a sequence of expressions, evaluating arguments from left to right. It implements short-circuit evaluation, stopping as soon as a non-null value is found. If all expressions evaluate to null, the result is null.

Syntax¶

COALESCE(expr1, expr2, ..., exprN)

// DataFrame API
coalesce(col("expr1"), col("expr2"), ..., col("exprN"))

Arguments¶

Argument	Type	Description
children	Seq[Expression]	A sequence of expressions to evaluate, requires at least one expression

Return Type¶

Returns the same data type as the input expressions. All input expressions must have the same data type.

Supported Data Types¶

All Spark SQL data types are supported, including:

Numeric types (IntegerType, LongType, DoubleType, DecimalType, etc.)
String types (StringType)
Date and timestamp types (DateType, TimestampType)
Complex types (ArrayType, MapType, StructType)
Binary types (BinaryType)

All input expressions must be of the same type.

Algorithm¶

Expressions are evaluated sequentially from left to right
Evaluation stops immediately when the first non-null value is encountered (short-circuit evaluation)
Only the first expression is always evaluated; subsequent expressions are conditionally evaluated
Returns the first non-null value found, or null if all expressions are null
Uses ComplexTypeMergingExpression to handle type resolution across all children

Partitioning Behavior¶

Coalesce does not affect data partitioning:

Preserves existing partitioning scheme
Does not require data shuffle
Operates as a row-level transformation within each partition

Edge Cases¶

Null handling: Returns null only if all input expressions evaluate to null
Single argument: If only one expression is provided, returns that expression's value
Empty input: Throws QueryCompilationErrors.wrongNumArgsError if no arguments provided
Type mismatch: Throws type check error if input expressions have different data types
Nullable result: Result is nullable only if all input expressions are nullable

Code Generation¶

This expression supports Tungsten code generation with optimized implementations:

Generates efficient Java code using do-while loops with continue statements
Implements short-circuit evaluation in generated code
Uses splitExpressionsWithCurrentInputs for handling large numbers of expressions
Falls back to interpreted eval() method when code generation is not available

Examples¶

-- Basic usage
SELECT COALESCE(null, 'hello', 'world') AS result;
-- Returns: 'hello'

-- With column references
SELECT COALESCE(col1, col2, 'default') FROM table;

-- Numeric example
SELECT COALESCE(null, null, 42, 100) AS result;
-- Returns: 42

// DataFrame API usage
import org.apache.spark.sql.functions.coalesce

df.select(coalesce($"col1", $"col2", lit("default")))

// Multiple columns with fallback
df.withColumn("result", coalesce($"primary", $"secondary", lit(0)))