Greatest¶

Overview¶

The Greatest expression returns the largest value among all provided arguments. It performs element-wise comparison using the data type's natural ordering to determine the maximum value across multiple expressions.

Syntax¶

SELECT greatest(expr1, expr2, ..., exprN)

// DataFrame API
import org.apache.spark.sql.functions.greatest
df.select(greatest(col("col1"), col("col2"), col("col3")))

Arguments¶

Argument	Type	Description
children	Seq[Expression]	Variable number of expressions (minimum 2) to compare for finding the maximum value

Return Type¶

Returns the same data type as the input expressions after type coercion. All input expressions must be coercible to a common type.

Supported Data Types¶

All data types that support ordering comparison:

Numeric types (Byte, Short, Integer, Long, Float, Double, Decimal)
String types
Date and Timestamp types
Boolean type
Binary type (lexicographic ordering)

Algorithm¶

Performs left-fold operation across all child expressions
Evaluates each expression against the current input row
Compares non-null values using the data type's interpreted ordering
Updates the result when a larger value is found
Preserves null values according to null handling rules

Partitioning Behavior¶

The Greatest expression does not affect partitioning:

Preserves existing partitioning as it operates row-by-row
Does not require data shuffle since it's a per-row computation
Can be pushed down to individual partitions independently

Edge Cases¶

Null handling: If all arguments are null, returns null. If some arguments are null, ignores null values and returns the greatest non-null value
Single argument: Throws QueryCompilationErrors.wrongNumArgsError - requires more than one argument
Type mismatch: Throws DataTypeMismatch error if input types cannot be coerced to a common type
Non-orderable types: Validation fails for complex types that don't support ordering (arrays, maps, structs without natural ordering)

Code Generation¶

This expression supports Tungsten code generation with optimizations:

Generates efficient Java code for comparison operations
Uses ctx.reassignIfGreater for optimized comparison logic
Implements expression splitting for handling large numbers of arguments
Falls back to interpreted mode only if code generation context limits are exceeded

Examples¶

-- Basic numeric comparison
SELECT greatest(10, 9, 2, 4, 3);
-- Returns: 10

-- String comparison
SELECT greatest('apple', 'banana', 'cherry');
-- Returns: 'cherry'

-- With null values
SELECT greatest(1, NULL, 3, NULL, 2);
-- Returns: 3

-- Date comparison
SELECT greatest('2023-01-01', '2023-12-31', '2023-06-15');
-- Returns: '2023-12-31'

// DataFrame API usage
import org.apache.spark.sql.functions.greatest

// Numeric columns
df.select(greatest($"price1", $"price2", $"price3"))

// Mixed with literals
df.select(greatest($"score", lit(100)))

// String comparison
df.select(greatest($"name1", $"name2", $"name3"))