Decode¶

Overview¶

The Decode expression implements a SQL CASE-like functionality that compares an input expression against multiple search values and returns the corresponding result value when a match is found. It serves as a runtime replaceable expression that gets transformed into a more optimized internal representation during query planning.

Syntax¶

DECODE(expr, search1, result1 [, search2, result2] ... [, default])

// DataFrame API usage would be through SQL expression or functions
df.selectExpr("DECODE(column_name, 'value1', 'result1', 'value2', 'result2', 'default')")

Arguments¶

Argument	Type	Description
expr	Expression	The input expression to be compared against search values
search	Expression	Search value(s) to match against the input expression
result	Expression	Result value(s) to return when corresponding search value matches
default	Expression (Optional)	Default value to return when no search values match

Return Type¶

The return type is determined by the common type of all result expressions and the optional default value. The expression performs type coercion to find a compatible return type among all possible result values.

Supported Data Types¶

The Decode expression supports all Spark SQL data types for input and comparison:

Primitive types (numeric, string, boolean, binary)
Complex types (array, map, struct)
Temporal types (date, timestamp)
Null types

Algorithm¶

The expression takes a variable number of parameters in groups of (search, result) pairs
During analysis, it gets transformed by Decode.createExpr() into an optimized internal representation
The evaluation compares the input expression against each search value in sequence
Returns the corresponding result value for the first matching search value
If no search values match and a default is provided, returns the default value
If no matches and no default, returns null

Partitioning Behavior¶

The Decode expression preserves partitioning characteristics:

Does not require data shuffling as it operates row-by-row
Maintains existing partitioning scheme since it's a deterministic transformation
Can be pushed down to individual partitions for parallel execution

Edge Cases¶

Null input expression matches only null search values using null-safe equality
Empty parameter list results in compilation error
Odd number of parameters (excluding first expression) uses the last parameter as default
Type mismatches between result expressions trigger type coercion to a common type
If type coercion fails, the expression may throw analysis exceptions

Code Generation¶

As a RuntimeReplaceable expression, Decode itself does not generate code directly. Instead, it gets replaced during analysis phase by Decode.createExpr() with an optimized expression tree (likely a CaseWhen expression) that supports Tungsten code generation for efficient evaluation.

Examples¶

-- Basic decode with default
SELECT DECODE(status, 'A', 'Active', 'I', 'Inactive', 'Unknown') FROM users;

-- Decode with numeric values
SELECT DECODE(grade, 1, 'Poor', 2, 'Fair', 3, 'Good', 4, 'Excellent') FROM reviews;

-- Decode without default (returns null for non-matches)
SELECT DECODE(category, 'TECH', 'Technology', 'BIZ', 'Business') FROM articles;

// Example DataFrame API usage
import org.apache.spark.sql.functions.expr

df.select(expr("DECODE(status_code, 200, 'OK', 404, 'Not Found', 500, 'Error', 'Unknown')"))

// Using with column references
df.selectExpr("DECODE(department, 'ENG', 'Engineering', 'MKT', 'Marketing', 'Other')")