Decode¶
Overview¶
The Decode expression implements a SQL CASE-like functionality that compares an input expression against multiple search values and returns the corresponding result value when a match is found. It serves as a runtime replaceable expression that gets transformed into a more optimized internal representation during query planning.
Syntax¶
// DataFrame API usage would be through SQL expression or functions
df.selectExpr("DECODE(column_name, 'value1', 'result1', 'value2', 'result2', 'default')")
Arguments¶
| Argument | Type | Description |
|---|---|---|
| expr | Expression | The input expression to be compared against search values |
| search | Expression | Search value(s) to match against the input expression |
| result | Expression | Result value(s) to return when corresponding search value matches |
| default | Expression (Optional) | Default value to return when no search values match |
Return Type¶
The return type is determined by the common type of all result expressions and the optional default value. The expression performs type coercion to find a compatible return type among all possible result values.
Supported Data Types¶
The Decode expression supports all Spark SQL data types for input and comparison:
- Primitive types (numeric, string, boolean, binary)
- Complex types (array, map, struct)
- Temporal types (date, timestamp)
- Null types
Algorithm¶
- The expression takes a variable number of parameters in groups of (search, result) pairs
- During analysis, it gets transformed by
Decode.createExpr()into an optimized internal representation - The evaluation compares the input expression against each search value in sequence
- Returns the corresponding result value for the first matching search value
- If no search values match and a default is provided, returns the default value
- If no matches and no default, returns null
Partitioning Behavior¶
The Decode expression preserves partitioning characteristics:
- Does not require data shuffling as it operates row-by-row
- Maintains existing partitioning scheme since it's a deterministic transformation
- Can be pushed down to individual partitions for parallel execution
Edge Cases¶
- Null input expression matches only null search values using null-safe equality
- Empty parameter list results in compilation error
- Odd number of parameters (excluding first expression) uses the last parameter as default
- Type mismatches between result expressions trigger type coercion to a common type
- If type coercion fails, the expression may throw analysis exceptions
Code Generation¶
As a RuntimeReplaceable expression, Decode itself does not generate code directly. Instead, it gets replaced during analysis phase by Decode.createExpr() with an optimized expression tree (likely a CaseWhen expression) that supports Tungsten code generation for efficient evaluation.
Examples¶
-- Basic decode with default
SELECT DECODE(status, 'A', 'Active', 'I', 'Inactive', 'Unknown') FROM users;
-- Decode with numeric values
SELECT DECODE(grade, 1, 'Poor', 2, 'Fair', 3, 'Good', 4, 'Excellent') FROM reviews;
-- Decode without default (returns null for non-matches)
SELECT DECODE(category, 'TECH', 'Technology', 'BIZ', 'Business') FROM articles;
// Example DataFrame API usage
import org.apache.spark.sql.functions.expr
df.select(expr("DECODE(status_code, 200, 'OK', 404, 'Not Found', 500, 'Error', 'Unknown')"))
// Using with column references
df.selectExpr("DECODE(department, 'ENG', 'Engineering', 'MKT', 'Marketing', 'Other')")
See Also¶
CaseWhen- The underlying expression thatDecodetypically gets transformed intoWhen- For building conditional expressions in DataFrame APICoalesce- For handling null values with fallback logic