If¶
Overview¶
The If expression is a conditional expression that evaluates a boolean predicate and returns one of two values based on the result. It implements the standard ternary conditional logic where a true predicate returns the first value, and a false predicate returns the second value.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| predicate | Boolean | The condition to evaluate |
| trueValue | Any | The value to return if the predicate is true |
| falseValue | Any | The value to return if the predicate is false |
Return Type¶
Returns the unified data type of the trueValue and falseValue expressions after type merging. The return type is nullable if either the true or false value expressions are nullable.
Supported Data Types¶
- Predicate must be of BooleanType
- True and false value expressions must have compatible types that can be merged through type coercion
- All primitive and complex data types are supported for the value expressions
Algorithm¶
- The predicate expression is always evaluated first
- If the predicate evaluates to
true, the trueValue expression is evaluated and returned - If the predicate evaluates to
falseornull, the falseValue expression is evaluated and returned - Only one of the value expressions is evaluated per row (lazy evaluation)
- Type coercion is applied to ensure both value expressions have the same data type
Partitioning Behavior¶
The If expression preserves partitioning characteristics:
- Does not require shuffle operations
- Maintains existing partitioning schemes
- Does not affect data distribution across partitions
Edge Cases¶
- If the predicate is
null, the expression returns the false value - The result is
nullonly if the selected branch (true or false value) evaluates tonull - Type validation ensures both value expressions can be coerced to a common type
- The predicate must be exactly BooleanType - no implicit conversion from other types
Code Generation¶
This expression supports full Tungsten code generation through the doGenCode method. It generates efficient Java code that:
- Evaluates the predicate condition first
- Uses conditional branching to evaluate only the selected value expression
- Properly handles null values and type conversions
Examples¶
-- Basic conditional logic
SELECT IF(age >= 18, 'Adult', 'Minor') AS category FROM users;
-- Null handling
SELECT IF(score IS NOT NULL, score, 0) AS final_score FROM tests;
-- Nested conditions
SELECT IF(status = 'ACTIVE', IF(premium, 'Premium User', 'Regular User'), 'Inactive') FROM accounts;
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(when(col("age") >= 18, "Adult").otherwise("Minor").as("category"))
// Using expr for IF function
df.select(expr("IF(score IS NOT NULL, score, 0)").as("final_score"))
See Also¶
- CaseWhen - For multiple conditional branches
- Coalesce - For null value handling
- When/Otherwise - DataFrame API equivalent for conditional logic