Skip to content

IsNull

Overview

The IsNull expression is a predicate function that tests whether a given expression evaluates to null. It returns true if the input expression is null, and false otherwise. This is a fundamental null-checking operation in Spark SQL.

Syntax

expression IS NULL
// DataFrame API
col("column_name").isNull

Arguments

Argument Type Description
child Expression The expression to test for null value

Return Type

Boolean - returns true if the input expression is null, false otherwise.

Supported Data Types

All data types are supported since null checking is a universal operation that can be applied to any expression regardless of its data type:

  • Numeric types (IntegerType, LongType, DoubleType, DecimalType, etc.)
  • String types (StringType, BinaryType)
  • Date and time types (DateType, TimestampType)
  • Complex types (ArrayType, MapType, StructType)
  • Boolean type

Algorithm

The evaluation follows these steps:

  • Evaluate the child expression against the input row
  • Compare the result directly with null using == operator
  • Return true if the evaluation result equals null, false otherwise
  • The result itself is never null (nullable = false)
  • Uses context-independent foldability from the child expression for optimization

Partitioning Behavior

This expression preserves partitioning characteristics:

  • Does not require shuffle operations
  • Can be pushed down to individual partitions
  • Maintains existing data distribution since it's a row-level predicate
  • Does not affect partition boundaries or require data movement

Edge Cases

  • The expression itself never returns null (nullable = false)
  • If the child expression cannot be evaluated due to errors, the error propagates up
  • Works correctly with complex nested data types where null checking applies to the outer structure
  • Handles all Spark internal data types consistently through the InternalRow interface

Code Generation

This expression supports Tungsten code generation for optimal performance:

  • Implements doGenCode method for wholestage code generation
  • Generated code directly uses the child expression's isNull flag as the result value
  • Avoids interpreted evaluation overhead in tight loops
  • Sets isNull = FalseLiteral since the result is never null

Examples

-- Basic null checking
SELECT name IS NULL FROM users;

-- In WHERE clause
SELECT * FROM products WHERE price IS NULL;

-- Example from documentation
SELECT IsNull(1);
-- Returns: false
// DataFrame API usage
import org.apache.spark.sql.functions._

df.filter(col("name").isNull)
df.select(col("price").isNull.as("price_is_null"))

See Also

  • IsNotNull - Tests whether an expression is not null
  • Coalesce - Returns the first non-null expression
  • NullIf - Returns null if two expressions are equal
  • IfNull - Returns alternative value if expression is null