Months¶

Overview¶

The Months expression is a v2 partition transform that extracts the month component from temporal data for partitioning purposes. This transform is used in Spark's DataSource v2 API to create month-based partitions, allowing for efficient time-based data organization and querying.

Syntax¶

months(column_name)

// DataFrame API usage in partition transforms
months(col("date_column"))

Arguments¶

Argument	Type	Description
child	Expression	The input expression, typically a date or timestamp column

Return Type¶

IntegerType - Returns an integer representing the month component.

Supported Data Types¶

DateType (date columns)
TimestampType (timestamp columns)
TimestampNTZType (timestamp without timezone columns)

Algorithm¶

Extracts the month component from the input temporal expression
Converts the month value to an integer representation
Returns the month as an integer value (typically 1-12 for standard calendar months)
Inherits the partition transform evaluation logic from PartitionTransformExpression
Supports expression tree transformation through withNewChildInternal

Partitioning Behavior¶

How this expression affects partitioning:

Creates month-based partitions for efficient temporal data organization
Does not preserve existing partitioning schemes - establishes new month-based partitioning
Does not require shuffle when used as a partition transform during table creation
Enables partition pruning for queries with month-based predicates

Edge Cases¶

Null input values are handled according to Spark's standard null propagation rules
Invalid date/timestamp formats in the child expression will propagate errors
Month extraction behavior depends on the configured timezone for timestamp operations
Leap year considerations are handled by the underlying temporal extraction logic

Code Generation¶

This expression extends PartitionTransformExpression which typically supports Tungsten code generation for efficient evaluation, though specific codegen implementation depends on the parent class implementation.

Examples¶

-- Example SQL usage in table creation
CREATE TABLE events_table (
  id INT,
  event_date DATE,
  data STRING
) 
PARTITIONED BY (months(event_date))

// Example DataFrame API usage
import org.apache.spark.sql.connector.expressions.Expressions._
val partitionTransform = months(col("event_date"))

// Used in DataSource v2 partition specifications
val partitioning = Seq(months(FieldReference("event_date")))