Months¶
Overview¶
The Months expression is a v2 partition transform that extracts the month component from temporal data for partitioning purposes. This transform is used in Spark's DataSource v2 API to create month-based partitions, allowing for efficient time-based data organization and querying.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| child | Expression | The input expression, typically a date or timestamp column |
Return Type¶
IntegerType - Returns an integer representing the month component.
Supported Data Types¶
- DateType (date columns)
- TimestampType (timestamp columns)
- TimestampNTZType (timestamp without timezone columns)
Algorithm¶
- Extracts the month component from the input temporal expression
- Converts the month value to an integer representation
- Returns the month as an integer value (typically 1-12 for standard calendar months)
- Inherits the partition transform evaluation logic from
PartitionTransformExpression - Supports expression tree transformation through
withNewChildInternal
Partitioning Behavior¶
How this expression affects partitioning:
- Creates month-based partitions for efficient temporal data organization
- Does not preserve existing partitioning schemes - establishes new month-based partitioning
- Does not require shuffle when used as a partition transform during table creation
- Enables partition pruning for queries with month-based predicates
Edge Cases¶
- Null input values are handled according to Spark's standard null propagation rules
- Invalid date/timestamp formats in the child expression will propagate errors
- Month extraction behavior depends on the configured timezone for timestamp operations
- Leap year considerations are handled by the underlying temporal extraction logic
Code Generation¶
This expression extends PartitionTransformExpression which typically supports Tungsten code generation for efficient evaluation, though specific codegen implementation depends on the parent class implementation.
Examples¶
-- Example SQL usage in table creation
CREATE TABLE events_table (
id INT,
event_date DATE,
data STRING
)
PARTITIONED BY (months(event_date))
// Example DataFrame API usage
import org.apache.spark.sql.connector.expressions.Expressions._
val partitionTransform = months(col("event_date"))
// Used in DataSource v2 partition specifications
val partitioning = Seq(months(FieldReference("event_date")))
See Also¶
Years- Year-based partition transformDays- Day-based partition transformHours- Hour-based partition transformPartitionTransformExpression- Base class for partition transforms- DataSource v2 partitioning documentation