Days¶

Overview¶

The Days expression is a v2 partition transform that converts timestamp values to the number of days since a reference epoch. This transform is used for partitioning data by day buckets, allowing efficient querying of time-series data partitioned at the daily level.

Syntax¶

-- SQL syntax (when used in partition transforms)
PARTITIONED BY (days(timestamp_column))

// DataFrame API usage
import org.apache.spark.sql.catalyst.expressions.Days
Days(child = timestampColumn)

Arguments¶

Argument	Type	Description
child	Expression	The input expression that should evaluate to a timestamp or date value

Return Type¶

IntegerType - Returns an integer representing the number of days since the epoch.

Supported Data Types¶

TimestampType
DateType
Any expression that can be implicitly cast to timestamp or date

Algorithm¶

Extracts the timestamp or date value from the child expression
Converts the timestamp to the number of days since the Unix epoch (1970-01-01)
Truncates any time component, keeping only the date portion
Returns the day number as an integer for partitioning purposes
Handles timezone considerations based on the session's timezone settings

Partitioning Behavior¶

How this expression affects partitioning:

Preserves co-location of data within the same day
Does not require shuffle when used as a partition transform during table creation
Creates partition buckets based on day boundaries
Enables partition pruning for queries with date/timestamp predicates

Edge Cases¶

Null input values result in null output (standard null propagation)
Timestamps before Unix epoch (1970-01-01) result in negative day numbers
Leap years and daylight saving time transitions are handled according to the configured timezone
Date boundary calculations respect the session timezone configuration

Code Generation¶

This expression extends PartitionTransformExpression which typically supports Catalyst code generation for efficient evaluation. The actual computation delegates to underlying date/timestamp functions that are code-gen enabled.

Examples¶

-- Creating a table partitioned by days
CREATE TABLE events (
    id BIGINT,
    event_time TIMESTAMP,
    data STRING
) PARTITIONED BY (days(event_time))

// DataFrame API usage in partition transforms
import org.apache.spark.sql.catalyst.expressions.Days

// Used internally when defining partition transforms
val dayTransform = Days(col("event_timestamp").expr)