SubtractTimestamps¶
Overview¶
The SubtractTimestamps expression computes the difference between two timestamp values. Depending on configuration, it returns either a legacy CalendarInterval (with microseconds field only) or a DayTimeInterval representing the time difference in microseconds.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| left | Expression | The end timestamp (minuend) |
| right | Expression | The start timestamp (subtrahend) |
| legacyInterval | Boolean | Whether to return CalendarInterval (true) or DayTimeInterval (false) |
| timeZoneId | Option[String] | Optional timezone identifier for timestamp interpretation |
Return Type¶
CalendarIntervalwhenlegacyIntervalis true (controlled byspark.sql.legacy.interval.enabled)DayTimeIntervalTypewhenlegacyIntervalis false
Supported Data Types¶
- Input types: Any timestamp type (
TimestampType,TimestampNTZType) - Both operands must be timestamp types
Algorithm¶
-
Converts both timestamp expressions to microsecond values (Long)
-
In legacy mode: Creates a
CalendarIntervalwith months=0, days=0, and microseconds set to the difference -
In non-legacy mode: Calls
subtractTimestampsutility method with timezone awareness to computeDayTimeInterval -
Uses timezone information from the left operand's data type for evaluation
-
Performs null-safe evaluation, returning null if either operand is null
Partitioning Behavior¶
- Preserves partitioning as it operates on individual rows without requiring data movement
- Does not require shuffle operations
- Can be pushed down to individual partitions for parallel execution
Edge Cases¶
-
Null handling: Returns null if either timestamp operand is null (null-intolerant behavior)
-
Timezone handling: Uses timezone from left operand's data type; respects explicit timeZoneId parameter
-
Overflow: Large timestamp differences may cause microsecond overflow in the resulting interval
-
Negative intervals: When left timestamp is earlier than right timestamp, produces negative interval values
Code Generation¶
Supports Tungsten code generation for optimized execution:
- Legacy mode: Generates inline
CalendarIntervalconstructor code - Non-legacy mode: Generates call to
DateTimeUtils.subtractTimestampswith timezone parameter - Falls back to interpreted mode only if code generation context limitations are reached
Examples¶
-- Basic timestamp subtraction
SELECT end_time - start_time AS duration
FROM events;
-- With explicit timestamps
SELECT TIMESTAMP '2023-12-25 10:30:00' - TIMESTAMP '2023-12-25 09:15:30' AS time_diff;
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(col("end_timestamp") - col("start_timestamp") as "duration")
// With literal timestamps
df.select(
(lit("2023-12-25 10:30:00").cast("timestamp") -
lit("2023-12-25 09:15:30").cast("timestamp")) as "duration"
)
See Also¶
DateAdd- Add intervals to datesTimestampAdd- Add intervals to timestampsDateDiff- Calculate date differencesCalendarInterval- Legacy interval representationDayTimeIntervalType- Modern interval type