ParseToTimestamp¶
Overview¶
ParseToTimestamp is a Spark Catalyst expression that converts string, date, timestamp, or numeric values to a timestamp data type. It supports optional format specifications for parsing string inputs and provides timezone-aware conversion capabilities with configurable error handling behavior.
Syntax¶
// DataFrame API usage
df.select(to_timestamp($"timestamp_column"))
df.select(to_timestamp($"timestamp_column", "yyyy-MM-dd HH:mm:ss"))
Arguments¶
| Argument | Type | Description |
|---|---|---|
| left | Expression | The input expression to convert to timestamp |
| format | Option[Expression] | Optional format string for parsing input |
| dataType | DataType | Target timestamp data type |
| timeZoneId | Option[String] | Optional timezone identifier for conversion |
| failOnError | Boolean | Whether to fail on conversion errors (defaults to ANSI mode setting) |
Return Type¶
Returns a timestamp data type as specified by the dataType parameter, typically TimestampType or TimestampNTZType.
Supported Data Types¶
- StringType with collation support (including trim collation)
- DateType
- TimestampType
- TimestampNTZType
- NumericType (only when target dataType is TimestampType)
Algorithm¶
- When format is specified, delegates to
GetTimestampexpression with the provided format string - When no format is provided, performs a
Castoperation to the target timestamp type - Applies timezone conversion using the specified
timeZoneIdif provided - Handles error scenarios based on the
failOnErrorflag (ANSI compliance) - Implements
RuntimeReplaceablepattern, replacing itself with appropriate underlying expressions during analysis
Partitioning Behavior¶
This expression preserves partitioning behavior:
- Does not require data shuffle as it operates on individual rows
- Maintains existing partition boundaries
- Can be applied within partition operations without affecting distribution
Edge Cases¶
- Null inputs are handled gracefully and typically return null outputs
- Invalid format strings will cause runtime errors when
failOnErroris true - Unparseable timestamp strings behavior depends on ANSI mode settings
- Numeric inputs are interpreted as seconds since epoch when converting to TimestampType
- Timezone conversion edge cases (DST transitions) are handled according to Java timezone rules
Code Generation¶
This expression supports Tungsten code generation through its replacement expressions (GetTimestamp and Cast). The actual code generation is delegated to the underlying replacement expression rather than being implemented directly.
Examples¶
-- Basic timestamp parsing
SELECT to_timestamp('2016-12-31 00:00:00');
-- With custom format
SELECT to_timestamp('12/31/2016 00:00:00', 'MM/dd/yyyy HH:mm:ss');
-- Converting date to timestamp
SELECT to_timestamp(current_date());
// DataFrame API usage
import org.apache.spark.sql.functions._
// Basic conversion
df.select(to_timestamp($"timestamp_str"))
// With format specification
df.select(to_timestamp($"date_str", "yyyy-MM-dd"))
// Converting numeric epoch seconds
df.select(to_timestamp($"epoch_seconds"))
See Also¶
GetTimestamp- Underlying expression for formatted parsingCast- Underlying expression for unformatted conversionParseToDate- Similar expression for date parsingUnixTimestamp- Converting to Unix timestamp format