StructsToJson¶

Overview¶

The StructsToJson expression converts Spark SQL struct data types into their JSON string representation. This expression enables serialization of complex nested data structures into JSON format for storage, transmission, or interoperability with external systems.

Syntax¶

to_json(struct_column [, options_map])

// DataFrame API
import org.apache.spark.sql.functions.to_json
df.select(to_json($"struct_column"))

Arguments¶

Argument	Type	Description
child	StructType or ArrayType	The struct or array column to convert to JSON
options	Map[String, String] (optional)	JSON serialization options like dateFormat, timestampFormat, etc.

Return Type¶

StringType - Returns a JSON string representation of the input struct or array.

Supported Data Types¶

StructType (primary use case)
ArrayType containing structs
Nested combinations of structs and arrays
All primitive types within structs (StringType, IntegerType, DoubleType, BooleanType, etc.)
TimestampType and DateType (with configurable formatting)

Algorithm¶

Traverses the input struct or array recursively to build JSON representation
Applies configured formatting options for dates and timestamps during serialization
Handles nested structures by recursively converting child elements
Uses Jackson library internally for efficient JSON generation
Preserves field names and maintains structural hierarchy in output

Partitioning Behavior¶

How this expression affects partitioning:

Preserves existing partitioning as it operates row-wise without requiring data movement
Does not require shuffle operations
Can be executed locally on each partition independently

Edge Cases¶

Null handling: Returns null when input struct is null
Empty structs: Converts to empty JSON object "{}"
Empty arrays: Converts to empty JSON array "[]"
Special float values: NaN and Infinity are serialized as JSON strings
Nested nulls: Null fields within structs are included in JSON with null values
Malformed options: Invalid date/timestamp format options may cause runtime errors

Code Generation¶

This expression supports Whole-Stage Code Generation (Tungsten) for optimized performance. Falls back to interpreted mode only when code generation limits are exceeded or for extremely complex nested structures.

Examples¶

-- Basic struct to JSON conversion
SELECT to_json(struct(name, age, city)) as json_data
FROM users;

-- Array of structs to JSON
SELECT to_json(array(struct(1 as a), struct(2 as a))) as json_array;

-- With formatting options
SELECT to_json(struct(event_time, data), map('timestampFormat', 'yyyy-MM-dd HH:mm:ss')) 
FROM events;

// DataFrame API examples
import org.apache.spark.sql.functions._

// Basic conversion
df.select(to_json(struct($"name", $"age", $"city")).as("json_data"))

// With options
val options = Map("timestampFormat" -> "yyyy-MM-dd HH:mm:ss")
df.select(to_json(struct($"event_time", $"data"), options))

// Converting array column
df.select(to_json($"array_of_structs"))