CreateNamedStruct¶

Overview¶

CreateNamedStruct is a Catalyst expression that creates a struct (row) value from pairs of field names and values. It takes an even number of arguments where odd-positioned arguments are field names (must be string literals) and even-positioned arguments are the corresponding field values.

Syntax¶

named_struct(name1, value1, name2, value2, ...)
struct(value1, value2, ...)  -- alias form

Arguments¶

Argument	Type	Description
name1, name2, ...	String literal	Field names for the struct (must be foldable string expressions)
value1, value2, ...	Any	Field values corresponding to each name

Return Type¶

Returns a StructType with fields defined by the provided name-value pairs. Each field's data type matches the corresponding value expression's data type.

Supported Data Types¶

Field names: Must be string literals or foldable string expressions
Field values: Any data type supported by Spark SQL
Preserves nullability of individual field expressions
Preserves metadata from NamedExpression or GetStructField inputs

Algorithm¶

Validates that the number of arguments is even and greater than 0
Groups arguments into name-value pairs
Evaluates name expressions at planning time (must be foldable)
Creates StructType with fields based on name-value pairs
At runtime, evaluates value expressions and constructs InternalRow

Partitioning Behavior¶

This expression does not affect partitioning behavior:

Preserves existing partitioning as it's a row-level transformation
Does not require shuffle operations
Can be applied within existing partitions

Edge Cases¶

Null field names: Throws DataTypeMismatch error with UNEXPECTED_NULL subclass
Non-string field names: Throws DataTypeMismatch error with CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING subclass
Odd number of arguments: Throws QueryCompilationError expecting "2n (n > 0)" arguments
Null field values: Preserves null values in the resulting struct
The struct itself is never null (nullable = false)

Code Generation¶

This expression supports Tungsten code generation. It generates optimized code that:

Creates an Object array for field values
Evaluates each value expression with null checking
Constructs a GenericInternalRow from the values array
Uses code splitting for expressions to avoid Java method size limits

Examples¶

-- Create a struct with named fields
SELECT named_struct('id', 1, 'name', 'John', 'age', 25) as person;

-- Using struct alias (field names inferred)
SELECT struct(1, 'John', 25) as person;

-- With complex expressions
SELECT named_struct('total', col1 + col2, 'ratio', col1 / col2) FROM table;

// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(struct(col("id"), col("name")).alias("person"))

// With named fields
df.select(expr("named_struct('id', id, 'name', name)").alias("person"))