Skip to content

GetArrayStructFields

Overview

GetArrayStructFields extracts a specific field from all struct elements within an array, returning a new array containing only those field values. This expression is used internally by Spark's Catalyst optimizer when accessing nested fields in arrays of structs, providing an efficient way to project columns from complex nested data structures.

Syntax

array_column.field_name

Arguments

Argument Type Description
child Expression The input expression that evaluates to an array of structs
field StructField The struct field definition containing name and data type information
ordinal Int The zero-based index position of the field within the struct
numFields Int The total number of fields in the struct type
containsNull Boolean Whether the resulting array can contain null values

Return Type

Returns an ArrayType where the element type matches the data type of the extracted field. The array's containsNull property is determined by the containsNull parameter.

Supported Data Types

This expression requires the input to be an ArrayType containing StructType elements. The extracted field can be of any supported Spark SQL data type including primitives, complex types, and nested structures.

Algorithm

  • Validates that the child expression produces an array of structs during type checking
  • For each element in the input array, checks if the array element itself is null
  • If the array element is not null, extracts the struct and checks if the target field is null
  • Retrieves the field value using the specified ordinal position and field data type
  • Constructs a new GenericArrayData containing all extracted field values

Partitioning Behavior

This expression preserves partitioning as it performs element-wise transformations without requiring data redistribution:

  • Does not require shuffle operations
  • Maintains the same number of rows and partitioning scheme
  • Can be executed independently on each partition

Edge Cases

  • Null array elements: When an array element is null, the corresponding position in the result array is set to null
  • Null field values: When the extracted field within a struct is null, null is placed in the result array
  • Empty arrays: Returns an empty array of the appropriate type
  • Field nullability: Respects the nullable property of the field definition for both evaluation and code generation

Code Generation

This expression supports Tungsten code generation through the doGenCode method. It generates optimized Java code that avoids object creation overhead and provides better performance compared to interpreted evaluation. The generated code includes null-safe evaluation logic that conditionally checks field nullability.

Examples

-- Extract 'name' field from array of person structs
SELECT persons.name FROM table_with_person_array;

-- Access nested field in array of structs
SELECT addresses.street FROM customers;
// DataFrame API usage (internal - not directly exposed)
// This expression is typically generated by Catalyst when optimizing
// operations like: df.select($"array_col.field_name")

See Also

  • ExtractValue - Parent trait for value extraction expressions
  • GetStructField - For extracting fields from individual structs
  • GetArrayItem - For extracting elements from arrays by index
  • UnaryExpression - Base class for single-child expressions