GetArrayStructFields¶
Overview¶
GetArrayStructFields extracts a specific field from all struct elements within an array, returning a new array containing only those field values. This expression is used internally by Spark's Catalyst optimizer when accessing nested fields in arrays of structs, providing an efficient way to project columns from complex nested data structures.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| child | Expression | The input expression that evaluates to an array of structs |
| field | StructField | The struct field definition containing name and data type information |
| ordinal | Int | The zero-based index position of the field within the struct |
| numFields | Int | The total number of fields in the struct type |
| containsNull | Boolean | Whether the resulting array can contain null values |
Return Type¶
Returns an ArrayType where the element type matches the data type of the extracted field. The array's containsNull property is determined by the containsNull parameter.
Supported Data Types¶
This expression requires the input to be an ArrayType containing StructType elements. The extracted field can be of any supported Spark SQL data type including primitives, complex types, and nested structures.
Algorithm¶
- Validates that the child expression produces an array of structs during type checking
- For each element in the input array, checks if the array element itself is null
- If the array element is not null, extracts the struct and checks if the target field is null
- Retrieves the field value using the specified ordinal position and field data type
- Constructs a new
GenericArrayDatacontaining all extracted field values
Partitioning Behavior¶
This expression preserves partitioning as it performs element-wise transformations without requiring data redistribution:
- Does not require shuffle operations
- Maintains the same number of rows and partitioning scheme
- Can be executed independently on each partition
Edge Cases¶
- Null array elements: When an array element is null, the corresponding position in the result array is set to null
- Null field values: When the extracted field within a struct is null, null is placed in the result array
- Empty arrays: Returns an empty array of the appropriate type
- Field nullability: Respects the
nullableproperty of the field definition for both evaluation and code generation
Code Generation¶
This expression supports Tungsten code generation through the doGenCode method. It generates optimized Java code that avoids object creation overhead and provides better performance compared to interpreted evaluation. The generated code includes null-safe evaluation logic that conditionally checks field nullability.
Examples¶
-- Extract 'name' field from array of person structs
SELECT persons.name FROM table_with_person_array;
-- Access nested field in array of structs
SELECT addresses.street FROM customers;
// DataFrame API usage (internal - not directly exposed)
// This expression is typically generated by Catalyst when optimizing
// operations like: df.select($"array_col.field_name")
See Also¶
ExtractValue- Parent trait for value extraction expressionsGetStructField- For extracting fields from individual structsGetArrayItem- For extracting elements from arrays by indexUnaryExpression- Base class for single-child expressions