Skip to content

UpdateFields

Overview

UpdateFields is a Spark Catalyst expression that modifies fields within a struct by applying a sequence of field operations. It enables adding new fields, dropping existing fields, or modifying field values and metadata in struct-typed data without recreating the entire structure.

Syntax

-- Used internally by Spark SQL for struct field operations
-- Typically invoked through higher-level struct manipulation functions
// DataFrame API usage through struct manipulation functions
import org.apache.spark.sql.catalyst.expressions._
UpdateFields(structExpr, fieldOps)

Arguments

Argument Type Description
structExpr Expression The input struct expression to be modified
fieldOps Seq[StructFieldsOperation] Sequence of operations to apply to struct fields (add, drop, modify)

Return Type

Returns a StructType with the modified field structure based on the applied field operations.

Supported Data Types

  • Input: StructType only
  • Output: StructType with potentially different field composition

Algorithm

  • Validates that the input expression is of StructType and that not all fields are being dropped
  • Extracts existing field expressions from the input struct, handling both CreateNamedStruct and regular struct expressions
  • Applies each field operation sequentially using foldLeft to build the new field structure
  • Creates a CreateNamedStruct expression with the resulting fields and expressions
  • Wraps the result in null-handling logic if the input struct is nullable

Partitioning Behavior

How this expression affects partitioning:

  • Preserves partitioning as it operates on individual rows without changing data distribution
  • Does not require shuffle operations
  • Maintains the same number of rows and their relative positions

Edge Cases

  • Null handling: If the input struct is nullable, the expression wraps the result in an If statement that returns null when the input struct is null
  • Empty operations: Validation prevents dropping all fields, which would result in an invalid empty struct
  • Type validation: Strictly enforces that the input expression must be of StructType
  • Nested field access: Handles both CreateNamedStruct expressions and regular struct field access patterns

Code Generation

This expression is marked as Unevaluable, meaning it does not support direct code generation. Instead, it transforms into an evalExpr (typically a CreateNamedStruct wrapped in conditional logic) that supports Tungsten code generation.

Examples

-- UpdateFields is used internally by struct manipulation functions
-- Example of equivalent high-level operations:
SELECT struct_col.field1, 'new_value' as field2, struct_col.field3 
FROM table_name
// Internal Catalyst usage
val structExpr = // some struct expression
val addFieldOp = // StructFieldsOperation to add a field  
val updateFieldsExpr = UpdateFields(structExpr, Seq(addFieldOp))

// The expression transforms to an evaluable form:
val evaluableExpr = updateFieldsExpr.evalExpr

See Also

  • CreateNamedStruct - Used internally to construct the final struct
  • GetStructField - Used to extract existing field values
  • StructFieldsOperation - Operations applied to modify struct fields
  • If/IsNull - Used for null-safe struct field updates