TransformValues¶

Overview¶

The TransformValues expression applies a lambda function to each value in a map while preserving the original keys. This higher-order function transforms map values according to the provided lambda expression, creating a new map with the same keys but modified values.

Syntax¶

transform_values(map_expr, lambda_function)

Arguments¶

Argument	Type	Description
map_expr	MapType	The input map whose values will be transformed
lambda_function	LambdaFunction	A lambda function that takes (key, value) parameters and returns the transformed value

Return Type¶

Returns a MapType with the same key type as the input map, but potentially different value type based on the lambda function's return type.

Supported Data Types¶

Input map can have any valid MapType with supported key and value types
Lambda function can return any supported Spark SQL data type
Keys must be of a type that supports map operations (non-null, hashable types)

Algorithm¶

Iterates through each key-value pair in the input map
Applies the lambda function to each (key, value) pair
Preserves the original keys in their exact form
Collects transformed values and constructs a new map
Maintains the original map's structure and ordering properties

Partitioning Behavior¶

How this expression affects partitioning:

Preserves partitioning as it operates on individual rows
Does not require shuffle operations
Can be executed locally on each partition

Edge Cases¶

Null input map returns null
Empty map returns empty map with same type signature
If lambda function returns null for any value, that key-value pair will have null as the value
Lambda function exceptions will propagate and cause query failure
Duplicate keys are handled according to standard map semantics

Code Generation¶

This expression supports Spark's Tungsten code generation for optimized execution when the lambda function and map operations can be compiled to efficient Java bytecode.

Examples¶

-- Transform values by adding key to value
SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
-- Result: {1:2, 2:4, 3:6}

-- Transform string values to uppercase
SELECT transform_values(map('a', 'hello', 'b', 'world'), (k, v) -> upper(v));
-- Result: {'a':'HELLO', 'b':'WORLD'}

// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(transform_values(col("map_column"), (k, v) => k + v))

// Using SQL expression
df.selectExpr("transform_values(map_column, (k, v) -> k + v)")