MapZipWith¶

Overview¶

MapZipWith is a higher-order function that combines two maps by applying a lambda function to corresponding key-value pairs. It creates a new map containing the union of all keys from both input maps, where the lambda function receives the key and values from both maps (or null if a key doesn't exist in one map) to compute the resulting value.

Syntax¶

map_zip_with(map1, map2, lambda_function)

// DataFrame API usage would be through expr() or selectExpr()
df.selectExpr("map_zip_with(map_col1, map_col2, (k, v1, v2) -> v1 + v2)")

Arguments¶

Argument	Type	Description
left	Map	The first input map
right	Map	The second input map
function	Lambda	A three-parameter lambda function (key, value1, value2) -> result

Return Type¶

Returns a MapType with the same key type as the input maps and value type determined by the lambda function's return type.

Supported Data Types¶

Input maps must have the same key type
Input maps can have different value types
Lambda function can return any supported Spark data type
Keys must be of a type that supports equality comparison

Algorithm¶

Collects all unique keys from both input maps
For each key, retrieves the corresponding values from both maps (null if key doesn't exist)
Applies the lambda function with parameters (key, value_from_left_map, value_from_right_map)
Constructs a new map with the key and the lambda function's result
Returns the combined map containing all processed key-value pairs

Partitioning Behavior¶

Preserves partitioning as it operates on individual rows
Does not require shuffle operations
Can be executed locally on each partition independently

Edge Cases¶

If a key exists in only one map, the missing value is passed as null to the lambda function
If either input map is null, the result is null
Empty maps are handled gracefully - the result contains only keys from the non-empty map
Lambda function must handle null values appropriately using functions like coalesce()
Duplicate processing is avoided by taking the union of keys rather than iterating both maps separately

Code Generation¶

This expression likely supports Tungsten code generation for the map iteration and key processing, but falls back to interpreted mode for lambda function evaluation, as higher-order functions typically require dynamic code evaluation.

Examples¶

-- Combine two maps by adding values, treating missing keys as 0
SELECT map_zip_with(
  map('a', 1, 'b', 2), 
  map('b', 3, 'c', 4), 
  (k, v1, v2) -> coalesce(v1, 0) + coalesce(v2, 0)
);
-- Result: {"a":1,"b":5,"c":4}

-- Combine maps with string concatenation
SELECT map_zip_with(
  map('x', 'hello', 'y', 'world'), 
  map('y', '!', 'z', 'new'), 
  (k, v1, v2) -> concat(coalesce(v1, ''), coalesce(v2, ''))
);
-- Result: {"x":"hello","y":"world!","z":"new"}

// DataFrame API usage
import org.apache.spark.sql.functions._

df.selectExpr("""
  map_zip_with(
    map_col1, 
    map_col2, 
    (k, v1, v2) -> coalesce(v1, 0) + coalesce(v2, 0)
  ) as combined_map
""")