MapConcat¶
Overview¶
MapConcat is a Catalyst expression that concatenates multiple maps into a single map. It takes a sequence of map expressions as input and merges them together, with later maps overwriting values for duplicate keys from earlier maps.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| children | Seq[Expression] | Variable number of map expressions to be concatenated |
Return Type¶
Returns a MapType with the same key and value types as the input maps.
Supported Data Types¶
Supports MapType expressions where:
- All input maps must have compatible key types
- All input maps must have compatible value types
- Keys can be any comparable data type
- Values can be any Spark SQL data type
Algorithm¶
The expression evaluates by:
- Iterating through each input map expression in order
- Creating a new result map starting with the first map's contents
- For each subsequent map, adding all key-value pairs to the result
- When duplicate keys are encountered, the value from the later map overwrites the earlier value
- Returns the final merged map containing all unique keys with their most recent values
Partitioning Behavior¶
This expression has neutral partitioning behavior:
- Preserves existing partitioning as it operates on individual rows
- Does not require shuffle operations
- Can be executed locally on each partition
Edge Cases¶
- Null handling: If any input map is null, the result will be null
- Empty maps: Empty maps are ignored in the concatenation process
- Duplicate keys: Values from maps appearing later in the argument list take precedence
- Type compatibility: All input maps must have compatible key and value types, otherwise compilation fails
- Single argument: If only one map is provided, returns that map unchanged
Code Generation¶
This expression supports Tungsten code generation for improved performance when processing large datasets with map concatenation operations.
Examples¶
-- Basic map concatenation
SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c'));
-- Result: {1:"a", 2:"b", 3:"c"}
-- Handling duplicate keys (later values win)
SELECT map_concat(map(1, 'old'), map(1, 'new', 2, 'b'));
-- Result: {1:"new", 2:"b"}
-- Concatenating multiple maps
SELECT map_concat(map(1, 'a'), map(2, 'b'), map(3, 'c'));
-- Result: {1:"a", 2:"b", 3:"c"}
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(map_concat(
map(lit(1), lit("a"), lit(2), lit("b")),
map(lit(3), lit("c"))
))
// Using column references
df.select(map_concat(col("map1"), col("map2")))
See Also¶
map()- Creates a map from key-value pairsmap_keys()- Extracts keys from a mapmap_values()- Extracts values from a mapmap_entries()- Converts a map to an array of key-value structs