MapFilter¶
Overview¶
MapFilter is a higher-order function that filters map entries based on a lambda predicate function. It applies the provided lambda function to each key-value pair in the input map and returns a new map containing only the entries where the predicate evaluates to true.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| map_expression | MapType | The input map to be filtered |
| lambda_function | Lambda | A function that takes two parameters (key, value) and returns a boolean |
Return Type¶
Returns a MapType with the same key and value types as the input map, containing only the entries that satisfy the predicate condition.
Supported Data Types¶
- Input map can have keys and values of any supported Spark SQL data type
- The lambda function must return a boolean expression
- Key and value types are preserved in the output map
Algorithm¶
- Iterates through each key-value pair in the input map
- Applies the lambda function to each pair, passing key as first argument and value as second argument
- Evaluates the lambda function result as a boolean predicate
- Includes the key-value pair in the result map only if the predicate returns true
- Constructs and returns a new map with the filtered entries
Partitioning Behavior¶
How this expression affects partitioning:
- Preserves existing partitioning since it operates row-by-row
- Does not require shuffle operations
- Can be applied as a projection without repartitioning
Edge Cases¶
- If input map is null, returns null
- If lambda function returns null for a key-value pair, that pair is excluded (treated as false)
- Empty input map returns empty map of the same type
- Lambda function exceptions will cause the entire expression to fail
- Maintains original key and value data types in filtered result
Code Generation¶
This expression supports Catalyst code generation (Tungsten) for optimized runtime performance when the lambda function can be code-generated.
Examples¶
-- Filter map entries where key is greater than value
SELECT map_filter(map(1, 0, 2, 2, 3, -1), (k, v) -> k > v);
-- Returns: {1:0, 3:-1}
-- Filter string map by key length
SELECT map_filter(map('a', 'apple', 'bb', 'banana'), (k, v) -> length(k) > 1);
-- Returns: {'bb':'banana'}
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(map_filter(col("data_map"), (k, v) => k > v))
// Filter nested map column
df.withColumn("filtered_map",
map_filter(col("original_map"), (key, value) => key.isNotNull && value > 0))
See Also¶
map_zip_with- applies a function to corresponding pairs from two mapstransform_keys- transforms map keys using a lambda functiontransform_values- transforms map values using a lambda functionfilter- filters array elements using a predicate function