ArraysOverlap¶
Overview¶
The ArraysOverlap expression determines whether two arrays have any elements in common. It returns true if at least one element exists in both arrays, false if no common elements are found, and null if either array contains null elements but no overlap is detected.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| left | ArrayType | The first array to compare |
| right | ArrayType | The second array to compare |
Return Type¶
BooleanType - Returns true, false, or null
Supported Data Types¶
Supports arrays of any element type, including:
- Numeric types (IntegerType, LongType, DoubleType, etc.)
- String types
- Date and timestamp types
- Complex types (StructType, ArrayType, MapType)
- All primitive and reference types that support ordering or equality comparison
Algorithm¶
- Optimizes performance by identifying the smaller and larger arrays
- For data types with proper
equals()implementation, uses a fast HashSet-based approach - For complex data types, falls back to a brute-force nested loop with ordering comparison
- Short-circuits on first match found, returning
trueimmediately - Tracks null presence throughout evaluation for proper three-valued logic
Partitioning Behavior¶
- Preserves input partitioning as it operates element-wise on co-located data
- Does not require shuffle operations
- Can be pushed down in query optimization
- Maintains data locality for efficient distributed processing
Edge Cases¶
Null Handling:
- Returns
nullif any element in either array isnulland no overlap is found - Follows SQL three-valued logic for null propagation
- Expression is null-intolerant, meaning null inputs produce null outputs
Empty Arrays:
- Returns
falsewhen comparing empty arrays - Returns
falsewhen one array is empty regardless of the other array's contents
Special Cases:
- Arrays with different element types are cast to a common type if possible
- Requires elements to have ordering capability for comparison
- Uses
TypeUtils.typeWithProperEquals()to determine evaluation strategy
Code Generation¶
Supports Tungsten code generation for optimal performance:
- Generates specialized code paths for fast (HashSet) vs brute-force evaluation
- Produces efficient nested loops with early termination conditions
- Implements null-safe element access with runtime checks
- Falls back to interpreted mode for unsupported data types
Examples¶
-- Basic usage
SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5)); -- true
SELECT arrays_overlap(array(1, 2), array(3, 4)); -- false
-- With string arrays
SELECT arrays_overlap(array('a', 'b'), array('b', 'c')); -- true
-- Null handling
SELECT arrays_overlap(array(1, null, 3), array(4, 5)); -- null
SELECT arrays_overlap(array(1, null, 3), array(1, 4)); -- true
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(arrays_overlap(col("tags1"), col("tags2")))
// With literal arrays
df.select(arrays_overlap(array(lit(1), lit(2)), col("numbers")))
See Also¶
array_intersect- Returns the intersection of two arraysarray_union- Returns the union of two arraysarray_except- Returns elements in first array but not in secondarray_contains- Checks if array contains a specific element