ArrayContains¶
Overview¶
The ArrayContains expression checks whether an array contains a specific value. It performs an element-wise comparison using ordering semantics and returns a boolean result indicating presence of the value in the array.
Syntax¶
// DataFrame API
col("array_column").contains(value)
// or using function
array_contains(col("array_column"), lit(value))
Arguments¶
| Argument | Type | Description |
|---|---|---|
| left | ArrayType | The array expression to search within |
| right | Any comparable type | The value to search for in the array |
Return Type¶
Returns BooleanType - true if the value is found, false if not found, null if the result is indeterminate due to null values.
Supported Data Types¶
The expression supports arrays of any data type that has ordering semantics, including:
- Numeric types (IntegerType, LongType, DoubleType, etc.)
- StringType
- DateType and TimestampType
- BooleanType
- Complex types with defined ordering
The array element type and search value type must be compatible through type coercion rules.
Algorithm¶
- Iterates through each element in the input array using
ArrayData.foreach() - Compares each non-null element with the search value using
Ordering.equiv() - Returns
trueimmediately when a matching element is found - Tracks presence of null elements during iteration
- Returns
nullif no match found but null elements exist (indeterminate result) - Returns
falseif no match found and no null elements exist
Partitioning Behavior¶
This expression does not affect partitioning behavior:
- Preserves existing partitioning as it operates on individual rows
- Does not require shuffle operations
- Can be pushed down to individual partitions for parallel execution
Edge Cases¶
Null Handling:
- Returns null if either the array or search value is null
- Returns null if array contains null elements and no match is found
- Null elements in array are tracked but not compared for equality
Empty Array:
- Returns false for empty arrays (no elements to match)
Type Compatibility:
- Throws DataTypeMismatch error if array element type and search value type are incompatible
- Throws DataTypeMismatch error if either input is NullType
- Uses type coercion to find wider compatible types when possible
Code Generation¶
This expression supports Tungsten code generation through the doGenCode method:
- Generates optimized Java code for the containment check
- Uses
nullSafeCodeGento handle null safety efficiently - Implements loop unrolling for array iteration in generated code
- Optimizes null checking logic based on array nullability
Examples¶
-- Basic usage
SELECT array_contains(array(1, 2, 3), 2);
-- Returns: true
-- With null elements
SELECT array_contains(array(1, null, 3), 2);
-- Returns: null (indeterminate)
-- String arrays
SELECT array_contains(array('a', 'b', 'c'), 'b');
-- Returns: true
-- Not found
SELECT array_contains(array(1, 2, 3), 5);
-- Returns: false
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(array_contains(col("numbers"), lit(42)))
// Using column method
df.select(col("array_col").contains(lit("search_value")))
// With complex expressions
df.filter(array_contains(col("tags"), col("search_tag")))
See Also¶
ArrayPosition- finds the position of an element in an arrayArrayExists- checks if any element satisfies a predicateArraysOverlap- checks if two arrays have common elementsIn- checks membership in a list of values