ArrayBinarySearch¶
Overview¶
ArrayBinarySearch performs a binary search operation on a sorted array to find the index of a specified value. It returns the index of the element if found, or a negative value indicating the insertion point if not found, following Java's Arrays.binarySearch semantics.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| array | ArrayType | The sorted array to search in |
| value | Compatible with array element type | The value to search for in the array |
Return Type¶
IntegerType - Returns the index of the found element or negative insertion point.
Supported Data Types¶
The array element type and search value must be compatible types that support ordering operations:
- Numeric types (IntegerType, LongType, FloatType, DoubleType, etc.)
- StringType
- DateType
- TimestampType
- Any type that has a defined ordering
The expression uses type coercion to find the tightest common type between the array element type and search value type.
Algorithm¶
The expression is implemented as a RuntimeReplaceable that delegates to Java's binary search:
- For primitive types: Uses Arrays.binarySearch directly on the converted Java array
- For complex types: Uses Arrays.binarySearch with a custom Comparator based on Spark's PhysicalDataType.ordering
- Converts the Spark array to a Java array using ToJavaArray
- Invokes ArrayExpressionUtils.binarySearch via StaticInvoke
- Returns standard binary search semantics: exact index if found, or -(insertion_point + 1) if not found
Partitioning Behavior¶
This expression does not affect partitioning:
- Preserves existing partitioning as it operates on individual array values within each partition
- Does not require shuffle operations
- Can be executed independently on each partition
Edge Cases¶
- Null array: Throws DataTypeMismatch error with NULL_TYPE subclass
- Null search value: Throws DataTypeMismatch error with NULL_TYPE subclass
- Null elements in array: Handled by custom comparator where nulls are ordered before non-null values
- Empty array: Returns -1 (standard binary search behavior for empty collections)
- Unsorted array: Undefined behavior as binary search requires sorted input
- Incompatible types: Throws DataTypeMismatch error with ARRAY_FUNCTION_DIFF_TYPES subclass
Code Generation¶
This expression does not use Tungsten code generation. It is implemented as a RuntimeReplaceable that generates a StaticInvoke expression calling into ArrayExpressionUtils.binarySearch, which executes in interpreted mode.
Examples¶
-- Search in sorted integer array
SELECT array_binary_search(array(1, 3, 5, 7, 9), 5);
-- Returns: 2
-- Search for non-existent value
SELECT array_binary_search(array(1.0F, 2.0F, 3.0F), 1.1F);
-- Returns: -2
-- Search in string array
SELECT array_binary_search(array('apple', 'banana', 'cherry'), 'banana');
-- Returns: 1
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(expr("array_binary_search(sorted_array_col, search_value)"))
// With literal array
df.select(expr("array_binary_search(array(1, 3, 5, 7), 3)"))
See Also¶
array_contains- Check if array contains a value without requiring sorted ordersort_array- Sort an array before performing binary searcharray_position- Find first occurrence of value in unsorted array