SortArray¶
Overview¶
The SortArray expression sorts the elements of an array in either ascending or descending order. It returns a new array with the same elements sorted according to the specified ordering, with null values handled according to a consistent null-first or null-last policy.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
base |
ArrayType | The input array to be sorted |
ascendingOrder |
BooleanType | Optional. If true (default), sorts in ascending order; if false, sorts in descending order. Must be a foldable expression (constant). |
Return Type¶
Returns an ArrayType with the same element type and nullability as the input array.
Supported Data Types¶
Supports arrays containing any orderable data types:
- Numeric types (IntegerType, LongType, DoubleType, FloatType, etc.)
- String types (StringType)
- Date and timestamp types
- Binary types
- Does NOT support arrays containing non-orderable types like MapType or complex nested structures
Algorithm¶
- Creates specialized comparators (
ltfor ascending,gtfor descending) that handle null values consistently - Converts the input ArrayData to a Java array for efficient sorting
- Uses
java.util.Arrays.parallelSort()for performance with the appropriate comparator - Null values are sorted to the beginning in ascending order, end in descending order
- For primitive non-nullable arrays, uses optimized primitive array sorting when possible
- Returns a new GenericArrayData or UnsafeArrayData with sorted elements
Partitioning Behavior¶
This expression preserves partitioning:
- Does not require shuffle operations
- Operates on individual arrays within each partition
- Maintains the same number of rows and partitioning scheme
Edge Cases¶
- Null arrays: Returns null if the input array is null
- Empty arrays: Returns an empty array of the same type
- Arrays with all nulls: Returns an array with all nulls in the same positions (nulls are equal in comparison)
- Mixed null and non-null elements: Nulls are consistently placed at the beginning (ascending) or end (descending)
- Non-foldable ascendingOrder: Throws DataTypeMismatch error - the ordering parameter must be a constant
- Non-orderable element types: Throws DataTypeMismatch error during type checking
Code Generation¶
Supports full code generation (Tungsten):
- Generates optimized code paths for primitive types without nulls
- Falls back to object-based sorting for complex types or nullable elements
- Uses
UnsafeArrayData.fromPrimitiveArray()for primitive array optimizations - Implements custom Comparator generation for null-safe comparisons
Examples¶
-- Basic ascending sort (default)
SELECT sort_array(array(3, 1, 4, 1, 5)) AS sorted;
-- Result: [1, 1, 3, 4, 5]
-- Descending sort
SELECT sort_array(array('d', 'c', 'b', 'a', null), false) AS sorted_desc;
-- Result: ['d', 'c', 'b', 'a', null]
-- With null values (ascending)
SELECT sort_array(array(3, null, 1, null, 2)) AS sorted_with_nulls;
-- Result: [null, null, 1, 2, 3]
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(sort_array(col("numbers"))).show()
df.select(sort_array(col("strings"), lit(false))).show()
// With column reference for array
df.withColumn("sorted_values", sort_array(col("value_array"))).show()
See Also¶
array_sort- Alternative function name in some Spark versionsarray_max,array_min- For finding extremes without full sortingshuffle- For randomizing array element orderreverse- For reversing array element order