SortOrder¶
Overview¶
SortOrder is a Catalyst expression that represents sort ordering specification for a single column or expression in Apache Spark SQL. It encapsulates the child expression to be sorted, sort direction (ascending/descending), null ordering behavior, and semantically equivalent expressions that have the same sort order.
Syntax¶
-- SQL syntax (used in ORDER BY clauses)
ORDER BY column_expression [ASC|DESC] [NULLS FIRST|NULLS LAST]
// DataFrame API usage
df.sort(col("column_name").asc_nulls_first)
df.orderBy(col("column_name").desc_nulls_last)
Arguments¶
| Argument | Type | Description |
|---|---|---|
| child | Expression | The expression to be sorted |
| direction | SortDirection | Sort direction (Ascending or Descending) |
| nullOrdering | NullOrdering | How null values should be ordered (NullsFirst or NullsLast) |
| sameOrderExpressions | Seq[Expression] | Set of expressions with equivalent sort order derived from operator equivalence relations |
Return Type¶
SortOrder itself is Unevaluable and does not return a value. It inherits the data type from its child expression for type checking purposes only.
Supported Data Types¶
Supports any data type that implements ordering semantics. The checkInputDataTypes() method validates that the child expression's data type supports ordering using TypeUtils.checkForOrderingExpr().
Algorithm¶
- SortOrder is a metadata expression that specifies sorting behavior rather than computing values
- The child expression provides the actual values to be compared
- Direction and null ordering control the comparison semantics
- Same order expressions enable optimization by recognizing equivalent sort keys
- Used primarily by sort-based operators like Sort, SortMergeJoin, and window functions
Partitioning Behavior¶
SortOrder itself does not directly affect partitioning, but it influences operators that use it: - Sort operations: May require shuffle if global ordering is needed - SortMergeJoin: Can preserve partitioning when join keys align with partition keys - Window functions: May require repartitioning based on partition keys vs sort keys
Edge Cases¶
- Null handling: Behavior determined by
nullOrderingparameter (NULLS FIRST or NULLS LAST) - Expression equivalence: The
satisfies()method checks semantic equivalence usingsameOrderExpressions - Type validation: Fails if child expression's data type doesn't support ordering
- Empty sameOrderExpressions: Valid case when no equivalent expressions exist
Code Generation¶
SortOrder is marked as Unevaluable, so it does not participate in code generation directly. The actual sorting logic is handled by physical operators that consume SortOrder specifications and generate optimized comparison code.
Examples¶
-- Basic ascending sort
SELECT * FROM table ORDER BY col1 ASC NULLS FIRST
-- Descending with nulls last
SELECT * FROM table ORDER BY col2 DESC NULLS LAST
-- Multiple sort orders
SELECT * FROM table ORDER BY col1 ASC, col2 DESC NULLS FIRST
// DataFrame API usage
import org.apache.spark.sql.functions._
// Ascending with nulls first
df.orderBy(col("col1").asc_nulls_first)
// Descending with nulls last
df.orderBy(col("col2").desc_nulls_last)
// Multiple sort orders
df.orderBy(col("col1").asc, col("col2").desc_nulls_first)
// Using sort instead of orderBy
df.sort(col("col1").desc)
See Also¶
- SortDirection: Enumeration for Ascending/Descending
- NullOrdering: Enumeration for NullsFirst/NullsLast
- Sort: Physical operator that executes sorting
- SortMergeJoin: Join operator that uses sort ordering
- WindowExec: Window function operator that requires sorted input