Size¶
Overview¶
The Size expression returns the number of elements in an array or map. It supports both legacy and modern null handling behavior, where legacy mode returns -1 for null inputs while modern mode returns null for null inputs.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| child | ArrayType or MapType | The array or map expression whose size is to be calculated |
| legacySizeOfNull | Boolean | Internal flag controlling null handling behavior (defaults to SQLConf.legacySizeOfNull) |
Return Type¶
IntegerType - Returns an integer representing the number of elements.
Supported Data Types¶
ArrayType- Any array type regardless of element typeMapType- Any map type regardless of key/value types
Algorithm¶
- Evaluates the child expression to get the array or map value
- If the value is null, returns -1 (legacy mode) or null (modern mode)
- For ArrayType, calls
numElements()on the ArrayData instance - For MapType, calls
numElements()on the MapData instance - Throws an error for unsupported operand types
Partitioning Behavior¶
This expression preserves partitioning as it operates on individual rows without requiring data movement:
- Does not require shuffle operations
- Maintains existing partitioning scheme
- Can be executed independently on each partition
Edge Cases¶
- Null handling: Returns -1 in legacy mode (
legacySizeOfNull = true) or null in modern mode for null inputs - Empty collections: Returns 0 for empty arrays or maps
- Unsupported types: Throws
QueryExecutionErrors.unsupportedOperandTypeForSizeFunctionErrorfor non-array/map types - Nullability: Expression is non-nullable in legacy mode, nullable in modern mode
Code Generation¶
This expression supports Tungsten code generation with optimized paths:
- Legacy mode: Generates specialized code that sets
isNull = falseand returns -1 for null inputs - Modern mode: Uses
defineCodeGenhelper for standard null-safe code generation - Both modes generate efficient
numElements()calls without interpretation overhead
Examples¶
-- Array size
SELECT SIZE(array(1, 2, 3, 4));
-- Returns: 4
-- Map size
SELECT SIZE(map('a', 1, 'b', 2));
-- Returns: 2
-- Empty array
SELECT SIZE(array());
-- Returns: 0
-- Null handling (legacy mode)
SELECT SIZE(null);
-- Returns: -1
// DataFrame API usage
import org.apache.spark.sql.functions.size
df.select(size($"array_column"))
df.select(size($"map_column"))
See Also¶
array- Creates arraysmap- Creates mapscardinality- Alternative function for array/map size in some SQL dialects