StringSpace¶
Overview¶
The StringSpace expression creates a string consisting of a specified number of space characters. It takes an integer argument representing the desired length and returns a UTF-8 string filled with that many spaces.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| length | Integer | The number of space characters to generate |
Return Type¶
UTF8String - A string containing the specified number of space characters.
Supported Data Types¶
- Input:
IntegerTypeonly - Implicit casting is supported for compatible numeric types that can be cast to integer
Algorithm¶
- Validates that the input is an integer type through implicit cast input types
- Evaluates the length parameter as an integer value
- If the length is negative, it defaults to 0 (no spaces)
- Uses
UTF8String.blankString()to efficiently generate a string of spaces - Returns the resulting UTF-8 string with the specified number of spaces
Partitioning Behavior¶
This expression preserves partitioning behavior:
- Does not require shuffle operations
- Can be evaluated independently on each partition
- Does not affect data distribution or partitioning scheme
Edge Cases¶
- Null handling: Expression is null-intolerant, meaning null inputs result in null output
- Negative values: Negative length values are treated as 0, resulting in an empty string
- Zero length: Returns an empty string when length is 0
- Large values: Limited by available memory and UTF8String implementation constraints
Code Generation¶
This expression supports Tungsten code generation through the doGenCode method:
- Implements
nullSafeCodeGenfor efficient compiled code execution - Generates optimized bytecode that avoids object creation overhead
- Falls back to interpreted mode only when code generation is disabled
Examples¶
-- Basic usage
SELECT SPACE(5);
-- Result: ' ' (5 spaces)
-- With negative number
SELECT SPACE(-3);
-- Result: '' (empty string)
-- Combined with other functions
SELECT CONCAT(SPACE(2), '1');
-- Result: ' 1'
-- Using with column values
SELECT name, SPACE(padding_length) AS spaces FROM table;
// DataFrame API examples
import org.apache.spark.sql.functions._
// Basic usage
df.select(space(lit(5)))
// With column reference
df.select(space(col("length_column")))
// Combined with other functions
df.select(concat(space(lit(3)), col("text")))
See Also¶
CONCAT- For combining the space string with other stringsREPEAT- For repeating other characters besides spacesLPAD/RPAD- For padding strings with spaces or other characters