Concat¶
Overview¶
The Concat expression concatenates multiple input expressions into a single result. It supports concatenating strings with collation support, binary data, and arrays, returning a combined result of the same data type as the inputs.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| expr1, expr2, ..., exprN | StringType, BinaryType, or ArrayType | Variable number of expressions to concatenate. All expressions must be of the same compatible type |
Return Type¶
Returns the same data type as the input expressions:
StringTypefor string inputs (orStringTypeif no arguments provided)BinaryTypefor binary inputsArrayType(elementType, containsNull)for array inputs
Supported Data Types¶
StringTypewith collation support (including trim collation)BinaryTypeArrayTypeof any element type
Algorithm¶
- For empty input, returns empty string of
StringType - For string inputs, uses
UTF8String.concat()to merge all input strings - For binary inputs, uses
ByteArray.concat()to merge all byte arrays - For array inputs, flattens all arrays into a single array, checking total element count against
MAX_ROUNDED_ARRAY_LENGTH - Validates all inputs are of compatible types during type checking phase
Partitioning Behavior¶
This expression does not affect partitioning:
- Preserves existing partitioning as it operates row-by-row
- Does not require shuffle operations
- Can be pushed down to individual partitions
Edge Cases¶
- Null handling: If any input expression evaluates to null, the entire result is null
- Empty input: Returns empty string (
StringType) when no arguments provided - Array overflow: Throws
QueryExecutionErrorif total array elements exceedMAX_ROUNDED_ARRAY_LENGTH - Type compatibility: All inputs must be the same type - mixed types (e.g., string + binary) will fail type checking
- Nullability: Result is nullable if any child expression is nullable
Code Generation¶
This expression supports Tungsten code generation:
- Generates optimized code paths for each supported data type (
BinaryType,StringType,ArrayType) - Uses
splitExpressionsWithCurrentInputsfor handling large numbers of input expressions - Falls back to interpreted evaluation only during the code generation setup phase
Examples¶
-- String concatenation
SELECT concat('Hello', ' ', 'World') AS greeting;
-- Result: 'Hello World'
-- Array concatenation
SELECT concat(array(1, 2), array(3, 4), array(5)) AS combined_array;
-- Result: [1, 2, 3, 4, 5]
-- Null handling
SELECT concat('Hello', NULL, 'World') AS result;
-- Result: NULL
// DataFrame API usage
import org.apache.spark.sql.functions.concat
df.select(concat($"first_name", lit(" "), $"last_name").as("full_name"))
// Array concatenation
df.select(concat($"array_col1", $"array_col2").as("combined_arrays"))
See Also¶
concat_ws- Concatenate with separatorarray_union- Union of array elements||operator - String concatenation operator