Lower¶

Overview¶

The Lower expression converts all characters in a string to lowercase using appropriate collation rules. It supports various string collations and can optionally use ICU (International Components for Unicode) libraries for UTF8_BINARY collation instead of JVM's default case mappings.

Syntax¶

LOWER(str)
lower(str)

// DataFrame API
col("column_name").lower()
lower(col("column_name"))

Arguments¶

Argument	Type	Description
str	StringType	The input string expression to convert to lowercase

Return Type¶

StringType with the same collation as the input string.

Supported Data Types¶

Supports StringType with any collation (UTF8_BINARY, UTF8_LCASE, UNICODE, etc.).

Algorithm¶

Extracts the collation ID from the input StringType to determine locale-specific case conversion rules
Checks the spark.sql.icuCaseMappingsEnabled configuration to decide whether to use ICU or JVM case mappings for UTF8_BINARY collation
Delegates the actual lowercase conversion to CollationSupport.Lower.exec() which handles collation-aware case mapping
Preserves the original string's collation in the result
Uses lazy evaluation for collation ID extraction and ICU flag determination

Partitioning Behavior¶

This expression preserves partitioning since it operates on individual string values without changing data distribution:

Does not require shuffle operations
Maintains existing partition boundaries
Can be pushed down to individual partitions for parallel execution

Edge Cases¶

Null handling: Returns null if input is null (null-intolerant behavior)
Empty string: Returns empty string unchanged
Unicode characters: Properly handles Unicode case conversion based on collation rules
Locale-specific cases: Handles special cases like Turkish dotted/dotless i based on collation
Non-string types: Only accepts StringType inputs, type checking occurs during analysis

Code Generation¶

This expression supports Tungsten code generation through the doGenCode method, which generates optimized Java code using CollationSupport.Lower.genCode() for better performance in compiled execution paths.

Examples¶

-- Basic lowercase conversion
SELECT LOWER('SparkSQL');
-- Result: sparksql

-- With mixed case and numbers
SELECT LOWER('Hello World 123');
-- Result: hello world 123

-- Null handling
SELECT LOWER(NULL);
-- Result: NULL

// DataFrame API usage
import org.apache.spark.sql.functions._

df.select(lower(col("name")))
df.select(col("description").lower())

// With column alias
df.select(lower(col("title")).alias("lowercase_title"))