Collations¶

Overview¶

The collations function is a table-generating function (generator) that returns metadata information about all available collations in the Spark system. It provides detailed information about collation properties including sensitivity settings, locale information, and ICU version details.

Syntax¶

SELECT * FROM collations()

Arguments¶

Argument	Type	Description
None	-	This function takes no arguments

Return Type¶

Returns a table with the following schema: - CATALOG (String, non-nullable): Catalog name - SCHEMA (String, non-nullable): Schema name
- NAME (String, non-nullable): Collation name - LANGUAGE (String, nullable): Language code - COUNTRY (String, nullable): Country code - ACCENT_SENSITIVITY (String, non-nullable): Either "ACCENT_SENSITIVE" or "ACCENT_INSENSITIVE" - CASE_SENSITIVITY (String, non-nullable): Either "CASE_SENSITIVE" or "CASE_INSENSITIVE" - PAD_ATTRIBUTE (String, non-nullable): Padding attribute setting - ICU_VERSION (String, nullable): ICU library version

Supported Data Types¶

This is a parameterless generator function, so no input data types apply.

Algorithm¶

Calls CollationFactory.listCollations() to retrieve all available collation identifiers
Loads metadata for each collation using CollationFactory.loadCollationMeta()
Transforms boolean sensitivity flags into human-readable string values
Constructs InternalRow objects containing collation metadata
Returns an iterable collection of rows representing all collations

Partitioning Behavior¶

As a leaf generator expression:

Does not preserve input partitioning (no input data)
Generates data independently on each partition
Does not require shuffle operations
Output is generated locally on each executor

Edge Cases¶

Returns empty result set if no collations are available in the system
Null values are returned for optional fields (LANGUAGE, COUNTRY, ICU_VERSION) when not available
All mandatory fields (CATALOG, SCHEMA, NAME, sensitivity flags, PAD_ATTRIBUTE) are always populated
No input validation required since function is parameterless

Code Generation¶

This expression extends CodegenFallback, meaning it does not support Tungsten code generation and always falls back to interpreted evaluation mode for each row generation.

Examples¶

-- Get all available collations
SELECT * FROM collations();

-- Filter for UTF8_BINARY collation
SELECT * FROM collations() WHERE NAME = 'UTF8_BINARY';

-- Find accent-sensitive collations
SELECT NAME, LANGUAGE, COUNTRY 
FROM collations() 
WHERE ACCENT_SENSITIVITY = 'ACCENT_SENSITIVE';

// DataFrame API usage
import org.apache.spark.sql.functions._

// Generate collations table
val collationsDF = spark.sql("SELECT * FROM collations()")

// Filter and select specific collations
val utf8Collations = collationsDF
  .filter($"NAME".contains("UTF8"))
  .select($"NAME", $"CASE_SENSITIVITY", $"ACCENT_SENSITIVITY")