PrimaryKeyConstraint¶
Overview¶
PrimaryKeyConstraint is a Catalyst expression that represents a primary key constraint on one or more columns in a table. It extends LeafExpression and TableConstraint to define uniqueness and non-null requirements for specified columns, and can be converted to Spark's V2 constraint representation.
Syntax¶
-- In CREATE TABLE or ALTER TABLE statements
CONSTRAINT constraint_name PRIMARY KEY (column1, column2, ...)
// Programmatic creation
PrimaryKeyConstraint(
columns = Seq("col1", "col2"),
userProvidedName = "pk_name",
tableName = "table_name",
userProvidedCharacteristic = ConstraintCharacteristic(...)
)
Arguments¶
| Argument | Type | Description |
|---|---|---|
| columns | Seq[String] | Sequence of column names that form the primary key |
| userProvidedName | String | Optional user-provided name for the constraint (default: null) |
| tableName | String | Name of the table this constraint belongs to (default: null) |
| userProvidedCharacteristic | ConstraintCharacteristic | Constraint characteristics like enforced/rely flags (default: empty) |
Return Type¶
This is a constraint expression that doesn't return data values. When converted via toV2Constraint, it returns a Spark V2 Constraint object.
Supported Data Types¶
Primary key constraints can be applied to columns of any data type that supports equality comparison and hashing for uniqueness checks.
Algorithm¶
- Validates that the constraint characteristic is not enforced (throws error if enforced is true)
- Generates a default constraint name using pattern
${tableName}_pkif no user name provided - Converts column names to
FieldReference.columnobjects for V2 constraint creation - Creates a V2 primary key constraint with rely/enforced flags and UNVALIDATED status
- Supports copying with modified names, table names, or characteristics
Partitioning Behavior¶
Primary key constraints are metadata-only and do not directly affect data partitioning:
- Does not preserve or alter existing partitioning schemes
- Does not require shuffle operations
- Constraint validation may impact query planning and optimization
Edge Cases¶
- Null handling: Primary key columns implicitly require non-null values
- Empty columns list: May be allowed but would create an invalid constraint
- Enforced characteristic: Explicitly fails if user tries to set enforced=true for PRIMARY KEY
- Missing table name: Generates constraint name as "null_pk" if tableName is null
- Duplicate constraint names: No validation performed at expression level
Code Generation¶
This expression is a constraint definition that participates in Catalyst's rule-based optimization but does not generate runtime code for data processing. It operates at the metadata/schema level.
Examples¶
-- Create table with named primary key constraint
CREATE TABLE users (
id INT,
email STRING,
CONSTRAINT users_pk PRIMARY KEY (id)
);
-- Create table with composite primary key
CREATE TABLE order_items (
order_id INT,
item_id INT,
quantity INT,
PRIMARY KEY (order_id, item_id)
);
// Create primary key constraint programmatically
val pkConstraint = PrimaryKeyConstraint(
columns = Seq("user_id", "account_id"),
userProvidedName = "user_account_pk",
tableName = "user_accounts"
)
// Convert to V2 constraint
val v2Constraint = pkConstraint.toV2Constraint
// Create with custom characteristics
val characteristic = ConstraintCharacteristic(rely = Some(true), enforced = Some(false))
val pkWithChar = pkConstraint.withUserProvidedCharacteristic(characteristic)
See Also¶
TableConstraint- Base trait for table-level constraintsConstraintCharacteristic- Configuration for constraint behaviorLeafExpression- Base class for expressions with no child expressions- Foreign key constraints and check constraints in the same constraint system