AesEncrypt¶

Overview¶

The AesEncrypt expression provides AES (Advanced Encryption Standard) encryption functionality in Spark SQL. It encrypts binary input data using a specified key, encryption mode, padding scheme, initialization vector (IV), and optional additional authenticated data (AAD). This expression is implemented as a runtime replaceable that delegates to native implementation methods for performance.

Syntax¶

aes_encrypt(input, key [, mode [, padding [, iv [, aad]]]])

// DataFrame API
import org.apache.spark.sql.catalyst.expressions.AesEncrypt
AesEncrypt(inputExpr, keyExpr, modeExpr, paddingExpr, ivExpr, aadExpr)

Arguments¶

Argument	Type	Description
input	BinaryType	The binary data to encrypt
key	BinaryType	The encryption key as binary data
mode	StringType	The encryption mode (defaults to "GCM")
padding	StringType	The padding scheme (defaults to "DEFAULT")
iv	BinaryType	The initialization vector (defaults to empty)
aad	BinaryType	Additional authenticated data for GCM mode (defaults to empty)

Return Type¶

Returns BinaryType - the encrypted data as a binary array.

Supported Data Types¶

Input data: Binary type only
Key: Binary type only
Mode: String type with collation support (trim collation supported)
Padding: String type with collation support (trim collation supported)
IV: Binary type only
AAD: Binary type only

Algorithm¶

Accepts up to 6 parameters with automatic default value assignment for optional parameters
Delegates actual encryption to ExpressionImplUtils.aesEncrypt static method via StaticInvoke
Supports multiple constructor overloads for convenience (2-6 parameters)
Uses implicit type casting to coerce inputs to expected types
Implements runtime replacement pattern for efficient native code execution

Partitioning Behavior¶

This expression preserves partitioning behavior:

Does not require data shuffling across partitions
Can be applied per-row independently within each partition
Does not affect the partitioning scheme of the underlying dataset

Edge Cases¶

Null inputs: Follows standard Spark null propagation - any null input produces null output
Empty AAD: When AAD parameter is omitted, defaults to empty binary literal
Empty IV: When IV parameter is omitted, defaults to empty binary literal
Invalid key sizes: Behavior depends on underlying AES implementation in ExpressionImplUtils
Mode/padding combinations: Some mode and padding combinations may not be supported

Code Generation¶

This expression uses runtime replacement with StaticInvoke, which supports:

Full code generation (Tungsten) when possible
Falls back to interpreted mode for complex scenarios
Leverages native implementation through ExpressionImplUtils for optimal performance

Examples¶

-- Basic encryption with default GCM mode
SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH'));

-- Full specification with all parameters
SELECT base64(aes_encrypt(
  'Spark', 
  'abcdefghijklmnop12345678ABCDEFGH', 
  'GCM', 
  'DEFAULT', 
  unhex('000000000000000000000000'), 
  'This is an AAD mixed into the input'
));

// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(base64(expr("aes_encrypt(data, key, 'GCM', 'DEFAULT', iv, aad)")))

// Using expression directly
import org.apache.spark.sql.catalyst.expressions._
val encrypted = AesEncrypt(col("data").expr, col("key").expr)