PrettyPythonUDF¶
Overview¶
PrettyPythonUDF is a placeholder expression used for displaying Python UDF expressions in a human-readable format without debugging information such as result IDs. It serves as a presentation layer for Python aggregate functions during query plan visualization and logging, extending UnevaluableAggregateFunc to indicate it cannot be directly evaluated.
Syntax¶
This expression is not directly invokable by users but represents Python UDFs in query plans:
Arguments¶
| Argument | Type | Description |
|---|---|---|
name |
String |
The name of the Python UDF function |
dataType |
DataType |
The return data type of the Python UDF |
children |
Seq[Expression] |
The input expressions/arguments to the Python UDF |
Return Type¶
The return type is determined by the dataType parameter, which can be any Spark SQL data type that the underlying Python UDF is configured to return.
Supported Data Types¶
As a placeholder expression, it supports all data types that Python UDFs can return: - Primitive types (IntegerType, StringType, DoubleType, etc.) - Complex types (ArrayType, MapType, StructType) - The actual type validation occurs in the underlying Python UDF implementation
Algorithm¶
- Acts as a non-executable placeholder for display purposes only
- Formats the UDF name and arguments for string representation
- Delegates actual evaluation to the underlying Python UDF implementation
- Cannot be directly evaluated (throws exception if evaluation is attempted)
- Provides specialized formatting for SQL output and aggregate string representations
Partitioning Behavior¶
Since this is a placeholder expression: - Does not directly affect partitioning (inherits from underlying UDF) - Does not require shuffle by itself - Partitioning behavior depends on the actual Python UDF implementation it represents
Edge Cases¶
- Null handling: Always returns
nullable = true, indicating the expression can produce null values - Evaluation attempts: Throws
UnsupportedOperationExceptionif direct evaluation is attempted since it extendsUnevaluableAggregateFunc - Empty children: Handles empty argument lists gracefully in string formatting
- Display formatting: Provides consistent formatting across different string representation methods
Code Generation¶
This expression does not support code generation:
- Extends UnevaluableAggregateFunc, making it non-evaluable
- Falls back to the underlying Python UDF's execution model
- Code generation is handled by the actual Python UDF implementation, not this placeholder
Examples¶
-- This expression appears in query plans when using Python UDFs:
-- EXPLAIN output might show:
python_sum(col1, col2)
python_avg(DISTINCT col3)
// Internal usage in Catalyst (not user-facing):
val prettyUDF = PrettyPythonUDF(
name = "my_python_agg",
dataType = DoubleType,
children = Seq(col("value").expr)
)
// String representation:
prettyUDF.toString // "my_python_agg(value)"
prettyUDF.sql(true) // "my_python_agg(DISTINCT value)"
See Also¶
UnevaluableAggregateFunc- Parent class for non-evaluable aggregate expressionsNonSQLExpression- Trait for expressions not directly expressible in SQLPythonUDF- Actual executable Python UDF expressionUserDefinedAggregateFunction- Interface for user-defined aggregate functions