Skip to content

DynamicPruningSubquery

Overview

DynamicPruningSubquery is a specialized subquery expression used for runtime partition pruning optimization in Spark SQL. It dynamically filters partitions at runtime based on the results of a broadcast subquery, allowing the query optimizer to skip reading irrelevant partitions during join operations.

Syntax

This is an internal optimization expression that is automatically generated by Spark's Catalyst optimizer during adaptive query execution. It is not directly accessible through SQL syntax or DataFrame API.

Arguments

Argument Type Description
pruningKey Expression The expression used for pruning on the probe side
buildQuery LogicalPlan The logical plan that produces the broadcast values
buildKeys Seq[Expression] The expressions from the build side used for filtering
broadcastKeyIndices Seq[Int] The indices of the filtering keys collected from the broadcast
onlyInBroadcast Boolean Whether the filter should be executed only if beneficial or only if broadcast can be reused
exprId ExprId Unique identifier for the expression (optional, auto-generated)
hint Option[HintInfo] Optional hint information for query optimization

Return Type

This expression is unevaluable and does not return a value during normal expression evaluation. It serves as a marker for the optimizer to apply dynamic pruning.

Supported Data Types

The pruning key and corresponding build key must have matching data types. All primitive data types supported by Spark SQL are compatible, as long as the pruning key and build key data types are identical.

Algorithm

  • Collects broadcast values from the build side of a join operation

  • Compares the pruning key expression against the broadcasted filter values

  • Dynamically determines which partitions can be pruned based on the broadcast results

  • Only executes if the broadcast can be reused through ReuseExchange (when onlyInBroadcast is true) or always executes if deemed beneficial

  • Validates that broadcast key indices correspond to valid positions in the build keys sequence

Partitioning Behavior

How this expression affects partitioning:

  • Does not preserve partitioning as it modifies the data flow through pruning

  • Does not require additional shuffle operations as it leverages existing broadcast operations

  • Reduces the amount of data that needs to be processed by eliminating partitions early

Edge Cases

  • Null handling behavior: Non-nullable expression (nullable returns false)

  • Empty buildKeys sequence makes the expression unresolved

  • Invalid broadcastKeyIndices (negative or out of bounds) cause resolution failure

  • Mismatched data types between pruning key and build key prevent proper resolution

  • Currently supports only single broadcasting key (broadcastKeyIndices.size must equal 1)

Code Generation

This expression extends Unevaluable, meaning it does not support code generation and is not evaluated during normal expression processing. It is consumed by the optimizer during query planning phases before code generation occurs.

Examples

-- This is an internal optimization feature
-- No direct SQL syntax available
-- Automatically applied during joins with broadcast hints
SELECT * FROM large_table lt
JOIN broadcast_table bt ON lt.key = bt.key
WHERE bt.filter_column = 'some_value'
// This is an internal optimization feature
// No direct DataFrame API usage
// Automatically applied during adaptive query execution
// when broadcast joins are detected

See Also

  • SubqueryExpression - Parent class for subquery operations
  • DynamicPruning - Marker trait for dynamic pruning expressions
  • BroadcastExchangeExec - Physical operator that works with dynamic pruning
  • ReuseExchange - Optimization for reusing broadcast results