Skip to content

InterpretedPredicate

Overview

InterpretedPredicate is a Catalyst expression evaluator that wraps a boolean expression and evaluates it in interpreted mode (non-code-generated). It extends BasePredicate and is primarily used for filtering operations where the predicate expression needs to be evaluated row by row to return a boolean result.

Syntax

// Internal Catalyst usage - not directly exposed to SQL/DataFrame API
InterpretedPredicate(expression: Expression)

Arguments

Argument Type Description
expression Expression The boolean expression to be evaluated against input rows

Return Type

Boolean - returns true if the predicate condition is satisfied, false otherwise.

Supported Data Types

InterpretedPredicate itself doesn't restrict data types - it depends on the wrapped expression. The wrapped expression must evaluate to a Boolean type, but can operate on:

  • All primitive types (numeric, string, boolean)
  • Complex types (arrays, maps, structs)
  • Date and timestamp types
  • Any data type supported by the underlying expression

Algorithm

  • Prepares the wrapped expression during initialization with optional subexpression elimination
  • On evaluation, sets the input row in the runtime context if subexpression elimination is enabled
  • Delegates the actual evaluation to the wrapped expression's eval method
  • Casts the result to Boolean type before returning

Partitioning Behavior

InterpretedPredicate evaluation behavior:

  • Preserves existing partitioning as it only evaluates expressions row-wise
  • Does not require shuffle operations
  • Can be pushed down to individual partitions for local evaluation
  • Commonly used in filter operations that can be applied per-partition

Edge Cases

  • Null handling depends entirely on the wrapped expression's null semantics
  • If the wrapped expression returns null, casting to Boolean may throw ClassCastException
  • Empty input behavior is handled at the operator level, not within the predicate itself
  • Subexpression elimination state is managed per-partition through initialization

Code Generation

This expression explicitly uses interpreted evaluation mode and does not support Tungsten code generation. It falls back to the eval() method for row-by-row interpretation. Code generation would typically use GeneratedPredicate instead of InterpretedPredicate.

Examples

// Internal Catalyst usage example
val expr = EqualTo(col("age"), Literal(25))
val predicate = InterpretedPredicate(expr)
val result = predicate.eval(inputRow) // Returns Boolean
// Conceptual DataFrame operation that might use InterpretedPredicate internally
df.filter($"age" === 25) // May use InterpretedPredicate for the equality expression

See Also

  • BasePredicate - Parent abstract class
  • GeneratedPredicate - Code-generated alternative
  • ExpressionsEvaluator - Base evaluation interface
  • SQLConf.subexpressionEliminationEnabled - Configuration flag for optimization