Exists¶

Overview¶

The Exists expression represents an existential subquery predicate in SQL that returns true if the subquery produces at least one row. It extends SubqueryExpression and implements the EXISTS clause functionality, allowing queries to test for the existence of rows in a correlated or uncorrelated subquery.

Syntax¶

EXISTS (subquery)

// Not directly available in DataFrame API - generated by Catalyst optimizer

Arguments¶

Argument	Type	Description
plan	LogicalPlan	The logical plan representing the subquery to be evaluated
outerAttrs	Seq[Expression]	Outer references from the subquery to the outer query (default: empty)
exprId	ExprId	Unique identifier for this expression (default: auto-generated)
joinCond	Seq[Expression]	Join conditions for correlated subqueries (default: empty)
hint	Option[HintInfo]	Optional query hints for optimization (default: None)

Return Type¶

Boolean - returns true if the subquery produces at least one row, false otherwise.

Supported Data Types¶

The Exists expression itself doesn't operate on specific data types as it evaluates the existence of rows rather than values. The subquery can contain any supported Spark SQL data types.

Algorithm¶

Executes the provided logical plan (subquery) with any outer attribute references resolved
Evaluates join conditions if present for correlated subqueries
Returns true if the subquery execution produces one or more rows
Returns false if the subquery execution produces zero rows
Short-circuits evaluation once the first row is found for performance optimization

Partitioning Behavior¶

Does not preserve partitioning as it requires global evaluation of the subquery
May require shuffle operations depending on the subquery complexity and correlation
Correlated subqueries with join conditions typically require broadcast or shuffle joins

Edge Cases¶

Always returns non-nullable boolean (nullable = false)
Empty subquery results return false
NULL values in subquery rows still count as existing rows (returns true)
Correlated subqueries with no matching outer attributes behave as uncorrelated
Subquery execution errors propagate to the outer query

Code Generation¶

This expression is marked as Unevaluable, meaning it cannot be directly evaluated through standard code generation. It must be rewritten by the Catalyst optimizer (typically into joins) before code generation can occur.

Examples¶

-- Uncorrelated EXISTS subquery
SELECT * FROM employees 
WHERE EXISTS (SELECT * FROM departments WHERE dept_id > 10);

-- Correlated EXISTS subquery
SELECT * FROM employees e
WHERE EXISTS (SELECT * FROM departments d WHERE d.manager_id = e.emp_id);

-- EXISTS with complex subquery
SELECT * FROM orders o
WHERE EXISTS (
  SELECT * FROM order_items oi 
  WHERE oi.order_id = o.id AND oi.quantity > 5
);

// Generated internally by Catalyst - not directly accessible in DataFrame API
// Equivalent DataFrame operations would use joins or other transformations