Exists¶
Overview¶
The Exists expression represents an existential subquery predicate in SQL that returns true if the subquery produces at least one row. It extends SubqueryExpression and implements the EXISTS clause functionality, allowing queries to test for the existence of rows in a correlated or uncorrelated subquery.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| plan | LogicalPlan | The logical plan representing the subquery to be evaluated |
| outerAttrs | Seq[Expression] | Outer references from the subquery to the outer query (default: empty) |
| exprId | ExprId | Unique identifier for this expression (default: auto-generated) |
| joinCond | Seq[Expression] | Join conditions for correlated subqueries (default: empty) |
| hint | Option[HintInfo] | Optional query hints for optimization (default: None) |
Return Type¶
Boolean - returns true if the subquery produces at least one row, false otherwise.
Supported Data Types¶
The Exists expression itself doesn't operate on specific data types as it evaluates the existence of rows rather than values. The subquery can contain any supported Spark SQL data types.
Algorithm¶
-
Executes the provided logical plan (subquery) with any outer attribute references resolved
-
Evaluates join conditions if present for correlated subqueries
-
Returns true if the subquery execution produces one or more rows
-
Returns false if the subquery execution produces zero rows
-
Short-circuits evaluation once the first row is found for performance optimization
Partitioning Behavior¶
-
Does not preserve partitioning as it requires global evaluation of the subquery
-
May require shuffle operations depending on the subquery complexity and correlation
-
Correlated subqueries with join conditions typically require broadcast or shuffle joins
Edge Cases¶
-
Always returns non-nullable boolean (nullable = false)
-
Empty subquery results return false
-
NULL values in subquery rows still count as existing rows (returns true)
-
Correlated subqueries with no matching outer attributes behave as uncorrelated
-
Subquery execution errors propagate to the outer query
Code Generation¶
This expression is marked as Unevaluable, meaning it cannot be directly evaluated through standard code generation. It must be rewritten by the Catalyst optimizer (typically into joins) before code generation can occur.
Examples¶
-- Uncorrelated EXISTS subquery
SELECT * FROM employees
WHERE EXISTS (SELECT * FROM departments WHERE dept_id > 10);
-- Correlated EXISTS subquery
SELECT * FROM employees e
WHERE EXISTS (SELECT * FROM departments d WHERE d.manager_id = e.emp_id);
-- EXISTS with complex subquery
SELECT * FROM orders o
WHERE EXISTS (
SELECT * FROM order_items oi
WHERE oi.order_id = o.id AND oi.quantity > 5
);
// Generated internally by Catalyst - not directly accessible in DataFrame API
// Equivalent DataFrame operations would use joins or other transformations
See Also¶
InSubquery- for IN subquery expressionsScalarSubquery- for scalar subquery expressionsSubqueryExpression- base class for all subquery expressionsListQuery- for subqueries that return multiple values