Skip to content

Exists

Overview

The Exists expression represents an existential subquery predicate in SQL that returns true if the subquery produces at least one row. It extends SubqueryExpression and implements the EXISTS clause functionality, allowing queries to test for the existence of rows in a correlated or uncorrelated subquery.

Syntax

EXISTS (subquery)
// Not directly available in DataFrame API - generated by Catalyst optimizer

Arguments

Argument Type Description
plan LogicalPlan The logical plan representing the subquery to be evaluated
outerAttrs Seq[Expression] Outer references from the subquery to the outer query (default: empty)
exprId ExprId Unique identifier for this expression (default: auto-generated)
joinCond Seq[Expression] Join conditions for correlated subqueries (default: empty)
hint Option[HintInfo] Optional query hints for optimization (default: None)

Return Type

Boolean - returns true if the subquery produces at least one row, false otherwise.

Supported Data Types

The Exists expression itself doesn't operate on specific data types as it evaluates the existence of rows rather than values. The subquery can contain any supported Spark SQL data types.

Algorithm

  • Executes the provided logical plan (subquery) with any outer attribute references resolved

  • Evaluates join conditions if present for correlated subqueries

  • Returns true if the subquery execution produces one or more rows

  • Returns false if the subquery execution produces zero rows

  • Short-circuits evaluation once the first row is found for performance optimization

Partitioning Behavior

  • Does not preserve partitioning as it requires global evaluation of the subquery

  • May require shuffle operations depending on the subquery complexity and correlation

  • Correlated subqueries with join conditions typically require broadcast or shuffle joins

Edge Cases

  • Always returns non-nullable boolean (nullable = false)

  • Empty subquery results return false

  • NULL values in subquery rows still count as existing rows (returns true)

  • Correlated subqueries with no matching outer attributes behave as uncorrelated

  • Subquery execution errors propagate to the outer query

Code Generation

This expression is marked as Unevaluable, meaning it cannot be directly evaluated through standard code generation. It must be rewritten by the Catalyst optimizer (typically into joins) before code generation can occur.

Examples

-- Uncorrelated EXISTS subquery
SELECT * FROM employees 
WHERE EXISTS (SELECT * FROM departments WHERE dept_id > 10);

-- Correlated EXISTS subquery
SELECT * FROM employees e
WHERE EXISTS (SELECT * FROM departments d WHERE d.manager_id = e.emp_id);

-- EXISTS with complex subquery
SELECT * FROM orders o
WHERE EXISTS (
  SELECT * FROM order_items oi 
  WHERE oi.order_id = o.id AND oi.quantity > 5
);
// Generated internally by Catalyst - not directly accessible in DataFrame API
// Equivalent DataFrame operations would use joins or other transformations

See Also

  • InSubquery - for IN subquery expressions
  • ScalarSubquery - for scalar subquery expressions
  • SubqueryExpression - base class for all subquery expressions
  • ListQuery - for subqueries that return multiple values