Skip to content

Spark Internals Reference Guide

A comprehensive reference guide to Apache Spark SQL internals, covering expressions and physical operators.

Structure

  • Expressions - Catalyst expression reference (functions, operators, literals)
  • Operators - Physical execution operator reference (joins, aggregates, exchanges)

Page Structure

Each reference page includes:

For Expressions

  • Overview
  • Syntax (SQL and DataFrame API)
  • Arguments
  • Return Type
  • Supported Data Types
  • Algorithm
  • Partitioning Behavior
  • Edge Cases
  • Code Generation Support
  • Examples
  • See Also

For Operators

  • Overview
  • When Used (planner rules)
  • Input Requirements
  • Output Properties
  • Algorithm
  • Memory Usage
  • Partitioning Behavior
  • Metrics
  • Code Generation Support
  • Configuration Options
  • Edge Cases
  • Examples
  • See Also