Skip to content

ToJavaArray

Overview

The ToJavaArray expression converts Spark's internal ArrayData representation to a native Java array. This expression is primarily used internally by Spark for interoperability with Java code and external systems that require native Java array types.

Syntax

This expression is typically used internally and not directly exposed in SQL syntax. It may be used programmatically in custom expressions or UDFs.

ToJavaArray(arrayExpression)

Arguments

Argument Type Description
array Expression The array expression to be converted to a Java array

Return Type

Returns a native Java array type corresponding to the element type of the input ArrayType. The specific Java array type depends on the element type (e.g., int[], String[], Object[]).

Supported Data Types

  • ArrayType with any supported element type
  • Supports nested arrays (multidimensional arrays)
  • Compatible with all Spark SQL data types that can be represented as array elements

Algorithm

  • Evaluates the input array expression to get the ArrayData object
  • Determines the appropriate Java array type based on the element data type
  • Calls the corresponding ArrayData.to{XXX}Array() method to perform the conversion
  • Returns the native Java array object
  • Leverages constant folding optimization when the input expression is foldable

Partitioning Behavior

This expression does not affect partitioning behavior:

  • Preserves existing partitioning as it operates on individual array values
  • Does not require shuffle operations
  • Executed locally on each partition

Edge Cases

  • Null handling: If the input array expression evaluates to null, returns null
  • Empty arrays: Empty ArrayData is converted to an empty Java array of the appropriate type
  • Null elements: Null elements within the array are preserved in the resulting Java array
  • Type compatibility: Element types must be convertible to corresponding Java types

Code Generation

This expression supports Tungsten code generation when possible. The code generation optimization depends on the underlying array element types and the specific conversion method used.

Examples

// This expression is primarily used internally
// Example in custom expression implementation
case class CustomArrayProcessor(input: Expression) extends UnaryExpression {
  override def child: Expression = ToJavaArray(input)
  // Process the Java array...
}
// Example usage in UDF context
import org.apache.spark.sql.catalyst.expressions.ToJavaArray
// Used internally when Spark needs to convert ArrayData to Java arrays
// for external function calls or interoperability

See Also

  • ArrayData - Spark's internal array representation
  • Array functions (array, array_contains, array_join)
  • Collection expressions in Spark Catalyst
  • ConstantFolding optimizer rule