VectorInnerProduct¶

Overview¶

The VectorInnerProduct expression computes the inner (dot) product of two float vectors represented as arrays. This is a runtime replaceable expression that delegates the actual computation to VectorFunctionImplUtils.vectorInnerProduct method for optimized vector operations.

Syntax¶

vector_inner_product(array1, array2)

// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(expr("vector_inner_product(vector1, vector2)"))

Arguments¶

Argument	Type	Description
left	Array[Float]	The first float vector/array
right	Array[Float]	The second float vector/array

Return Type¶

FloatType - Returns a single float value representing the inner product of the two input vectors.

Supported Data Types¶

This expression only supports:

Input: ArrayType(FloatType, _) for both arguments
Output: FloatType

Any other input data types will result in a DataTypeMismatch error with the UNEXPECTED_INPUT_TYPE error subclass.

Algorithm¶

The expression evaluation follows these steps:

Validates that both input expressions are arrays of float type
Delegates to VectorFunctionImplUtils.vectorInnerProduct via StaticInvoke
The static method computes the dot product: Σ(left[i] * right[i]) for all valid indices
Returns the computed scalar result as a float value
Includes the pretty name "vector_inner_product" for error reporting context

Partitioning Behavior¶

This expression has neutral partitioning behavior:

Preserves existing partitioning as it operates row-wise on individual records
Does not require shuffle operations since computation is local to each row
Can be pushed down and executed in parallel across partitions

Edge Cases¶

Null handling: Behavior depends on the underlying VectorFunctionImplUtils.vectorInnerProduct implementation
Mismatched array lengths: Error handling delegated to the runtime implementation
Empty arrays: Likely returns 0.0, but depends on the utility method implementation
NaN/Infinity values: Standard IEEE 754 floating-point arithmetic rules apply
Type validation: Strict type checking - only Array[Float] accepted for both inputs

Code Generation¶

This expression uses StaticInvoke for code generation, which:

Generates efficient bytecode that directly calls the static utility method
Supports Tungsten code generation for optimized execution
Avoids interpreted expression evaluation overhead
Provides better performance for vector operations

Examples¶

-- Calculate inner product of two feature vectors
SELECT vector_inner_product(
  array(1.0, 2.0, 3.0), 
  array(4.0, 5.0, 6.0)
) AS dot_product;
-- Result: 32.0 (1*4 + 2*5 + 3*6)

-- Use with table columns
SELECT id, vector_inner_product(features1, features2) as similarity
FROM vectors_table;

// DataFrame API usage
import org.apache.spark.sql.functions._

val df = spark.createDataFrame(Seq(
  (Array(1.0f, 2.0f, 3.0f), Array(4.0f, 5.0f, 6.0f)),
  (Array(1.0f, 0.0f), Array(0.0f, 1.0f))
)).toDF("vec1", "vec2")

df.select(expr("vector_inner_product(vec1, vec2)").as("dot_product")).show()