VectorL2Distance¶
Overview¶
The VectorL2Distance expression computes the L2 (Euclidean) distance between two float vector arrays. This is a runtime-replaceable expression that delegates the actual computation to the VectorFunctionImplUtils.vectorL2Distance method through static invocation.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| left | Array[Float] | The first float array vector |
| right | Array[Float] | The second float array vector |
Return Type¶
FloatType - Returns a single float value representing the L2 distance between the two input vectors.
Supported Data Types¶
This expression only supports:
- Input arrays must be of type
ArrayType(FloatType, _)for both left and right expressions - No other numeric types (Double, Integer, etc.) are supported
- Both arrays must contain float elements
Algorithm¶
The expression uses a runtime replacement pattern for evaluation:
- Validates that both input expressions are float arrays during type checking
- Delegates actual computation to
VectorFunctionImplUtils.vectorL2DistanceviaStaticInvoke - Passes the left array, right array, and expression name ("vector_l2_distance") as parameters
- The underlying implementation computes the Euclidean distance between the two vectors
Partitioning Behavior¶
This expression has neutral partitioning behavior:
- Preserves existing partitioning as it operates on individual rows
- Does not require shuffle operations
- Can be evaluated independently on each partition
Edge Cases¶
- Null handling: Behavior depends on the underlying
VectorFunctionImplUtils.vectorL2Distanceimplementation - Array length mismatch: Runtime behavior is determined by the static invocation target
- Empty arrays: Handling is delegated to the underlying utility method
- Type validation: Strict type checking ensures only float arrays are accepted, with specific error messages for mismatched types
Code Generation¶
This expression uses runtime replacement rather than direct code generation:
- Implements
RuntimeReplaceableinterface - Replaces itself with a
StaticInvokeexpression during query planning - The
StaticInvokeexpression may support code generation depending on Spark's capabilities for static method calls
Examples¶
-- Calculate L2 distance between two float vectors
SELECT vector_l2_distance(array(1.0, 2.0, 3.0), array(4.0, 5.0, 6.0)) as distance;
-- Use in WHERE clause for similarity search
SELECT id, features
FROM vectors_table
WHERE vector_l2_distance(features, array(1.0, 0.0, 1.0)) < 2.0;
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(expr("vector_l2_distance(vector1, vector2)").as("distance"))
// With column references
df.withColumn("similarity",
expr("vector_l2_distance(features, reference_vector)"))
See Also¶
- Other vector distance functions in the
vector_funcsgroup VectorFunctionImplUtilsclass for underlying implementation detailsStaticInvokeexpression for runtime replacement pattern