BitmapConstructAgg¶
Overview¶
BitmapConstructAgg is an imperative aggregate function that constructs a bitmap by setting bits at positions specified by input long values. It accumulates multiple bit positions into a single binary bitmap structure during aggregation.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| child | Expression (Long) | The bit position to set in the bitmap (must evaluate to LongType) |
| mutableAggBufferOffset | Int | Offset for mutable aggregation buffer (internal) |
| inputAggBufferOffset | Int | Offset for input aggregation buffer (internal) |
Return Type¶
BinaryType - Returns a fixed-size binary array representing the constructed bitmap.
Supported Data Types¶
- Input: LongType only (implicit casting supported via ImplicitCastInputTypes)
- Output: BinaryType (binary array)
Algorithm¶
- Initializes a fixed-size binary buffer filled with zeros using
BitmapExpressionUtils.NUM_BYTES - For each input position, validates the bit position is within valid range (0 to 8 * bitmap.length - 1)
- Calculates byte position using integer division (position / 8) and bit offset using modulo (position % 8)
- Sets the corresponding bit using bitwise OR operation:
bitmap(bytePosition) | (1 << bit) - Merges bitmaps during aggregation using
BitmapExpressionUtils.bitmapMerge
Partitioning Behavior¶
This expression affects partitioning as follows:
- Does not preserve partitioning due to aggregation nature
- Requires shuffle for global aggregation across partitions
- Partial aggregation can occur within partitions before shuffle
Edge Cases¶
- Null handling: Null input positions are ignored (skipped without error)
- Empty input: Returns a bitmap filled with all zeros as the default result
- Invalid positions: Throws
QueryExecutionErrors.invalidBitmapPositionErrorfor positions < 0 or >= (8 * bitmap.length) - Buffer overflow: Fixed-size bitmap prevents dynamic expansion beyond
BitmapExpressionUtils.NUM_BYTES
Code Generation¶
This expression uses interpreted mode execution as it extends ImperativeAggregate, which does not support Tungsten code generation. All operations fall back to the eval() method implementation.
Examples¶
-- Construct bitmap from user IDs
SELECT bitmap_construct_agg(user_id) FROM events GROUP BY session_id;
-- Create bitmap of active days (0-6 for week days)
SELECT bitmap_construct_agg(day_of_week) FROM user_activity GROUP BY user_id;
// DataFrame API usage
import org.apache.spark.sql.functions._
// Construct bitmap of product categories
df.groupBy("store_id")
.agg(expr("bitmap_construct_agg(category_id)").alias("category_bitmap"))
// Create user activity bitmap
userEvents.groupBy("user_id")
.agg(expr("bitmap_construct_agg(event_type_id)").alias("activity_bitmap"))
See Also¶
- BitmapExpressionUtils.bitmapMerge - Used internally for merging bitmaps
- Other bitmap-related expressions in the bitmap functions group
- ImperativeAggregate - Base class for aggregation expressions