BitmapCount¶
Overview¶
The BitmapCount expression counts the number of set bits (1s) in a binary bitmap representation. This function is designed to work with bitmap data structures stored as binary data, providing efficient cardinality counting for sets represented as bitmaps.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| bitmap_binary | BinaryType | The binary representation of a bitmap whose set bits should be counted |
Return Type¶
Returns LongType - a 64-bit signed integer representing the count of set bits.
Supported Data Types¶
- Input: Only
BinaryTypeis supported - Output:
LongType
Algorithm¶
- Validates that the input expression is of
BinaryType, returning a type mismatch error for any other data type - Delegates the actual bit counting operation to
BitmapExpressionUtils.bitmapCount()method - Uses
StaticInvoketo call the utility method, indicating this is implemented as a runtime-replaceable expression - The replacement expression is marked as non-nullable (
returnNullable = false) - Performs direct binary data processing to count set bits in the bitmap
Partitioning Behavior¶
This expression preserves partitioning behavior since:
- It operates on individual rows without requiring data movement across partitions
- No shuffle operations are required as it's a deterministic function on single column values
- Each partition can independently process its bitmap data
Edge Cases¶
- Null input: The expression handles null inputs according to standard Spark null propagation rules
- Invalid binary format: Behavior depends on the underlying
BitmapExpressionUtils.bitmapCount()implementation - Empty binary data: Will return 0 as there are no set bits to count
- Large bitmaps: Returns
LongTypewhich can handle counts up to 2^63-1, suitable for very large bitmaps
Code Generation¶
This expression uses RuntimeReplaceable with StaticInvoke, which means:
- It does not generate direct bytecode but instead calls a static Java method
- Falls back to interpreted execution through the
BitmapExpressionUtilsutility class - The static method invocation may be optimized by the JVM's JIT compiler during runtime
Examples¶
-- Count bits in a bitmap column
SELECT bitmap_count(user_bitmap) FROM user_segments;
-- Use in aggregation context
SELECT segment_id, bitmap_count(combined_bitmap) as user_count
FROM segment_bitmaps;
// DataFrame API usage
import org.apache.spark.sql.functions.expr
df.select(expr("bitmap_count(bitmap_column)").alias("bit_count"))
// With column reference
df.select(expr("bitmap_count(user_segments)"))
.show()
See Also¶
- Other bitmap-related expressions in the
misc_funcsgroup BitmapExpressionUtilsutility class for bitmap operations- Binary data manipulation functions