BitwiseCount¶
Overview¶
The BitwiseCount expression counts the number of set bits (1s) in the binary representation of an integer or boolean value. It is implemented as a unary expression that accepts integral types and boolean values, returning the population count (popcount) of the input.
Syntax¶
Arguments¶
| Argument | Type | Description |
|---|---|---|
| expr | Integral or Boolean | The expression whose bits should be counted |
Return Type¶
IntegerType - Always returns an integer representing the count of set bits.
Supported Data Types¶
BooleanType- Counts 1 for true, 0 for falseByteType- 8-bit signed integersShortType- 16-bit signed integersIntegerType- 32-bit signed integersLongType- 64-bit signed integers
Algorithm¶
- For
BooleanType: Returns 1 if true, 0 if false - For integral types: Uses
java.lang.Long.bitCount()which implements population count - The algorithm counts each bit position that contains a 1 in the binary representation
- Input values are cast to long for the bitCount operation regardless of original integral type
- Null inputs result in null outputs (null-intolerant behavior)
Partitioning Behavior¶
- Preserves partitioning: Yes, this is a unary transformation that operates row-by-row
- Requires shuffle: No, computation is purely local to each partition
- Partition-wise operation: Each row is processed independently within its partition
Edge Cases¶
- Null handling: Returns null for null inputs (nullIntolerant = true)
- Boolean handling: Special case where true → 1, false → 0
- Negative numbers: Uses two's complement representation, so negative numbers may have many set bits
- Zero input: Returns 0 (no bits set)
- Maximum values: For signed integers, the maximum possible bit count depends on the data type width
Code Generation¶
This expression supports code generation (Tungsten). The doGenCode method provides optimized code paths:
- For BooleanType: Generates inline ternary operator ($c) ? 1 : 0
- For integral types: Generates direct call to java.lang.Long.bitCount($c)
- Falls back to interpreted mode (nullSafeEval) only when code generation is disabled
Examples¶
-- Count bits in integers
SELECT bit_count(0); -- Returns: 0
SELECT bit_count(7); -- Returns: 3 (binary: 111)
SELECT bit_count(8); -- Returns: 1 (binary: 1000)
SELECT bit_count(-1); -- Returns: 64 (all bits set in long representation)
-- Count bits in boolean
SELECT bit_count(true); -- Returns: 1
SELECT bit_count(false); -- Returns: 0
-- With null values
SELECT bit_count(NULL); -- Returns: NULL
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(expr("bit_count(id)"))
.show()
// Using with different data types
df.select(
expr("bit_count(byte_col)"),
expr("bit_count(short_col)"),
expr("bit_count(int_col)"),
expr("bit_count(long_col)"),
expr("bit_count(boolean_col)")
)
See Also¶
- Bitwise AND (
&), OR (|), XOR (^) expressions shiftleft(),shiftright()bitwise shift operationsbin()function for binary string representation- Other bitwise manipulation functions in the
bitwise_funcsgroup