## Sunday, January 20, 2013

### Sparse bitset in scala, benchmarking 5 implementations

Recently in a Scala project, I've hit a problem with sparse bit set. Typically, I was moving around set of 30~50 integers between 0 and 2047 with operation such as xor, and, circular shift etc. Basic operations, but repeated extensively, were soon to become a bottleneck in my application. The first implementation used scala BigInt (the main reason was the straightforwards shift call) but it later appeared that other solutions were more suited to the problem.
In this post, I will profile basic operation on  5 implementation alternatives: Scala BigInt, mutable and immutable BitSet versus Java BigInteger and BitSet (Java type can be used inside the Scala application of course).
Source code is available on github

## Measure method

In Scala, function are first class citizen. A function f  can easily be passed to a measure function which will execute it n times ( for precision matter)
def timeInNanos(n: Int, f: () => Unit) = {
val t0 = System.currentTimeMillis()
i = 0;
while (i < n) {
f();
i+=1
}
val t1 = System.currentTimeMillis()
( 1000000*(t1 - t0) / n)
}

And measuring the time of an addition of two integers is simply
timeInNanos(1000000,  { val i:Int = 2 +3})

#### Warming up

In practice, to limit overhead in the profiling loop, we add a first loop executing the function a few times.

#### Validating the measure

What is the precision of the method? What is the overhead of the loop and the function call? A first reflex at this stage is to ensure that the returned results make some sense.
So let's benchmark an empty function timeInNanos(1000000,  {}). Result returned is 0 nanoseconds, which is convincing (and will be much lower than typical measures later in this post).
We can now benchmark a sleep of 2 milliseconds timeInNanos(1000, ()=>{Thread.sleep(2)}). Results gives back 2.1ms, which an understandable overhead of 5% for the Thread call initialization

#### Data provider

To benchmark an operator, it would be dangerous to call 10 million times a function with the same parameters (the compiler could find that out, somehow). In our situation, benchmarking single or dual operator of object, we populate a list with a few thousand random objects to be profiled, and define an iterator that will loop (and cycle) across this list.
A typical time for pulling the next element is 17ns. We can even subtract this value (once or twice) when evaluating the cost of an operator

#### Hardware & version

scala 2.9.1(that's the one I use in my scalatra application) was run on a mac book pro 8GB, i7, 2Ghz processor.

## 5 Bitset implementations

The goal was to benchmark 5 different way of representing set of integer, below a certain limit (here 2048). The underlying problem lies with sparse bit set, i.e. containing between 30 to 50 values. Looking for the pessimistic side, I  initialized the object with 50 random values.
3 Scala & 2 Java representation were tested:

## Results

6 calls were measured (ran 10^7 times, subtracting overhead due to the iterating steps). We show here the function call and the results for each of them. See the recapitulation table at the end of the post.

#### bitCount

Implementation code t (ns)
sc BigInt `o.bitCount` 3
sc mut.BitSet `o.size` 625
sc imm.BitSet `o.size` 614
jv BitSet `o.cardinality()` 35
jv BigInteger `o.bitCount` 3

#### isEmpty?

Implementation code t (ns)
sc BigInt `o == 0` 54
sc mut.BitSet `o.isEmpty` 622
sc imm.BitSet `o.isEmpty` 604
jv BitSet `o.isEmpty()` 1
jv BigInteger `o == BigInteger.ZERO` 4

#### Right shift

Implementation code t (ns)
sc BigInt `o >> 1` 210
sc mut.BitSet
sc imm.BitSet
jv BitSet `o.get(1, o.length())` 161
jv BigInteger `o.shiftRight(1)` 211

#### right circular shift

Implementation code t (ns)
sc BigInt ``` val bit0 = o.testBit(0) val p = o >> 1 if (bit0) p.setBit(maxBit - 1) else p``` 221
sc mut.BitSet
sc imm.BitSet
jv BitSet ```val p = bi.get(1, bi.length()) if (o.get(0)) { p.set(maxBit-1) } else p``` 158
jv BigInteger ```bit0 = bi.testBit(0) val p = it.next.shiftRight(1) if (bit0) p.setBit(maxBit - 1) else p``` 232

#### bitwise and

Implementation code t (ns)
sc BigInt `o1 & o2` 492
sc mut.BitSet `o1 & o2` 189
sc imm.BitSet `o1 & o2` 207
jv BitSet `o1.and(o2)` 23
jv BigInteger `o1.and(o2)` 464

#### bitwise or

Implementation code t (ns)
sc BigInt `o1 | o2` 431
sc mut.BitSet `o1 | o2` 104
sc imm.BitSet `o1 | o2` 170
jv BitSet `o1.or(o2)` 46
jv BigInteger `o1.or(o2)` 521

#### bitwise exclusive or

Implementation code t (ns)
sc BigInt `o1 ^ o2` 432
sc mut.BitSet `o1 ^ o2` 253
sc imm.BitSet `o1 ^ o2` 171
jv BitSet `o1.xor(o2)` 51
jv BigInteger `o1.xor(o2)` 516

## Conclusion

#### Summary table

Implementation bit count isEmpty shift c.shift and or xor
sc BigInt 3 54 210 221 492 431 432
sc mut.BitSet 625 622 - - 189 104 253
sc immut.BitSet 614 604 - - 207 170 171
jv BitSet 35 1 161 158 23 46 51
jv BigInteger 3 4 211 232 464 521 516

The match is closed. Java BitSet is a clear winner in all heats!