Sunday, January 20, 2013

Sparse bitset in scala, benchmarking 5 implementations

Recently in a Scala project, I've hit a problem with sparse bit set. Typically, I was moving around set of 30~50 integers between 0 and 2047 with operation such as xor, and, circular shift etc. Basic operations, but repeated extensively, were soon to become a bottleneck in my application. The first implementation used scala BigInt (the main reason was the straightforwards shift call) but it later appeared that other solutions were more suited to the problem.
In this post, I will profile basic operation on  5 implementation alternatives: Scala BigInt, mutable and immutable BitSet versus Java BigInteger and BitSet (Java type can be used inside the Scala application of course).
Source code is available on github

Measure method

In Scala, function are first class citizen. A function f  can easily be passed to a measure function which will execute it n times ( for precision matter)
 def timeInNanos(n: Int, f: () => Unit) = {
    val t0 = System.currentTimeMillis()
     i = 0;
    while (i < n) {
      f();
      i+=1
    }
    val t1 = System.currentTimeMillis()
    ( 1000000*(t1 - t0) / n)
  }

And measuring the time of an addition of two integers is simply
timeInNanos(1000000,  { val i:Int = 2 +3})

Warming up

In practice, to limit overhead in the profiling loop, we add a first loop executing the function a few times.

Validating the measure

What is the precision of the method? What is the overhead of the loop and the function call? A first reflex at this stage is to ensure that the returned results make some sense. 
So let's benchmark an empty function timeInNanos(1000000,  {}). Result returned is 0 nanoseconds, which is convincing (and will be much lower than typical measures later in this post).
We can now benchmark a sleep of 2 milliseconds timeInNanos(1000, ()=>{Thread.sleep(2)}). Results gives back 2.1ms, which an understandable overhead of 5% for the Thread call initialization

Data provider

To benchmark an operator, it would be dangerous to call 10 million times a function with the same parameters (the compiler could find that out, somehow). In our situation, benchmarking single or dual operator of object, we populate a list with a few thousand random objects to be profiled, and define an iterator that will loop (and cycle) across this list.
A typical time for pulling the next element is 17ns. We can even subtract this value (once or twice) when evaluating the cost of an operator

Hardware & version

scala 2.9.1(that's the one I use in my scalatra application) was run on a mac book pro 8GB, i7, 2Ghz processor.

5 Bitset implementations

The goal was to benchmark 5 different way of representing set of integer, below a certain limit (here 2048). The underlying problem lies with sparse bit set, i.e. containing between 30 to 50 values. Looking for the pessimistic side, I  initialized the object with 50 random values.
3 Scala & 2 Java representation were tested:

Results

6 calls were measured (ran 10^7 times, subtracting overhead due to the iterating steps). We show here the function call and the results for each of them. See the recapitulation table at the end of the post.

bitCount

Implementation code t (ns)
sc BigInt o.bitCount 3
sc mut.BitSet o.size 625
sc imm.BitSet o.size 614
jv BitSet o.cardinality() 35
jv BigInteger o.bitCount 3

isEmpty?

Implementation code t (ns)
sc BigInt o == 0 54
sc mut.BitSet o.isEmpty 622
sc imm.BitSet o.isEmpty 604
jv BitSet o.isEmpty() 1
jv BigInteger o == BigInteger.ZERO 4

Right shift

Implementation code t (ns)
sc BigInt o >> 1 210
sc mut.BitSet
sc imm.BitSet
jv BitSet o.get(1, o.length()) 161
jv BigInteger o.shiftRight(1) 211

right circular shift

Implementation code t (ns)
sc BigInt val bit0 = o.testBit(0)
val p = o >> 1
if (bit0) p.setBit(maxBit - 1) else p
221
sc mut.BitSet
sc imm.BitSet
jv BitSet val p = bi.get(1, bi.length())
if (o.get(0)) {
p.set(maxBit-1)
} else p
158
jv BigInteger bit0 = bi.testBit(0)
val p = it.next.shiftRight(1)
if (bit0) p.setBit(maxBit - 1) else p
232

bitwise and

Implementation code t (ns)
sc BigInt o1 & o2 492
sc mut.BitSet o1 & o2 189
sc imm.BitSet o1 & o2 207
jv BitSet o1.and(o2) 23
jv BigInteger o1.and(o2) 464

bitwise or

Implementation code t (ns)
sc BigInt o1 | o2 431
sc mut.BitSet o1 | o2 104
sc imm.BitSet o1 | o2 170
jv BitSet o1.or(o2) 46
jv BigInteger o1.or(o2) 521

bitwise exclusive or

Implementation code t (ns)
sc BigInt o1 ^ o2 432
sc mut.BitSet o1 ^ o2 253
sc imm.BitSet o1 ^ o2 171
jv BitSet o1.xor(o2) 51
jv BigInteger o1.xor(o2) 516

Conclusion

Summary table

Implementation bit count isEmpty shift c.shift and or xor
sc BigInt 3 54 210 221 492 431 432
sc mut.BitSet 625 622 - - 189 104 253
sc immut.BitSet 614 604 - - 207 170 171
jv BitSet 35 1 161 158 23 46 51
jv BigInteger 3 4 211 232 464 521 516

The match is closed. Java BitSet is a clear winner in all heats!






No comments:

Post a Comment