the big g@me: Sparse bitset in scala, benchmarking 5 implementations

Recently in a Scala project, I've hit a problem with sparse bit set. Typically, I was moving around set of 30~50 integers between 0 and 2047 with operation such as xor, and, circular shift etc. Basic operations, but repeated extensively, were soon to become a bottleneck in my application. The first implementation used scala BigInt (the main reason was the straightforwards shift call) but it later appeared that other solutions were more suited to the problem.
In this post, I will profile basic operation on 5 implementation alternatives: Scala BigInt, mutable and immutable BitSet versus Java BigInteger and BitSet (Java type can be used inside the Scala application of course).
Source code is available on github

Measure method

In Scala, function are first class citizen. A function f can easily be passed to a measure function which will execute it n times ( for precision matter)

def timeInNanos(n: Int, f: () => Unit) = {

val t0 = System.currentTimeMillis()

i = 0;

while (i < n) {

f();

i+=1

}

val t1 = System.currentTimeMillis()

( 1000000*(t1 - t0) / n)

}

And measuring the time of an addition of two integers is simply
timeInNanos(1000000, { val i:Int = 2 +3})

Warming up

In practice, to limit overhead in the profiling loop, we add a first loop executing the function a few times.

Validating the measure

What is the precision of the method? What is the overhead of the loop and the function call? A first reflex at this stage is to ensure that the returned results make some sense.

So let's benchmark an empty function timeInNanos(1000000, {}). Result returned is 0 nanoseconds, which is convincing (and will be much lower than typical measures later in this post).

We can now benchmark a sleep of 2 milliseconds timeInNanos(1000, ()=>{Thread.sleep(2)}). Results gives back 2.1ms, which an understandable overhead of 5% for the Thread call initialization

Data provider

To benchmark an operator, it would be dangerous to call 10 million times a function with the same parameters (the compiler could find that out, somehow). In our situation, benchmarking single or dual operator of object, we populate a list with a few thousand random objects to be profiled, and define an iterator that will loop (and cycle) across this list.

A typical time for pulling the next element is 17ns. We can even subtract this value (once or twice) when evaluating the cost of an operator

Hardware & version

scala 2.9.1(that's the one I use in my scalatra application) was run on a mac book pro 8GB, i7, 2Ghz processor.

5 Bitset implementations

The goal was to benchmark 5 different way of representing set of integer, below a certain limit (here 2048). The underlying problem lies with sparse bit set, i.e. containing between 30 to 50 values. Looking for the pessimistic side, I initialized the object with 50 random values.

3 Scala & 2 Java representation were tested:

scala BigInt [code]
scala mutable.BitSet [code]
scala mutable.BitSet [code]
java BitSet [code]
java BigInteger [code]

Results

6 calls were measured (ran 10^7 times, subtracting overhead due to the iterating steps). We show here the function call and the results for each of them. See the recapitulation table at the end of the post.

bitCount

Implementation	code	t (ns)
sc BigInt	`o.bitCount`	3
sc mut.BitSet	`o.size`	625
sc imm.BitSet	`o.size`	614
jv BitSet	`o.cardinality()`	35
jv BigInteger	`o.bitCount`	3

isEmpty?

Implementation	code	t (ns)
sc BigInt	`o == 0`	54
sc mut.BitSet	`o.isEmpty`	622
sc imm.BitSet	`o.isEmpty`	604
jv BitSet	`o.isEmpty()`	1
jv BigInteger	`o == BigInteger.ZERO`	4

Right shift

Implementation	code	t (ns)
sc BigInt	`o >> 1`	210
sc mut.BitSet
sc imm.BitSet
jv BitSet	`o.get(1, o.length())`	161
jv BigInteger	`o.shiftRight(1)`	211

right circular shift

Implementation	code	t (ns)
sc BigInt	`val bit0 = o.testBit(0) val p = o >> 1 if (bit0) p.setBit(maxBit - 1) else p`	221
sc mut.BitSet
sc imm.BitSet
jv BitSet	`val p = bi.get(1, bi.length()) if (o.get(0)) { p.set(maxBit-1) } else p`	158
jv BigInteger	`bit0 = bi.testBit(0) val p = it.next.shiftRight(1) if (bit0) p.setBit(maxBit - 1) else p`	232

bitwise and

Implementation	code	t (ns)
sc BigInt	`o1 & o2`	492
sc mut.BitSet	`o1 & o2`	189
sc imm.BitSet	`o1 & o2`	207
jv BitSet	`o1.and(o2)`	23
jv BigInteger	`o1.and(o2)`	464

bitwise or

Implementation	code	t (ns)
sc BigInt	`o1 \| o2`	431
sc mut.BitSet	`o1 \| o2`	104
sc imm.BitSet	`o1 \| o2`	170
jv BitSet	`o1.or(o2)`	46
jv BigInteger	`o1.or(o2)`	521

bitwise exclusive or

Implementation	code	t (ns)
sc BigInt	`o1 ^ o2`	432
sc mut.BitSet	`o1 ^ o2`	253
sc imm.BitSet	`o1 ^ o2`	171
jv BitSet	`o1.xor(o2)`	51
jv BigInteger	`o1.xor(o2)`	516

Conclusion

Summary table

Implementation	bit count	isEmpty	shift	c.shift	and	or	xor
sc BigInt	3	54	210	221	492	431	432
sc mut.BitSet	625	622	-	-	189	104	253
sc immut.BitSet	614	604	-	-	207	170	171
jv BitSet	35	1	161	158	23	46	51
jv BigInteger	3	4	211	232	464	521	516

The match is closed. Java BitSet is a clear winner in all heats!

the big g@me

Sunday, January 20, 2013

Sparse bitset in scala, benchmarking 5 implementations