Saturday, May 10, 2014

Viewing graphs in the browser


Entities linked through relations are just everywhere. Social networks, biological entities, publications, any database tables joined with a foreign key etc. are countless examples but often not represented as graphs. Many tools exist to visualize them and this post will introduce my favorite five allowing to interact with a graph in a modern web browser: graphviz, neo4j, cytoscape.js, sigma.js and d3.js.
By far, we will not cover all the cool features of these tools (hei, this a simple blog post, not a book!) but rather give a short introduction to each of them. We will focus on small graphs with force layout methods, as it is the most general purpose one.

Monday, August 5, 2013

Scala: 6 silver bullets

Bye bye Java, Hello Scala

In the JVM world, Scala is certainly the rising star. Created at EPFL in 2001, its strongly gaining in popularity. Depending on the indices, it ranks now as a "serious" language reaching far beyond the academic world and adopted in mainstream companies (twitter backend, Ebay research, Netflix, FourSquare etc.).
For data scientists, this language is a breeze. Above the religion war between functional and object oriented believers, it succeeded by merging the best of both worlds, with a strong drive at "let's be practical."
If Grails/Groovy was a big step forwards in productivity on the JVM, Scala goes even further, mixing static typing (thus efficiency) with many improvements in the language structure, collections handling, concurrency, backed by solid frameworks and a very active community.
In this post, I'll picked up six major (and subjective) improvements, showing my hardcore Java colleagues how jumping on this train would be a promise of a great journey.

Sunday, January 20, 2013

Sparse bitset in scala, benchmarking 5 implementations

Recently in a Scala project, I've hit a problem with sparse bit set. Typically, I was moving around set of 30~50 integers between 0 and 2047 with operation such as xor, and, circular shift etc. Basic operations, but repeated extensively, were soon to become a bottleneck in my application. The first implementation used scala BigInt (the main reason was the straightforwards shift call) but it later appeared that other solutions were more suited to the problem.
In this post, I will profile basic operation on  5 implementation alternatives: Scala BigInt, mutable and immutable BitSet versus Java BigInteger and BitSet (Java type can be used inside the Scala application of course).
Source code is available on github

Tuesday, December 11, 2012

Sustainable JavaScript

JavaScript is certainly one of the most popular programming language [1] [2] [3] and the never increasing popularity of complex web application will not cut that trend soon.
Despite some intrinsic flaws of the language and environment, trends have emerged to turn JavaScript into clearer, more structured, deployable, testable environment. In this post, I will go through some of them. As a subjective point of view, they will reflect some techniques we daily use in the scope of large scale projects, computing and displaying rich information in bio-informatics.

I will mainly make here a short introduction to underscore.js, require.js and jasmine as the tools that recently made my day change and shine again. I will skip discussions about jQuery, chrome developer tool, html5 and backbone which also are invaluable, among many others.

Example code, usable as is or in an eclipse (aptana studio is perfect) is available on google drive
Part of a large scale tech jobs ad campaign in the Bay area

Saturday, August 25, 2012

Continuous Deployment in Perl: Code & Folks

Continuous Deployment in Perl: Code & Folks

pre-reviewed version, published in a Perl issue in the Software Developer's Journal (May 2012 issue)
This article was written together with my friend Pierre-Antoine Queloz (

Continuous Integration is the tactic of decreasing the latency between the implementation of a new piece of code and its integration in the overall project. It is the backbone of Continuous Deployment that is often defined as releasing software very frequently in order to satisfy customer needs and get their feedback as soon as possible. Both have shown their benefits and play an important role in the success of the current Agile software development trend.

Monday, December 19, 2011

Comparing hostip and geoip

What is the best IP address to geolocalisation system?

Seeking to analyze web log files by the country of request origin, I recently proposed a simple plugin wrapping API. Of course, the major competitor in the field is maxmind geoip and they provide a free light version of their database. This database can be queried through their java API of a geoip Grails plugin.
Time has come for numbers. How each module perform on the same set of ip address? Even with 3% of uncertainties, maxmind geoip lite shows to be clearly the best.

Wednesday, December 14, 2011

hostip grails plugin: from IP address to localisation

Getting a localisation from an IP address is somehow straight forwards with available web services. However, for massive report (such as analyzing web logs from the remoteIP perspective), it is not doable, even with caching.
The idea of the plugin is to use the free distributed hostip database, mirror it locally and give a Grails service to query it.
There is another post, comparing hostip with maxmind geoip. It should certainly be read before jumping to hostip.