Saturday, October 18, 2014

Embracing Technology Change In Scientific Software

“πάντα χωρεῖ καὶ οὐδὲν μένει” [Everything changes and nothing stands still]  - Heraclitus of Ephesus.
True 500 BC, this statement for sure holds nowadays in the software world. Whether at micro scale, when replacing a third parties library or continuously refactoring, or at macro scale, when adopting a new language, a framework or making major paradigm shift such as going parallel. Evolution is part of the development cycle.
Change can be planned, when a prototype paves the route for a production software or unexpected, when a technology breakthrough emerges during the lifespan of a project.
Although every domain is concerned by the phenomenon, scientific software raises a couple peculiarities. Often serving research, the purpose of the code itself will pivot in unforeseen directions. Meanwhile, developers, as talented as they might be, rarely come from a software engineering background and sometimes (that might be an understatement) do not consider the software building process as a relevant priority.
Embracing evolution raises challenges in multiple dimensions. We will try to cover both the technical perspective as well as the leadership one.


Contents:  Build To Change - Where (and When) To Go? - Foster Evolution in Your Team.

Kayaking through the ice, to reach our glacier Patagonia, 2000

Saturday, May 10, 2014

Viewing graphs in the browser

(graphs)-[:ARE]->(everywhere)

Entities linked through relations are just everywhere. Social networks, biological entities, publications, any database tables joined with a foreign key etc. are countless examples but often not represented as graphs. Many tools exist to visualize them and this post will introduce my favorite five allowing to interact with a graph in a modern web browser: graphviz, neo4j, cytoscape.js, sigma.js and d3.js.
By far, we will not cover all the cool features of these tools (hei, this a simple blog post, not a book!) but rather give a short introduction to each of them. We will focus on small graphs with force layout methods, as it is the most general purpose one.
Kite drying, after a Greenland ice cap crossing, 2002

Monday, August 5, 2013

Scala: 6 silver bullets

Bye bye Java, Hello Scala

In the JVM world, Scala is certainly the rising star. Created at EPFL in 2001, its strongly gaining in popularity. Depending on the indices, it ranks now as a "serious" language reaching far beyond the academic world and adopted in mainstream companies (twitter backend, Ebay research, Netflix, FourSquare etc.).
For data scientists, this language is a breeze. Above the religion war between functional and object oriented believers, it succeeded by merging the best of both worlds, with a strong drive at "let's be practical."
If Grails/Groovy was a big step forwards in productivity on the JVM, Scala goes even further, mixing static typing (thus efficiency) with many improvements in the language structure, collections handling, concurrency, backed by solid frameworks and a very active community.
In this post, I'll picked up six major (and subjective) improvements, showing my hardcore Java colleagues how jumping on this train would be a promise of a great journey.

Sunday, January 20, 2013

Sparse bitset in scala, benchmarking 5 implementations

Recently in a Scala project, I've hit a problem with sparse bit set. Typically, I was moving around set of 30~50 integers between 0 and 2047 with operation such as xor, and, circular shift etc. Basic operations, but repeated extensively, were soon to become a bottleneck in my application. The first implementation used scala BigInt (the main reason was the straightforwards shift call) but it later appeared that other solutions were more suited to the problem.
In this post, I will profile basic operation on  5 implementation alternatives: Scala BigInt, mutable and immutable BitSet versus Java BigInteger and BitSet (Java type can be used inside the Scala application of course).
Source code is available on github

Tuesday, December 11, 2012

Sustainable JavaScript

JavaScript is certainly one of the most popular programming language [1] [2] [3] and the never increasing popularity of complex web application will not cut that trend soon.
Despite some intrinsic flaws of the language and environment, trends have emerged to turn JavaScript into clearer, more structured, deployable, testable environment. In this post, I will go through some of them. As a subjective point of view, they will reflect some techniques we daily use in the scope of large scale projects, computing and displaying rich information in bio-informatics.

I will mainly make here a short introduction to underscore.js, require.js and jasmine as the tools that recently made my day change and shine again. I will skip discussions about jQuery, chrome developer tool, html5 and backbone which also are invaluable, among many others.

Example code, usable as is or in an eclipse (aptana studio is perfect) is available on google drive
Part of a large scale tech jobs ad campaign in the Bay area

Saturday, August 25, 2012

Continuous Deployment in Perl: Code & Folks


Continuous Deployment in Perl: Code & Folks

pre-reviewed version, published in a Perl issue in the Software Developer's Journal (May 2012 issue)
This article was written together with my friend Pierre-Antoine Queloz (paqueloz@gmail.com)


Continuous Integration is the tactic of decreasing the latency between the implementation of a new piece of code and its integration in the overall project. It is the backbone of Continuous Deployment that is often defined as releasing software very frequently in order to satisfy customer needs and get their feedback as soon as possible. Both have shown their benefits and play an important role in the success of the current Agile software development trend.

Monday, December 19, 2011

Comparing hostip and geoip

What is the best IP address to geolocalisation system?

Seeking to analyze web log files by the country of request origin, I recently proposed a simple plugin wrapping hostip.info API. Of course, the major competitor in the field is maxmind geoip and they provide a free light version of their database. This database can be queried through their java API of a geoip Grails plugin.
Time has come for numbers. How each module perform on the same set of ip address? Even with 3% of uncertainties, maxmind geoip lite shows to be clearly the best.