Saturday, May 10, 2014

Viewing graphs in the browser

(graphs)-[:ARE]->(everywhere)

Entities linked through relations are just everywhere. Social networks, biological entities, publications, any database tables joined with a foreign key etc. are countless examples but often not represented as graphs. Many tools exist to visualize them and this post will introduce my favorite five allowing to interact with a graph in a modern web browser: graphviz, neo4j, cytoscape.js, sigma.js and d3.js.
By far, we will not cover all the cool features of these tools (hei, this a simple blog post, not a book!) but rather give a short introduction to each of them. We will focus on small graphs with force layout methods, as it is the most general purpose one.
Kite drying, after a Greenland ice cap crossing, 2002

 

Materials

Some companion code can be found here. Open web/index.html in a web browser for viewing actual examples (due to ajax call and web worker, you must run the directory through some kind of http server, like an IDE or apache httpd).

Generating random graphs

For the sake of the demonstration, it will be easier to work on some fake graphs, with minimal information, adaptable sizes and no business knowledge. Vertices and edges contains different types of properties to illustrate some tools functionality.
 A perl script perl/build-random-graph.pl /tmp/aa 40 3 will create  a graph with 40 nodes, 3 disconnected clusters and some more connected subset of nodes and save various files with /tmp/aa prefix. See more in the github readme file.

Nodes

Each vertex contains several properties
  • a unique id like "n_123";
  • a category, which is in fact a color name;
  • a type, either "X" or "Y";
  • a numerical size, uniformly distributed between 0 and 100;
  • a cluster name, like "c_12"
  • a description, generated as a lorem ipsum sentence.

Edges

 An edge links two nodes and contains the properties:
  • a from id referring to a node id;
  • a to id;
  • a cluster name;
  • a probability value, between 0 and 1;
  • a rate numerical value, uniformly distributed between 0 and 100.

 

Graphviz

King comes first! Graphviz is a very comprehensive tool. From their own web site:
"Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics,  software engineering, database and web design, machine learning, and in visual interfaces for other technical domains."
Even though graphviz allows a rare variety in terms of customization, and provides a powerful algorithm to draw near optimal edges paths, the produced results is often seen as static. It can be seen through their application or exported to various formats.

Formatting the graph

Although it accepts several input formats, the native dot one is both simple and comprehensive. It allows customization at the global level (graph, node, edge) or at the single entity.

The random graph describe below produces a /tmp/aa.dot file which contains:
graph random{
  node [
    style=filled,
    fontsize=25
       ];
n_0 [color=red, width=1.01031734877059, height=1.01031734877059, tooltip="Fugit quia mollitia sint ullam est.", href="http://www.graphviz.org"]
...

n_0 -- n_6 [label=e_0]
...

}

Heading to the browser

Any software can output this type of text format.  It can then be transformed into SVG by the graphviz dot binary:
dot -Tsvg /tmp/aa.dot  >/tmp/aa-graphviz.svg
The color, height and width attributes will  rule the nodes attributes, while tooltip and href will react to mouse over and click.
Edges and nodes can of course also be decorated with more attributes (see the documentation).
Opening the SVG into a browser, or even better, embedding it in a web page, gives a rich and interactive graph view.

Yes, but it's not really interactive...

Of course, graphviz architecture does not easily allow to add or move nodes around. But the maturity of the software makes it sometimes an appealing solution.

 

Neo4j 

Neo4j is one of the most elaborated graph database system. Quoted from their web site:
"Graphs for Everyone
Neo4j is the world's leading graph database. People everywhere are using Neo4j to find graphs in every industry, connecting data to make sense of everything."
Although it's not a tool to view data, it offers an interesting way to do so. Moreover, the easiness of use (I could get it up and running and start playing with my data within a few minutes) makes it an interesting option.

Formatting the data

Several routes to import data in are possible. Recent improvements show a great gain of speed for bulk upload but for our simple example, we can stick to the cypher syntax. cypher allows to insert, query,  remove and interact otherwise with the database.
The random graph builder described above produces a /tmp/aa-cypher.txt file looking like:
CREATE (n_0:MyNode {description:'Fugit quia mollitia sint ullam est.', category:'red', cluster:'c_0', size:51.0317348770588})
...

CREATE  (n_0)-[:MyLink {rate:21.4877748514894}]->(n_6)
...

;
It can be imported into the database via the web console or in command line:
bin/neo4j-shell -file /tmp/aa-cypher.txt

Viewing data

Rest calls can be made and graphs exported  to json or TSV  to be displayed by other tools, but neo4j web application offers a way to interact with data.
To display all the node of cluster 'c_0', simply enter the cypher query:
MATCH n WHERE n.cluster= 'c_0' RETURN n;
Then, clicking on a node or an edge will display the details.

 

Cytoscape

Cytoscape is originally an open source desktop software.
"Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Although Cytoscape was originally designed for biological research, now it is a general platform for complex network analysis and visualization."

 Cytoscapeweb & Cytoscape.js

Being a desktop application doesn't fit today's post, although the tool is pretty powerful and gained a lot of success. However, it was declined in a browser oriented version cytoscapeweb. This first implementation was powered by a flash environment but allowed interactions with JavaScript.
However, a more recent project has emerged: cytoscape.js, based on html5. We will focus here on this later project.

Input Data (code)

We are here back in a classic JavaScript world, so the basic json file /tmp/aa.json contains it all:
{
   "nodes" : [
      {
         "cluster" : "c_0",
         "type" : "Y",
         "category" : "blue",
         "id" : "n_0",
         "description" : "Assumenda atque atque nostrum veniam quia sunt.",
         "size" : 71.9857610890433
      },

      ...
   ],
   "edges" : [
      {
         "rate" : 24.1752238846782,
         "cluster" : "c_0",
         "to" : "n_9",
         "from" : "n_0",
         "probability" : 0.254679689733507,
         "id" : "e_0"
      },
      ...

   ]
}
Nevertheless, we must tweak a little bit the format to adapt to the library. With the help of underscore.js, we can transform the loaded graph:
       var cyGraph = {
            nodes: _.map(graph.nodes, function (n) {
                return {data: n};
            }),
            edges: _.map(graph.edges, function (e) {
                return {data: {source: e.from, target: e.to, rate: e.rate}};
            })
        };


Using cytoscape.j

Without unrolling the complete doc,  here are a few cool things:
  • layout:{name:'arbor'} is the force directed layout. It uses webworkers for efficiency, which is an interesting implementation.
  • Mappers allow direct, liner or more evolved transformation of you nodes/edges properties into style values:
    'height': 'mapData(size, 0, 100, 20, 60)'
    'background-color':'data(category)',
    'line-color': 'mapData(rate, 0,100, blue, red)'
There is of course way more to the library, more layout, more node/edge properties to tune, dynamic addition and deletion of node. And even more I have not (yet) discovered...

 

Sigma.js

This library is a sort of an outlier in this post, as I've not be very found of its abilities to  arrange nodes in an efficient way. However, it comes as a very useful companion of the desktop application gephi. This software is a powerful tool to arrange large set of data, with various layout schema.
For the following example, the arctic data set (1715 nodes and 6676 edges) is imported and the ForceAtlas 2 is used in gephi to arrange the nodes.
Gephi graph (left) imported and displayed by sigma.js (right)

Importing Data (code)

Data can be arranged with gephi through the ForceAtlas 2 method. with large graphs, playing smoothly with the diverse parameters can fasten the convergence process. The graph can then be exported to gexf format and imported in sigma via a simple piece of code.
sigma.parsers.gexf(
    'data/arctic.gexf', {
        container: 'graph'
    });
Even though sigma js proposes mirror layout algorithms to gephi, I've personally found them buggy and very slow.

D3.js

Mike Bostock's d3.js library is certainly the major general purpose available library for visualization in the browser. Browsing its incredible gallery is the best way to get convinced (and inspired).
"D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation."

JavaScript code (code)

The code linked as example is largely inspired from the d3 documentation.
As for cytoscape, we need to transform our json graph into a d3 ready structure. This mean adding the node objects as the edge source and target attributes to the links:
       var nodes = {};
        _.each(graph.nodes, function (n) {
            nodes[n.id] = n;
        })
        var links = graph.edges.map(function (e) {
            return _.extend(e, {

                                source: nodes[e.from],
                                target: nodes[e.to]
                               });
        });

Then building the graph object
        force
            .nodes(_.values(nodes)) //_.values takes the hash values
            .links(links)
            .start();


Much more with d3.js

Staying with the force layout (d3 provides much more), it is possible to produce curved edges for more clarity, fisheye distortion to view details, or a matrix diagram.
But even more, d3 allow to customize any aspect of nodes or edges by drawing virtually any type of diagram linked to the data. Limit is the sky!

 

Conclusions: TIMDOWTDI

Each of the presented tools clearly have some strengths.
Graphviz can be preferred when the edges paths is important, when high customization of all characteristics is needed but when interactivity is not crucial.
The neo4j viewer is more to be used to debug/explore an existing database and is certainly the simpler tool in these situation.
Cytoscapejs offers a interesting alternative, highly customizable. It can also be opted in for a simple graph viewing web application, as it does not depends on more complex libraries to learn such as d3.js.
The main advantage of sigma.js is the handling of large network and can be a companion for overviews to a tool like gephi. Exporting gephi directly to SVG  could be an option to consider and produce better looking graphs.
Finally d3.js is not an outlier in the graph layout landscape in term of capabilities, but the library offers the richest toolkit available to display graphical elements on a web page. To that extent, benefiting from the power of d3 for other components in an application can be a big win. On the longer term, the momentum around this project and the solidity of its community makes d3.js a strong choice.

Post scriptum

 Giving a talk on the topic, a few other tools popped up. Among the most interesting is the R/d3 wrapper http://christophergandrud.github.io/d3Network/.

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete