Tuesday, December 11, 2012

Sustainable JavaScript

JavaScript is certainly one of the most popular programming language [1] [2] [3] and the never increasing popularity of complex web application will not cut that trend soon.
Despite some intrinsic flaws of the language and environment, trends have emerged to turn JavaScript into clearer, more structured, deployable, testable environment. In this post, I will go through some of them. As a subjective point of view, they will reflect some techniques we daily use in the scope of large scale projects, computing and displaying rich information in bio-informatics.

I will mainly make here a short introduction to underscore.js, require.js and jasmine as the tools that recently made my day change and shine again. I will skip discussions about jQuery, chrome developer tool, html5 and backbone which also are invaluable, among many others.

Example code, usable as is or in an eclipse (aptana studio is perfect) is available on google drive
Part of a large scale tech jobs ad campaign in the Bay area

-"JavaScript sucks"

Despite its ubiquitousness, it is often quoted as a highly hated language among the developer community. Cryptic, unreadable, poorly organized, not portable, conter-intuitive are among the most recurrent criticisms.
And when the question comes to how to adopt good practice (code organization, testing, continuous deployment), the answer is: "yep, but we found no one stop shop, we tried a couple of them and... we gave up."
To get more about some flaws of the language itself, visit JavaScript: the real bad parts, or if you want to go for a really hot argument about a single semi column go here, thanks to Andrew Dupont.

-"but JavaScript is magic"

Looking around at application as Gmail, Google docs, cloud 9 IDE  etc., the sky is the limit. Just think of modern css features, html5, webGL. In scientific application, we can use its power to compute graphical rendering and even some mid intensive computations tasks that would otherwise ask to round trip to the server.
And the language escapes from the browser to the server side, with the rise of NoSql database (mongoDB, couchDB) or Node.js.
The easiness of deployment (a web browser is the main target), the intrinsic freedom of a dynamic language, the galaxy of available libraries, techniques and frameworks and the heterogeneity of the community makes it no easy to find a simple path for our daily programmer life.

If you want to go further on, I recommend Javascript, the good parts, by Douglas Crockford, a short but efficient book.

What is sustainability?

"Sustainability in a general sense is the capacity to support, maintain or endure [...] In ecology, sustainability describes how biological systems remain diverse, robust, and productive over time." (wikipedia)
If we go back to software development, sustainability could encompass the abilities:
  • to adapt to the environment: use productive libraries to improve the language itself;
  • to evolve: refactoring, survive moving targets;
  • robustness: enforced by testing;
  • to meet the environment, the  customer: continuous deployment.
I propose here some personal solutions towards which I converged recently. There are for sure plenty of other paths to reach the same goal.

Illustration: the problem of the day

We'll need to illustrate this post, I will not go for so ToDo list example but head for some bio-informatics problem, as it's my daily meat. Don't panic, it should be rather straightforwards:
  • a protein is a string of upper case letter (they represent 20 amino acids),
  • this protein can be digest by a cleavage enzyme, which is for us equivalent to a regular expression and produce a list of substrings, named peptides; trypsin is a typical cleavage enzyme,
  • peptide are also string of upper case letter, with an integer offset (their relative position to the protein).
So, we'll have a page with a textarea containing a protein sequence, a button to launch the digestion process, keeping only the peptide longer than a given length and reporting them in a list.

Available libraries: the 2012 winner is... underscore.js

Among jQuery, Backbone, Boostrap, html5 wrappers, I had to pick up one which changed my day: underscore.js
Core JavaScript is rather limited, and we often tend to re-invent the wheel again and again: Sorting list, extracting one the same attribute to a list of maps, checking validity, looping, filtering, getting the minimun value, contains. Zipping, shuffling or intersecting arrays. Mapping data to html templates and so on.
Underscore is a neat little librarie that provides plenty of those commons constructs.

Let's have a list of peptides, sequence and offset:
var peptideList = [{
    sequence : "MALWMR",
    offset : 0
  }, {
    sequence : "LLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGER",
    offset : 6
  }, {
    sequence : "GFFYTPK",
    offset : 46
  }, {
    sequence : "TR",
    offset : 53
  }, {
    sequence : "R",
    offset : 55
  }, {
    sequence : "EAEDLQVGQVELGGGPGAGSLQPLALEGSLQK",
    offset : 56
  }, {
    sequence : "R",
    offset : 88
  }, {
    sequence : "GIVEQCCTSICSLYQLENYCN",
    offset : 89
}];

Extracting an attribute

We want to make an array with all the sequence: 
var otherList = _.pluck(peptideList, "sequence");

Transforming each element

Each element into a composed string:
var otherList = _.collect(peptideList, function(pept) {
                  return pept.sequence + "(" + pept. offset + ")"
                });

Sorting on a function

Let's sort the list on the sequence length
var sorted = _.sortBy(petideList, function(pept) {
                return pept.sequence.length
            });

Chaining underscore functions

We can transform a structure by successive calls
_.chain(peptideList).
  groupBy(function(pept) {
     return pept.sequence
  }).
  each(function(g) {
     $("#target-4").append(g.length + ' x ' + g[0].sequence + '<br/>')
  });

Applying templates

Mapping an object to a piece of html is also a common pattern
var tmpl = "<b><%= sequence %></b> (<%= offset %>)"
var content = 
  _.collect(peptideList, function(pept) {
        return _.template(tmpl, pept);
 }).join('<br/>');
There is much more to the library and a very good documentation on their web page.

Organizing the source code: breaking into pieces

For our Protein/Enzyme/Peptide problem, we need three types of object, residing in independent files, for clarity sake (they could then reside in folders)

Protein.js

var Protein = function(sequence) {
    this.sequence = sequence.replace(/\s+/g, '');
    return this
};

Peptide.js

var Peptide = function(sequence, start){
    this.sequence=sequence;
    this.start= start;
    return this;
};

Peptide.prototype.length = function(){
    return this.sequence.length;
};

Trypsin.js

var Trypsin = function() {
    this.rule = '(.+?[KR])(?!P)';
};
/**
 * digest a protein sequence with the given regular expression and return the list of peptide with  more than 4 letters
 */
Trypsin.prototype.cleave = function(prot) {
    var re = new RegExp(this.rule, 'g');
    
    var offset = 0;
    var allPeptides = [];
    
    while ( m = re.exec(prot.sequence)) {
        allPeptides.push(new Peptide(m[0], offset));
        offset += m[0].length;
    }
    
    return _.filter(allPeptides, function(p) {
        return p.length() >= 5;
    });
}

Dependencies

These file have some dependencies, when Tryspin instanciate new Peptide(), or to the external world, with link to underscore.js for example.
The classic setup would be put the *.js files int a js/ forlder and list them all, in the correct order, in the html <header>.
<head>
  <script src="js/lib/jQuery-1.8.3.js"></script>
  <script src="js/lib/underscore.js"></script>
  <script src="js/Protein.js"></script>
  <script src="js/Peptide.js"></script>
  <script src="js/Trypsin.js"></script>
</head>

And a web page with some action

<body>
  <textarea id="prot-seq" style="width: 50em;height: 10em">
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN</textarea>
       <button id="but-digest" >digest</button>
       <h4>Peptides</h4> 
       <ul id="peptide-list"></ul>
       
       
       <script language="JavaScript">
           var ul = $('#peptide-list');
           var enzyme= new Trypsin();
           
           $('#but-digest').click(function(){
              ul.empty();
              var prot = new Protein($("#prot-seq").val());
              
              _.each(enzyme.cleave(prot), function(pept){
                  ul.append('<li>'+pept.sequence+'</li>');
              });
              
           });
           
       </script>
</body>

And we soon feel overwhelmed by:

  • header size explodes (think of having >100 js files).
  • dependencies management is a nightmare,
  • a page load needs one async call per javascript file,
  • it becomes just ugly...

Require.js: a JavaScript file and module loader [source code]

The idea is simple:
  • "classes", singleton text content are defined  into a resource file. A define instance can depend on other resources, which will mean that they'll be loaded only when the upstream dependencies are ready and will export one entity (class, object text etc.).
  • a script can require some resources. Requirement can be chained of course.

Simple define: Protein.js

define([], function() {
  var Protein = function(sequence) {
    this.sequence = sequence.replace(/\s+/g, '');
    return this
  };
  return Protein;
})

Define and depends on libraty and text template: Peptide.js

define(['underscore', 'text!templates/peptide-simple.html'], function(_, tmpl) {
  var Peptide = function(sequence, start) {
    this.sequence = sequence;
    this.start = start;
    return this;
  };

  Peptide.prototype.length = function() {
    return this.sequence.length;
  };

  //prettyString is based on a text underscore template
  Peptide.prototype.prettyString = function() {
    return _.template(tmpl, this)
  };

  //the exposed variable
  return Peptide;
});

And local dependencies: Trypsin.js

define(['underscore', 'mypackage/Peptide'], function(_, Peptide) {
  var Trypsin = function() {
    this.rule = '(?:[RK]|.+?(?:[RK]|$))(?=[^P]|$)';
  };

  Trypsin.prototype.cleave = function(prot) {
    [SNIP]
  };

  //we return a singleton (because require ensure that this resource will be loaded only once)
  return new Trypsin();
});

OK, now we have dependent block, but we still have to put them together.

File organisation [source code]

Let's have a look at a possible file structure:
src/
  index.html
  js/
    app.js
    lib/
      jquery-1.8.3.js
      underscore.js
      require/
        domReady.js
        require.js
        text.js
    mypackage/
      Peptide.js
      Protein.js
      Trypsin.js
  templates/
    peptide-simple.html
More or less everything resides under src/js. This organisation is personal, there is no constraints in require to put file in a given hierarchy (you can put everything into a single root directory).

Defining path and main script: app.js

Let's now let require.js know where everything lies
//require configuration
requirejs.config({
  baseUrl : 'js',
  paths : {
    //define aliases, either to final resources
    jQuery : 'lib/jquery-1.8.3',
    underscore : 'lib/underscore',

    //a couple of usefule require.js plugins
    text : 'lib/require/text',
    domReady : 'lib/require/domReady',

    //directory of underscore templates
    templates : '../templates'
  },
  /*
   * shim: Configure the dependencies and exports for older,
   * traditional "browser globals" scripts that do not use define()
   * to declare the dependencies and set a module value. Example (underscore 1.4+, jQuery)
   *
   * Ok, that's a little bit odd at first, but that's just a cut and paste once per project
   * shim also allow to define dependencies betwetween requirements at once.
   */
  shim : {
    'jQuery' : {
      exports : '$'
    },
    'underscore' : {
      exports : '_'
    }
  }
});

/*
 * state requirement and the name they will be used inside the function
 * $ will stand for the resource exposed by jQuery
 * Peptide for the class expose by  Peptide.js
 * trypsin (note the lower case, because I like it) stands for a singleton
 *
 * NB: we only define what we need at the in the "main" function, dependencies will be pulled oout
 */
require(['domReady!', 'jQuery', 'underscore', 'mypackage/Protein', 'mypackage/Trypsin'], function(doc, $, _, Protein, trypsin) {
  var ul = $('#peptide-list');

  $('#but-digest').click(function() {
    ul.empty();
    var prot = new Protein($("#prot-seq").val());

    _.each(trypsin.cleave(prot), function(pept) {
      ul.append('<li>' + pept.prettyString() + '</li>');
    });
  });
});

Index.html


Finally, contains a minimalist <header> section and html element (although the body could be populated later.

<html>
  <head>
    <!-- call the requirejs library and load the boostrap file js/app.js-->
    <script data-main="js/app" src="js/lib/require/require.js"></script>
  </head>
  <body>
    <textarea id="prot-seq" style="width: 50em;height: 10em">
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN</textarea>
    <button id="but-digest" >digest</button>
    <h4>Peptides</h4>
    <ul id="peptide-list"></ul>
  </body>
<html>

Asynchronous download

Visit the dev tool network shows how files are loaded
app.js pulls out all the needed dependencies

Optimization for deployment with r.js

Require.js offers a clean way to organize a JavaScript project. Even though it provides cache ready http header, the application can soon have hundreds of un-minified files to load.
Hopefully r.js comes to the rescue.
Let's create a new repository (also available within the example) deploy/ with the setup and build/ for the target/
deploy/
  lib/
    r.js
  app.build.js
Install node, cd deploy/ and run
node lib/r.js -o app.build.js

Et voilà!, all JavaScript files are packed into one big uglified build/js/app.js
build/index.html refers to it and makes the minimal amount of queries:
The file app.build.js is quite similar to the app.js. Well, I personally break the DRY principle here and certainly deserve a few dozen push-ups.  DRY extremists can find their way to more factorization here;)
There are much more options to r.js (uglify:'none' is a good place to start if the packaged application does not works). You can find them here.

Testing with jasmine

As often with JavaScript, There Is Also More Than One Way To Do It. I present here one unit testing framework, because it works well with require and allow for headless continuous integration with phantomjs. Source code also presents a example_jasmine project.

Setup

in a test directory
test/
  index.html
  js/
    Peptide-tests.js
    Trypsin-tests.js
  lib/jasmine-1.3.1
      MIT.LICENSE
      jasmine-html.js
      jasmine.css
      jasmine.js

index.html

Where jasmine is setup and the list of test suites is given. we use require js, with its definition embedded into the html page (once again DRY could reduce some copy/paste)
<html>
  <head>
    <title>Jasmine Spec Runner</title>

    <link rel="shortcut icon" type="image/png" href="lib/jasmine-1.3.1/jasmine_favicon.png">
    <link rel="stylesheet" type="text/css" href="lib/jasmine-1.3.1/jasmine.css">

    <script type="text/javascript" src="lib/jasmine-1.3.1/jasmine.js"></script>
    <script type="text/javascript" src="lib/jasmine-1.3.1/jasmine-html.js"></script>
    <script src="../src/js/lib/require/require.js"></script>

  </head>

  <body>
    <H1>haha</H1>
    <script language="JavaScript">
      requirejs.config({
        //relative to the local directory
        baseUrl : '../src/js',
        paths : {
          jQuery : 'lib/jquery-1.8.3',
          underscore : 'lib/underscore',

          text : 'lib/require/text',
          domReady : 'lib/require/domReady',

          templates : '../templates',

          //directory where are the test suites
          suites : '../../test/js'
        },
        shim : {
          'jQuery' : {
            exports : '$'
          },
          'underscore' : {
            exports : '_'
          }
        }
      });

      var reqs = ['suites/Peptide-tests',
                  'suites/Trypsin-tests'];

      require(reqs, function() {
        jasmine.getEnv().updateInterval = 1000;

        jasmine.getEnv().addReporter(new jasmine.HtmlReporter());
        jasmine.getEnv().execute();

      });

    </script>
  </body>
</html>

Test files

In our example, two test files are located in the test/suite directory.

Peptide-tests.js

OK, for the purpose of the demo, we check that the constructor works and that prettyString() return a meaningful string
define(['mypackage/Peptide'], function(Peptide) {
  describe('Peptide', function() {
    it("Constructor", function(){
      expect(new Peptide('HAHAHA') instanceof Peptide).toBe(true);
    })
  });
  describe('prettyString', function() {
    expect(new Peptide('HAHAHA', 42).prettyString()).
           toEqual('<b>HAHAHA</b> (42)')
  });
});

Trypsin-tests.js

This case is more interesting. A cleavage enzyme cut a protein string after K or R, except if it is following by a P. Javascript regular expression does not allow look behind (only look ahead), so putting up the regular expression is a bit more cumbersome.

Therefore, we can setup a check function and call it with a list of testing sequences (and add more when we'll find bug...)
describe('check cleavages', function() {
    var checkCleavages = function(comment, seq, expected, options) {
      var pepts = trypsin.cleave(new Protein(seq), options);
      it(comment + ' ' + seq + ' -> (' + expected.join(',') + ')', function() {
        expect(_.pluck(pepts, 'sequence')).toEqual(expected)
      });
    }

    checkCleavages('one long peptide', 'ABCDEFGHT', ['ABCDEFGHT']);
    checkCleavages('one short', 'ABC', []);
    checkCleavages('ABC, minlength=4', 'ABC', [], {
      minLength : 4
    });
    checkCleavages('ABC, minlength=3', 'ABC', ['ABC'], {
      minLength : 3
    });

    checkCleavages('cleave in 3', 'AAABDKAAEFGRAAHIJ', ['AAABDK', 'AAEFGR', 'AAHIJ'])
    checkCleavages('dont cut before P', 'ABDKPEFG', ['ABDKPEFG'])
    checkCleavages('last letter is a K', 'AAABDKAAEFGK', ['AAABDK', 'AAEFGK'])
  });

Running the tests

In a browser, simply open the page test/index.html. Green is good.
Don't forget to make some test fail on purpose, just to see that everything is working;)
It is also possible to select one test or one test subtree to cycle faster on a sub set.
a green jasmine report

(Much) more with jasmine

We have just presented here a very simplified setup for unit testing. It is straight forwards to employ jasmine with more features:
  • Many more testing functionalities here
  • setup a fake server for ajax calls with sinon (response is based on request method and url regular expressions)
  • launch unit test from command line with phantom.js for continuous integration (e.g. Jenkins)
  • used test to display graphical views (e.g. SVG) to have a centralized page with all rendering widgets

Conclusions

In this post, we presented a few tools to make javascript development a bit more easy, adaptable, sharable and more fun. Once more, it was only a subjective tour...

3 comments: