Bye bye Java, Hello Scala
In the JVM world, Scala is certainly the rising star. Created at EPFL in 2001, its strongly gaining in popularity. Depending on the indices, it ranks now as a "serious" language reaching far beyond the academic world and adopted in mainstream companies (twitter backend, Ebay research, Netflix, FourSquare etc.).For data scientists, this language is a breeze. Above the religion war between functional and object oriented believers, it succeeded by merging the best of both worlds, with a strong drive at "let's be practical."
If Grails/Groovy was a big step forwards in productivity on the JVM, Scala goes even further, mixing static typing (thus efficiency) with many improvements in the language structure, collections handling, concurrency, backed by solid frameworks and a very active community.
In this post, I'll picked up six major (and subjective) improvements, showing my hardcore Java colleagues how jumping on this train would be a promise of a great journey.
Bullet #1: Object orientation
Immutable variables
By default, variable are immutable. Once a value has been defined, you don't cannot change it. This is key for the functional paradigm, where a call on object should return the same value, as no state is stored. It can be disturbing at first, but one realize that most of our codes can be written with immutable variableval x:Int = 42
x=12 //error: reassignment to val
var i:Int = 0
i+=1 //OK
Type inference
Why specifying the variable type wherever it can be inferred (apart for readability, for public method for example)?val x = 42 // x is an Int
Lists and maps
We'll go further on collections later on, but from the definition point of viewval l1 = List(1,2,5) // List[Int]
val l2 = List(1,2,5.0) // List[Double]
val l2 = List(1,2,"x") // List[Any]
And maps follow the same approach
val m1 = Map("x"->3, "y"->42) // Map[String, Int]
null is discouraged
In Java, null stands for "does not exist" or "has actually a null value". Scala defines a parametrized type for that Option[Int]. That will make full sense below with pattern matching.val m = Map("x"->3, "y"->42) // Map[String, Int]
println(m("x")) // -> 3
println(m("z")) // -> NoSuchElementException
println(m.get("x)) // -> Some(3)
println(m.get("z")) // -> None
Tuples
It can be convenient to move around pair or variable tuples without the need of defining temporary classesval p = (1,4.0) // Pair[Int, Double]
println(p._2) // -> 4.0
val l = List((1, 4.0, "x"), (2, 1.2, "y"))
// List[(Int, Double, String)]
println(l(0)._1) // -> 1
Class
class Peak(val x: Double, val intens: Double) {def sqrtIntens = intens * intens // parameterless method
lazy val hundred = { // evaluated once
println("hundred call") // when needed
new Peak(x, 100 * intens)
}
override def toString = s"[$x, $intens]"
}
And let's use our class
val p = new Peak(5, 3)
val i2 = p.sqrtIntens // no need for parentheses
println(p.hundred)
Singleton object
Scala is truly object oriented. Why bother with static and MyClass.getInstance()?object SuperStat{
def mean(values:List[Double]) = values.sum/values.size
def variance(values:List[Double]) = {
val mu = mean(values)
val sum2 = values.map( x => math.pow(x-mu, 2))
.sum
sum2/(values.size-1) // returned value
}
}
println(SuperStat.mean(List(1,2,3,4,5)))
Companion object
If an object is defined in the same source file as a class, with the same name, it can have some "special" relation to it (implicit conversion, factory etc.)object Peak {
implicit def formPair(p:Pair[Double, Double]) =
new Peak(p._1, p._2)
}
val p:Peak = (1.3, 42.0)
Traits
Why an interface could not implement methods, and why a class could not inherit from multiple implemented parents?While abstract class still exist, a trait play a more general role. It can define methods "to be defined in implemented class" and also actual methods (but no fields).
trait HasIntensity{
def intens:Double
def sqrtIntens = intens * intens
}
class Peak(val x:Double, val intens:Double) extends HasIntensity{...}
Bullet #2: Collections
- List[T]: to be traversed, with head and last direct access,
- Vector[T]: with random access on integer index,
- Map[K, V]: for dictionary or hash map structure.
Instantiation
Has unveiled above:val l = List(1,2,3,4)
val m = Map("x"->3, "y"->1984)
By default, collection are also immutable, although they also exist in a mutable flavor:
import scala.collection.mutable.List
Adding values
val l = List(1,2,3,4)val l2 = l1 :+ 5 // List(1,2,3,4)
val l3 = l1 :: List(5,6,7) // List(1,2,3,4,5,6,7)
Access
val l = List(1,2,3,4)l(2) // 3
l.head // 1
l.last // 4
l.tail // List(2,3,4)
l.take(2) // List(1,2)
l.drop(2) // List(3,4)
And so much more
val l = List(1,2,3,4)l.map(_ * 10) // List(10,20,30,40)
l.find(_%2 == 0) // Some(2)
l.filter(_%2 == 0) // List(2,4)
l.partition(_%2==0) // (List(2,4), List(1,3))
l.combinations(2) // iterator (1,2), (1,3)...
A more complex example
Let's pretend we have a list of peaks as defined above Peak(val x:Double, val intens:Double). We'd like to group them by integer bin on x, sum up the intensities and keep only the 2 binned values with the highest total intensity.peaks.groupBy(p=>math.floor(p.x))
.map({pl =>(pl._1, pl._2.map(_.intens).sum)})
.toList
.sortBy(-_._2)
.take(2)
Each operation return a new collection, on which can be applied an operator. The succession of these operations are concisely described
for loops
Ever written for(i=0; i<arr.size();i++){arr[i]}? Well, we can do better. Let's consider two imbricated loops that build a list of points
val l1 = List(1,2,3,4)
val l2 = List(3,4)
for {
i <- l1
j <- l2 if i<j
} yield (i,j) // List((1,3), (1,4), (2,3), (2,4), (3,4))
Bullet #3: pattern matching
Pattern matching is a functional programming technique. Depending on the passed variable (values, type, structure) a case statement is selected
def useless(x:Any) = x match{
case x:String => "hello "+x
case i:Int if i<10 => "small"
case i:Int => "large"
case _ => "whatever"
}
List structure
For example, we have the very ambitious goal of reversing number 2 by 2 in a list of integers.def biRev(l:List[Int]):List[Int] = l match{
case Nil => List()
case x::Nil => List(x)
case x1::x2::xs => List(x2, x1):::biRev(xs)
}
biRev(List(1,2,3,4,5)) //List(2,1,4,3,5)
With regular expressions
Let's consider a function converting to kilometers. If the argument is an integer, it is returned, if it's a string matching \d+miles, number is converted into kilometers and so on.val reMiles="""(\d+)miles""".r
val reKm="""(\d+)km""".r
def toKm(x:Any)= x match{
case x:Int=> x
case reMiles(v) => (v.toInt*1.609).toInt
case reKm(v) => v.toInt
case _ =>
throw new IllegalArgumentException(s"cannot convert $x")
}
Bullet #4: concurrency
Easy list parallelization
Imagine we have a heavy function to be called on each list members (here, it will be sleeping 100ms...). Having immutable variables allows more easily to parallelize such a code on the multiple available cores with the .par call:val t0 = System.currentTimeMillis()
def ten(i: Int) = {
Thread.sleep(100)
println(System.currentTimeMillis() - t0)
i * 10
}
(0 to 20).toList.par.map(ten).sum;
Actors
Again, the functional trend of Scala enables easily to communicate between actors via message passing. Here a master send an integer to a slave, which decrease it by 1 at each step. Once finished, it send the word "stop". Pattern matching is use to select the correct behavior.
import scala.actors.Actor
import scala.actors.Actor._
object main {
class MinusActor(val master: Actor) extends Actor {
def act() {
loop {
react {
case (i: Int) => {
println(s"$i--")
master ! (i.toInt - 1)
}
case "stop" => {
println("ciao")
exit
}
case _ => println("whatever")
}
}
}
}
class MasterActor extends Actor {
val minusActor = new MinusActor(this)
def act() {
minusActor.start
minusActor ! 10
loop {
react {
case i: Int if i > 0 =>
minusActor ! i
case _ =>
minusActor ! "stop"
exit
}
}
}
}
new MasterActor().start()
Thread.sleep(1000)
}
And much more with akka, future, async
Bullet #5: the ecosystem
No matter how brilliant, a language cannot succeed if it is not supported by a strong ecosystem which encompasses many aspectsJava integration
Scala code is compiled into Java. A very good side effect is that available Java libraries can be used transparently in a Scala code. Here is an simple example using apache commons.math.import org.apache.commons.math3.stat.descriptive.moment.Variance
import scala.collection.JavaConversions._
object SD {
val variance = new Variance()
def apply(values: Seq[Double]): Double = {
math.sqrt(variance.evaluate(values.toArray))
}
def apply(values: Seq[Double],
weights: Seq[Double]): Double = {
math.sqrt(variance.evaluate(values.toArray, weights.toArray))
}
}
IDE
If Typesafe support an eclipse package, netbeans is used by many. Some development environment, such as Activator/Play! are strongly embedded into the browser and allow to use any text editor for the source code.
Web frameworks
if Play! is the most comprehensive one, some lighter alternative light scalar are available.
RDBMS integration
More than an ORM, slick is a mainstream solution. The comfort of an ORM, with the flexibility to manipulate list in the Scala fashion. Depending on the connected database, the generate SQL is optimized.
And NoSQL
Any Java driver is usable. But some tools are natively Scala oriented, such as Spark (in-memory large database) or reactive mongo.
Bullet #6 REPL
Experimenting is a key component of discovering a language. A REPL (Read-Eval-Print-Loop) allow to see the code executed on the fly.
worksheets
Even better, the eclpise IDE (at least) allows to have worksheet. Enter code, and each time the code is saved, it is evaluated. This offer the possibility to use object defined in the source base to be evaluated interactively.
This is the shortest bullet int this list. But it can be sometimes the most efficient.
Plenty more bullets
- xml parsing: mixing xpath and the list manipulation for a mixed SAX/DOM approach;
- http queries;
- file system traversal and files parsing;
- Json serialization;
- dynamic classes, to add handle whatever method code;
- regular expression (ok, Perl is the king, but Scala is not bad);
- macros;
- string interpolation and reverse;
- optional ';'
- more Java integration;
- abstract classes;
- profiling with yourkit or jProfiler
- context bound & parametric types
- streams
- foldLeft /:
- for loops structure
- map getOrElse
- case class
- implicit
- Mockito, EasyMock
- ...
Cons
- backward incompatibilities between major version: a jar compiled in 2.8 is not usable in 2.9. That maybe a price to pay for a maturing language, not being tight to a full backwards compatibility. That may be a serious issue for some, but personally, upgrading my code frmo 2.8 to 2.9 and 2.10, with its dependency has never been more that a couple of hour issue.
- sbt is a pain: ok. But I've never made my way smoothly through maven neither. For most of the issues I faced, googling was enough and even though I cannot a full understanding of the tool, it does the job. Slowly, for sure, but it does it.
- It's hard to hire people. That should hopefully evolve, with more and more adopters and major companies making commitment to the language.
- It's hard to learn: that is a common wisdom among scalers. It is definitely not a language a normal developer picks up in a week (contrary to Groovy, where the learning curve is steeper). Reading a book, or better, following Martin Odersky online course could be a worthy investment. But the common wisdom also says that when you have picked up the basis, the return on investment is great.
Further links
Google is your friends (well, nowadays, that's not totally true), but here are my favorites:- coursera.org functional programming in scala course by Martin Odersky, and there should be a more advanced one coming in fall 2013;
- Scala for the Impatient by Carl S. Horstamnn is a good introduction to the language, specially for Java folks;
- Programmin in Scala by Martin Odersky, Lex Spoon and Bill Venners is a pretty comprehensive one;
- daily scala blog, for inspiration;
- scala days conference (videos available);
- and definitely Scala meetups, if you are fortunate enough to have a local community.
No comments:
Post a Comment