Monday 6 September 2010

Option in Scala vs null in Java

When I first came to Scala a few months ago, one of the things I'd heard was to avoid using null and use the Option class instead.

Great, I thought. So instead of:

String x = ...
if (x != null) {

I'd write:

val x: Option[String] = ...
if (x.isDefined) {

Or perhaps:

val x: Option[String] = ...
x match {
  case Some(x1) => doSomething(x1)
  case None => // do nothing

Much as I was loving scala, neither of these were a great advance forward. In fact, even without Java's semicolon, they're both still more characters to type, and certainly not aiding readability.

Until one day, with the help of my colleague Daithi, the realisation came: treat an Option as a collection with zero or one entries.

So now I'd write:

val x: Option[String] = ...

("map", defined for all collections, creates a new collection with each element transformed using the function supplied. For Options, this means None => None, and Some(a) => Some(f(a)).  "map" is another one of those methods that my java background couldn't see the point of, but now I find myself using over and over again.  More on this in a later post, perhaps.)

For a real life example, take a look at the scala client for the Guardian's open platform content api.  This code comes from Api.scala, which is a simple builder for the content api url:

trait PaginationParameters extends Parameters {
    var _pageSize: Option[Int] = None
    var _page: Option[Int] = None

    def pageSize(newPageSize: Int): this.type = {
      _pageSize = Some(newPageSize); this

    def page(newPage: Int): this.type  = {
      _page = Some(newPage); this

    override def parameters = super.parameters ++
  "page-size" -> _) ++
  "page" -> _)

ExampleUsageTest shows this in action.

The interesting code is in the override of parameters, which returns a Map[String, Any] of the query string parameters to pass to the api. Each Option is mapped to a tuple of the parameter name and its value. This is then added onto the map returned by the base class using ++ which returns a new map with the tuple added.

Map[String, Any].++ expects to be passed a TraversableOnce[(String, Any)], i.e. a collection of String -> Any tuples.  Since the returns either None or Some("page-size" -> value) and we can treat an option as a collection with zero or one entries, we end up either adding nothing to the map (None) or adding a "page-size" -> value to the map.

A relatively long explanation, mostly because as an (ex-?)Java dev I still sometimes have to understand exactly what's going on underneath.  When I first started developing C, I had to understand what machine code was generated. When I first started developing C++, I had to understand what C was generated.  Luckily both wore off, and I expect it will happen again soon.

But read the resulting code. It's concise, and its intent is clear. And hey, no null checks!

NB: After writing most of this post, I found an excellent, far better written, write up of using Option sensibly here.  Tony's scala Option cheat sheet is worth a read too.

Wednesday 31 March 2010

Expressing intent in scala

Scala gets some bad press because it's possible to get carried away with some its shiny new toys, especially the ability to create methods with non-alpha names. Some of this I agree with - methods like >:> really doesn't aid readability for me.

Don't throw out the baby with the bathwater though. I like scala because it makes it much easier for me to express my intent.

I've been working on some code that builds a solr index of a large database. To do this efficiently, it needs to divide the data set into smaller blocks for processing in parallel.

The ids in the table are autogenerated, and for historical reasons are far from monotonically incrementing. For the data retrieval queries to be efficient we need to find the primary keys for each block. The (oracle) query looks a bit like this:

   (SELECT id, ROWNUM rownumber 
      FROM [the_large_table] ORDER BY id) 
WHERE MOD(rownumber, 1000) = 0

Then we had a bit of java code to build batch objects from this:

public List<Batch> getBatches
    (int batchSize) {
  List<Integer> idList = 
    // result of the query above
  int maxId =  
    // find the max id from the table

  int currentId = 0;

  List<Batch> batches = 
    new ArrayList<Batch>();

  int num = 1;

  for (Integer nextId : idList) {
    batches.add(new Batch(num++, currentId, nextId-1));
    currentId = nextId;

  batches.add(new Batch(num, currentId, maxId));

  return batches;

I converted this to the following scala yesterday:

private def buildBatches(
      num: Int, 
      idList: List[Int]) = {
idList match {
 case startOfThis :: startOfNext :: tail =>
   new Batch(num, startOfThis, startOfNext-1) :: 
    buildBatches(num+1, startOfNext :: tail)
  case _ => Nil
def getBatches(batchSize: Int) = {
 val idList = // ...
 val maxId = // ...

 buildBatches(1, 0 :: idList ::: maxId :: Nil)

Importantly, when I discussed how to build up the batches with a colleague, we talked about the operation as "take the first two entries in the the list, build a batch from that, then go on to the next two entries in the list. Oh, and we need to pretend that the list from the database has a zero on the beginning and the max on the end." That's exactly what the code does.

"::" is the scala list append operator. So 1 :: 2 :: 3 :: Nil is equivalent to List(1, 2, 3). Nil represents an empty list. (You need the Nil on the end to let the scala compiler know that you're wanting to do stuff with lists.)

The magic here is the match in the buildBatches method (it looks better when it doesn't have quite so many line breaks!). The match works like this:

List(1, 2, 3, 4) match { 
  case a :: b :: c => 
  case _ => 
// output:
// 1
// 2
// List(3, 4)

This is why I'm liking scala: it enables me to express my intent in a way that less malleable languages like java do not. As always, the most important thing is to focus on readability, not no the shiny toys.

A future blog post will (hopefully) talk about the 20 lines of scala code that hides the complexity of jdbc from the the code.

Saturday 23 January 2010

Scala futures

I've been working with scala recently, and I've hugely enjoyed the experience.  It's changing the way I think about writing code, without having to bin the confidence I have about tuning and scaling the JVM in production.  Nor do I have to get my head round a completely new set of libraries.  Yup I still get the joy of using java.util.Date if I really want to (or, when my forehead gets tired of banging against that wall, I can stick with using org.joda.time.DateTime).

I am no doubt hopelessly old fashioned, but personally my brain works well with statically typed compiled languages.  When done well, it helps me get (non-trivial) stuff done faster.  This is definitely a suitable-for-how-my-mind-works preference: many people I hugely respect get on much better with dynamic languages.  Suck 'em all and see.  Scala is static typing done well: its type inference means it just happens most of the time and only occasionally do I need to be explicit.

I've enjoyed scala so much programming in Java now seems so "yuk".

A simple example of what I love about scala: futures done well.  Want to kick a long running activity off in another thread?  Here's how:

import scala.actors.Futures._

object App {
  def main(args: Array[String]) = {
    println("About to do something...")

    future {
      println("Slow thing finished")

    println("I don't wait")

// output is:
// About to do something...
// I don't wait
// Slow thing finished

This isn't a language feature of course: it's just part of the library.

Want to do a whole bunch of things in parallel and wait for them all?

package scala_actor

import scala.actors.Future
import scala.actors.Futures._

object App {
  def main(args: Array[String]) = {

    var results = List[Future[Int]]()
    for (i <- 1 to 10) {
      println("Sending " + i + "...")
      val f = future {
        println("Processing " + i + "...")
        println("Processed " + i)
      println("Sent " + i)
      results = results ::: List(f)

    results.foreach(future =>
      println("result: " + future()))

//  output is: (on my machine)
//  Sending 1...
//  Processing 1...
//  Sent 1
//  Sending 2...
//  Processing 2...
//  Sent 2
//  Sending 3...
//  Processing 3...
//  Sent 3
//  Sending 4...
//  Processing 4...
//  Sent 4
//  Sending 5...
//  Processing 5...
//  Sent 5
//  Sending 6...
//  Processing 6...
//  Sent 6
//  Sending 7...
//  Processing 7...
//  Sent 7
//  Sending 8...
//  Processing 8...
//  Sent 8
//  Sending 9...
//  Processing 9...
//  Sent 9
//  Sending 10...
//  Processing 10...
//  Sent 10
//  Processed 1
//  result: 1
//  Processed 2
//  Processed 3
//  result: 2
//  result: 3
//  Processed 4
//  Processed 5
//  result: 4
//  Processed 6
//  result: 5
//  result: 6
//  Processed 7
//  Processed 8
//  result: 7
//  result: 8
//  Processed 9
//  Processed 10
//  result: 9
//  result: 10

I'm loving it.

Saturday 16 January 2010

Obligatory first post

So finally, a number of years after the horse has bolted and indeed as the popularity of blogging is very much on the wane, I finally start a blog.  Possibly better late than never, but lets see how things go before forming judgement on that.

This is a technical blog and I'll be trying to keep things focussed on code with little waffle.

For those that don't know me, I've been working developing software for many more years than I care to confess. I started commercial development with Microsoft Visual C++ 1.0 (20 floppy disk install iirc), went via VB and C#, and currently do most development in Java.  I plan to move to bash server pages next.  Maybe.

I currently work on the software behind, which is mostly java with spring/hibernate/velocity.  Anything I write here is of course my own opinions only and nothing to do with my employer.