Terse Systems

Building a Development Environment With Docker

| Comments

TL;DR

I’ve written a cheat sheet for Docker, and I have a github project for it. Here’s the thinking that went into why Docker, and how best to use it.

The problem

You want to build your own development environment from scratch, and you want it to be as close to a production environment as possible.

Solutions

Development environments usually just… evolve. There’s a bunch of tries at producing a consistent development environment, even between developers. Eventually, through trial and error, a common set of configuration files and install instructions turns into something that resembles a scaled down and testable version of the production environment, managed through version control and a set of bash scripts.

But even when it gets to that point, it’s not over, because modern environments can involve dozens of different components, all with their own configuration, often times communicating with each other through TCP/IP or even worse, talking to a third party API like S3. To replicate the production environment, these lines of communication must be drawn — but they can’t be squashed into one single machine. Something has to give.

Solution #1: Shared dev environment

The first solution is to set up a environment with exactly the same machines in the same way as production, only scaled down for development. Then, everyone uses it.

This works only if there is no conflict between developers, and resource use and contention is not a problem. Oh, and you don’t want to swap out one of those components for a particular team.

If you need to access the environment from outside the office, you’ll need a VPN. And if you’re on a flaky network or on a plane, you’re out of luck.

Solution #2: Virtual Machines

The second solution is to put as much of the environment as possible onto the developer’s laptop.

Virtual Machines such as VirtualBox will allow you to create an isolated dev environment. You can package VMs into boxes with Vagrant, and create fresh VMs from template as needed. They each have their own IP address, and you can get them to share filesystems.

However, VMs are not small. You can chew up gigabytes very easily providing the OS and packages for each VM, and those VMs do not share CPU or memory when running together. If you have a complex environment, you will run into a point where you either run out of disk space or memory, or you break down and start packaging multiple components inside a single VM, producing an environment which may not reflect production and is far more fragile and prone to complexities.

Solution #3: Docker

Docker solves the isolation problem. Docker provides (consistent, reproducible, disposable) containers that make components appear to be running on different machines, while sharing CPU and memory underneath, and provides TCP/IP forwarding and filesystems that can be shared between containers.

So, here’s how you build a development environment in Docker.

Docker Best Practices

Build from Dockerfile

The only sane way to put together a dev environment in Docker is to use raw Dockerfile and a private repository. Pull from the central docker registry only if you must, and keep everything local.

Chef recipes are slow

You might think to yourself, “self, I don’t feel like reinventing the wheel. Let’s just use chef recipes for everything.”

The problem is that creating new containers is something that you’ll do lots. Every time you create a container, seconds will count, and minutes will be totally unacceptable. It turns out that calling apt-get update is a great way to watch nothing happen for a while.

Use raw Dockerfile

Docker uses a versioned file system called AUFS, which identifies commands it can run from layers (aka cached fs) and pulls out the appropriate version. You want to keep the cache happy. You want to put all the mutable stuff at the very end of the Dockerfile, so you can leverage cache as much as possible. Chef recipes are a black box to Docker.

The way this breaks down is:

  1. Cache wins.
  2. Chef, ansible, etc, does not use cache.
  3. Raw Dockerfile uses cache.
  4. Raw Dockerfile wins.

There’s another way to leverage Docker, and that’s to use an image that doesn’t start off from ubuntu or basebox. You can use your own base image.

The Basics

Install a internal docker registry

Install an internal registry (the fast way) and run it as a daemon:

1
docker run -name internal_registry -d -p 5000:5000 samalba/docker-registry

Alias server to localhost:

1
echo "127.0.0.1      internal_registry" >> /etc/hosts

Check internal_registry exists and is running on port 5000:

1
2
apt-get install -y curl
curl --get --verbose http://internal_registry:5000/v1/_ping

Install Shipyard

Shipyard is a web application that provides an easy to use interface for seeing what Docker is doing.

Open up a port in your Vagrantfile:

1
config.vm.network :forwarded_port, :host => 8005, :guest => 8005

Install shipyard from the central index:

1
2
3
4
5
SHIPYARD=$(docker run \
    -name shipyard \
  -p 8005:8000 \
  -d \
  shipyard/shipyard)

You will also need to replace /etc/init/docker.conf with the following:

1
2
3
4
5
6
7
8
9
10
description "Docker daemon"

start on filesystem and started lxc-net
stop on runlevel [!2345]

respawn

script
        /usr/bin/docker -d -H tcp://0.0.0.0:4243 -H unix:///var/run/docker.sock
end script

THen reboot the VM.

Once the server has rebooted and you’ve waited for a bit, you should have shipyard up. The credentials are “shipyard/admin”.

  • Go to http://localhost:8005/hosts/ to see Shipyard’s hosts.
  • In the vagrant VM, ifconfig eth0 and look for “inet addr:10.0.2.15” — enter the IP address.

Create base image

  • Create a Dockerfile with initialization code such as `apt-get update / apt-get install’ etc: this is your base.
  • Build your base image, then push it to the internal registry with docker build -t internal_registry:5000/base .

Build from your base image

Build all of your other Dockerfile pull from “base” instead of ubuntu.

Keep playing around until you have your images working.

Push your images

Push all of your images into the internal registry.

Save off your registry

if you need to blow away your Vagrant or set someone else up, it’s much faster to do it with all the images still intact:

1
2
3
docker export internal_registry > internal_registry.tar
gzip internal_registry.tar
mv internal_registry.tar.gz /vagrant

Tips

  • docker add blows away the cache, don’t use it (bug, possibly fixed).
  • There’s a limit to the number of layers you can have, pack your apt-get onto a single line.
  • Keep common instructions at the top of the Dockerfile to leverage the cache as long as possible.
  • Use tags when building (Always pass the -t option to docker build).
  • Never map the public port in a Dockerfile.

Exposing Services

If you are running a bunch of services in Docker and want to expose them through Virtualbox to the host OS, you need to do something like this in your Vagrant:

1
2
3
(49000..49900).each do |port|
  config.vm.network :forwarded_port, :host => port, :guest => port
end

Let’s start up Redis:

1
2
docker pull johncosta/redis
docker run -p 6379 -d johncosta/redis

Then find the port:

1
2
docker ps
docker port <redis_container_id> 6379

Then connect to the 49xxx port that Virtualbox exposes.

Cleanup

1
docker ps -a | grep 'weeks ago' | awk '{print $1}' | xargs docker rm

eliminate:

1
docker rm `docker ps -a -q`

Running from an existing volume

1
docker run -i -t -volumes-from 5ad9f1d9d6dc mytag /bin/bash

Sources

Play in Practice

| Comments

I gave a talk on Play in Practice at the SF Scala meetup recently. Thanks to Stackmob for hosting us and providing pizza.

I went into describing how to implementing CQRS in Play, but there was a fairly long question and answer section about Play as well. I couldn’t go into detail on some of the answers and missed some others, so I’ll fill in the details here.

Video

Slides

Core API

The core API is Action, which take in a Request and return a Result. The Request is immutable, but you can wrap it with extra information, which you’ll typically do with action composition. 2.1.1 introduced EssentialAction, which uses (RequestHeader => Iteratee[Array[Byte], Result]) instead of Action’s (Request => Result) and makes building Filters easier.

Again, Play’s core is simple. About as simple as you can get.

Streaming

Streaming is handled by Iteratees, which can be a confusing topic for many people. There are good writeups here and here. lila is the best application to look at for streaming, especially for sockets and hubs.

Having good streaming primitives is something that I didn’t get into that much in the talk, but is still vitally important to “real time web” stuff.

Filters

If you want to do anything that you’d consider as part of a “servlet pipeline”, you use Filters, which are designed to work with streams.

An example of a good Filter is to automatically uncompress an asset — here’s an example that uses an Enumeratee:

1
2
3
4
5
6
7
8
9
10
11
class GunzipFilter extends EssentialFilter {
  def apply(next: EssentialAction) = new EssentialAction {
    def apply(request: RequestHeader) = {
      if (request.headers.get("Content-Encoding").exists(_ == "gzip")) {
        Gzip.gunzip() &>> next(request)
      } else {
        next(request)
      }
    }
  }
}

Note that this only does uncompression: Automatic streaming gzip compression of templates is not available “out of the box” in 2.1.2, but it should be available in Play 2.2.

Templating

Play comes packaged with its own template language, Twirl, but you’re not required to use it. There is an integration into Scalate that gives you Mustache, Jade, Scaml and SSP. There’s also an example project that shows how to integrate Play with Freemarker.

One thing that Play doesn’t address directly is how to set up a structure for page layouts. Play provides you with index.scala.html and main.scala.html, but doesn’t provide you with any more structure than that. If you set up a header and footer and allow for subdirectories to use their own templates, you can minimize the amount of confusion in the views.

There’s an example in RememberMe, and this is the approach that lila takes as well.

Another thing is that Play’s default project template is intentionally minimal. If you use Backbone and HTML5 templates, then a custom giter8 template like mprihoda/play-scala may suit you better.

JSON

Play’s JSON API is very well done, and is a great way to pass data around without getting into the weeds or having to resort to XML. It goes very well with case classes.

The documentation isn’t bad, but Pascal Voitot (the author of play-json) has a series of blog posts that go the extra mile: reading JSON with JsPath, writing JSON formats, transforming JSON, and even defining JSON macros.

Forms

Form handling is one of those things that is never intuitive for me. The documentation helps, but really if you want to know how to do validation, using the sample forms application is the best way to pick things up. There are many useful nuggets that aren’t explicitly discussed in the documentation. In particular, the ability to make custom constraints is extremely useful.

Routing

There’s only one routing API replacement that I know of, Play Navigator, a routing DSL for REST services. However, you can use custom data types in the routing table using QueryStringBindable and PathBindable, and save yourself some “string2foo” conversion.

Asynchronous Operation

Talking about Akka (and the other async code) in Play is tricky for a couple of reasons.

The first reason is that “async” involves a number of different concepts, all of which are complex and worthy of blog posts in themselves. Sadek Drobi gives a nice overview, and there’s an exhaustive mailing list discussion about the asynchronous code in Play works.

The second bit of trickiness is that Play 2.0 and Play 2.1 async features do not work in the quite the same way.

Play 2.0 uses Akka for almost everything internally.

Play 2.1 does not use Akka to handle incoming requests, or iteratees, or internal code. It uses scala.concurrent.Future instead with its own thread pools.

Play 2.1 also uses a default thread pool, which is Akka backed — ActorSystem("play") — and is used for the application code, i.e. the stuff inside Action.

This is important, because blog posts like James Ward’s Optimizing Play 2 for Database Driven Apps are only applicable to Play 2.0, not 2.1. For 2.1, use the thread pools documentation.

In addition to the “play” actor system, there’s a Play Akka plugin. The Akka plugin is actually packaged with Play itself, and you can find it under play.api.libs.concurrent.Akka.

So, if Play already uses Akka under the hood, then why define an Akka plugin?

I believe it’s because the Akka plugin defines a distinct ActorSystem("application") that can be used for backend tasks like sending email, and can be configured without impacting the “play” ActorSystem. The Akka plugin provides a useful default and enforces seperation between Play’s actors and the application’s actors.

CQRS

Given that most of the CQRS talks I’ve read have been from the enterprise perspective, it was nice to talk about CQRS in the context of functional programming and statelessness.

Message passing is something that is typically mentioned in inter process communication, or in message oriented middleware. Akka — a message passing architecture on the thread level — allows us to build “zero coupling” systems . As message passing patterns, CQRS and DDD are a good set of idioms to think about domain logic together, especially since they already assume eventual consistency and indeterminate time.

Authentication

If you’re using Scala, there are two good authentication options, RememberMe (ahem) and SecureSocial. SecureSocial has better documentation and has been around longer, but RememberMe has better security resistance to some attacks. I’m working to integrate RememberMe’s functionality into SecureSocial, but you’ll want to check out both of them.

There’s also a pure Java authentication option: Play Authenticate. I haven’t used this, but the code looks reasonable.

If you’d rather go it alone or need a basic starter application, you may find Play20StartApp useful (password reset, account confirmation, etc.)

Authorization

Deadbolt 2 is the best known authorization framework. You can use things like Shiro, but you’re better off with something specifically designed for Play.

Security

Play does fairly well on security compared to other frameworks. For example, it will set a CORS header to protect against clickjacking, will sign the session cookie with an HMAC to protect against broken authentication, supports SSL, etc.

However, there are some things that Play doesn’t do.

Play doesn’t encrypt the session cookie, so you shouldn’t store any sensitive information in there.

Play won’t protect you from replay attacks, as Play is stateless by default. You can specify a nonce or request counter to counteract this, and RememberMe uses a token based approach for persistent login cookies.

Play won’t protect you against injection attacks. You can specify value classes to validate your input against raw strings.

Play won’t protect you against security misconfiguration. You should have a release checklist.

Play won’t protect you from insecure cryptography practices. Education helps, but there’s a lot of misinformation out there as well; watch this video (and slides) and be wary of things you read on Stack Overflow and Hacker News.

Play won’t protect you from failure to restrict URL access; that’s up to the authorization framework.

Play does have cross site request forgery protection, but it will only be effective if you enable the filter and explicitly pass the CSRF helper function in through every single form. There is an authenticity token approach as well, though I haven’t used it.

Most importantly, Play won’t tell you about how web application security fails. I recommend The Tangled Web as an excellent overview on how web applications are stitched together out of different technologies, and how to secure them.

Logging

The underlying logger for Play is Logback. Logback is one of the few hardcoded dependencies in Play, which has caused some issues. Fortunately, Play uses Logback through the SLF4J logging API, but there’s no option built into Play to allow Logback to be swapped out easily. There are reports of people swapping out Logback for other logging frameworks, but I haven’t tried them.

There have also been issues with the logging configuration conflicting in places or being unclear. One thing that has tripped people up repeatedly is that all the logging configuration must be done in one place. You can’t have some logging configuration in application.conf and some configuration in logger.xml.

While Play uses SLF4J under the hood, it doesn’t expose SLF4J functionality in play.api.Logger. In fact, there are only two method signatures for logging:

1
2
  def error(message: => String) : Unit
  def error(message: => String, error: => Throwable) : Unit

This doesn’t really cover the way I like to log, and it doesn’t provide even the features that are available in SLF4J, such as parameterized logging. My own answer was to ignore the Play logging API entirely and write a Logging wrapper directly against SLF4J (with kestrel combinators, natch), but you may want to use something out of the box.

For example, Typesafe Logging, uses SLF4J and provides you with this:

1
2
3
4
5
6
  def error(message: String): Unit
  def error(message: String, params: AnyRef*): Unit
  def error(message: String, t: Throwable): Unit
  def error(marker: Marker, message: String): Unit
  def error(marker: Marker, message: String, params: AnyRef*): Unit
  def error(marker: Marker, message: String, t: Throwable): Unit

Or you can use loglady, which uses the Python API style with printf syntax:

1
2
  def error(message: String,  params: Any*) : Unit
  def error(thrown: Throwable, message: String,  params: Any*) : Unit

WAR packaging

I said in the Q&A that I didn’t think you could package Play 2 applications as WAR files. Well, it turns out that there is a plugin available, and it works with Servlet 3.0 and 2.5 containers (Tomcat 6/7, Jetty 7/8/9, JBoss 5/6/7, etc). You may need to tweak the logger to work in the container correctly.

I don’t know how Play’s performance is affected by running inside a servlet container; let me know if it works for you.

Asset Packaging

Javascript assets in Play are minified using Google Closure — this happens automatically on play dist. They also can be gzipped using a custom SBT script.

This makes a good enough solution for most people. If you are really intent on minimizing your asset overhead, you should consider putting your assets on a static file server backed by HAProxy, or putting them on CDN.

Email

Email is one of those things that I think should be divorced as much as possible from Play. It’s backend and async by nature, and this makes it something that is best handled through Akka.

akka-email is available on Github and gives you a starting place to build up a message passing infrastructure for email.

Metrics

Instrumenting applications is important. Sadly, every metrics solution has its own API, so you can’t easily switch between them. However, there’s no shortage of options.

  • New Relic recently came out with support for Play 2.
  • Ostrich, the Twitter metrics library.
  • Metrics, with the metrics-scala from Erik Van Oosten, cross-compiled for multiple versions. This is what I use.
  • Pillage, which has a Scala option (I have not tried this).
  • statsd module for Play 2.

The Typesafe Console is the best monitoring tool to use if you are using Akka, but that depends on having a Typesafe subscription if you want to use it in production.

Load and Stress Testing

Determining a load plan is hard, and involves some amount of educated guessing. Fortunately, most applications simply don’t get that much load, even ones you’d think would be busy.

Gatling and wrk are good ways of stressing a system, but they don’t reflect normal user behavior. Apache JMeter is very good at modelling random user behavior, but is clunky. A good and arguably the most realistic load test is to hire a couple of hundred users from Mechanical Turk to pound on the site at once, but this may not be very convenient.

Deployment

There are a number of different ways to deploy Play projects. Using play dist gets you most of the way, but you may want to deploy with Ansible or Chef or Fabric. Or you can use upstart or even git hooks.

If you just want to push changes to a staging server as they happen, you can do this with rsync -avz --delete -e ssh $deployed_code staging:/opt/play-app, although this isn’t so great for production.

Java Support

The Java and Scala APIs are very similar. However, there are a couple of notable differences, which come out of Java’s lack of closure support:

  • The Java API does not support Iteratees.
  • The Java API does not have an implicit execution context.

The play.libs.F library goes a fair way to providing Scala’s functional programming constructs in Java.

More?

If you have suggestions or want to point something out, please email me at will.sargent@gmail.com, and I’ll fill out this post with more details.

Error Handling in Scala

| Comments

The previous post was mostly about programming “in the small” where the primary concern is making sure the body of code in the method does what it’s supposed to and doesn’t do anything else. This blog post is about what to do when code doesn’t work — how Scala signals failure and how to recover from it, based on some insightful discussions.

First, let’s define what we mean by failure.

  • Unexpected internal failure: the operation fails as the result of an unfulfilled expectation, such as a null pointer reference, violated assertions, or simply bad state.
  • Expected internal failure: the operation fails deliberately as a result of internal state, i.e. a blacklist or circuit breaker.
  • Expected external failure: the operation fails because it is told to process some raw input, and will fail if the raw input cannot be processed.
  • Unexpected external failure: the operation fails because a resource that the system depends on is not there: there’s a loose file handle, the database connection fails, or the network is down.

Java has one explicit construct for handling failure: Exception. There’s some difference of usage in Java throughout the years — IO and JDBC use checked exceptions throughout, while other API like org.w3c.dom rely on unchecked exceptions. According to Clean Code, the best practice is to use unchecked exceptions in preference to checked exceptions, but there’s still debate over whether unchecked exceptions are always appropriate.

Exceptions

Scala makes “checked vs unchecked” very simple: it doesn’t have checked exceptions. All exceptions are unchecked in Scala, even SQLException and IOException.

The way you catch an exception in Scala is by defining a PartialFunction on it:

1
2
3
4
5
6
7
8
9
10
11
12
val input = new BufferedReader(new FileReader(file))
try {
  try {
    for (line <- Iterator.continually(input.readLine()).takeWhile(_ != null)) {
      Console.println(line)
    }
  } finally {
    input.close()
  }
} catch {
  case e:IOException => errorHandler(e)
}

Or you can use control.Exception, which provides some interesting building blocks. The docs say “focuses on composing exception handlers”, which means that this set of classes supplies most of the logic you would put into a catch or finally block.

1
2
3
Exception.handling(classOf[RuntimeException], classOf[IOException]) by println apply {
  throw new IOException("foo")
}

Using the control.Exception methods is fun and you can string together exception handling logic to create automatic resource management, or an automated exception logger. On the other hand, it’s full of sharp things like allCatch. Leave it alone unless you really need it.

Another important caveat is to make sure that you are catching the exceptions that you think you’re catching. A common mistake (mentioned in Effective Scala) is to use a default case in the partial function:

1
2
3
4
5
try {
  operation()
} catch {
  case _ => errorHandler(e)
}

This will catch absolutely everything, including OutOfMemoryError and other errors that would normally terminate the JVM.

If you want to catch “everything” that would normally happen, then use NonFatal:

1
2
3
4
5
6
7
import scala.util.control.NonFatal

try {
  operation()
} catch {
  case NonFatal(exc) => errorHandler(e)
}

Exceptions don’t get mentioned very much in Scala, but they’re still the bedrock for dealing with unexpected failure. For unexpected internal failure, there’s a set of assertion methods called require, assert, and assume, which all use throwables under the hood.

Option

Option represents optional values, returning an instance of Some(A) if A exists, or None if it does not. It’s ubiquitous in Scala code, to the point where it fades into invisibility. The cheat sheet is the best way to get a handle on it.

It’s almost impossible to use Option incorrectly, but there is one caveat: Some(null) is valid. If you have code that returns null, wrap it in Option() to convert it:

1
val optionResult = Option(null) // optionResult is None.

Either

Either is a disjoint union construct. It returns either an instance of Left[L] or an instance of Right[R]. It’s commonly used for error handling, where by convention Left is used to represent failure and Right is used to represent success. It’s perfect for dealing with expected external failures such as parsing or validation.

1
2
3
4
5
6
7
8
9
10
case class FailResult(reason:String)

def parse(input:String) : Either[FailResult, String] = {
  val r = new StringTokenizer(input)
  if (r.countTokens() == 1) {
    Right(r.nextToken())
  } else {
    Left(FailResult("Could not parse string: " + input))
  }
}

Either is like Option in that it makes an abstract idea explicit by introducing an intermediate object. Unlike Option, it does not have a flatMap method, so you can’t use it in for comprehensions — not safely at any rate. You can use a left or right projection if you’re not interested in handling failure:

1
val rightFoo = for (outputFoo <- parse(input).right) yield outputFoo

More typically, you’ll use fold:

1
2
3
4
parse(input).fold(
  error => errorHandler(error),
  success => { ... }
)

You’re not limited to using Either for parsing or validation, of course. You can use it for CQRS.

1
2
3
4
case class UserFault
case class UserCreatedEvent

def createUser(user:User) : Either[UserFault, UserCreatedEvent]

or arbitary binary choices:

1
def whatShape(shape:Shape) : Either[Square, Circle]

Either is powerful, but it’s trickier than Option. In particular, it can lead to deeply nested code. It can also be misunderstood. Take the following Java lookup method:

1
public Foo lookup(String id) throws FooException // throw if not found or db exception

Scala has Option, so we can use that. But what if the database goes down? Using the error reporting convention of Either might suggest the following:

1
def lookup() : Either[FooException,Option[Foo]]

But this is awkward. If you return Either because something might fail unexpectedly, then immediately half your API becomes littered with Either[Throwable, T].

Ah, but what if you’re modifying a new object?

1
def modify(inputFoo:Foo) : Either[FooException,Foo]

If you’re dealing with expected failure and there’s good odds that the operation will fail, then returning Either is fine: create a case class representing failure FailResult and use Either[FailResult,Foo].

Don’t return exceptions through Either. If you want a construct to return exceptions, use Try.

Try

Try is similar to Either, but instead of returning any class in a Left or Right wrapper, it returns Failure[Throwable] or Success[T]. It’s an analogue for the try-catch block: it replaces try-catch’s stack based error handling with heap based error handling. Instead of having an exception thrown and having to deal with it immediately in the same thread, it disconnects the error handling and recovery.

Try can be used in for comprehensions: unlike Either, it implements flatMap. This means you can do the following:

1
2
3
4
5
6
val sumTry = for {
  int1 <- Try(Integer.parseInt("1"))
  int2 <- Try(Integer.parseInt("2"))
} yield {
  int1 + int2
}

and if there’s an exception returned from the first Try, then the for comprehension will terminate early and return the Failure.

You can get access to the exception through pattern matching:

1
2
3
4
5
6
7
8
sumTry match {
  case Failure(thrown) => {
    Console.println("Failure: " + thrown)
  }
  case Success(s) => {
    Console.println(s)
  }
}

Or through failed:

1
2
3
if (sumTry.isFailure) {
  val thrown = sumTry.failed.get
}

Try will let you recover from exceptions at any point in the chain, so you can defer recovery to the end:

1
2
3
4
5
6
7
8
val sum = for {
  int1 <- Try(Integer.parseInt("one"))
  int2 <- Try(Integer.parseInt("two"))
} yield {
  int1 + int2
} recover {
  case e => 0
}

Or recover in the middle:

1
2
3
4
5
6
val sum = for {
  int1 <- Try(Integer.parseInt("one")).recover { case e => 0 }
  int2 <- Try(Integer.parseInt("two"))
} yield {
  int1 + int2
}

There’s also a recoverWith method that will let you swap out a Failure:

1
2
3
4
5
6
7
8
val sum = for {
  int1 <- Try(Integer.parseInt("one")).recoverWith {
    case e: NumberFormatException => Failure(new IllegalArgumentException("Try 1 next time"))
  }
  int2 <- Try(Integer.parseInt("2"))
} yield {
  int1 + int2
}

You can mix Either and Try together to coerce methods that throw exceptions internally:

1
2
val either : Either[String, Int] = Try(Integer.parseInt("1")).transform({ i => Success(Right(i)) }, { e => Success(Left("FAIL")) }).get
Console.println("either is " + either.fold(l => l, r => r))

Try isn’t always appropriate. If we go back to the first exception example, this is the Try analogue:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
val input = new BufferedReader(new FileReader(file))
val results = Seq(
  Try {
    for (line <- Iterator.continually(input.readLine()).takeWhile(_ != null)) {
      Console.println(line)
    }
  },
  Try(input.close())
)

results.foreach { result =>
  result.recover {
    case e:IOException => errorHandler(e)
  }
}

Note the kludge to get around the lack of a finally block to close the stream. Victor Klang and Som Snytt suggested using a value class and transform to enhance Try:

1
2
3
4
5
6
7
8
implicit class TryOps[T](val t: Try[T]) extends AnyVal {
  def eventually[Ignore](effect: => Ignore): Try[T] = {
    val ignoring = (_: Any) => { effect; t }
    t transform (ignoring, ignoring)
  }
}

Try(1 / 0).map(_ + 1) eventually { println("Oppa Gangnam Style") }

Which is cleaner, at the cost of some magic.

Try was originally invented at Twitter to solve a specific problem: when using Future, the exception may be thrown on a different thread than the caller, and so can’t be returned through the stack. By returning an exception instead of throwing it, the system is able to reify the bottom type and let it cross thread boundaries to the calling context.

Try is new enough that people are still getting comfortable with it. I think that it’s a useful addition when try-catch blocks aren’t flexible enough, but it does have a snag: returning Try in a public API means exceptions must be dealt with by the caller. Using Try also implies to the caller that the method has captured all non fatal exceptions itself. If you’re doing this in your trait:

1
def modify(foo:Foo) : Try[Foo]

Then Try should be at the top to ensure exception capture:

1
2
3
def modify(foo:Foo) : Try[Foo] = Try {
  Foo()
}

Because exceptions must be dealt with the caller, you are placing more trust in the caller to handle or delegate a failure appropriately. With try-catch blocks, doing nothing means that the exception can pass up the stack to a top level exception handler. With Try, exceptions must be either returned or handled by each method in the chain, just like checked exceptions.

To pass the exception along, use map:

1
2
3
4
5
def fooToString(foo:Foo) : Try[String] = {
  modify(foo).map { outFoo =>
   outFoo.toString()
  }
}

Or to rethrow the exception up the stack if the return type is Unit:

1
2
3
def doStuff : Unit = {
  val modifiedFoo = modify(foo).get // throws the exception if failure
}

And you want to avoid this:

1
2
3
4
5
6
7
8
modify(foo) match {
  case Failure(f) => {
    // database failure?  don't care, swallow exception.
  }
  case Success(s) => {
    ...
  }
}

If you have a system that needs specific error logging or error recovery, it’s probably safer to stick to unchecked exceptions.

TL;DR

  • Throw Exception to signal unexpected failure in purely functional code.
  • Use Option to return optional values.
  • Use Option(possiblyNull) to avoid instances of Some(null).
  • Use Either to report expected failure.
  • Use Try rather than Either to return exceptions.
  • Use Try rather than a catch block for handling unexpected failure.
  • Use Try when working with Future.
  • Exposing Try in a public API has a similiar effect as a checked exception. Consider using exceptions instead.

Problems Scala Fixes

| Comments

When I tell people I write code in Scala, a typical question is well, why? When it comes to writing code, most of my work is straightforward: SQL database on the backend, some architectural glue, CRUD, some exception handling, transactions handlers and an HTML or JSON front end. The tools have changed, but the problems are usually the same: you could get a website up in 5 minutes with Rails or Dropwizard. So why pick Scala?

It’s a tough question to answer off the bat. If I point to the language features, it doesn’t get the experience across. It’s like explaining why I like English by reading from a grammar book. I don’t like Scala because of its functional aspects or its higher kinded type system. I like Scala because it solves practical, real world problems for me.

You can think of Scala as Java with all the rough edges filed off, with new features that make it easier to write correct code and harder to create bugs. Scala is not a purist’s language — it goes out of its way to make it easy for Java programmers to dip their toes in the pool. You can literally take your Java code and hit a key to create working Scala code.

So what problems does Scala solve?

Let’s start with the single biggest problem in programming, the design flaw that’s caused more errors than anything else combined. Null references.

Solving for Null

Scala avoids null pointer references by providing a special type called Option. Methods that return Option[A] (where A is the type that you want, i.e. Option[String]) will give you an object that is either a wrapper object called ‘Some’ around your type, or None. There are a number of different ways you can use Option, but I’ll just mention the ones I use most. You can chain Options together in Scala using for comprehensions:

1
2
3
4
  for {
     foo <- request.params('foo')
     bar <- request.params('bar')
  } yield myService.process(foo, bar)

or through a map:

1
  request.params('foo').map { foo => logger.debug(foo) }

or through pattern matching.

1
2
3
4
  request.params('foo') match {
    case Some(foo) => { logger.debug(foo) }
    case None => { logger.debug('no foo :-(') }
  }

Not only is this easy, but it’s also safer. You can flirt with NPE saying myOption.get, but if you do that, you deserve what you get. Not having to deal with NPE is a pleasure.

Right Type in the Right Place

What’s the second biggest problem in programming? It’s a huge issue in security and in proving program correctness: invalid, unchecked input.

Take the humble String. The work of manipulating strings is one of the biggest hairballs in programming — they’re pulled in from the environment or embedded in the code itself, and then programs try to figure out how best to deal with them. In one case, a string is displayed to the user and it’s done. In another case, an SQL query is embedded as a query parameter on a web page and passed straight through to the database. To the compiler, they’re just strings and there is no difference between them. But there are some types of strings that are suitable to pass to databases, and some which are not. Ideally, we’d like to tell the compiler that SQL and query parameters have different types. Scala makes this easy.

With the Shapeless library, you can add distinguishing type information to objects and ensure that you can’t pass random input in:

1
2
3
import shapeless.TypeOperators._
type SqlString = Newtype[String, Any]
val x: SqlString = newtype("SELECT * FROM USER")

I’ve called out strings because it’s a good example, but you can also do this for repository IDs. No more this:

1
2
  case class User(id: Int, firstName:String)
  def lookup(id:Int) : User

When you can have this:

1
2
  case class User(id: Id[User], firstName:String)
  def lookup(id:Id[User]) : User

You can also use this to validate input on the front end. One of the big problems with regular expressions is that when you parse a random string for certain kinds of input, you get back… more strings. You may be validating a string as a username (no spaces, no odd characters), but what you’ve got at the end is a string that says it’s a username.

1
2
3
4
val rawInput = request.params('foo')
if (isUsername(rawInput)) {
  val username = rawInput
}

You can replace that with something nicer.

1
   val email : Option[Username] = parseUsername(rawInput)

This embeds the constraint in the type itself. You can design your API to accept Username instead of String, and so enforce a kind of whitelisting.

Can you do this in Java? Yes, but it’s inconvenient. Scala’s type system makes it easy for you, and in 2.10 there will be Value Classes, which will provide this functionality in the core language itself.

Doing the gruntwork for you

The previous example can be improved though. Really, we just want a Username at the end — we don’t want to have to call parseUsername on it. Fortunately, Scala rewards the lazy with implicit conversions. If you define a method like this and use the implicit keyword:

1
  implicit def string2username(input : String) : Option[Username] = parseUsername(input)

And do this:

1
  val email : Option[Username] = rawInput;

Then the compiler is smart enough to see that a String isn’t an Option[Username], and looks through any implicit methods available to do the conversion.

There is an element of ‘magic’ to implicit conversions, especially when you’re reading someone else’s code and trying to figure out where the conversion is happening. You can find the appropriate implicit through the REPL, or through IDEA.

Providing Context

There are many cases in programming where everything depends on a Context object in some way: either you’re using a database connection, or you rely on a security principal, or you’re resolving objects from a request or JAAS / LDAP / Spring context… the list goes on. Whatever it is, it’s passed in by the system, it’s absolutely essential, and you can count on most of your API to depend on it in some way. A typical Java way to deal with this is to make it part of the parameter list, or try to ignore it and make it a ThreadLocal object.

1
   public void doStuff(Context context);

Scala has a better way to deal with this: you can specify implicit parameters on a method.

1
   def doStuff(implicit context:Context)

which means that anything marked as implicit that is in scope will be applied:

1
2
   implicit val context = new Context()
   doStuff  // uses val context automatically.

This is all handled by the compiler: just set up the implicits and Scala will do the rest.

A place for everything

So now you have a number of implicit methods, value classes and type definitions and wotnot. In Scala, there’s a place to keep all this stuff that is so intuitive, you may not think of it as a place at all. It’s the package object.

Package objects are supremely useful. You define a file called package.scala, then in the file you put

1
2
3
package object mypackagename {
  implicit def string2username(input : String) : Option[Username] = parseUsername(input)
}

and after that point, anything with ‘import mypackagename._’ will import the package object as well. One less thing to think about.

Free Data Transfer Objects

Case classes. So called because they’re used in case statements (see below).

1
case class Data(propertyOne:String, propertyTwo:Int)

Immutable, convenient, and packed with functionality. They make creating data types or DTOs trivial. They’re cool.

Free Range (Organic) Checking

Scala contains a powerful pattern matching feature. You can think of it as a switch statement on steroids.

1
2
3
4
a match {
   case Something => doThis
   case SomethingElse => doThat
}

There are so many things that feed into pattern matching — extractor objects, aliases, matching on types, regular expressions and wildcards — it’s the ‘regexp’ of Scala. It takes in an object as input, filters it, and manipulates it in exactly the way you want.

But the thing I really like about pattern matching is what it doesn’t let you do. It doesn’t let you miss something.

There’s a feature called sealed classes which lets you define all the valid types in a file. If you define a trait with the sealed keyword inside a file, then any classes you define inside that file that extend that trait are the ONLY classes that will extend that trait.

1
2
3
sealed trait Message { def msg: String }
case class Success(msg:String) extends Message
case class Failure(msg:String) extends Message

The compiler knows this, and so when you write use pattern matching against that trait, it knows that it must be one of the case classes defined. If not all of the case classes are defined in the match, it will print out a warning method saying that you don’t have an exhaustive match.

1
2
3
4
def log(msg: Message) = msg match {
  case Success(str) => println("Success: " + str)
  case Failure(str) => println("Failure: " + str)
}

And More

But that’s enough for now. I hope this gives you an idea of why I like Scala. If you have any features dear to your heart, add them to the comments and let me know what makes you happy.

Remember Me Cookies for Play 2.0

| Comments

I’ve been working with Play 2.0 for a while now, and in many ways it’s the ideal web framework for me: it’s a light framework that gets a request, puts together a result (either at once or in chunks using an iteree pattern), and provides some HTML templates and form processors for ease of use. It lets you change code and templates while the server is running, and gives you an asset pipeline for compressing LESS and Coffeescript into minified CSS and Javascript out of the box.

That being said, it’s a new web framework, and the biggest issue right now is all the boring infrastructure that goes on top of it to make a framework deal with authentication, authorization, and even boring things like resetting a password.

On the Java side, Yvonnick Esnault has a good starter application (disclaimer; I contributed some code), or you can use Play Authenticate.

On the Scala side, play20-auth is a good starting point for an authentication system. However, it didn’t do token based authentication, aka “Remember Me” cookies. Adding this feature turns out to be tricky if you’re new to Scala, because extending the request pipeline in Play 2.0 Scala requires that you know a functional style of programming called “action composition”.

So here’s a boilerplate project play20-rememberme that does authentication with remember me functionality (although it doesn’t have the password reset or confirm features added to Play20StartApp).

UPDATE: Now works with Play 2.1.