No, the OTHER J. Chen.: scala

Showing posts with label scala. Show all posts

Tuesday, November 26, 2013

Unicast enumerators let you debug in production.

Have you ever had to debug something broken in production? Have you ever had to grep through mountains of logs to find a stack trace? What if I told you you could get back log data in your API call response? But wait there's more! What if you could run jobs via an API endpoint, allowing them to be scheduled from other services?

The ghost of Billy Mays here, and before I'm off to haunt that dick bag sham wow guy, I just wanted to tell you about this amazing opportunity to use Play Framework's unicast enumerators.

It's no secret that Klout fucks with iteratees. We use them all over the place: our collectors, our targeting framework, our HBase and ElasticSearch libraries. Recently, we discovered one weird trick to debug code in production using this framework. Bugs hate us!

Iteratees

If you don't know what an enumerator or iteratee is, usher yourself to the Play website to have a 20 megaton knowledge bomb dropped on your dome piece. If you're too lazy, let me give you a mediocre 10,000 foot overview: enumerators push out inputs, enumeratees modify those inputs, iteratees are a sink to receive those inputs.

For example, in the following code snippet:

List(1,2,3).map(_ + 1).foldLeft(0)(_ + _)

The list is the enumerator, the mapping is the enumeratee, and the folding is the iteratee. Of course, everything is done piece-meal and asynchronously. The 1 gets pushed to _ + 1 enumeratee, which then pushes a 2 to the fold iteratee, which dumps it in a sink.

Play Framework

Interestingly enough, Play's HTTP handling is actually done by iteratees. Your controller actions return byte array enumerators, which get consumed by various iteratees. Everything else is just a bunch of sugar on top of this low-level interface.

Also in Play 2.1, there is a unicast enumerator, which allows you to imperatively push inputs via a channel. Yeah, yeah we all know that imperative code is uglier than Steve Buscemi and Paul Scheer's unholy love child, but pushing inputs allows you to do some interesting things. If you haven't solved the fucking mystery yet, and are still as confused as you were by the ending to Prometheus, let me give you a giant fucking hint: you can push debug statements via the channel, which get spat out by the enumerator. This enumerator then gets consumed by a Play's iteratee.

So here's my implementation of this pattern:

The call starts with a controller wrapper, intended to be used like this:

The event stream then gets passed down to the service method as an implicit.

When you curl the endpoint, if you use a special header, the log pushes will be returned as a chunked http response AS THEY ARE PUSHED.

curl -H 'X-Klout-Use-Stream:debug' 'http://rip.billy.com/api/blah.json'



[info] [2013-11-26 09:57:50.931] Fetching something from HBase

[info] [2013-11-26 09:57:51.932] HBase returned in 1s. Durrgh.

[complete] [2013-11-26 23:25:32.499] {"billymays" : "badass"}

200

Here's the wrapper code itself. The implementation of the EventStream case class itself is left as an exercise for the reader.

Basically what's going on here is I look through the headers for my special header and see if it maps to a predefined event level. If it doesn't, I do the usual shit. If it does, I call Concurrent.unicast. This factory takes in a function that uses a channel of type T and returns an enumerator of type T. I take this channel and put it into an EventStream case class. This case class gets passed down to the business logic, which typically uses it as an implicit.

Word.

Application

In addition to being able to stream logs by request in production, you can by extension stream logs from a long running request and call it a job. This means you can use external services like Jenkins to run an API endpoint like a job.

Yay! Also, Deltron:

Saturday, June 15, 2013

Protip: layer cake

This post is about Scala's cake pattern. If you don't know what it is, educate yourself here. To tldr; it, the cake pattern is the canonical way of doing type safe dependency injection in Scala.

An unfortunate problem with the cake pattern is registry bloat. The Klout API (surprise surprise) happens to be a pretty large application. Our registry got to be over 100 lines long. It's not a huge deal, but it's ugly as fuck. And if there's one thing we love at Klout more than cursing, it's damn hell ass beautiful code.

That is not beautiful.

To shrink down the registry and just organize our code better in general, we've started using what we're tentatively calling the layer cake pattern. And much like Daniel Craig's character, we're all in it to be good middlemen.

Instead of directly adding components to the application cake, we build a sub-cake of related components. For example, our perks system is composed of 6 components that talk to each other. We then add this sub-cake (called a feature) to the application cake. External dependencies are handled as self-types on the subcake. This is an extra step, but I would argue that this is compensated by the time you save when you add components to a feature. You don't have to add those components to the application cakes since they already include the feature.

We haven't completely switched over to this pattern yet (as we are also adherents of suffering oriented programming,) but are definitely feeling the code organization benefits.

Monday, April 2, 2012

Routes lookup in Play Framework 2.0

Apologies for the gap in blog posts. I got sick after getting back from Ultra and crypto-class has been beating me about the face and neck.

This will be another post about getting around some Play 2.0 jankiness, namely how you can't get controller and action name from the request object any more.

This solution is going to make you cringe. It might even make you vomit. I've looked around the framework a decent amount and I just don't see a better way of doing this.

Ready?

Yes, the solution is to populate your own list of compiled routes and test every single one until you get the right route.

Yeah you read that right.

There is no runtime lookup because the routes file is compiled to Scala code and then to byte code. There is no way to look up anything. The one piece of information you can get from request about where you came from is the URI, but since URIs map to routes dynamically (for example you can specific /pics/wangs/{wangid},) you still have to test each route.

If you haven't closed your browser in disgust yet, I'll show you how this is done.

Populating the list

Notice how I got "routes" as a resource stream instead of just opening the "conf/routes" file. You have to be careful here since when you are deploying with "play dist," there is no conf folder.

This code should be pretty straightforward. I use the RouteFileParser in RoutesCompiler to extract some information I care about from the routes file. Then I use that to generate a list of routesFuncs, which can be "unapplied" to a request.

Doing lookups

Here we run through the entire list, unapplying each Route to the request you're testing. You can of course use foreach/yield to stop when you get to the first matching route. This is left as an exercise to the reader.

I used the reverse of this method to fix routes parsing in swagger-play2. You can see that code here.

While the reverse lookup solution is decently fast since we can just use maps, the forward lookup solution is quite inefficient. The cost is low enough (2ms for a lookup on 100 routes) that we don't care too much, especially since we're doing hbase lookups and all kinds of IO bound crap for our API.

If you do find a better solution, please let me know. The performance isn't an issue, but the fact that this algorithmic atrocity is out there just makes my blood boil.

Bonus level!

Oh you're still here. Well, let me drop another knowledge bomb on your dome that may prove useful.

At Klout, we have a library for dealing with Play 2.0, also written as a Play 2.0 application. The problem is that when an application depends on this library, the application's routes and application.conf files are overridden by the library's. We have a sample app in the library for testing and it's annoying to constantly move conf/routes to conf/routes2 before publishing.

So instead of getting mad, I choked myself out with a Glad bag. Then I threw this in my Build.scala:

The application.conf part should be straight forward. The other stuff... maybe not. But as I mentioned before, conf/routes is compiled into a whole slew of class files for performance reasons. You can't just exclude conf/routes, you have to exclude all those class files, which is why I have those disgusting looking regexes.

Don't worry your pretty little head about it though. Just throw this in the PlayProject().Settings section of the Build.scala file of your Play 2.0 library.

Enjoy!

Wednesday, March 21, 2012

Framework ids in Play 2.0

The problem

Pretend you're a sysops guy (for some of you, this will not require much imagination.) You have an application that needs to be deployed on multiple environments (dev, stage, qa, prod, etc.,) each with different component dependencies. prod might use the sql1.prod.root mysql server while dev would use sql-dev.stage.root.

config.xml: The technical term is "cluster-fuck"

Back in the day, you would just partition these configs into different files. You would have a site.conf.prod file, a site.conf.dev file, etc. and would rename these to site.conf as needed. As you can imagine, even with deployment automation, this really, really sucks as you can have two environments that might need to share a large part of their configuration. While SQL server configuration can be exclusive between the environments, you have things like pool sizes and connection timeouts that should be roughly the same across many environments.

Play 1.2 had framework ids, which were designed to solve this problem. You could pass them in from the command line like play run --%prod or you could specify them directly using play id. When specifying --%prod, all keys prefixed with %prod would override the base keys. For example: %prod.http.port will override http.port. This allows you to partition your environments however you like, while always having a fallback. Suppose you always want to have a connection timeout of 200ms. Well now you can, without needing to specify this individually for prod,qa, etc.

Unfortunately, Play 2.0 did away with framework ids and there does not appear to be any replacement in sight.

The solution

Thankfully, Play 2.0 allows you to create your own configuration variable by overriding the Global object. If you override the configuration method of the Global object (an undocumented feature,) you can pass back any configuration you want at load time. Unfortunately, you don't get access to the default implementation of the method configuration or any configurations that it would have created. You have to re-build the whole thing yourself. To do this, I used the same ConfigFactory library used by Play's default config parsing method.

When you use this library to parse "application.conf", it also inserts in a few useful parameters. The one we care about is "sun.java.command," which is the full command that was used to launch your app. This includes any environment parameters you might have sent in. Unfortunately, this does not work with "play start". Play start forks a Netty server from your sbt launch process so sun.java.command just shows the Netty command, with no command line args. I got around this by creating an application.tag config variable that the app can fall back on in the absence of command line args. At Klout, we use Capistrano to deploy our Apis to both staging and production environments. This allows me to inject an "application.tag=%stage" or "application.tag=%prod" line into application.conf during deployment so that deployment is still a one step process. It's a bit janky, but it works well.

We're in the process of switching to running Play 2.0 distributions created with "play dist." This gives us back sun.java.command since you can start the Netty server straight from the shell and not from sbt.

Be vewy vewy careful

I found out that some config variables, notably the akka ones, are read directly from the application.conf file. If you try to do a %prod.play.akka._____, it will do nothing. There is a pull request from my colleague David Ross to fix that.