Tuesday, February 02, 2016

Using inject to remove mutated state in Ruby

This article has a great pattern to allow filtering of ActiveRecord results based on query parameters, but it also has a snippet of code that really bugs me:

The thing that bugs is using a .each block to mutate some state outside the block and return it. The power of blocks and enumerated methods like .each is one of Ruby's strong points, but it also gets abused. In this case, there is no need to mutate the results object and the code can be condensed and cleaned up as follows:

Now we don't have any state being mutated. In a function like this the risk is minimal and it might be a moot point, but I think it's a good habit to get into. Whenever you reach for .each, it's worth thinking: could I use .map or .reduce / .inject instead and this way avoid mutating state.

Rant over.

Sunday, January 31, 2016

Rails, React and ES6

I'm currently working on a side project. I wanted this to be something I can make solid progress on but also learn things so I chose a mix of familiar and unfamiliar tech. The familiar is Rails for the backend and the new is React for the frontend.

However, getting Rails and React to play nicely isn't that easy. Especially when you drop in ES6, babel and browserify. I'm going to come back to all of this, but right now I'm going to talk about one particular issue that drove me nuts and for which Google and Stack Overflow were largely silent.

TL;DR if you get the following error when using a combination of React, Rails and ES6, then read on.

Warning: React.createElement: type should not be null, undefined, boolean, or number. It should be a string (for DOM elements) or a ReactClass (for composite components).

followed by:

Uncaught Invariant Violation: Element type is invalid: expected a string (for built-in components) or a class/function (for composite components) but got: object.

The quick answer to why this happens is the following:

When defining classes with ES6 you need to export them as default. See this link for some details about why. If you don't export as default you'll see the error above. So class def should look something like:

class User extends React.Component {
  ...
}
export default User;

Even if you do export as default it means the module is exported with a default property which then messes things up. If you use a require instead of an import you will see the same error again (full disclosure: I don't really understand why yet, JS is new and I need to find out more about what export default does).  To resolve this you can do the following:

window.User = require('./components/User.jsx').default;

This all came to light due to this github issue. As indicated in the link above, this is also an issue in Jest. In fact I think it's an issue anywhere you use require instead of import.

This is a pain because for reasons you need to use vanilla JS in your Rails component.js. That is it for now. I'm going to come back and talk more about the setup to get this stack working nicely, but right now I wanted to get this down in case anyone else ran into it.

Monday, December 21, 2015

Understanding Contramap

Recently I've been trying to get my head around what contravarient functions and contramap are used for in practice. In particular I've been searching for some practical examples that turn the types of the function into something concrete. This all started because after implementing a lot Json parsing / writing using Play's Json libraries. In doing so I'd periodically have to use contramap to implement a Write and I wanted to understand what it actually did.

This led me to here: http://blog.tmorris.net/posts/functors-and-things-using-scala/index.html. and here: http://igstan.ro/posts/2013-10-31-contravariant-functors-an-intuition.html.

And now my own attempt to explain.

The problem:

The thing that confused me straight up was this:


Being relatively new to functional programming my mind jumps straight to lists when I think about functors. How can you have a List[Int], a function that converts a String to Int and then somehow generate a List[String]? The function is backwards, this can't be possible? The problem is lists are applicable for the covariant map function, but not contramap.

An example:

Below I take what I've learnt from the above blogs and apply to a very basic serialisation. For me this finally made the idea of contramap stick. But before we talk about types and functions I wanted to try and talk about the concrete example.

Serialsation is pretty common concept. If we look at Json, the idea of taking a string of characters in a generic structure and converting it into an object in memory well understood. When we talk about reading Json we are implementing an abstraction that is covariant and which implements map. For example, if you a have a container (let's call it a Reads[T]) that can read a string into a JsObject (so this would be Reads[JsObject]), and you have a function that can convert a JsObject into the type you are interested (let's call it Company), then it should be possible to create Reads[Company]. The function would look something like:

The implementation of this would probably use the underlying Reads[JsObject] to convert our serlizaed format to a JsObject and then our function to convert the JsObject to Company. So this matches the definition of a covariant map:

But what about contramap?

For this example, let's forget about JsObject and Json and take a really simple example of serializing an object to a string. Let's define a very basic serialisation function.

Along with this we create a simple class and it's serialisation implementation.
Very basic. Just print out a key and a value. Now let's implement contramap. This will allow us to serialise any object using only the traits defined above.


Which matches the contrmap functor defined by Tony. In this case A is Writeable and B is Test and F is writes. If we switch these out we have:

And finally we can define ourselves a Writes[Test] using contramap.

Friday, October 16, 2015

Scala: Functions can be Keys of Maps

This came up in a discussion we had recently about what could be a key for a map in various languages. I jokingly suggested that we should take it to the extreme and have a function as a key for the map. On second thought, there was no reason we shouldn't be able to do this in Scala.

And it turns out it's possible with some limitations. I'm not sure why you'd do this. My only use case was to do with dependency injection. If you have a function what accepts a function, there might be a case where you'd like to know what implementation you were given. In most case you would not (and probably should not) care: that's the whole reason for injecting the implementation. Having said that, you may want it for logging purposes so you can tell what implementation was used.

Anyway, the use case doesn't really matter: this was all about can you, not would you.

So here's the REPL output of my little experiment. The big limitation here is that the function needs to be assigned to a val for this to be useful, and then that val must be used whenever the function is invoked. This seems intuitive to me: without having the actual function implementation, we can't tell at runtime whether an anonymous function or partially applied function is identical to a function assigned to a val.

scala> def add(a: Int, b:Int) = a + b
add: (a: Int, b: Int)Int

scala> val addFunc = add _
addFunc: (Int, Int) => Int = 

scala> def sub(a: Int, b: Int) = a - b
sub: (a: Int, b: Int)Int

scala> val subFunc = sub _
subFunc: (Int, Int) => Int = 

scala> val funcMap = Map(addFunc -> "add", subFunc -> "sub")
funcMap: scala.collection.immutable.Map[(Int, Int) => Int,String] = Map( -> add,  -> sub)

scala> def operation(a: Int, b: Int, op: (Int, Int) => Int, opMap: Map[(Int,Int) => Int, String]) = {
     |   val opString = opMap.getOrElse(op, "Unknown Function")
     |   println("Function: " + opString + " a: " + a + " b: " + b)
     |   op(a, b)
     | }
operation: (a: Int, b: Int, op: (Int, Int) => Int, opMap: Map[(Int, Int) => Int,String])Int

scala> operation(1, 2, addFunc, funcMap)
Function: add a: 1 b: 2
res0: Int = 3

scala> operation(1, 2, subFunc, funcMap)
Function: sub a: 1 b: 2
res1: Int = -1

scala> operation(1, 2, (a: Int, b: Int) => { a * b }, funcMap)
Function: Unknown Function a: 1 b: 2
res2: Int = 2

scala> operation(1, 2, add _, funcMap)
Function: Unknown Function a: 1 b: 2
res3: Int = 3

Sunday, April 26, 2015

Ruby services in Docker using Passenger

This weekend I was working a little project I wanted to deploy to Digital Ocean. The app was a simple server written Ruby using Sinatra to server some JS. I already had an instance running in Digitual Ocean with Docker installed, so I figured I'd just copy one of my existing docker files and deploy it that way.

Having a look at the Dockerfiles I've used for my existing projects, I realised they all used the standard Ubuntu base and then used a dreaded curl | sudo bash install. I'd recently read this hacker news article about the state of sysadmin in a world with containers, and whilst I disagree with many of the points, I thought it was time to look for a more robust way of dealing with containers.

Enter Phusion and their base-image and passenger dockerfiles. Phusion aims to provide a set of stable, secure and better setup docker images then what most people would be used to. In my case this was definitely true.

The base-image is a standard ubuntu build with a number of tweaks to make it more docker friendly. The main thing I picked up on is they use a version of runit as a lightweight process supervisor to manage the processes running in your container.

The passenger images are aimed at application deployment. There are number of builds for different version of ruby, node etc, and it also comes with Nginx and a few other services bundled together.

Getting down to it, I decided I'd use the ruby21 passenger with everything disabled (ie., Nginx) and just use runit to start my app. This seemed to be the best way to get things up and running as I already had scripts setup to run my app using unicorn. I didn't really want to port the app over to use Nginx and Passenger just yet.

So my Dockerfile ended up looking like this. It's very simple which I like.

My run script was as follows:

And my start script simply called unicorn:

A few things to note:
* Don't put an Entry point in your file.
* Instead use the runit and my_init command that is built in. Basically, you use my_init to start and monitor your daemons rather than using an entry point.

The doco explains how to do this (add a script to /etc/service/my_service_name/run) and it pretty much works out of the box

... except it didn't. On running the service and tailing the logs I got the following:

*** Booting runit daemon...
*** Runit started as PID 9
Rack env: production
Running using: 8080 and rack env: production
Apr 26 05:58:08 67f2f5a43939 syslog-ng[21]: syslog-ng starting up; version='3.5.3'
Rack env: production
Running using: 8080 and rack env: production
master failed to start, check stderr log for details
Rack env: production
Running using: 8080 and rack env: production
master failed to start, check stderr log for details
Rack env: production
Running using: 8080 and rack env: production
master failed to start, check stderr log for details

So the runit process wasn't picking up that service was running, and was constantly trying to restart it. This is obviously not the desired behaviour.

I spent a fair bit of time googling around and reading the doco for the base-image. I also looked at runit in more detail to understand what was going on. It wasn't until I re-read the passenger page that I picked up on this:

Note that the shell script must run the daemon without letting it daemonize/fork it. Usually, daemons provide a command line flag or a config file option for that.

Oh. Right. runit wants to manage the process itself. It doesn't want to be looking for a pid file associated with a daemon that is already running as a daemon. It wants to treat it as a process. The solution is to drop the -D out of the call to unicorn as follows:

And there you go. Now I have a small Dockerfile from a well maintained repository that I can easily use to run any kind of ruby service, be it a worker or a web app.

TL;DR: If you have a ruby process you want to run using passenger--don't daemonize it!


Thursday, April 23, 2015

Scala Functional Toolkit: Handling exceptions in for comprehensions


As we develop more and more code in Scala, we are trying to stick to a few key principles. One of these is to avoid throwing Exceptions at all costs. In pretty much every case, this can be avoided by using Options or Eithers, however, sometimes it gets a little messy.

One such example we came across recently is when you are dealing with Futures, in particular doing something like a web request.

For example: Let's say you want to request some data from an API using Play's WebService library (or any HTTP library). Once you've got the result back you want to check whether the response was  a 200 or not, if so, you want to parse the JSON body. Finally, this all should be happening in side a Future so that it doesn't block your code.

In this case, the initial Future may succeed, but contain a 400, in which case you still want to fail.

Here is a naive implementation:

This is the kind of thing we don't want to do. Regardless of how it is handled by the caller, we'd prefer not to throw Exceptions anywhere in our code.

So we can get around this using Future.failed, however, as we are failing the future once the response has been received (that is the initial request future has now completed) we need to use flatMap instead of map. Otherwise our failure case would be Future[Future[JsValue]] and our success case would be Future[JsValue].

Ok. This is better. We are no longer throwing exceptions. We are now using the Future.failed method instead. However, it is a bit clunky. We can make this a slightly nicer using a for comprehension, however to do this we need to create a predicate method:

This method takes a boolean condition and then depending on the result either evaluates the success block, or returns a failed future with the exception specified. Using this we can now re-write the function as follows:

You can play around with this a little to create more versatile or specific variants. Whilst in the end this is all syntactic sugar, it has been a useful in making our code more concise.

Sunday, October 26, 2014

A letter to the Australian Government regarding metadata collection

26th October 2014

Circulation:

Teresa Gambaro
George Brandis
Scott Ludlum
Christian Milne
Bill Shorten

I am writing to you today to express my deep concern regarding the proposed data retention bills that are soon to be presented to the Senate. I want to make it clear that I am not writing to discuss the question of whether metadata collection constitutes surveillance. Nor do I wish to discuss the morality of mass surveillance and whether it violates a fundamental human right. Finally, I am not going delve into the ease with which this surveillance can be circumvented using readily accessible technologies such as Tor or VPNs.

Although all of these issues are concerning to me, there are others who can frame more elegant and knowledgeable arguments on these points. My objection to these changes stems instead from my firsthand experience as a Software Engineer; in particular, my previous experience as a developer of security software and my current experience working with big data analytics.
My concerns are pragmatic and relate to the realities of delivering the infrastructure required to support the proposed legislation. In particular:

·       The cost to the ISP and the end user

·       The risk to the Australian people posed by collecting and storing this information

·       The potential for unseen follow-on effects resulting from decreased competition between ISPs.

The scale of the metadata to be collected is enormous. One estimate puts it at around 1 petabyte per day. If you try to store this on Blu-Ray Discs, you would need 20,000 discs Each day. In my current job we deal with terabytes of data and I have firsthand experience of the difficulties involved in managing this volume of data (which is much smaller than the daily volume required to support this legislation). The cost of storing this data alone is significant, let alone the cost to move that data around. 

This is the point. The data is useless unless it is accessible.

To be accessible it has to be indexed efficiently and readily available. Or, it needs to be archived and stored on media that can be loaded when needed. Both of these approaches present issues; one is computational, the other is physical (for example storing 20,000 Blue-ray Discs per day). There are then considerations around redundancy and back up. For this to be useful, the services recording this data need to be highly available, as do the services storing the data.

All of these technological hurdles can be overcome. Companies such as Google, Facebook and Amazon deal with these volumes of data as part of their daily business. These companies succeed because they have world leaders in technology, building and maintaining systems to allow them to do it. But ISPs don’t. This isn’t their core business and it delivers no value to them or their customers. There is no way for them to recoup the costs of managing this volume of data, other than being subsidized by the government or by passing on costs to their customers. Either way, the taxpayers of Australia foot the bill.

Let us say that these obstacles are overcome and that the impact to the consumer is somehow managed so as not to drive the cost of Internet connections to high. We are now in the situation where vast amounts (in fact all) of the metadata describing the online behavior of Australia’s citizens are stored in a small number of locations hosted by the ISPs. This represents a sizeable target to anyone with nefarious intentions. In the past 18 months we have seen two serious vulnerabilities appear: first OpenSSL’s Heartbleed, and more recently, the Shellshock vulnerability in Bash. OpenSSL and Bash are two technologies that form the cornerstone of computing. They power an enormous amount of the Internet, yet despite being built and maintained by some of the leaders in the field of computer science, they still had flaws and these flaws were exploited.

These two cases are presented to illustrate the difficulty in securing software. The data that these laws require the ISPs to retain is of critical importance. So much so that many Australians feel uncomfortable with their own government possessing that data. How would these people feel if this data was obtained by a third party that was not bound by the laws of Australia? This is the very real risk of metadata collection. Once again, the ISP is being asked to take on a significant burden that is not part of their core business and that they are not equipped to handle. In this case the responsibility is higher as the risk is greater and they will be competing against a potentially highly skilled adversary.

As a final point, in the event that ISPs are able to implement a highly secure and cost effective solution to meet the requirements of this legislation, these requirements will form an enormous barrier to entry for any future ISPs wishing to enter the market. The cost overheads this legislation would place on the operation of an ISP would not only be a strong deterrent to entering the industry, but could also result in the exit of smaller ISPs from the industry. The net result would be a smaller number of ISPs operating under punishing business conditions.

In the end it will be the consumers who suffer. The true cost of this legislation includes the time and resources to create the infrastructure, but also the risk to the Australian public that their data is compromised by a third party. The net effects result from decreased competition and lower service levels. Australia already lags behind much of the developed world with our Internet infrastructure. The scrapping of the NBN ensures that this will remain the case. These laws risk pushing us further behind.

As a result of these concerns I have the following questions:

·       What estimates have been done on the cost of this metadata collection? This includes costs for the ISP and for the government body accessing the data when required? What safeguards will be put in place to ensure that these cost are not passed on to the consumers? In short, can the government guarantee that these changes will not directly or indirectly increase the cost of Internet in Australia?

·       What specific measures are being taken by the government to ensure the metadata is secure and only accessible by those authorized to access it? Have the security protocols been agreed on; if so, what are they? These details should not be concealed under the guise of national security—obfuscation does not create secure systems. Secure systems are built through strong collaboration with industry and constant peer review.

·       What protections are being put in place to ensure that competition between ISPs does not suffer because of the dramatically increased cost of collecting and managing this metadata. What thought has been put into the approval and verification process that would be required by new and existing ISPs to ensure they meet the standard of the metadata collection? What body will oversee this and how will it be administered?

I expect that these questions have already been raised while discussing the legislation. If not, this legislation is nowhere near ready for passage through parliament. If they have been discussed then the answers need to be shared with the Australian people so that they fully understand the impact on the cost of living, the risk to their personal data and the threat to competition that this the legislation poses.

Sincerely


David Healy