Maybe in Java 17 Nov 2011
One of the more elegant concepts in a lot of functional languages is that of Maybe. Haskell drove home the magic of Maybe for me. The general idea is representing the possibility of a value. In Haskell, you’ll generally use it such that processing will continue if the value is needed and present, or short circuit if the value is needed and not present. The end result of a computation which includes a Maybe will also be a Maybe, but one representing the result of the computation.
In expression-oriented languages like Haskell, that works out very nicely, but I spend most of my time working in Java, which is decidedly statement-oriented. Concepts will sometimes move nicely between worlds, sometimes not. When I started working on Atlas recently I decided to see how well Maybe ported. I started with Nat Pryce’s maybe-java and took off from there.
Nat’s class encourages a model analogous to Haskell’s, executing within the context of a Maybe instance by passing functions into the Maybe instance, for example:
Maybe<String> name = Maybe.definitely("Brian");
Maybe<String> message = name.to(new Function<String, String>() {
public String apply(String s)
{
return "hello, " + s;
}
});
System.out.println(message.otherwise("No one here!"));
This treats the sequence of Maybe instances as an evaluation context or control flow, which works nicely in some languages, but sadly, as with most attempts to do higher-order functions in Java, it got awkward rather quickly. Part of it is purely syntactic, the syntax isn’t optimized for it, but part of it is semantic as well. Idiommatic Java uses exceptions for non-happy-path control flow, and most of the libraries which provide the main reason for using Java behave this way.
Given that, I switched from using Maybe to control evaluation to using Maybe purely to represent the possibility of a value, and things fell into place very nicely – even playing within Java’s exception idioms. Take for example this snippet:
SSHCredentials creds = space.lookup(credentialName, SSHCredentials.class)
.otherwise(space.lookup(SSHCredentials.DEFAULT, SSHCredentials.class)
.otherwise(new IllegalStateException("unable to locate any ssh credentials"));
In this case there may exist named credentials, if not there may exist some default credentials, and if there is neither the world explodes. In the typical case you would see either a test for existence and then use, or a fetch and check for null. Both of which are, to my mind, less clear and certainly more error prone (largely in needing to remember to check everywhere, particularly in the case of a this-or-that situation, etc).
Other bits of using Maybe extensively are not completely clear, but I am pretty confident that I will be using some evolution this flavor of Maybe in most of my Java-based code going forward.
Using s3 URLs with Ruby's open-uri 11 Aug 2011
Ruby’s open-uri is a wonderful hack, and I recently got to figure out how ot plug in additional URL schemes. Here is a quick and dirty to allow urls of the form s3://<bucket>/<object> :
require 'aws/s3'
module URI
class S3 < Generic
def initialize(*args)
@bucket, @file = args[2], args[5][1,args[5].length]
super(*args)
end
def open &block
http_url = AWS::S3::S3Object.url_for @file, @bucket
URI.parse(http_url).open &block
end
end
@@schemes['S3'] = S3
end
It uses the AWS::S3 library, but could be adapted pretty easily to the AWS SDK for Ruby. It does require the normal initialization but then just works :-)
open("s3://skife/whiteboard.jpg") do |in|
# do stuff with the contents...
end
Fundamental Components in a Distributed System 26 Jul 2011
In the last several weeks I have had a surprising number of conversations about the fundamental building blocks of a large web-based system. I thought I’d write up the main bits of a good way to do it. This is far from the only way, but most reasonably large systems will wind up with most of this stuff. We’ll start at the base and work our way up.
Operational Platform
At the very base of the system you need to have networking gear, servers, the means to put operating systems onto the servers, bring them up to baseline configuration, and monitor their operational status (disk, memory, cpu, etc). There are lots of good tools here. Getting the initial bits onto disk will usually be determined by the operating system you are using, but after that Chef or Puppet should become your friend. You’ll use these to know what is out there and bring servers up to a baseline. I personally believe that chef or puppet should be used to handle things like accounts, dns, and stable things common to a small number of classes of server (app server, database server, etc).
The operational platform provides the raw material on which the system will run, and the tools here are chosen to manage that raw material. This is different than application management.
Deployment
The first part of application management is a means of getting application components onto servers and controlling them. I generally prefer deploying complete, singular packages which bundle up all their volatile dependencies. Tools like Galaxy and Cast do this nicely. Think hard about how development, debugging, and general work with these things will go, as being pleasent to work with during dev, test, and downtime will trump idealism in production.
Configuration
Your configuration system is going to be intimately tied to your deployment system, so think about these things together. Aside from seperating the types of configuration you want there are a lot of tradeoffs. In generally, I like immutable configuration obtained at startup or deployment time. A new set of configs means a restart. In this case, you can either have the deployment system provide it to the application, or have the application itself fetch it. Some folks really like dynamic configuration, in that case Zookeeper is going to be your friend. Most things don’t reload config well without a restart though, and I like having a local copy of the config, so… YMMV.
Application Monitoring
Application level monitoring and operational level monitoring are very similar, and can frequently be combined in one tool, but are conceptually quite different. For one thing, operational monitoring is usually available out of the box from good old Nagios to newer tools like ‘noit. Generally you will want to track the same kinds of things, but how you get it, and what they mean will vary by the application. Monitoring is a huge topic, go google it :-)
Discovery
Assuming you have somewhere to deploy, and the ability to deploy, your services need to be able to find each other. I prefer dynamic, logical service discovery where services register their availability and connection information (host and port, base url, etc) and then everything finds each other via the discovery service. A lot of folks use Zookeeper for this nowadays, and most everyone I know who has used it loves it. One of the best architecty type guys I know would probably have its baby if he could, based on how effusive he is about. That said, you can do lots of different things.
I have heard discussion about using the reporting capabilities of a tool like Galaxy, or the CMDB capabilities of Chef to accomplish this, but I think these are ill suited. Firstly, they operate on either concrete deployment units, or on specific low level roles, rather than on logical services. Secondly, they are quite outside the lifecycle control of the service itself.
In an ideal world the location of your discovery system is the only well known address in the whole system. Some things don’t participate well in discovery out of the box – those being fully formed components such as databases, caches, and so on. How you integrate these will vary, but two good techniques I have seen are the use of companion processes which interact with the discovery service, and static entries in the discovery service. In the case of a companion process, the companion generally does a very basic health check (is the server running?) and provides a local view of whatever is needed from the service. In the case of static entries, the entry may be placed and removed by the startup script, or via some alternate channel (doing it by hand, etc).
Yet More of the Long Tail Treasure Trove 15 Jul 2011
Another addition of the long tail treasure trove in blog form.
SQLite JDBC
JDBC driver for sqlite which embeds the mac, linux, and windows binaries for sqlite. It will load the C library on demand, and you just go about your merry way. I love SQLite, I work in Java frequently. Win!
Connector/MXJ
MySQL server in a jar. Seriously, just embed MySQL in your Java stuff. Magical for testing, etc.
ConcurrentLinkedHashMap
Exactly what the name says, concurrent linked hash map. Martin and I really wanted this back in the day. Now Ben Manes (sorry, don’t have a good link for him) wrote a really good implementation.
Greplin’s Bloom Filter Library
Nice bloom filter library for Java.
SSHJ
Young project, but very easy to use library for SSH in Java.
JLine2
Better readline for Java
ReflectASM
High perf Java reflection via bytecode gen.
snakeyaml
Not-sucky YAML in Java
QueueFile
Crazy Bob’s one-class on disk FIFO queue.
Ah, finally something non-Java! Mail is a very pleasent email library for ruby.
CMPH
The C Minimal Perfect Hashing Library. Perfect hashes are fun. This finds them for you.
Diff Match and Patch
Diff, fuzzy matching, and patching in C++, C#, Java, Javascript, lua, Objective-C, and Python.
Making Really Executable Jars 20 Jun 2011
One of the more annoying things about writing command line applications in Java is that the Java model of an executable is the so called executable jar, which is executed via an incantation like
$ java -jar ./waffles-1.2.3.jar --some-flag=blue hello
There has long been a hack known in some circles, but not widely known, to make jars really executable, in the chmod +x sense. The hack takes advantage of the fact that jar files are zip files, and zip files allow arbitrary cruft to be prepended to the zip file itself (this is how self-extracting zip files work).
To do this for jar files, on unix-like operating systems, create a little shell script which looks like:
#!/bin/sh
exec java -jar $0 "$@"
You can make it fancier, doing things like looking for JAVA_HOME and so on, but the above is enough to get started. Make sure to add a few newlines at the end, they are very important. If you leave them out it will not work.
Now that you have your little shell script, cat the executable jar you want onto the end of it, set the script +x, and go to town. If you script is named waffles, then you would do that like:
$ cat ./waffles-1.2.3.jar >> ./waffles
$ chmod +x ./waffles
$ ./waffles --some-flag=blue hello
and there you go! I have a little maven plugin that will do this for you automagically, but haven’t had a chance to get it into central yet. I guess I should probably stop writing and go do so…
Updates
-
David Phillips suggested putting the
$@in parens as it can contain spaces. I have updated the post to do so. -
Sven Schober pointed out a bug in the original form of the shell script I posted. I forgot the extremely important
$0. That is what I get for writing from memory and not unit testing my blog posts! The post has been fixed. -
Jeffrey McManus found a typo, I had
chomdinstead ofchmod. Fixed, thank you! I really need to find a way to unit test blog posts!