I am always amazed when I read an article from 2004 and find interesting goodies. I’m probably late to the game on a lot of these articles, as I didn’t really dive into programming as a career until 2005, but I just read The Simplest Thing that Could Possibly Work, a conversation with Ward Cunningham by Bill Venners. The article was published on January 19, 2004, but it is truly timeless.

The Shortest Path

Simplicity is the shortest path to a solution.

“Shortest” doesn’t necessarily refer to lines of code or number of characters, but I see it more as the path that requires the least amount of complexity. As he mentions in the article, if someone releases a 20 page proof to a math problem and then later on, someone releases a 10 page proof for the same problem, the 10 page proof is not necessarily more simple.

The 10 page proof could use some form of mathematics that is not widely used in the community and takes some time to comprehend. This means the 10 page version could be less simple as it requires learning to understand, whereas the 20 page uses generally understood concepts.

I think this is a balance that we always fight with as programmers. What is simple? I can usually say simple or not simple when I look at code, but it is hard to define the rules for simplicity.

Work Today Makes You Better Tomorrow

The effort you expend today to understand the code will make you a more powerful programmer tomorrow.

This is one of the concepts that has made the biggest different in my programming knowledge over the past few years. The first time that I really did this was when I wrote about class and instance variables a few years back. Ever since then, when I come across something that I don’t understand, that I feel I should, I spend the time to understand it. I have grown immensely because of this and would recommend that you do the same if you aren’t already.

Narrow What You Think About

We had been thinking about too much at once, trying to achieve too complicated a goal, trying to code it too well.

This is something that I have been practicing a lot lately. You know how sometimes you just feel overwhelmed and don’t want to start a feature or project? What I’ve found is that when I feel this way it is because I’m trying to think about too much at once.

Ward encourages over and over in the article, think about what is the most simple possible thing that could work. Notice he did not say what is the simplest thing that would work, but rather what could work.

This is something that I’ve noticed recently while pairing with Brandon Keepers. Both of us almost apologize for some of the code we first implement, as we are afraid the other will think that is all we are capable of. What is funny, is that we both realize that you have to start with something and thus never judge. It is far easier to incrementally work towards a brilliant solution than to think it in your head and instantly code it.

Start with a test. Make the test pass. Rinse and repeat. Small, tested changes that solve only the immediate problem at hand always end up with a more simple solution than trying to do it all in one fell swoop. I’ve also found I’m more productive this way as I have less moments of wondering what to do next. The failing test tells me.

Anyway, I thought the article was interesting enough that I would post some of the highlights here and encourage you all to read it. If you know of some oldie, but goodie articles, link them up in the comments below.



So you decide you want a way for a number of people to post short articles to a Web site, and maybe allow for other people to leave comments. What do you do? That’s easy: jump to the white board and start sketching out a vast array of boxes, blobs, and arrows that define a sophisticated content management system with multi-level admin roles, content versioning, threaded discussion boards, syntax highlighting, the works.


Don’t be too quick to judge. It’s easy to fall into the trap of defining the wrong requirements for a project.

Part of the reason is that building software is (or should be) fun, and often bigger projects are more fun. There is also the tendency to think about “what if”, imagining all the things that maybe one day who knows you never can tell might be needed.

People also tend to think in terms of what’s familiar, of how things have been done in the past or by others.

There are many ways to satisfy the needs described in the first paragraph. Some don’t require writing any software at all.

For the Ruby Best Practices blog, the general goals were modest. Allow a core set of people to easily add new content. Allow other people to contribute content as well, but only with the OK from someone in that core group. (BTW, there are eight of us in the RBP blog team. See here for my take on the RBP team logo)

We wanted to allow the use of Textile, and not make people use a browser for editing. Basically, turn simple flat files into a Web site with minimal fuss.

Korma takes an interesting approach to building Web sites. The whole app is ~230 lines of Ruby. Its key function is to take content from a Git repository, run it through some templating, and write out the site files.

Relying on Git for that back end is stunningly perfect. Git provides versioning, access control, and distributed contributions.

It becomes the database layer common to most blogging tools. For free. Right off the bat, no need to write any admin tools or content versioning code .

At the heart of Korma is the grit gem. As the project blurb says, “Grit gives you object oriented read/write access to Git repositories via Ruby.” Very sweet.

The korma.rb file takes two arguments. The first is required, and is the path to a git repository. The second argument is optional; it tells Korma what directory to use for the generated content, and defaults to ‘www’, relative to where you invoke the program.

The app uses grit to reach into the current contents of the repo and parse the needed files. Files have to be committed to be accessible to Korma.

There is a configuration file that describes some basic site metadata, such as the base URL, the site title, and the author list. When called, korma.rb grabs this config file, sets some Korma::Blog properties, and then writes out the static files.

An early version of Korma used Sinatra; basically, Korma was a lightweight Web app to serve the blog posts. But as simple as it was, it was overkill, since there was no real dynamic content. It made no sense to have the Web app regenerate the HTML on each request, since it changed so infrequently.

A next version replaced the Web app part with static files, making it a straightforward command-line program. This solved another problem: how to automate the regeneration of files. The answer: use Git’s post-commit hook to invoke the program.

For example:

   # File .git/hooks/post-commit   
   /usr/local/bin/ruby /home/james/data/vendor/korma/korma.rb /home/james/data/vendor/rbp-blog /home/james/data/vendor/korma/www

Early versions also used Haml for site-wide templates. Not being a fan of Haml, I added in a configurable option to use Erb. It was nice and all, but it was a feature without a requirement. No one was asking for configurable templating, so that option was dropped and Erb replaced Haml as the default.

If you are wondering why working code was removed, consider that any code you have is something you have to maintain. As bug-free and robust as you may like to think it, no code is easier to maintain than no code. Configurable templating was simply not a problem we were needed to solve, and a smaller, simpler code base is more valuable than a maybe “nice to have.”

There was some discussion about the need or value of allowing comments. In the end they were deemed good, but there was no good argument for hosting them as part of the blog site itself. That meant a 3rd-party solution (in this case, Disqus) was perfectly acceptable. Again, a goal was to have comments, not to write a commenting system (as entertaining as that may be).

Using Git allows for yet another feature for free: easy one-off contributions. In many systems, allowing anyone to contribute means creating an account, granting some sort of access, managing all the new user annoyances. Or, an existing user has to play proxy, accepting content and entering it into the system on behalf of the real author. That’s work! With git, one can clone the repo, add new content, and issue a pull request back to the master branch. Anyone with commit rights to the master can then merge it in (or politely decline). No-code features FTW.

None of the blog design requirements are written in stone, and they may change tomorrow, but by identifying the real needs, addressing what was deemed essential, offloading what we could, and skipping the feature bling, we have a system that is easy to understand, easy to maintain, and easy to change.

Fixing The Web With Stylish

Tháng Năm 20, 2009

Design becomes beautiful when there’s nothing left to take away. Google understands this; too many other web applications don’t.

Many apps grow new features simply to differentiate themselves from the competition, or to increase their target market by supporting multiple conflicting workflows. But every single thing I don’t need detracts from the usability of the things I do. In the past I’ve had to abandon apps that do more than I need, but not any more.

Now I can fix them.

Stylish is a Firefox plugin that lets you write your own css per-site, overriding the existing css. This means you can set display:none on everything unnecessary.

It’s a one-click install. You’ll also need Firebug to assist in identifying elements you want to block.

Once it’s installed, open firebug and point to the offending feature. Figure out a way to identify it with a CSS selector. In some cases this’ll be easy – it might have a classname like AnnoyingFeature. In others you’ll have to use CSS3 selectors. For example, you can do:

label[id*="requested_by_id"] {display:none}
view raw This Gist brought to you by GitHub.

This matches every Label element with an ‘id’ attribute containing ‘requested_by_id’. This is helpful if the site’s using ids like ‘story_34524_requested_by_id’.

But what about if you had this?

  <span class='annoying'>annoyance 1</span>
  <span>annoyance 2</span>
view raw This Gist brought to you by GitHub.

It’d be nice to block the parent div, thereby taking out ‘annoyance 2’, but you can’t; there’s no “parent selector” even in CSS3. Greasemonkey could fix this; Stylish can’t.

While you’re there, you can also fix other usability issues like textboxes being too small – it’s as simple as ‘textarea {height:300px}’.

Stylish has a live preview, and it’s only one click to turn off custom styles, so you can afford to be both aggressive and experimental.

You can share custom styles. I’d like to suggest that styles solely designed to remove distractions, rather than ‘re-skin’ sites, include ‘Simplify’ in the name to make them easy to find. Over the next few days I’ll be redesigning some of the web tools I use frequently to make them more of a joy to work with.

Help kill Internet Explorer 6

Tháng Tư 19, 2009


Today Done21 is proud to announce IE6 Update, a great new tool to encourage your Internet Explorer 6 website visitors to update their browser. Check out, and if you end up using IE6 Update, tell us about it on Twitter using the hashtag #ie6update!

Now, some back story…

Internet Explorer 6 is the plague of the Internet. Sure there are other battles, like net neutrality or censorship, but I think it’s safe to say that IE6 has held back innovation more than anything else, because of its slow JavaScript engine, incorrect rendering of web pages, bad security, and more. At the time of this writing, Net Applications says 18% of Internet users still use Internet Explorer 6. Even worse, many sites have a much higher percentage of Internet Explorer 6 visitors.

There have been numerous efforts to put an end to Internet Explorer 6 and we’d like to thank all of them, particularly these:

While we applaud any campaign to put an end to IE6, most efforts we’ve seen are very developer centric. And although there have been many other great sites that politely ask users to update, I think developers have made the bold assumption that your average IE6 user is capable of distinguishing between their web browser and the Internet. It’s kind of a hard thing for us tech types to understand, but just take a look at Microsoft’s marketing for Internet Explorer 8:

IE8 Home Page
IE8’s Homepage

IE8 Upgrade your Internet
Upgrade your Internet

Better Web
IE8 makes your web better… what?

IE8 Banner Advertisement
Download a new Internet!

See what I mean? Microsoft markets its browser as the Internet itself. It’s no wonder its been so hard for developers to explain the situation to their site visitors. We, the web community, have politely asked web surfers to upgrade from IE6. We’ve tried. Everyone has tried. So now it’s time to trick them into upgrading. That’s right, trick IE6 users into moving on. Tricking users is a big taboo, but if we’re talking about tricking users into upgrading from IE6 to a newer browser, we feel that it’s like tricking them into receiving free money.

How can we do this? Well, you know that yellow bar that sometimes appears in Internet Explorer at the top of the browser window? It’s called the Information Bar. It usually displays information about security updates, missing plug-ins, and things of that nature. We decided that the best way to get behind enemy lines would be to fake the Information Bar and offer the user a browser update. Check it out:

IE6 Update

To use IE6 Update, all you have to do is go to this website, and then copy and paste the IE6 Update code into your site. It’s that easy.


You know that feeling? The one where you are in the zone. Every keystroke is intentional and the sound of them in succession sounds almost like music. Your chair feels comfy and your desk is clean, or at least mostly clean. The code you are producing could be framed and put on a wall and even your mom would admire its beauty. You’ve had that feeling right?

Contrast what I just presented, with that other feeling. You know it. I know you know it. You have a slight buzz in the front of your brain. Typos begin to abound and you are starting to think your fingers went drinking without you. You shift to and fro but your chair continues to feel like a block of wood. Your code is so convoluted that you sometimes don’t understand what you just wrote 30 minutes ago. In fact, the thing that should “just work” isn’t working and hasn’t been for the past 45 minutes.

What To Do?

So what are you to do? Push through you say! Mush, mush! I know in an hour I’ll once again feel fresh and write code that sounds like a harp from the Heavens. I’ve already spent 45 minutes, what’s another 15. Besides, I’m so close. FAIL! I have news for you Walter Cronkite, you won’t. Your code will begin to feel more and more like the spaghetti [insert previous language here] you use to write. You’ll get more uncomfortable, and most likely the buzz in your head will last for hours after you stop, quite possibly ruining the rest of your night.

Great Programming Requires Inspiration

I believe the ability to give up and to know when to give up is what sets great programmers apart from good, average and poor ones. Programming is art. Art is created through inspiration. Anyone can push through and churn out crappy code. The question is, are you self-aware to the point that you know when it’s time to stop?

Do Something Else

Next time you feel the buzz creeping into the front of your skull and your undies start riding up, stop! As the wait staff at Logan’s Roadhouse says, “Stop what you’re doin and swallow what you’re chewin.” Pet your cat. Read a few chapters from a book. Complete the next mission on Grand Theft Auto. Don’t push through. Do something other than programming that inspires you and I guarantee the inspiration for what you were working on will return.

What I Do

So what do I do when I hit the wall? I’ve never been much of a reader but lately I’ve turned to books. I’m not a fan of fiction, but I have discovered an ardent interest in business, especially now that I’m a business owner. Books like Good to Great, The Fred Factor, 4 Hour Work Week and Blink are inspiring me to think differently and write better code.

Got Git? HOWTO git and github

Tháng Hai 16, 2009


Getting started with Github:

Github is the new cool git repository website (social network for geeks). I asked a friend for an invite and got started. Creating a github account and repository is really easy. The instructions are well laid out. The last step is to export your svn repo into the git directory and then commit and push. I’ve been told that if anyone needs a git invite to let me know and I’ll be given invites to send out. So let me know.

Now what?

So now I had my git repo setup on github and I started sending out the repo’s public address. And before I know it I had my first patch. Now what? I read Dr. Nic’s blog post about git when it came out, so I went back and read it again. After reading a few times I realized two things: (1) That git is very complicated and (2) Dr. Nic is a way bigger StarWars geek than I ever gave him credit for. I called my friend Josh Owens (of the Web 2.0 Show) and asked for some help. Josh talked me through my first merge and push, but I still didn’t quite get it. The next day I was trying to merge another patch (branch) and since Josh wasn’t around I went into #github where Tom Preston-Werner (mojombo) helped sort me out quite a bit.


I am still new at git so I may have things wrong, so if you find an error or omission, please say so in the comments.

Git explained (I think):

Understatement: Git is very different from Subversion. Git is a distributed source control system, which means you can work disconnected from the main repo (branch) and still commit. But you commit to your local repo (branch). The basic flow is (some crucial steps have been left out for now. I’ll fill those in later. Don’t use the following as a step by step guide, that comes later):

  1. git clone. This basically tells your local git to go out an pull the files from the server.
  2. Make your changes.
  3. git commit. This commits your files to your LOCAL repository, not the one on the server.
  4. git push. This “pushes” you committed changes up to the server.

OK, so far so good. This feels a lot like svn. So much that it’s time to get overconfident and think we get git. But we don’t. And our confidence is about to be shaken like a dry martini.

But I have my own repo and people are trying to commit to me. What do I do, how does that work?

You are the master:

In git everything revolves around branches (github calls them forks1). When you create a git repo, that main branch is called “master.” Your master branch is kind of like what trunk is in svn. When someone wants to fork/branch your master, they go to your page in github and click the fork button. Now they have a fork/branch of your master branch. When they are ready for you to check out their changes and merge theirs back into your master they’ll send you a message via git hub. You’ll get this message in your inbox and have no idea what to do with it. Cool.

The first thing to notice is that the url for their branch looks a lot like the url to your master. My “public clone url” is git:// Dr. Nic’s looks like this: git:// The similarities matter.

To get Dr. Nic’s branch and merge it into my master I do the following (let’s assume I have my master cloned to /lovd):

NOTE: Put the contents of this pastie in the top of your ~/.bash_profile file to see which branch you are currently working in. This is demonstrated in the examples below. The part in parens is the branch. (Another version of the same thing.) (After changing your ~/.bash_profile file you’ll either have to restart terminal or source the file to see the changes.)

  1. /luvd(master) $ git remote add drnic git://
  2. /luvd(master) $ git pull
  3. /luvd(master) $ git checkout -b drnic/master
  4. /luvd(drnic/master) $ git pull drnic master
  5. [look at the files, rake, test, etc]
  6. /luvd(drnic/master) $ git status
  7. /luvd(drnic/master) $ git checkout master
  8. /luvd(master) $ git merge drnic/master
  9. /luvd(master) $ git status
  10. /luvd(master) $ git commit -a
  11. /luvd(master) $ git push
  • Step 1 tells my local git repo to add a new remote repository. This means that the same repo can pull and push to two different server2. There is nothing analogous in svn. The syntax is “git remote add {name} {url}.” I am naming this remote “drnic,” because I am going to pull his branch.
  • Step 2 simply pulls my local master. This is like doing an svn up. I do this because I gave commit rights to someone else and I want to make sure I have the latest changes in my local master. Step 4 also does a pull, but there we have to specify where we are pulling from. Here it defaults to the “origin,” which is my git master on github.
  • Step 3 is a bit confusing because I am not doing anything like an svn checkout. A git checkout basically changes the branch I am working with. The -b switch means to create the new branch3. Notice that in steps 1 -3 I was in branch master (master) but after I “git checkout drnic/master” I am in branch drnic/master (drnic/master4). Get ready: Rather than have each branch in a different directory on the file system, all the branches live in the same directory. What? Exactly. Try this: Open Textmate and open a file that you know has changed between the two branches. Now go in to terminal and “git checkout {other branch}.” Go back to Textmate and notice that the file did actually change. What? Exactly. Isn’t this cool? But it means you must use the .bash_profile hack to know what branch you’re working in. Other wise you will go crazy. You can also do “git branch” which will tell you what branches are available locally and which one you are in.
  • Step 4 just says pull all of the files from the remote repo named “drnic” (which we created in step 1) and the branch, in this case “master.” Syntax: “git pull {name of remote} {name of branch}.” This is kind of like an svn checkout.
  • Step 6 shows which files have modifications. If you didn’t change anything then you can go on to step 7, otherwise you might have to git commit to your local branch of drnic’s branch of your master. (Don’t feel bad if you have to reread that a few times :)
  • Step 7 puts us back in the context of our master branch.
  • Step 8 merges the local drnic branch into our master branch.
  • Step 10 commits all unadded files to the local master repo. Unadded? Yes, in git a file with local modifications needs to be added back into the repo for committing. You can do this with “git add {file}” or just git commit all the added and unadded files.
  • Step 11 pushes your local changes back to your origin (master branch on github). Now other may fork/branch your master and start again. Easy right?

You just want to patch some else’s repo:

This is really simple. Here are the steps:

  1. Go to github and click the “fork” button.
  2. git clone git://
  3. cd lovd-by-less
  4. Make your cahnges
  5. git status
  6. git commit -a
  7. git push
  8. go back to git hub and click the “pull request” button.

I’m guessing that by now this is really clear.

  1. Step 1 will create you own fork of the repo where you can make your changes. Click this on the person’s page you want to fork from.
  2. Steps 2-7 you should understand by now. (I hope. :)
  3. Step 8 will send a message to the person notifing them that you have something for them to see. Click this from your repo page.


Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, “if you want vast, on-demand scalability, you need a non-relational database”.

If that is true, then is this a sign that the once mighty relational database finally has a chink in its armor? Is this a sign that relational databases have had their day and will decline over time? In this post, we’ll look at the current trend of moving away from relational databases in certain situations and what this means for the future of the relational database.

Relational databases have been around for over 30 years. During this time, several so-called revolutions flared up briefly, all of which were supposed to spell the end of the relational database. All of those revolutions fizzled out, of course, and none even made a dent in the dominance of relational databases.

First, Some Background

A relational database is essentially a group of tables (entities). Tables are made up of columns and rows (tuples). Those tables have constraints, and relationships are defined between them. Relational databases are queried using SQL, and result sets are produced from queries that access data from one or more tables. Multiple tables being accessed in a single query are “joined” together, typically by a criterion defined in the table relationship columns. Normalization is a data-structuring model used with relational databases that ensures data consistency and removes data duplication.

Relational databases are facilitated through Relational Database Management Systems(RDBMS). Almost all database systems we use today are RDBMS, including those of Oracle, SQL Server, MySQL, Sybase, DB2, TeraData, and so on.

The reasons for the dominance of relational databases are not trivial. They have continually offered the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility in managing generic data.

However, to offer all of this, relational databases have to be incredibly complex internally. For example, a relatively simple SELECT statement could have hundreds of potential query execution paths, which the optimizer would evaluate at run time. All of this is hidden to us as users, but under the cover, RDBMS determines the “execution plan” that best answers our requests by using things like cost-based algorithms.

The Problem with Relational Databases

Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation. This has not been much of a problem so far because the universal dominance of RDBMS has outweighed the need to push any of these boundaries. Nonetheless, if you really had a need that couldn’t be answered by a generic relational database, alternatives have always been around to fill those niches.

Today, we are in a slightly different situation. For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability. As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general.

Relational databases scale well, but usually only when that scaling happens on a single server node. When the capacity of that single node is reached, you need to scale out and distribute that load across multiple server nodes. This is when the complexity of relational databases starts to rub against their potential to scale. Try scaling to hundreds or thousands of nodes, rather than a few, and the complexities become overwhelming, and the characteristics that make RDBMS so appealing drastically reduce their viability as platforms for large distributed systems.

For cloud services to be viable, vendors have had to address this limitation, because a cloud platform without a scalable data store is not much of a platform at all. So, to provide customers with a scalable place to store application data, vendors had only one real option. They had to implement a new type of database system that focuses on scalability, at the expense of the other benefits that come with relational databases.

These efforts, combined with those of existing niche vendors, have led to the rise of a new breed of database management system.

The New Breed

This new kind of database management system is commonly called a key/value store. In fact, no official name yet exists, so you may see it referred to as document-oriented, Internet-facing, attribute-oriented, distributed database (although this can be relational also), sharded sorted arrays, distributed hash table, and key/value database. While each of these names point to specific traits of this new approach, they are all variations on one theme, which we’ll call key/value databases.

Whatever you call it, this “new” type of database has been around for a long time and has been used for specialized applications for which the generic relational database was ill-suited. But without the scale that web and cloud applications have brought, it would have remained a mostly unused subset. Now, the challenge is to recognize whether it or a relational database would be better suited to a particular application.

Relational databases and key/value databases are fundamentally different and designed to meet different needs. A side-by-side comparison only takes you so far in understanding these differences, but to begin, let’s lay one down:

No Entity Joins

Key/value databases are item-oriented, meaning all relevant data relating to an item are stored within that item. A domain (which you can think of as a table) can contain vastly different items. For example, a domain may contain customer items and order items. This means that data are commonly duplicated between items in a domain. This is accepted practice because disk space is relatively cheap. But this model allows a single item to contain all relevant data, which improves scalability by eliminating the need to join data from multiple tables. With a relational database, such data needs to be joined to be able to regroup relevant attributes.

But while the need for relationships is greatly reduced with key/value databases, certain ones are inevitable. These relationships usually exist among core entities. For example, an ordering system would have items that contain data about customers, products, and orders. Whether these reside on the same domain or separate domains is irrelevant; but when a customer places an order, you would likely not want to store both the customer and product’s attributes in the same order item.

Instead, orders would need to contain relevant keys that point to the customer and product. While this is perfectly doable in a key/value database, these relationships are not defined in the data model itself, and so the database management system cannot enforce the integrity of the relationships. This means you can delete customers and the products they have ordered. The responsibility of ensuring data integrity falls entirely to the application.

Key/Value Stores: The Good

There are two clear advantages of key/value databases to relational databases.

Suitability for Clouds

The first benefit is that they are simple and thus scale much better than today’s relational databases. If you are putting together a system in-house and intend to throw dozens or hundreds of servers behind your data store to cope with what you expect will be a massive demand in scale, then consider a key/value store.

Because key/value databases easily and dynamically scale, they are also the database of choice for vendors who provide a multi-user, web services platform data store. The database provides a relatively cheap data store platform with massive potential to scale. Users typically only pay for what they use, but their usage can increase as their needs increase. Meanwhile, the vendor can scale the platform dynamically based on the total user load, with little limitation on the entire platform’s size.

More Natural Fit with Code

Relational data models and Application Code Object Models are typically built differently, which leads to incompatibilities. Developers overcome these incompatibilities with code that maps relational models to their object models, a process commonly referred to as object-to-relational mapping.This process, which essentially amounts to “plumbing” code and has no clear and immediate value, can take up a significant chunk of the time and effort that goes into developing the application. On the other hand, many key/value databases retain data in a structure that maps more directly to object classes used in the underlying application code, which can significantly reduce development time.

Other arguments in favor of this type of data storage, such as “Relational databases can become unwieldy” (whatever that means), are less convincing. But before jumping on the key/value database bandwagon, consider the downsides.

Key/Value Stores: The Bad

The inherent constraints of a relational database ensure that data at the lowest level have integrity. Data that violate integrity constraints cannot physically be entered into the database. These constraints don’t exist in a key/value database, so the responsibility for ensuring data integrity falls entirely to the application. But application code often carries bugs. Bugs in a properly designed relational database usually don’t lead to data integrity issues; bugs in a key/value database, however, quite easily lead to data integrity issues.

One of the other key benefits of a relational database is that it forces you to go through a data modeling process. If done well, this modeling process create in the database a logical structure that reflects the data it is to contain, rather than reflecting the structure of the application. Data, then, become somewhat application-independent, which means other applications can use the same data set and application logic can be changed without disrupting the underlying data model. To facilitate this process with a key/value database, try replacing the relational data modeling exercise with a class modeling exercise, which creates generic classes based on the natural structure of the data.

And don’t forget about compatibility. Unlike relational databases, cloud-oriented databases have little in the way of shared standards. While they all share similar concepts, they each have their own API, specific query interfaces, and peculiarities. So, you will need to really trust your vendor, because you won’t simply be able to switch down the line if you’re not happy with the service. And because almost all current key/value databases are still in beta, that trust is far riskier than with old-school relational databases.

Limitations on Analytics

In the cloud, key/value databases are usually multi-tenanted, which means that a lot of users and applications will use the same system. To prevent any one process from overloading the shared environment, most cloud data stores strictly limit the total impact that any single query can cause. For example, with SimpleDB, you can’t run a query that takes longer than 5 seconds. With Google’s AppEngine Datastore, you can’t retrieve more than 1000 items for any given query.

These limitations aren’t a problem for your bread-and-butter application logic (adding, updating, deleting, and retrieving small numbers of items). But what happens when your application becomes successful? You have attracted many users and gained lots of data, and now you want to create new value for your users or perhaps use the data to generate new revenue. You may find yourself severely limited in running even straightforward analysis-style queries. Things like tracking usage patterns and providing recommendations based on user histories may be difficult at best, and impossible at worst, with this type of database platform.

In this case, you will likely have to implement a separate analytical database, populated from your key/value database, on which such analytics can be executed. Think in advance of where and how you would be able to do that? Would you host it in the cloud or invest in on-site infrastructure? Would latency between you and the cloud-service provider pose a problem? Does your current cloud-based key/value database support this? If you have 100 million items in your key/value database, but can only pull out 1000 items at a time, how long would queries take?

Ultimately, while scale is a consideration, don’t put it ahead of your ability to turn data into an asset of its own. All the scaling in the world is useless if your users have moved on to your competitor because it has cooler, more personalized features.

Cloud-Service Contenders

A number of web service vendors now offer multi-tenanted key/value databases on a pay-as-you-go basis. Most of them meet the criteria discussed to this point, but each has unique features and varies from the general standards described thus far. Let’s take a look now at particular databases, namely SimpleDB, Google AppEngine Datastore, and SQL Data Services.

Amazon: SimpleDB

SimpleDB is an attribute-oriented key/value database available on the Amazon Web Services platform. SimpleDB is still in public beta; in the meantime, users can sign up online for a “free” version — free, that is, until you exceed your usage limits.

SimpleDB has several limitations. First, a query can only execute for a maximum of 5 seconds. Secondly, there are no data types apart from strings. Everything is stored, retrieved, and compared as a string, so date comparisons won’t work unless you convert all dates to ISO8601 format. Thirdly, the maximum size of any string is limited to 1024 bytes, which limits how much text (i.e. product descriptions, etc.) you can store in a single attribute. But because the schema is dynamic and flexible, you can get around the limit by adding “ProductDescription1,” “ProductDescription2,” etc. The catch is that an item is limited to 256 attributes. While SimpleDB is in beta, domains can’t be larger than 10 GB, and entire databases cannot exceed 1 TB.

One key feature of SimpleDB is that it uses an eventual consistency model.This consistency model is good for concurrency, but means that after you have changed an attribute for an item, those changes may not be reflected in read operations that immediately follow. While the chances of this actually happening are low, you should account for such situations. For example, you don’t want to sell the last concert ticket in your event booking system to five people because your data wasn’t consistent at the time of sale.

Google AppEngine Data Store

Google’s AppEngine Datastore is built on BigTable, Google’s internal storage system for handling structured data. In and of itself, the AppEngine Datastore is not a direct access mechanism to BigTable, but can be thought of as a simplified interface on top of BigTable.

The AppEngine Datastore supports much richer data types within items than SimpleDB, including list types, which contain collections within a single item.

You will almost certainly use this data store if you plan on building applications within the Google AppEngine. However, unlike with SimpleDB, you cannot currently interface with the AppEngine Datastore (or with BigTable) using an application outside of Google’s web service platform.

Microsoft: SQL Data Services

SQL Data Services is part of the Microsoft Azure Web Services platform. The SDS service is also in beta and so is free but has limits on the size of databases. SQL Data Services is actually an application itself that sits on top of many SQL servers, which make up the underlying data storage for the SDS platform. While the underlying data stores may be relational, you don’t have access to these; SDS is a key/value store, like the other platforms discussed thus far.

Microsoft seems to be alone among these three vendors in acknowledging that while key/value stores are great for scalability, they come at the great expense of data management, when compared to RDBMS. Microsoft’s approach seems to be to strip to the bare bones to get the scaling and distribution mechanisms right, and then over time build up, adding features that help bridge the gap between the key/value store and relational database platform.

Non-Cloud Service Contenders

Outside the cloud, a number of key/value database software products exist that can be installed in-house. Almost all of these products are still young, either in alpha or beta, but most are also open source; having access to the code, you can perhaps be more aware of potential issues and limitations than you would with close-source vendors.


CouchDB is a free, open-source, document-oriented database. Derived from the key/value store, it uses JSON to define an item’s schema. CouchDB is meant to bridge the gap between document-oriented and relational databases by allowing “views” to be dynamically created using JavaScript. These views map the document data onto a table-like structure that can be indexed and queried.

At the moment, CouchDB isn’t really a distributed database. It has replication functions that allow data to be synchronized across servers, but this isn’t the kind of distribution needed to build highly scalable environments. The CouchDB community, though, is no doubt working on this.

Project Voldemort

Project Voldemort is a distributed key/value database that is intended to scale horizontally across a large numbers of servers. It spawned from work done at LinkedIn and is reportedly used there for a few systems that have very high scalability requirements. Project Voldemort also uses an eventual consistency model, based on Amazon’s.

Project Voldemort is very new; its website went up in only the last few weeks.


Mongo is the database system being developed at 10gen by Geir Magnusson and Dwight Merriman (whom you may remember from DoubleClick). Like CouchDB, Mongo is a document-oriented JSON database, except that it is designed to be a true object database, rather than a pure key/value store. Originally, 10gen focused on putting together a complete web services stack; more recently, though, it has refocused mainly on the Mongo database. The beta release is scheduled for mid-February.


Drizzle can be thought of as a counter-approach to the problems that key/value stores are meant to solve. Drizzle began life as a spin-off of the MySQL (6.0) relational database. Over the last few months, its developers have removed a host of non-core features (including views, triggers, prepared statements, stored procedures, query cache, ACL, and a number of data types), with the aim of creating a leaner, simpler, faster database system. Drizzle can still store relational data; as Brian Aker of MySQL/Sun puts it, “There is no reason to throw out the baby with the bath water.” The aim is to build a semi-relational database platform tailored to web- and cloud-based apps running on systems with 16 cores or more.

Making a Decision

Ultimately, there are four reasons why you would choose a non-relational key/value database platform for your application:

  1. Your data is heavily document-oriented, making it a more natural fit with the key/value data model than the relational data model.
  2. Your development environment is heavily object-oriented, and a key/value database could minimize the need for “plumbing” code.
  3. The data store is cheap and integrates easily with your vendor’s web services platform.
  4. Your foremost concern is on-demand, high-end scalability — that is, large-scale, distributed scalability, the kind that can’t be achieved simply by scaling up.

But in making your decision, remember the database’s limitations and the risks you face by branching off the relational path.

For all other requirements, you are probably best off with the good old RDBMS. So, is the relational database doomed? Clearly not. Well, not yet at least.

Did you know that you can add a quick link to your site, to allow people who use Twitter make your link their status very quickly?

Enough with theorycraft, let’s get to the real thing, how would you like a cool link like this, and when people click on it, bam, their status advertise your site :)

For example, try to click this link (no virus, guaranteed):  Tweet This

With a bit of HTML, it would be very easy to do so:

<a href=””>Tweet This</a>

That is all the magic, happy tweeting