While learning how to use Amazon Web Services, a new word entered my vocabulary – idempotent. It is a disconcerting word, sounding a little like something that unlucky men might want to get a prescription to treat.
Idempotence within a system means that repetitions of an action do not effect the outcome. I am not going to go in to the etymology or the mathematical definitions of the concept, but I will attempt to explain why as a programmer this is interesting.
The Audition
No matter how many times a function is performed with the same inputs, the output is the same. The time has come for the audition, the curtain raises, enter stage right the programmer, beaming at the audience.
Programmer mutters under her breath the reassuring mantra “arrange, act, assert” takes a bow and shows the audience a few empty boxes. She has done this all before; she knows what to expect. She requests a few valuable items from members of the audience and through a myriad of different impressive feats, these items fly through the air and land in the boxes. Then she shows the results to the audience members who respond with enthusiastic applause.
Between each routine, the programmer arranges everything back to a known starting point, emptying the boxes. She requests a single item from each audience member. Using magic, she transports each item to a box. She shows the approving audience that yes, the right item is in the right box then returns it.
This kind of magic is possible when the audience is well behaved and predictable. A request for one item results in one item given.
There’s a few unfamiliar faces who come in late and sit at the front. They heckle, they have their own magic. Programmer on stage thinks she can handle them using the same tricks she has always known. Amazon sisters SimpleDB know differently.
The Heckle
Some of the Amazon web services like SQS and SDB promise that something will happen at least once. While the programmer is used to receiving one or more item from the audience, she starts noticing something a little strange about the things coming from these Amazon hecklers. At some point in midair it appears that one unique item becomes two or more. She has to work harder to catch these clones. She is glad she was prepared and had asserted what she was expecting; or those clones could have flown by unnoticed and turned in to nasty dangerous bugs. The kind that have crept up behind others and caused nasty injuries. She hates bugs.
What value is there in letting this disruptive element to the show? Are these non-idempotent Amazon services providing any benefit? There are two forms of scalability to consider – up and out.
Scaling up is a more traditional solution. This requires adding capacity to a service provided on a machine which is achieved by means of adding memory, processors and hard drive space.
Scaling out is to add more (cheaper) machines to handle traffic to a service. Instead of one mega server machine, many drones can provide the same service while it may appear as one mega machine to the outside world. Keeping the worker drones synchronized is difficult. While eventually all the drones will know what the others have done, the overhead of synchronization means they will not know instantly.
Server capacity can be added by adding another drone during peak demand. Drones can be quietly dispatched when they are no longer required, saving energy and money. A well architected system is more resilient because a new drone can take over seamlessly from a failed one.
The challenge of potentially large lag times before data is consistent forces a programmer to think differently. Similar challenges exist in writing well-behaved multithreaded systems; except here there are more things to go wrong during data transmission and storage.
Solutions
There were two approaches she decided to take in order to deal with these clones.
For tricks where a single item was expected back, she arranged a box to vaporize the old when receiving the new. This was a simple variable overwrite; each new unique value had a unique box that was created if it did not exist. If it did exist, the contents were replaced by the next received item. Unfortunately that did not solve tricks that required aggregation. The vaporizing box was no good for keeping count of things.
The second approach she attempted was to remember the items she had already seen and throw the unexpected clones off stage. This was okay when she could keep all that information in memory. There was a limit to how many unique things she could remember before she has to ask for timeouts. She had to start to use a list to check against things she has already dealt with to weed out the mischievous clones.
The programmer was just one consumer; the audience producing the items to store away could scale out to hundreds… thousands… millions. While some of the audience were okay with waiting, others grew unimpressed and left. Being an idempotent consumer was turning out to be a bottle neck.
The trick with different models of watches all going to the same box no longer worked. The audience would believe six watches were thrown, but the box would could contain many more – it was cluttered with clones. Some audience members only cared if they could get their watch back, but for the accountants in the audience the total was important.
Timing and order was a challenge. Sometimes item clones would pop out of the air a long time after receiving the original items. Things would pause midflight and arrive inexplicably after things thrown later.
The hecklers were introducing all kinds of complexity to what used to be a simple routine. Things would be dealt with eventually, but it was getting difficult to predict the end of an act.
Conclusion
So the imagery is a little silly but with any luck if you are a programmer with any level of experience that did not know what idempotent meant… you have some idea now. Writing scalable idempotent consumption of a service that sends the same thing more than once can be tricky.
The Amazon Queue Service (SQS) may deliver a message more than once. Amazon SimpleDB may repeat a unique result across different query pages. A simple for each bar do foo will not suffice if foo should only be executed once per bar. Foo must be idempotent, or the collection of bar filtered.
The two common approaches I covered boil down to uniquely identifying everything and storing using that identity, and processing things stored by tracking the unique identify of things already processed.
Keeping track of data acted upon is necessary for things that should only happen once. For example, a cash machine should only ever deduct the money taken out of an account for a unique withdrawal transaction once; otherwise the account balance would be incorrect.
Work can be split up by techniques that involve using “shards” and “map reduce” – both out of scope for this particular article. Though as a note, the reduce part of map reduce in implementations I have seen must be idempotent.
Programmers with a functional language in their arsenal probably have an edge here over those purely from a purely object-oriented background because of the way those languages make you think about data.
Keep in mind two things if you are considering data processing to cloud technologies. Keep the word “idempotent” in mind and pay attention to the “at least once” small print in the documentation. Secondly some services only promise to be eventually correct – decide when that is good enough. Do your users really need to know what the state is as of right that second, or is a minute ago good enough?
I am still learning myself, so if you have any insight or articles of your own please share them in the comments.