The limits of recovering from application logic failures

I have been blathering on all week about how to prepare for application logic failures in services and how to potentially recover from the damage those errors cause. I have yammered on about command journals (twice), tombstones, versioning etc. But none of these techniques is magical. They all have very serious limits that mean in most non-trivial cases the best one can really do is say to the user ”Here is the command I screwed up, here are the specific mistakes made, here is what the values should have been, do you want to repair this damage?” Below I explore three specific examples of those limits that I call: read syndrome, put syndrome and e-tag effect.

This article is part of a series. Click here to see summary and complete list of articles in the series.

Continue reading The limits of recovering from application logic failures

Tombstoning on top of Windows Azure Table Store

After command journaling probably the next most effective protection against application logic errors is tombstoning (keeping a copy of the last version of a deleted row). In this article I propose a design for adding tombstoning to Windows Azure Table Store using two tables, a main table and a tombstone table.

This article is part of a series. Click here to see summary and complete list of articles in the series.

Continue reading Tombstoning on top of Windows Azure Table Store

Thoughts on implementing a command journal

I had previously concluded that command journaling (creating a journal of all the external user commands and internal maintenance commands I issue) is really useful for recovering from self inflicted data corruption. In this article I look into the various techniques I can use to implement a command journal so as to trade off between system performance and the journal’s utility in recovery.

This article is part of a series. Click here to see summary and complete list of articles in the series.

Continue reading Thoughts on implementing a command journal

Techniques to Ease Recovering from Self Inflicted Data Corruption

In a previous article I argued that even with the protections Windows Azure Table Store provides for my data I can still screw things up myself and so need to put in place protections against my own mistakes. Below I walk through the three scenarios I previously listed and explain how command journaling, tombstoning and versioning could make recovering from my errors much easier.

This article is part of a series. Click here to see summary and complete list of articles in the series.

Continue reading Techniques to Ease Recovering from Self Inflicted Data Corruption

The Limits of Command Journals

In a previous article I argued that I needed some kind of journaling/backup for my Windows Azure Tables in order to make it easier for me to recover from my own screw ups. One type of journaling I suggested was command journaling. In this article I look at the practical limitations of command journals and conclude that while they are (somewhat) useful for notifying users who might have been affected by data corruption they aren’t likely in the general case to be re-playable so their real value is probably less than it might appear.

This article is part of a series. Click here to see summary and complete list of articles in the series.

Continue reading The Limits of Command Journals

Do I need to backup/journal my Windows Azure Table Store?

Windows Azure provides a highly scalable, reliable, fault resistent table store. So in theory my service can dump data into the table store and walk away secure in the knowledge that I’ll get back what I put in and that the data will be there when I need it. So is there any reason I should care about backing up or journaling my Windows Azure Tables? As I argue below the answer is - yes. But the reason isn’t to protect me against Azure’s mistakes, it’s to protect me from myself.

This article is part of a series. Click here to see summary and complete list of articles in the series.

Continue reading Do I need to backup/journal my Windows Azure Table Store?

Why does OAuth need request tokens?

OAuth's current access dance is based getting a request token that is later exchanged for an access token. Introducing the request token takes what could have been a 4 round trip protocol and makes it into a 6 round trip protocol. Couldn't we just simplify OAuth down to 4 round trips by getting rid of the request token all together? Or is there some critical use case enabled by request tokens that makes all the complexity worth the price?

[5/26/2009 – Updated with Q&A on open redirectors]

[6/2/2009 – Updated with a note from Allen Tom on another way to prevent open redirector attacks]

Continue reading Why does OAuth need request tokens?

Claims, Tickets and HTTP – Security protocols for services

I'm writing an enterprise service. A request comes in. Do I honor the request or reject it? Answering this apparently trivial access control question has spawned whole universes of interlocking protocols. Kerberos, Shibboleth, SAML, WS-*, Liberty, OAuth, OpenID and so on. Before I can pick which protocol to use I need to define my requirements.

DISCLAIMER: Although I am an architect on .NET Services' Access Control Service nothing said in this document necessarily represents the opinions of my employer, my friends, my enemies or my teddy bears. No warranty express or implied. Your mileage may vary. Do not remove tag.

Continue reading Claims, Tickets and HTTP – Security protocols for
services

What do program managers on the Cosmos team do anyway?

In previous articles (here and here) I have talked about what software program managers do. And in another previous article I talked about Cosmos. In this article I bring the two topics together and talk about what Cosmos program managers actually do. (For those just joining us Cosmos is Microsoft's internal platform for reliably storing and processing petabytes of information such as all of Microsoft's log data from its various websites.) The issue of what PMs on the Cosmos team do is near and dear to my heart because I'm the lead program manager for Cosmos and we are hiring!

Continue reading What do program managers on the Cosmos team do anyway?

What is Microsoft's Cosmos service?

Cosmos is Microsoft's internal data storage/query system for analyzing enormous amounts (as in petabytes) of data. As the lead Program Manager for Cosmos I can't say too much about it but what I can do is take a tour of the information that Microsoft has published about Cosmos. So read on if you are interested in the architecture Microsoft uses to store and query petabytes of data and what technical issues Microsoft's approach brings up.

Continue reading What is Microsoft's Cosmos service?