Distributed Authorization in Task Delegation

In my previous post, I was discussing how distributed authorization might be facilitated. Today, I want to discuss what effects such authorization tokens can have if we relay them, effectively achieving delegation.

“The first runner in a relay - kindergarten Sports Festival.” by MIKI Yoshihito. (#mikiyoshihito) is licensed under CC BY 2.0

Distributing authorization has the immediate effect that it (mostly) eliminates authorization servers. In practice, there always needs to be a machine to issue such tokens, of course – but it does not have to be consulted for every imaginable action someone takes. As I briefly discussed previously, authorization tokens can last a fairly long time (but should not last infinitely), and we can query for revocations quite carefully.

This attempt to eliminate authorization servers has two major benefits, and one downside.

Central authorization servers are choke points that can slow down distributed systems, because they’re involved in (most) every action taken. Removing them increases performance.
Such servers also become single points of failure. Removing them increases reliability.
The downside is that just as we have (mostly) accepted the web to be “eventually consistent”, we’ll have to do the same with authorization.

Eventual Consistency

Eventual consistency is a term that was flung around a lot more a good decade ago than it is nowadays, so let’s have a very quick recap.

A system is consistent if all nodes have the same view of the system state. In practice, this does not have to encompass the entire system, but is restricted to those nodes that participate in something or another.

For example, if Alice and Prilidiano both have a messenger app open that transmits messages directly, say via Bluetooth or local WiFi, then the system is consistent once Prilidiano receives the message Alice sent (or vice versa). This consistent state is of course the one we want to achieve.

We can easily have consistent states in synchronous systems. If Alice’s app does not consider the message sent until Prilidiano’s app has returned an acknowledgement, that’s where we are.

In an asynchronous system, Alice’s app may consider the message received once the send operation is done, without waiting for acknowledgement. This means there is a short period of time in which the system is inconsistent – Alice considers the message arrived, but Prilidiano is still receiving it.

Of course, with a direct connection, this takes split seconds. So let’s consider the more realistic situation where Alice and Prilidiano do not speak with each other, but use Ted as a relay. We’ll also introduce a problem here, and therefore consider a sequence of events:

Alice sends the message to Ted. Alice knows the message is sent, but cannot consider it received yet.
Ted sends the message to Prilidiano. Prilidiano considers the message received.
Prilidiano sends an acknowledgement to Ted.
Alice drives into a tunnel and loses internet connectivity.
Ted tries to forward the acknowledgement to Alice and fails.
Alice leaves the tunnel again and regains internet connectivity.
A second attempt by Ted to forward the acknowledgement succeeds.

When that last step is completed, the system is consistent again. But we don’t know how long Alice stays in the tunnel for – it’d definitely longer than the split seconds of direct communication. While no state is lost in this kind of situation, the overall system only becomes consistent eventually.

This is, of course, how messenger apps always work. It’s exceedingly rare for an app to completely lose a message. In the worst case, there will be some kind of error that a message could not be delivered within a reasonable time frame. The web is eventually consistent pretty much by design, after all.

Consistency with Multiple Parties

Ted in the preceding example is actually not very smart. She’s not supposed to be. That is, she’s just passing on messages, and doesn’t and shouldn’t care about the contents.

What if it was not a messenger, though? Or rather, what if the message was very specific? What if, for example, Prilidiano was a printer, and Alice was the owner of some document that she wanted to have printed?

The message could be “Hey Prilidiano, please print this document”, then, and the document attached. That’s a big message, but it doesn’t change much about the sequence above. But it’s worth noting that for some definition of consistency, the system is only consistent once Prilidiano has actually printed the document.

Except, that is, if Alice knows she’s about to drive into that tunnel. Or if she just has a bad data plan, and doesn’t want to send the document. Let’s introduce Dave, her data server.

OK, now the situation is this: Alice uses Ted to relay a message to Prilidiano. And the message is “Hey Prilidiano, could you please print this document that Dave holds?”. Here, the system can become consistent only after Dave and Prilidiano negotiated the file transfer.

Task Delegation

I think you might know where I’m going with this.

Distributed authorization makes it possible for Alice to include a token that authorizes Prilidiano to request the file from Dave. Alice is not really involved in the file transfer or authorization negotiation between Prilidiano and Dave, except that she initiated the entire process by sending an authorization token. Additionally, Ted cannot use the same token to retrieve the file herself, because it is tied to Prilidiano’s identity.

We have successfully delegated a task with authorization.

But wait! As usual, we have to worry about Eve and her constant attempts at eavesdropping!

Let’s for the sake of simplicity assume that Alice uses Ted as a relay, because she knows the latter is trustworthy. This is not Eve’s target, she has no chance here. And in either case, she’s not interested in whether Alice wants to print the document. She wants to know the document’s content. So she’s most likely to try and eavesdrop on the conversation between Prilidiano and Dave.

Alice is crafty, though, and only stores encrypted documents with Dave. Which means Eve has no chance to discern the document contents, and everything is fine.

Oh no! We forgot about Prilidiano! Prilidiano doesn’t know how to decrypt the document! He needs a decryption key before he can print it! So he goes and asks Alice…

… and that’s when Alice goes into the tunnel. Without a decryption key, Prilidiano can’t print the document, and Alice will be… well, not angry, just very, very disappointed.

Symmetric Key Token

The simplest approach to solving this is to embed the symmetric decryption key into the authorization token that Alice forwards to Prilidiano. If the symmetric key is encrypted with Prilidiano’s public key, only Prilidiano can use it.

Ted won’t be able to ask Dave for the file because the token does not name her. And if she happens to get the file anyway, she can’t decrypt it, because the key is available for Prilidiano only.

At the same time, Prilidiano can safely present the token to Dave. Dave can verify it to give Prilidiano access to the file, but also can’t process the decryption key. The file encryption remains safe; only Prilidiano can use it.

This kind of approach also has historic uses. It’s basically the envelope information of a file encrypted with OpenPGP in order to be safely stored and sent over the Internet. I think that it can be trusted reasonably well.

There are, however, a couple of partially unsolved issues with this approach. They have less to do with the use case as presented, however, but more with the system within which the use case applies.

The first and most obvious one is one of key exchange. Now in itself that’s a solved problem; Diffie-Hellman and variations of it abound, and choosing one is not particularly hard. The main reason to point this out is that up until now we’ve sort of glossed over how Alice, Prilidiano, Ted and Dave get to know about each other, and we’d like the key exchange not to throw a spanner into the works.
The next issue is that while the above makes for a good case of delegating authorization, it doesn’t really describe the entire system very well. Specifically, if Dave is a file server, chances are that Alice didn’t just dump a file there, but keeps updating it.

That means, we have two cases to consider now: one is where the token describes exactly one state or version of the file, and that is the only thing Prilidiano can download. This is very likely the case for printing, but it’s also possible that Prilidiano should e.g. get the latest, or some other version of the file (within the lifetime of the token).

We somehow have to deal with file versions in our authorization scheme.
Finally, if Dave is the kind of file server we’ll often use in e.g. a professional context, then it’s likely that not only Alice has made modifications to it, but Bob as well. The file, then, isn’t really just a single artifact, but something mixed together from Alice’s and Bob’s authorship. How does that fit into the authorization scheme?

Technically, these issues are adjacent to the main problem of delegating authorization; we can consider the main problem to be solved by mixing an encryption key into a distributed authorization token.

In practice, however, use cases are not that simple. We’ll have to consider how authorship by multiple participants fits into the scheme, and what that means for authorization processes – all the while we want to make sure that Alice and Prilidiano never really need to be online at the same time. We want to use Ted to buffer messages between them, and Dave to hold the actual file data that Prilidiano needs.

Candidates

The first thing that comes to mind – my mind, that is – is that this issue is actually already kind of solved, if you just bang two existing technologies together.

First, the idea that a document must be something static has been deprecated for quite some time. A lot of document formats, including those used by popular office suites, internally track updates to the document, while the software only presents the final results. We know this, because the software allows us to review changes to documents (if all the correct boxes are checked, and so forth). All we really need to do is connect each change to some kind of identity we can authenticate. That’s not too hard.

Second, if a document is a series of updates, well, so is a chat. The Signal messenger is based on a series of public specifications, which we could possibly adapt from “a series of chat messages” to “a series of updates to a document”, right?

In principle, yes – and indeed, there is much we can learn from these documents. But in practice, the situation is also sufficiently different that we’ll need a different scheme.

Chat assumes that the group of recipients is known before updates are added. That is, Alice would have to add every conceivable printer she might ever use to the group of authorized parties before she or Bob could write the document. That is just not how documents are handled.
If you look closely into the Signal specs, specifically into Sesame, you’ll see that the solution taken here is to encrypt every message with a key specific to the recipient. That is, every message is duplicated for each recipient. Although Signal updated its group handling, this part hasn't changed.

While that is feasible for small messages and/or groups of recipients, it swiftly becomes a burden when Alice and Bob are, for example, collaboratively editing an terapixel image that they might want to publish at a later stage.

Nevertheless, treating a document as a series of changes (from a null state or from some initial upload) will be key to solving this kind of problem. For the purposes of the next couple of articles, we’ll assume there’s a generic method for doing so – we just want to mix distributed authorization tokens and key exchange into this system so that each change can be encrypted, and the decryption key passed to authorized parties.

It’s also worth noting that while Signal’s approach doesn’t map so well to the document use case, the document use case could still easily be used to represent a group chat.

But all that is going to have to wait for the next article.

Cast of Characters

In this article, I’ve tried to stick to the typical cast of characters, but I had to come up with new roles. For the sake of completeness, here’s the list I’m using:

Alice (standard cast) is the protagonist, owner of a file, and initiator of actions.
Bob (standard cast) is another author contributing to the same file.
Dave (standard cast) in this story becomes less generic and is a data server.
Eve (standard cast) is an eavesdropper.
Prilidiano, he who remembers things of the past, is a networked printer or print server.
Ted (standard cast) is a trusted arbitrator, in this case just a communications intermediary. I used the generic Carol name in this space first, so used she/her pronouns. After changing the name to the more appropriate Ted, I decided to keep the pronouns. Deal with it.