Some of the reasons why Universal Plug and Play (UPnP) did not adopt Service Location Protocol V2 (SLP a.k.a Rendezvous) as its discovery protocol.
Pre-Note
This was an analysis I performed on the Service Location Protocol (SLP) while I worked at Microsoft on the UPnP project. We were evaluating SLP as the basis for UPnP discovery but decided against it for the reasons below. This analysis was made publicly available to the SLP working group chair and others. We discussed these issues and in some cases they agreed with our concerns but at the time SLP V2 was just about to be standardized and understandably they didn't want to start over again. So UPnP instead decided to move forward with the Simple Service Discovery Protocol (SSDP).
This paper assumes the reader is familiar with SLP. Specifically:
DA – Directory Agent – Collects service announcements and responds to service discovery requests.
SA – Service Agent – Advertises services
UA – User Agent – Someone looking for a service
BTW, there is heavy irony to section 8 given that I ended up designing an algorithm to deal with the problem but the decision was made to not include it in the UPnP end product.
1 Executive Summary
My main take away from SLP V2 is that it is the wrong design. The right design would appear to be a dirt simple discovery protocol. A discovery request would consist of a URI in the request and a URI in the response. No attributes or search would be supported in the discovery protocol. High-end devices would use the discovery protocol to find the directory server that would provide them with all the attribute and search support they need. Low-end devices cannot make use of attributes and search anyway. Low-end devices expect very few responses and will depend on intelligence based in other parts of the network to provide them with the right search results.
More specifically, SLP demonstrates four general types of problem:
1) Extensibility – SLP's design prevents the creation of value added extensions without central approval from IANA/IETF.
2) Re-Inventing Wheels – SLP replicates functionality in systems such as LDAP, HTTP, XML, URIs, DTDs, etc. without providing any significant benefit over any of them.
3) Security – SLP mandates use of X.509 that requires a complicated administrative structure, prevents decentralized implementation and foists upon us ASN.1.
4) Scalability – SLP provides no mechanisms to prevent network overload when many SLP enabled clients and services exist on the same network without a directory.
So it would seem that any way you look at SLP, it is a bad choice that barely meets our needs today and will prove a technological dead end as we move forward. Thus SLP is neither in our customer's nor our interest. We should actively examine alternatives.
2 Playing the Numbers
Imagine you have a digital camera with 1394 networking and you want to print out a picture on you home network. You have one, maybe two, printers total. So your camera, not having detected a DA, sends out a request asking for printers and gets back two responses. We will assume we have hacked SLP so one of the attributes returned is a "pretty" name for each printer, e.g. "Den printer" and "Kid's room printer", for example. The build in LCD screen displays two names, you select one and the picture is printed.
Now you're at work. You have the same digital camera and you want to print out a picture. The camera detects and sends out a request to the DA. There are 50 printers in your building and around 1,500 on campus. Depending on how the DA handles scopes, it will return between 50 to 1,500 results. Trying to select a printer from 50 to 1,500 on a tiny LCD screen is not a happy thought. More likely than not you will hook you digital camera into your PC and then use the PC to select the printer of your choice.
Imagine, however, that your digital camera did something that SLP does not directly support, although with industrious use of attributes we could certainly hack it in, what if the camera identified itself in its request? Alternatively, imagine that once your camera made its request the DA could contact your camera to find out more about it. Think of it as reverse discovery. The DA figures out that the device is a camera, checks the network connection and sees it is coming from your office and decides to be smart and display to the camera the printers your have pre-configured for use with your PC. More likely than not this will display one, maybe two, selections which will do the job. If not, the user can always fall back to the previous scenario.
The point is that devices that are designed for small networks will only work with small networks. Having attributes and queries and such in SLP does not provide real value. The device is either designed to handle a very small number of responses, in which case just asking for a single class of device and choosing from the results is the best action. Or the device is designed to handle an enormous number of responses in which case we want to give it the full power of LDAP/DASL etc. SLP offers a middle ground that meets neither device's needs and costs both unnecessary complexities.
This then argues for having a discovery protocol that is unbelievably dumb. Its only purpose is to provide the crudest level of functionality when there is no DA and if there is a DA, switches directly to the directory to do the heavy hitter stuff. Dumb devices will not be able to do the directory bit. Instead they will either be configured by third party smart devices or they will be probed by the DA who will "do the right thing" for them. In either case, keep your discovery protocol brain dead stupid.
Broadcast an opaque URI representing the requested service and get back one or more URIs indicating where the service may be found. No attributes. No queries. No fancy message formats. No big response codes. No extension fields. Broadcast a URI, get back a URI, THAT'S IT.
The first request is a URI asking for a directory. If that doesn't work, then ask directly for the desired service.
SAs who want to register with DAs should use the normal directory registration mechanisms that already exist in applications like LDAP.
3 Namespace Management
SLP is from the integer school of protocol extension. You assign an integer field and then let people register to get an integer that maps to their favorite functionality. This design requires everyone to go through a central authority every time they want to extend anything in anyway. Inevitably everyone fails to do this and numbers end up colliding.
SLP also uses the moral equivalent of "x-" headers. "X-" headers provide that one creates a private extension by adding a "x-" in front of the extension name. There are two problems with this:
1) Eventually a critical number of the "x-" extensions become very popular and, not wanting to loose compatibility with the devices that already support the "x-" header a "x-" finds its way into the main stream.
2) There is absolutely no collision protection provided here. How long before there are 50 x-printer attributes?
The solution to both integer fields and "x-" headers is to use URIs. URIs combine the best of centralized and decentralized design. Need your own private namespace that you don't want to have to share with anyone? No problem. Either you can register for a scheme name and then do whatever you please with the rest of the namespace or you can just generate a scheme name from the UUID-[Insert UUID here] namespace. In the later case you never have to coordinate anything with anyone. In all cases, you end up being able to extend at will without fear of collisions.
Below I give examples I found in the spec of non-extensible namespaces.
3.1 Function Ids
Only 256 functions are possible. This may seem like a lot, until people start adding their own.
3.2 Error Codes
SLP uses a flat 16 bit error code space. SLP provides for error extensions, but these extensions have their own problems.
3.3 Extension Ids
Extension Ids are an integer namespace that is divided into optional standardized extensions, mandatory standardized extensions and private extensions. The first two classes require central registration. The last one is a classic "x-" header.
3.4 Attribute Names
Attribute names must be defined with the service type. There is no way to add new names later without centrally registering them. Thus it isn't possible to "mark up" a template with new interesting but potentially proprietary information, at least not without risking collision. They should have just used URLs for attribute names. See the section on Data Extensibility for more on the power of adding new information in a backwards-compatible manner.
3.5 Block Structure Identifiers
These identifiers are used to specify what sort of authentication is being provided, again, a flat integer namespace.
3.6 Scope Names
Scope names have well… no scoping. Scope names are dependent on location to determine uniqueness. Everyone in the same location is supposed to somehow coordinate to ensure that they all use reasonable scope names. This is an understandable story given SLP's background as an Ethernet discovery protocol. But now imagine trying to use scope names with broadcast based media like wireless and power line networking. As I argued previously, we need to get rid of scope names as first class constructs all together. Instead the scope names need to be public keys which have a pretty name associated with them. BTW, we also need to differentiate between MADCAP type scopes/relative addressing and group scoping.
4 Data
4.1 Data Extensibility
SLP has its own data format that it uses for its attribute values. Attributes can be one of four different types including a catch all opaque type. SLP does not provide any way to extend an existing value in a decentralized manner.
Imagine, for example, one has the attribute printername. It is currently defined as a string. However you want to be able to slip in a programmatic ID that current clients should ignore but your clients can leverage. How does one stick an ID into printername="My Printer' without it showing up to down level clients?
One can add a new attribute called ProgrammaticID, if one is willing to centrally register it. Unfortunately, The new attribute cannot be registered without getting approval to up the version number of the associated template. Any time a new attribute is added to a template its minor version number must be increased. If they had just used URIs to name their attributes, we wouldn't have this problem. Better yet, if they had used XML for their attribute values we could have taken:
<d:printername>My Printer</d:printername>
and turned it into:
<d:printername>My Printer<e:programmaticID>1234</e:programmaticID></d:printername>
Down level clients (following the WebDAV XML ignore rule) would never "see" the new element and would act as if they had been sent the original value.
Decentralized backwards compatible data extensibility is a good thing and SLP does not support it.
4.2 Templates are just a pretty name for DTD
SLP's templates are what we would call a DTD, except they use a different format. Templates are a simple declaration mechanism to allow one to programmatically provide record information to DAs and others about exactly what attributes a service supports and what values those attributes take.
4.3 Information Leakage
The question the previous sections beg is – so what? So what if SLP invents its own attribute format and its own DTD format. After all, SLP is a self-contained system, the templates are really only used with SLP. Aren't they?
The funny thing about data is, it leaks. This harkens back to the old saying "Data wants to be free." This saying has nothing to do with anarchistic disrespect for intellectual property. Rather it is an observation. Data tends to get around, it hates it whenever someone tries to force it to stay in one place.
For example, that lovely little template, that harmless bit of data in its own completely unique data format that no one else understands. I wonder if it would show up in my SNMP requests (or equivalent)? Funny, the same data I used to discover a service is strikingly similar to the data I used to manage a service. Might this data also appear in my LDAP database as part of my inventory and network management system? Might it show up in my XML based procurement system? Data gets around. This is why it is so important to have consistent schemas and a consistent language to describe them in.
4.3.1 Service Types
SLP choose a naming system that is like URI's, except it uses a completely different syntax. Its format is expressed roughly as a "string" "." "naming authority", i.e. printer.xerox. A naming authority being some other organization that has to register its name with IANA but then owns all the strings before its name.
This mechanism may seem harmless enough until one starts to think of the way people will want to use service types. This is a classic example of information leakage. Rather than having a well known data type format with extremely well defined semantics, a.k.a. a URI. One now has this funky format with its own rules. As the data leaks, the complications will increase. How much simpler things would be if they had just used URIs. Better yet, think how wonderful things would be if they had just used XML. In that case not only can one use the decentralized management of URIs through the XML namespace mechanism but even better one could also use the annotation facilities XML provides.
4.4 Service URLs
It is amazing how many times people have to figure out that URIs MUST be opaque. Any time you try to smash data into a URI it always bites you. The literature on this point is extensive with http://www.xent.com/FoRK-archive/feb98/0238.html being one of the simplest and most obvious explanations.
Service URIs are used by SLP to instruct UAs on how to contact a service. They specify the type of service, the type of protocol, the address and path of the resource to talk to as well as any useful attribute information. All this packed into a single URI. All of which is completely non-extensible.
For what possible reason would anyone want to shove a complex data statement that would tax XML's expressive capabilities into a linear non-extensible data structure that suffers from all the problems described in previous sections?
5 Search
As I argued previously, there are really two types of devices. Those that are stupid and will always be stupid and those that are not. Stupid devices do not need half a query language and smart devices want a fully featured query language.
6 Message Format
SLP defines yet another message based client/server format. The world doesn't need yet another client/server protocol. We have one that does the job quite nicely, HTTP. Let's just use it.
7 Authentication
SLP does not provide for a mechanism to encrypt its messages. This allows people on broadcast networks to sniff for the sorts of services one is trying to discover. Do you want to go to the airport and broadcast to the world what sort of services you are seeking? One can imagine criminals who scan broadcasts along roadways for people asking for AAA. I will leave it to the reader's imagination to imagine what sort of devices one's wireless network inside one's house might contain and how potentially embarrassing it would be were anyone to know said devices were present.
However, even if encryption were added or if authentication were sufficient, SLP's requirement of mandatory support for DSA with X.509 v3 certificates is a showstopper. Just in case your system doesn't have enough heavy weight parsers, your digital camera now gets to implement ASN.1. But much worse than that, the entire design of X.509 requires a directory because it uses a certificate path to identify principals. In addition it requires the various rules for transfer of authority, such as requiring that only CAs can transfer trust. Even if one has a CA, one still can't even directly express what authority is being transferred. One has to check the directory for that.
In practice this means saying something as simple as "This camera is authorized to talk to this printer" ends up requiring a directory look up.
X.509 is the poster child for heavy weight, difficult to process and impossible to manage networks. We need a simple, lightweight, decentralized security solution.
8 Bad Behavior in Directoryless Environments
SLP uses a reserved administrative multicast channel to allow clients and services to discover each other and their directory. The problem with this solution is that in many networks the number of machines in a single administrative multicast channel can reach upwards of 100,000. Hence if one has many clients and services that happen to support SLP hanging out on the same network with no directory the entire network will very quickly be over whelmed with network traffic. One can conjure up all sorts of horrible scenarios, imagine if the power goes off and all the machines come on-line at once. Now imagine all of those machines simultaneously issuing requests looking for a directory at once.
A real world example would be useful. When a client or service starts up it is expected to wait a random period of time between 0 and 3 seconds (CONFIG_START_WAIT) before trying to discover the directory using a multicast request. Once a request is issued the issuer is expected to wait 2 seconds (CONFIG_RETRY) before repeating the request. The issuer may retry the request after 4 seconds have passed and may try one last time at 8 seconds. In other words, the wait interval is increased exponentially. The third re-try is the last one as the issuer must not attempt discovery after 15 seconds (CONFIG_MC_MAX), 2+4+8 = 14 seconds.
Assuming that all 100,000 machines simultaneously want to perform directory discovery after a power outage and assuming that their requests are evenly distributed over 3 seconds and assuming that the discovery requests are 512 bytes then the total network load will be:
((512*100000)*4)/17 = 12047058 bytes/second = 11764 Mbytes/second = 94,117 Mbits/second
Of course this is a very crude calculation and isn't exactly correctly. What I have done is taken the total number of bytes that will be sent and evenly spread them over the time period in which they will be transmitted. The actual network behavior will be worse than my calculation indicates because of clustering of requests but the math is fair enough to demonstrate the problem.
Without some sort of throttling mechanism SLP presents a clear and present danger to existing networks.
This then begs the problem, why would anyone be daft enough to deploy SLP on a 100,000 machines without implementing a directory?
To understand why this is a concern one needs to understand the deployment scenario we wish to support. We want UPnP to be "zero admin". This means that clients and services can discover each other without anyone having to configure anything. This is very important for certain scenarios, such as the home. This means that Windows platforms supporting UPnP will ship with UPnP to "on" so that the UPnP machine will automatically be able to find appropriate services and offer them to the user. If UPnP is defaulted to off then its utility is reduced to zero. Users won't know its there and won't be able to discover their services. Furthermore we can't ask users questions like "Should UPnP be turned on?" UPnP's target audience is average users who aren't equipped to understand the question.
In the case above the real problem is that SLP needs to understand that it should turn off. SLP does not have any provisions that enable it to de-activate.
Thus any discovery protocol we adopt must provide for a mechanism to automatically turn itself off when run in inappropriate environments, such as a network with 100,000 clients on a single local administrative multicast scope and no directory. Note the use of the term "automatically". We must not require administrators to perform any action in order to successfully use a Windows machine with UPnP support in the case where they do not wish to use UPnP. That is, if an administrator buys a bunch of UPnP enabled Windows machines and doesn't even know what UPnP is they must not be required to deploy some magic mechanism to turn UPnP off. Each Windows box must, on its own, determine there is a problem and turn off UPnP discovery without damaging the network.