Archive for May, 2011

I have been promising myself (and others) to write about the ColdFusion UUID implementation for quite a while now and I feel like I have been procrastinating long enough. So at long last the definitive guide to ColdFusion UUIDs, based on many years of experience and a few conversations with the ColdFusion engineering team over beer at the MAX.

What is a UUID

A UUID is an Universally Unique Identifier which is just a fancy name for a 128-bit integer. While a 128-bit integer is a really large number, it is not an infinite number, so it is not really unique, it is just so rare for a conflict to occur that we normally just presume it is actually unique. This 128-bit integer is typically represented as a hexadecimal string split into 5 groups by hyphens in the pattern 8-4-4-4-12. This UUID is typically generated from one of 5 different algorithms:

  1. MAC address based
  2. DCE based
  3. MD5 hash based
  4. Random
  5. SHA-1 hash based

Each of these versions offers different guarantees for uniqueness and randomness. For ColdFusion developers the import version are 1 and 4.

MAC address based UUIDs

The algorithm for a MAC address based UUID is based on 3 different components:

  1. timestamp
  2. clock sequence
  3. node identifier

The timestamp is a 60-bit integer counting the number of 100 nanosecond increments since the beginning of the Gregorian calendar in 1582. The clock sequence is an initially random number used to prevent duplicate UUIDs when the time is reset backwards for instance through an NTP client. The node identifier is a supposedly unique identification for the node on which the UUID is generated. Since this node identifier is typically the MAC address of one of the NICs of the system this version is commonly referred to as a MAC based UUID.

From this algorithm a few things stand out:

  1. The timestamp will overflow in stardate 3400 or something and from that moment on the generated UUIDs may conflict with earlier generated UUIDs. But since I doubt anybody was generating UUIDs in 1582 it is safe to assume the first actual conflicts from that will occur a few hundred years later.
  2. The UUID is only as unique as the MAC address is. While MAC addresses are supposedly unique anybody who has run a somewhat larger network like a campus network will know that in reality they are not.
  3. It is impossible to generate more then 10 million version 1 UUIDs per second per node due to the 100 nanosecond timestamp resolution.
  4. MAC based UUIDs are actually quite predictable.

The MAC based algorithm is the algorithm used in ColdFusion.

Random UUIDs

Random UUIDs are generated mostly random. The version number and 2 other bits are restricted, but the other 122 bits are generated from a random source. This means:

  1. Version 4 UUIDs are unpredictable.
  2. Version 4 UUIDs are more likely to conflict than version 1 UUIDs. Still for all practical purposes they are unique.
  3. The quality and speed of the generation of version 4 UUIDs depends on your entropy source.

Amongst others, java.util.UUID is one of the implementations of a version 4 UUID generator.

UUIDs in ColdFusion

UUIDs are generated in ColdFusion through the createUUID() function. This function generates UUIDs using the version 1 algorithm (MAC address based).  The one thing that makes these UUIDs stand out very much is that they have a non-standard string representation. Instead of being grouped in 5 groups with the pattern 8-4-4-4-12 they are grouped in 4 groups with the pattern 8-4-4-16. I have been told this was an unintentional deviation that was not discovered until after shipping and then backward compatibility was deemed more important than conforming to the string representation of others.

The ColdFusion createUUID() function gets interesting with the rewrite to Java in ColdFusion MX. At that time Java had no API to find the MAC address of a NIC in the system, so on Windows a little bit of native code in NeoUUID.dll was used to find the MAC address and on other platforms a MAC address was faked. When doing a native Java deployment on Windows (EAR/WAR file) the system would also fall back the same as on other platforms. In addition the timestamp resolution of the Sun JVMs was rather limited (10 milliseconds on Windows, 1 millisecond on other platforms). Since you can generate only one UUID per clock tick, the theoretical limit for the number of UUIDs generated per second was 100 on Windows (64 on multi-core systems).

A particular problem in this version was a bug in the Sun JVM where using createUUID() would cause the system clock to move forward a little bit. Under heavy use the clock would move forward up to 12 seconds per minute. Then when the time was resynchronized with the NTP server and the server clock went back a minute or so, the generation of UUIDs was stalled until the system was back in the future. Very much the intended behavior of a UUID generation algorithm that values uniqueness over everything else, but still an unpleasant surprise.

With the arrival of ColdFusion 9 createUUID() got a speed boost. The implementation was rewritten from using a millisecond time API to use a new Java API that provides timestamps with a nanosecond resolution. That means the theoretical limit of 100 or 1000 UUIDs per second got increased to 10 million per second. The practical limit is still a bit lower because the clock tick is not really 1 nanosecond, but the speed improvement is still very significant. The speed of createUUID() now actually varies depending on the clock speed of the hardware you use to run the test.

GUIDs in ColdFusion

In addition to a UUID datatype ColdFusion also has a GUID datatype. This is another 128-bit integer that is unfortunately incompatible with ColdFusion UUIDs because it uses the 8-4-4-4-12 string representation . On the other hand it has the huge benefit that it is compatible with the way the rest of the world represents UUIDs so we can natively exchange them with Java, databases etc. instead of having to serialize them to a string. I have written previously about the performance benefits you can reap if you use a native uniqueidentifier datatype in MS SQL Server instead of a string representation.

What ColdFusion does not have is a native function to generate GUIDs. Typically this is solved by generating GUIDs from UUIDs by just inserting another hyphen, or by falling back to the Java java.util.UUID class. Just remember that when you use the ColdFusion createUUID() function you get better uniqueness guarantees since it is a version 1 UUID, while when using java.util.UUID you get better performance since it is a version 4 UUID (if you have sufficient entropy).