I have been promising myself (and others) to write about the ColdFusion UUID implementation for quite a while now and I feel like I have been procrastinating long enough. So at long last the definitive guide to ColdFusion UUIDs, based on many years of experience and a few conversations with the ColdFusion engineering team over beer at the MAX.

What is a UUID

A UUID is an Universally Unique Identifier which is just a fancy name for a 128-bit integer. While a 128-bit integer is a really large number, it is not an infinite number, so it is not really unique, it is just so rare for a conflict to occur that we normally just presume it is actually unique. This 128-bit integer is typically represented as a hexadecimal string split into 5 groups by hyphens in the pattern 8-4-4-4-12. This UUID is typically generated from one of 5 different algorithms:

  1. MAC address based
  2. DCE based
  3. MD5 hash based
  4. Random
  5. SHA-1 hash based

Each of these versions offers different guarantees for uniqueness and randomness. For ColdFusion developers the import version are 1 and 4.

MAC address based UUIDs

The algorithm for a MAC address based UUID is based on 3 different components:

  1. timestamp
  2. clock sequence
  3. node identifier

The timestamp is a 60-bit integer counting the number of 100 nanosecond increments since the beginning of the Gregorian calendar in 1582. The clock sequence is an initially random number used to prevent duplicate UUIDs when the time is reset backwards for instance through an NTP client. The node identifier is a supposedly unique identification for the node on which the UUID is generated. Since this node identifier is typically the MAC address of one of the NICs of the system this version is commonly referred to as a MAC based UUID.

From this algorithm a few things stand out:

  1. The timestamp will overflow in stardate 3400 or something and from that moment on the generated UUIDs may conflict with earlier generated UUIDs. But since I doubt anybody was generating UUIDs in 1582 it is safe to assume the first actual conflicts from that will occur a few hundred years later.
  2. The UUID is only as unique as the MAC address is. While MAC addresses are supposedly unique anybody who has run a somewhat larger network like a campus network will know that in reality they are not.
  3. It is impossible to generate more then 10 million version 1 UUIDs per second per node due to the 100 nanosecond timestamp resolution.
  4. MAC based UUIDs are actually quite predictable.

The MAC based algorithm is the algorithm used in ColdFusion.

Random UUIDs

Random UUIDs are generated mostly random. The version number and 2 other bits are restricted, but the other 122 bits are generated from a random source. This means:

  1. Version 4 UUIDs are unpredictable.
  2. Version 4 UUIDs are more likely to conflict than version 1 UUIDs. Still for all practical purposes they are unique.
  3. The quality and speed of the generation of version 4 UUIDs depends on your entropy source.

Amongst others, java.util.UUID is one of the implementations of a version 4 UUID generator.

UUIDs in ColdFusion

UUIDs are generated in ColdFusion through the createUUID() function. This function generates UUIDs using the version 1 algorithm (MAC address based).  The one thing that makes these UUIDs stand out very much is that they have a non-standard string representation. Instead of being grouped in 5 groups with the pattern 8-4-4-4-12 they are grouped in 4 groups with the pattern 8-4-4-16. I have been told this was an unintentional deviation that was not discovered until after shipping and then backward compatibility was deemed more important than conforming to the string representation of others.

The ColdFusion createUUID() function gets interesting with the rewrite to Java in ColdFusion MX. At that time Java had no API to find the MAC address of a NIC in the system, so on Windows a little bit of native code in NeoUUID.dll was used to find the MAC address and on other platforms a MAC address was faked. When doing a native Java deployment on Windows (EAR/WAR file) the system would also fall back the same as on other platforms. In addition the timestamp resolution of the Sun JVMs was rather limited (10 milliseconds on Windows, 1 millisecond on other platforms). Since you can generate only one UUID per clock tick, the theoretical limit for the number of UUIDs generated per second was 100 on Windows (64 on multi-core systems).

A particular problem in this version was a bug in the Sun JVM where using createUUID() would cause the system clock to move forward a little bit. Under heavy use the clock would move forward up to 12 seconds per minute. Then when the time was resynchronized with the NTP server and the server clock went back a minute or so, the generation of UUIDs was stalled until the system was back in the future. Very much the intended behavior of a UUID generation algorithm that values uniqueness over everything else, but still an unpleasant surprise.

With the arrival of ColdFusion 9 createUUID() got a speed boost. The implementation was rewritten from using a millisecond time API to use a new Java API that provides timestamps with a nanosecond resolution. That means the theoretical limit of 100 or 1000 UUIDs per second got increased to 10 million per second. The practical limit is still a bit lower because the clock tick is not really 1 nanosecond, but the speed improvement is still very significant. The speed of createUUID() now actually varies depending on the clock speed of the hardware you use to run the test.

GUIDs in ColdFusion

In addition to a UUID datatype ColdFusion also has a GUID datatype. This is another 128-bit integer that is unfortunately incompatible with ColdFusion UUIDs because it uses the 8-4-4-4-12 string representation . On the other hand it has the huge benefit that it is compatible with the way the rest of the world represents UUIDs so we can natively exchange them with Java, databases etc. instead of having to serialize them to a string. I have written previously about the performance benefits you can reap if you use a native uniqueidentifier datatype in MS SQL Server instead of a string representation.

What ColdFusion does not have is a native function to generate GUIDs. Typically this is solved by generating GUIDs from UUIDs by just inserting another hyphen, or by falling back to the Java java.util.UUID class. Just remember that when you use the ColdFusion createUUID() function you get better uniqueness guarantees since it is a version 1 UUID, while when using java.util.UUID you get better performance since it is a version 4 UUID (if you have sufficient entropy).

During SoTR 2011 I presented on using XFA PDF forms (a.k.a. LiveCycle forms) with ColdFusion. Slides and materials are now available for download.

A after-conference addition is that Chandan Kumar from Adobe confirmed that the issue with overwritedata=”yes” in the cfpdf tag is resolved in ColdFusion so you don’t need to add it to all cfpdf populate operations anymore once the fix is released / installed.

Last week I had the second Flex 4 Crash Course session at the Adobe office in Amsterdam as an introduction to Flex for people with no previous Flex experience. (Although there were some familiar faces in the audience.) The training material was provided by Adobe and I am not allowed to publish all the originals, but I can share the slides with all the links to external resources.

flex4_crash_course_slides_extract

In a recent discussion on cf-talk the question was asked how to improve the performance of ColdFusion when working with very large XML documents. One of the solutions proposed was to use StAX and that got me thinking. StAX is a stream processor works very different from what you may be used to from other XML processors. Instead of viewing an XML document as a whole and elements in context to their parents, children and siblings, it just treats the whole document as a sequence of items. Each of these elements can be of type elementstart, elementend, comment, entity etc. The way you work with this is you iterate through all the items in your document and process them one by one. Working that way is sufficiently different to make it necessary to rewrite all your processing from scratch if you want to switch from the built-in processor to StAX which makes it a solution that is not so attractive.

But what if we combine a preprocessing step in StAX to split the large XML document into smaller pieces with the regular processing in ColdFusion? StAX is Java so it is easy to integrate it into ColdFusion and to test this I wrote a sample implementation to test if this would help. It has some limitations such as only handling elements, element text and attributes, but it seems to work just fine (and the code is open for improvement). With this I benchmarked some XML files I downloaded from internet with the following results:

Source file Source size Split on Records Time
http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz 111 MB regions 1 24274 ms
http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz 111 MB mailbox 21750 146999 ms
ftp://ftp.nlm.nih.gov/nlmdata/sample/medline/medsamp2011h.xml.zip 164 MB 30000 30000 472043 ms

As you can see how you are splitting a document has a significant impact. I presume this is mostly due to the impact the write operations have on my laptop with a slow 5400 rpm harddisk. On the other hand in the best case scenario the parsing speed is over 4 MB per second. Memory consumption stayed under 200 MB for the whole server so it looks like there are some scenario’s where this might be useful.

Code for xmlSplitter.cfc, tested on CF 9.01, 64-bit with StAX 1.2.0 and Java 1.6u24 64-bit.

Product:               Seapine TestTrack Pro
Vulnerable versions:   2010.x, 2011.x
Vulnerability:         predictable session cookies
Vendor informed:       2010-09-07
Fix available:         no

Info:
TestTrack Pro is an issue tracking application from Seapine

Vulnerability:
TestTrack Pro offers a SOAP interface which works as follows:
- connect with username and password to retrieve a list of available
  projects: getProjectList(username, password);
- connect with username and passsword to retrieve a session login cookie
  on a project: projectLogon(project, username, password);
- query the system to retrieve project data using the session login
  cookie to authenticate: getRecordListForTable (cookie, .....);
- log off the session: databaseLogoff(cookie).

The session login cookies generated by the server are predictable. Below
is a log file from the connections showing the date and time of a log
entry, and then the cookie used for authentication:
"09/07/10","11:18:19","1246111"
"09/07/10","11:18:22","1246115"
"09/07/10","11:18:44","1246123"
"09/07/10","11:18:46","1246127"
"09/07/10","11:18:51","1246132"
"09/07/10","11:18:53","1246139"
"09/07/10","11:19:16","1246144"
"09/07/10","11:19:18","1246151"
"09/07/10","11:19:33","1246156"
"09/07/10","11:19:35","1246163"
"09/07/10","11:19:51","1246167"
"09/07/10","11:19:53","1246175"

The absolute value of the session cookie is related to the server
uptime, starting near 0 when the server is just started and increasing
monotonic afterwards.

History:
2010-09-07 Seapine was informed and assigned case number 121426
2010-09-08 Seapine confirmed the issue as a known issue and scheduled a
           fix in 'an upcoming 2011.0.x maintenance release'.
2010-12-20 TestTrack 2011.1 was released without a fix.
2010-12-24 Seapine was asked to publish a security bulletin detailing
           risks and mitigations despite no fix being availale
2011-02-02 Seapine was informed this issue would be publicly disclosed
2011-02-13 Submitted to bugtrack and published on my blog

One of the ways to translate LiveCycle Designer PDFs is using XLIFF. Since an XDP file is essentially an XML file the process is to run an XSLT transformation over the XDP to extract all the text strings. Not just the captions, but also all the tooltips, screenreader texts, image alt’s etc. Then you can send the generated document with strings to a translator, and some time later you get a translation back that you can merge into your template using a different XSLT transformation. The whole process is described pretty well in Using XLIFF for translating Adobe® LiveCycle® Designer ES form designs on Adobe DevNet.

While putting together a few extra exercises for the LiveCycle ES2 Designer Specialist training I stumbled over something weird in the XSLT templates from Adobe. While the tools for translating XLIFF files all presume the original string is in the source element and the translated string is in the target element, the Adobe XSLT templates presume the translated string is put back into the source element. So for an English to Dutch translation tools create:

<trans-unit id="09DD30F3-CE03-4FF1-92D3-067126FF904E" resname="09DD30F3-CE03-4FF1-92D3-067126FF904E">
    <source>Awesome!</source>
    <target>Keigaaf!</target>
</trans-unit>

But the XSLT templates Adobe distributes with LiveCycle Designer (look in the%programfiles%\Adobe\Adobe LiveCycle Workbench ES2\Adobe LiveCycle Designer ES2\FormTranslation\ folder) expect:

<trans-unit id="09DD30F3-CE03-4FF1-92D3-067126FF904E" resname="09DD30F3-CE03-4FF1-92D3-067126FF904E">
    <source>Keigaaf!</source>
</trans-unit>

Luckily this is easy to fix (once you figure out what is going on) by changing the XSLT template for merging the translations back into the XDP file to select the target node instead of the source node at line 66 of mergestrings.xslt:

<xslt:variable name="translatedNode" select="$s2x[@id=$idToGet][1]/target" />

A version of mergestrings.xslt with this change is available for download.

This was written in response to a forum question, but I figured it might be useful for more people.

There are many ways to deploy ColdFusion code to a server. Probably the most prevalent, especially considering shared hosting, is using FTP to upload CFML templates to the server.  Tools such as DreamWeaver and CFBuilder allow you to do so right from your IDE. Another way to do it is to run some Ant script or batch file and extract the sources straight from source control to the server. With a little bit of effort you get much more control and much more reproducible results.  At Prisma IT we prefer to go a step further and deliver the ColdFusion applications we build as Enterprise ARchives (EARs) to our clients.  This allows us even more control, especially when we don’t have any.

Let me explain that a bit. We have several clients where we do all their development, but final deployment is done on the clients infrastructure. If we are lucky, we may have read-only access to the User Acceptance Testing servers, but sometimes we don’t even have that. In those cases deploying an application is completely up to the client (or their hosting partner). That leaves us no wiggle room to deal with stuff that could go wrong during a deployment. With EAR files we eliminate a huge number of risks from the process. An EAR file is a full application, so there is no risk that some files get forgotten. And we have the MD5 to prove it. Since it gets deployed to its own temporary folder, there is no chance of any old files remaining on the server ans slipping in to the server (the cfclasses folder is famous for that).

The one thing you need to solve for this is how to configure your application. If the client had to log in to the ColdFusion Administrator after deploying the application to configure datasources, mappings etc., it would be just as easy to do something wrong. So what we do instead is to have the client place a properties file on the class path with a bunch of configuration settings. Standard ones, such as the IP address of the outgoing mail server and the folder for logfiles, and application specific ones such as the location on the SAN where all the documents are stored. Then in the onApplicationStart() the application parses that and configures itself. Each of these settings is checked when it is loaded into the application, so if there is a path configured, a directoryExists() wil make sure it actually exists.

The added benefit is that it becomes very easy to move an application around. Once you have written your properties files for test, QA and production, they stay the same. You just move an EAR file with a release from one environment to the next and it configures itself as soon as it starts. The EAR files themselves get generated by Ant on our build server to make sure they are completely reproducible. And to protect our intellectual property and deter others from mucking around in them, they only contain compiled source code. And since an EAR is a standard format, it works on different JEE servers too. (Mostly JRun and occasionally JBoss for us.)

As any solution, this process has downsides. Working with compiled EAR files is obviously not a good idea if you push small changes to a live server three times a day. It is a very ‘heavy’ process, because in each EAR you are packaging ColdFusion as well (100+ MB). And building EAR files without a ColdFusion Administrator does not just mean the client can not mess the configuration up anymore, it also means you can not fix the configuration anymore either. But all in all, it is serving us well.

It has been quiet for a while, but with good reason. Over the last 2 months I have been travelling a lot. And with a lot I mean Schotland, India, England, USA and England again, all for business. And the way it goes is that by the time you get back to the hotel from your appointments you have loads of email from the office waiting for you. With all that travel I had a grand total of one day off in India and 2 in the USA, which I spend away from the computer.

But now that I am back I have started to tie up some of the loose ends for the ForumClient for the Adobe forums. First and foremost, I have made it portable so it can now also access the Jive forums. The main reason for that is that it offers me another server to run tests against so that I can more easily determine whether issues are between the keyboard and the chair or if they are real server issues. Unfortunately with the number of bugs in the server software and the lack of documentation this is a real necessity. Most interesting for users is probably that forums, thread and messages now have right-click menu’s to mark as (un)read and that I squashed most of the bugs in the counts of unread messages. And I have started some work on getting messages to display better by adding some CSS to the message display.

Last but not least, at MAX I sat down with some people from Adobe and we had a good discussion on some possible future directions. One of those is a Flex version for mobile users (try the current forums on a mobile to see how badly that is needed) which Adobe would need to support by publishing a cross domain policy file. Second we had some discussion on the consequences of making this Open Source. I have decided that I will be publishing the sourcecode for this client at some time in the future. No definite timeline, but it won’t be beforea new version of ClearSpace has been deployed for the Adobe forums.

Download version 0.1.0 and give it a try.

I have uploaded version 0.0.5 of ForumClient. I really didn’t want to do a release just yet (it is halfway a SOAP to REST rewrite), but the previous version had expired and people couldn’t use it anymore. More soon.

forumclient001

It doesn’t look like much yet, but the alpha 2 is available for download. Server communication stuff should mostly work, except for the gazillion bugs and missing functions in the Jive webservices. UI is modelled after the Thunderbird NNTP client, so you download a list of forums, then you subscribe to certain forums and then messages for those forums will be downloaded into a local SQLite database so you can even read them offline.