Posts tagged ‘ColdFusion’

I have been promising myself (and others) to write about the ColdFusion UUID implementation for quite a while now and I feel like I have been procrastinating long enough. So at long last the definitive guide to ColdFusion UUIDs, based on many years of experience and a few conversations with the ColdFusion engineering team over beer at the MAX.

What is a UUID

A UUID is an Universally Unique Identifier which is just a fancy name for a 128-bit integer. While a 128-bit integer is a really large number, it is not an infinite number, so it is not really unique, it is just so rare for a conflict to occur that we normally just presume it is actually unique. This 128-bit integer is typically represented as a hexadecimal string split into 5 groups by hyphens in the pattern 8-4-4-4-12. This UUID is typically generated from one of 5 different algorithms:

  1. MAC address based
  2. DCE based
  3. MD5 hash based
  4. Random
  5. SHA-1 hash based

Each of these versions offers different guarantees for uniqueness and randomness. For ColdFusion developers the import version are 1 and 4.

MAC address based UUIDs

The algorithm for a MAC address based UUID is based on 3 different components:

  1. timestamp
  2. clock sequence
  3. node identifier

The timestamp is a 60-bit integer counting the number of 100 nanosecond increments since the beginning of the Gregorian calendar in 1582. The clock sequence is an initially random number used to prevent duplicate UUIDs when the time is reset backwards for instance through an NTP client. The node identifier is a supposedly unique identification for the node on which the UUID is generated. Since this node identifier is typically the MAC address of one of the NICs of the system this version is commonly referred to as a MAC based UUID.

From this algorithm a few things stand out:

  1. The timestamp will overflow in stardate 3400 or something and from that moment on the generated UUIDs may conflict with earlier generated UUIDs. But since I doubt anybody was generating UUIDs in 1582 it is safe to assume the first actual conflicts from that will occur a few hundred years later.
  2. The UUID is only as unique as the MAC address is. While MAC addresses are supposedly unique anybody who has run a somewhat larger network like a campus network will know that in reality they are not.
  3. It is impossible to generate more then 10 million version 1 UUIDs per second per node due to the 100 nanosecond timestamp resolution.
  4. MAC based UUIDs are actually quite predictable.

The MAC based algorithm is the algorithm used in ColdFusion.

Random UUIDs

Random UUIDs are generated mostly random. The version number and 2 other bits are restricted, but the other 122 bits are generated from a random source. This means:

  1. Version 4 UUIDs are unpredictable.
  2. Version 4 UUIDs are more likely to conflict than version 1 UUIDs. Still for all practical purposes they are unique.
  3. The quality and speed of the generation of version 4 UUIDs depends on your entropy source.

Amongst others, java.util.UUID is one of the implementations of a version 4 UUID generator.

UUIDs in ColdFusion

UUIDs are generated in ColdFusion through the createUUID() function. This function generates UUIDs using the version 1 algorithm (MAC address based).  The one thing that makes these UUIDs stand out very much is that they have a non-standard string representation. Instead of being grouped in 5 groups with the pattern 8-4-4-4-12 they are grouped in 4 groups with the pattern 8-4-4-16. I have been told this was an unintentional deviation that was not discovered until after shipping and then backward compatibility was deemed more important than conforming to the string representation of others.

The ColdFusion createUUID() function gets interesting with the rewrite to Java in ColdFusion MX. At that time Java had no API to find the MAC address of a NIC in the system, so on Windows a little bit of native code in NeoUUID.dll was used to find the MAC address and on other platforms a MAC address was faked. When doing a native Java deployment on Windows (EAR/WAR file) the system would also fall back the same as on other platforms. In addition the timestamp resolution of the Sun JVMs was rather limited (10 milliseconds on Windows, 1 millisecond on other platforms). Since you can generate only one UUID per clock tick, the theoretical limit for the number of UUIDs generated per second was 100 on Windows (64 on multi-core systems).

A particular problem in this version was a bug in the Sun JVM where using createUUID() would cause the system clock to move forward a little bit. Under heavy use the clock would move forward up to 12 seconds per minute. Then when the time was resynchronized with the NTP server and the server clock went back a minute or so, the generation of UUIDs was stalled until the system was back in the future. Very much the intended behavior of a UUID generation algorithm that values uniqueness over everything else, but still an unpleasant surprise.

With the arrival of ColdFusion 9 createUUID() got a speed boost. The implementation was rewritten from using a millisecond time API to use a new Java API that provides timestamps with a nanosecond resolution. That means the theoretical limit of 100 or 1000 UUIDs per second got increased to 10 million per second. The practical limit is still a bit lower because the clock tick is not really 1 nanosecond, but the speed improvement is still very significant. The speed of createUUID() now actually varies depending on the clock speed of the hardware you use to run the test.

GUIDs in ColdFusion

In addition to a UUID datatype ColdFusion also has a GUID datatype. This is another 128-bit integer that is unfortunately incompatible with ColdFusion UUIDs because it uses the 8-4-4-4-12 string representation . On the other hand it has the huge benefit that it is compatible with the way the rest of the world represents UUIDs so we can natively exchange them with Java, databases etc. instead of having to serialize them to a string. I have written previously about the performance benefits you can reap if you use a native uniqueidentifier datatype in MS SQL Server instead of a string representation.

What ColdFusion does not have is a native function to generate GUIDs. Typically this is solved by generating GUIDs from UUIDs by just inserting another hyphen, or by falling back to the Java java.util.UUID class. Just remember that when you use the ColdFusion createUUID() function you get better uniqueness guarantees since it is a version 1 UUID, while when using java.util.UUID you get better performance since it is a version 4 UUID (if you have sufficient entropy).

During SoTR 2011 I presented on using XFA PDF forms (a.k.a. LiveCycle forms) with ColdFusion. Slides and materials are now available for download.

A after-conference addition is that Chandan Kumar from Adobe confirmed that the issue with overwritedata=”yes” in the cfpdf tag is resolved in ColdFusion so you don’t need to add it to all cfpdf populate operations anymore once the fix is released / installed.

In a recent discussion on cf-talk the question was asked how to improve the performance of ColdFusion when working with very large XML documents. One of the solutions proposed was to use StAX and that got me thinking. StAX is a stream processor works very different from what you may be used to from other XML processors. Instead of viewing an XML document as a whole and elements in context to their parents, children and siblings, it just treats the whole document as a sequence of items. Each of these elements can be of type elementstart, elementend, comment, entity etc. The way you work with this is you iterate through all the items in your document and process them one by one. Working that way is sufficiently different to make it necessary to rewrite all your processing from scratch if you want to switch from the built-in processor to StAX which makes it a solution that is not so attractive.

But what if we combine a preprocessing step in StAX to split the large XML document into smaller pieces with the regular processing in ColdFusion? StAX is Java so it is easy to integrate it into ColdFusion and to test this I wrote a sample implementation to test if this would help. It has some limitations such as only handling elements, element text and attributes, but it seems to work just fine (and the code is open for improvement). With this I benchmarked some XML files I downloaded from internet with the following results:

Source file Source size Split on Records Time
http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz 111 MB regions 1 24274 ms
http://www.ins.cwi.nl/projects/xmark/Assets/standard.gz 111 MB mailbox 21750 146999 ms
ftp://ftp.nlm.nih.gov/nlmdata/sample/medline/medsamp2011h.xml.zip 164 MB 30000 30000 472043 ms

As you can see how you are splitting a document has a significant impact. I presume this is mostly due to the impact the write operations have on my laptop with a slow 5400 rpm harddisk. On the other hand in the best case scenario the parsing speed is over 4 MB per second. Memory consumption stayed under 200 MB for the whole server so it looks like there are some scenario’s where this might be useful.

Code for xmlSplitter.cfc, tested on CF 9.01, 64-bit with StAX 1.2.0 and Java 1.6u24 64-bit.

This was written in response to a forum question, but I figured it might be useful for more people.

There are many ways to deploy ColdFusion code to a server. Probably the most prevalent, especially considering shared hosting, is using FTP to upload CFML templates to the server.  Tools such as DreamWeaver and CFBuilder allow you to do so right from your IDE. Another way to do it is to run some Ant script or batch file and extract the sources straight from source control to the server. With a little bit of effort you get much more control and much more reproducible results.  At Prisma IT we prefer to go a step further and deliver the ColdFusion applications we build as Enterprise ARchives (EARs) to our clients.  This allows us even more control, especially when we don’t have any.

Let me explain that a bit. We have several clients where we do all their development, but final deployment is done on the clients infrastructure. If we are lucky, we may have read-only access to the User Acceptance Testing servers, but sometimes we don’t even have that. In those cases deploying an application is completely up to the client (or their hosting partner). That leaves us no wiggle room to deal with stuff that could go wrong during a deployment. With EAR files we eliminate a huge number of risks from the process. An EAR file is a full application, so there is no risk that some files get forgotten. And we have the MD5 to prove it. Since it gets deployed to its own temporary folder, there is no chance of any old files remaining on the server ans slipping in to the server (the cfclasses folder is famous for that).

The one thing you need to solve for this is how to configure your application. If the client had to log in to the ColdFusion Administrator after deploying the application to configure datasources, mappings etc., it would be just as easy to do something wrong. So what we do instead is to have the client place a properties file on the class path with a bunch of configuration settings. Standard ones, such as the IP address of the outgoing mail server and the folder for logfiles, and application specific ones such as the location on the SAN where all the documents are stored. Then in the onApplicationStart() the application parses that and configures itself. Each of these settings is checked when it is loaded into the application, so if there is a path configured, a directoryExists() wil make sure it actually exists.

The added benefit is that it becomes very easy to move an application around. Once you have written your properties files for test, QA and production, they stay the same. You just move an EAR file with a release from one environment to the next and it configures itself as soon as it starts. The EAR files themselves get generated by Ant on our build server to make sure they are completely reproducible. And to protect our intellectual property and deter others from mucking around in them, they only contain compiled source code. And since an EAR is a standard format, it works on different JEE servers too. (Mostly JRun and occasionally JBoss for us.)

As any solution, this process has downsides. Working with compiled EAR files is obviously not a good idea if you push small changes to a live server three times a day. It is a very ‘heavy’ process, because in each EAR you are packaging ColdFusion as well (100+ MB). And building EAR files without a ColdFusion Administrator does not just mean the client can not mess the configuration up anymore, it also means you can not fix the configuration anymore either. But all in all, it is serving us well.

Since I am in Bangalore for a training I dropped in on the “Adobe Flash Platform Tools Preview” this evening. The agenda promised short sessions on Flash Builder 4, Flash Catalyst and LiveCycle ES, followed by food and networking. The Flash Builder (previously known as Flex Builder) session was solid. It showed Data Centric Development, where services are defined in Flash Builder and can then easily be wired into the UI because all the code for the services and value objects is generated. (See Raghu’s blog for the demo screencast). Next up was Flash Catalyst, showing a design - development workflow where a .psd file was transformed to a .fxp, which was then imported in Flash Builder to wire the data in through the new services management. Last was LiveCycle ES. Unfortunately, but understandably considering the audience, this was all about data management and not process management. What was new for me was that apparently this now wires directly into Hibernate so you don’t need to write any server side code anymore, you can have everything generated.

The Q&A focused mainly on the designer - developer workflow with Flash catalyst. The main question that was repeated several times in different words was whether this workflow put any additional constraints on the designer. And each time the answer was that good development on the design side, including the judicious use of layers, was all it took. I think this reflected the audience of architects and project managers, from developers I would expect more technical oriented questions.

Afterward the food and networking were great. Not just the Flash team from Adobe was there, but also people from the LiveCycle team and the ColdFusion team., so I got an opportunity to thank some people in person for fixes and new features I am not allowed to mention yet. And I also met up with some of the people we do business with in India.

This Sunday at long last my first real server “Spike” died. Spike wasn’t really my server, but it was the first server that wasn’t just for my entertainment and I carried final responsibility for. Purchased in February 2001 it entered service as a shared hosting server for a not-for-profit in March 2001. In the 8 years and 1 month it ran the only real problem it had was a worn down CPU fan causing an overheated CPU, until Sunday morning at long last the primary harddisk died and it was retired from service.

With its demise I am truly saying goodbye to an era (or perhaps to a relic): Spike ran trusty old Windows NT4 SP6a with ColdFusion Enterprise 4.5.2. With it gone, the youngest production machine I have access to is a Windows 2003 system (7 years younger then NT4) with CF 7 (6 years younger then CF 4.5). Spike itself is replaced with a machine with Windows 2003 and CF 8.0.1. And the contrast between how it worked then and now it works now is quite profound. At least in the area of security configuration, CFML is sufficiently backward compatible to just drop it on the new server.

2001, the year Spike was configured, was just before the height of the ‘hackable internet’. A few months after it was taken into production we saw the release of Code Red, followed shortly by Nimda. At that time, a large part of the servers connected to internet was vulnerable to attacks. (Nowadays vulnerabilities are more of a client problem or arguably a user problem then s server problem.) And that showed itself in the way Spike was build. It took me several weeks to come up with a stable and secure configuration, with all sorts of weird constraints. To build Spike I followed the NSA guidelines for configuring a Windows NT system, which for instance meant I wasn’t supposed to install any graphical driver, because no driver was NSA certified. And the way I ended up running ColdFusion, with Sandbox Security configured to impersonate OS accounts, has once even earned me the comment from a Macromedia engineer to be the only one in the world with that configuration in production. But the result was there: even with the onslaught of Code Red and Nimda it took 7 months before there was a Windows patch that was applicable for the hardened configuration.

Contrast this with how I threw a new server online. Windows installation was a default installation, after which I had to add components instead of remove them. When installing IIS I had to add filetypes and extensions, instead of remove them. When configuring Sandbox Security for ColdFusion I could easily find anything I wanted on the subject, because there are dozens of people blogging about it. Obviously some of the ease of installing a new server is due to more experience on my side, but I think it is hard to deny that the “secure by default” mindset has made inroads.

Today was the CF Insider Workshop in Brussels. Line-up was the same as last week in Amsterdam, Claude Englebert from Adobe and Simon Slooten and me from Prisma IT. Mark van Hedel was supposed to join us for the ColdFusion and AJAX session, but he broke his leg last week so he couldn’t be there and Simon took over his session. The event in Amsterdam was pretty well attended with 20+ attendees, in Brussels we had a few less (but they have another event in French tomorrow). But the good news is that after years of a slumber Adobe is finally talking about ColdFusion in Europe again. Some of the attendents were even Adobe employees from other departments who wanted to know what that ColdFusion thing was about.

I have uploaded my slides from the CF Insider workshop to the Prisma IT website:

What’s new in ColdFusion 8

The ColdFusion Server Monitor

Adobe and Prisma IT are organizing the Dutch and Flemish versions of the ColdFusion Insider Workshop that is touring Europe. First we will have the Dutch one on January 22th in the Adobe office in Amsterdam and then on Januari 27th the Flemish on in Brussels. There will be speakers from Adobe and Prisma IT. Attendance is free if you get an entrance voucher by filling out the registration form.

I will be doing a presentation on the ColdFusion Server Monitor and a more general one on how ColdFusion makes life easier with all the cool new features in ColdFusion 8. If you have a good suggestion I may be persuaded to add your topic to my “OMG, nobody has questions, how do I fill the awkward silence” emergency slides.

At Prisma IT we have an application that has been under constant development for 2 years. Over that time it has seen lots of feature creep and changing requirements, and no refactoring. As you can imagine that doesn’t make for pretty code or an optimal database model. Due to a retargeting of that application and a much more focused requirements process we are finally getting ready to start reworking that application. Over the past month I have been working on and off (mostly in the train on my way to clients) on putting more structure in the long list with things we (I) really would like to change about the technology of the application. I have also been running some experiments on the code to see what the impact of certain changes would be. Amongst others this includes the experiments with changing the datatype used in MS SQL Server to store ColdFusion UUIDs.

One of the last things I wanted was to get a feel for the cost/benefit ratio of a conversion from MS SQL Server to PostgreSQL. Continue reading ‘Converting a CF application from MS SQL Server to PostgreSQL?’ »

Lately I have posted a number of posts on the state of shared hosting security. Unfortunately we have to conclude that only by taking drastic measures a hoster can begin to protect his customers from eachother. And even when the hoster locks dow everything to the best of his abilities, which disables lots of useful functionality, there are some significant holes left. Therefore I have compiled this wishlist of things I would like to see improved in Centaur to make shared hosting security a bit better and easier to deal with.

Documentation

Setting up shared hosting is largely a process of trial-and-error. There is absolutely no documentation available that tells us which tags will blow up when we Sandbox the temp folder. That cfdocument will error out if the fonts folder is not allowed in the Sanbox. There is nothing to warn us that cfdump will break if we disable createObject(JAVA). What we need is better documentation and preferably a whitepaper that shows us how to configure a secure shared hosting environment step by step.

A locked down application scope

Frameworks are storing more and more data in the application scope. The application scope is currently completely unprotected: everybody can see everything. We need a way to lock down the application scope so we can only access it from just one Sandbox.

Fine grained Java permissions

The biggest useful feature we need to disable in order to protect servers is access to Java objects. We need much more control over which actions a Java class can perform before we can open up access to Java. For starters, we need knobs to disable many actions from the Runtime class. We need to be able to disable the loading of arbitrary classes by users (what is the point in disabling createObject(COM) if users can load their own version of JIntegra?), we need to stop users from killing the server with exitVM, halt etc., we need to stop them from spawning other processes etc. Then we need to have all access to SecurityPermission disabled so that users can’t simply reset the security and we need to have ReflectPermission disabled to keep our private stuff private.

Java provides all the API we need for that, we just don’t have any knobs in ColdFusion to switch them on for our Sandboxes

Sane default settings for Sandboxes

If you create a new Sandbox from scratch it has access to many things it shouldn’t. A secure by default configuration would for instance disable all datasources and leave it up to the administrator to enable them on an as needed basis. Same goes for tags like cfexecute and cfobject.

On the other hand the default Sandboxes do not have access to to for instance the /WEB-INF/cftags/ folder. Without that access cfinterface may not work. What is the harm in giving Sandboxes Read and Execute access to that folder by default? Isn’t the code in those folders written by and trusted by Adobe?

There are many more issues that would help ColdFusion becomming a better hosted platform, and I will address a few more in the future, but these are the 4 most important issues in the area of shared hosting security that Adobe needs to address. All of these have been submitted to Adobe, but I am sure a few more votes and ideas won’t hurt.