Archive for the ‘ColdFusion’ Category

Hidden below the “Show Advanced Settings” button on the datasource settings page for MS SQL Server lies the “String format” setting. According to the manual this setting is used to “Enable this option if your application uses Unicode data in DBMS-specific Unicode datatypes, such as National Character or nchar.” In other words, this allows you to use N fields in your database without adding those pesky N’s to your SQL and without butchering your data in the character set conversion process.

What that doesn’t tell you is that this setting can have serious performance consequences. When you enable this setting all variables send to the database using cfqueryparam are sent in their N format. Even the ones where the corresponding database field is not an N field but a regular field. And when that happens you have an encoding mismatch between the parameter and the column. Not really a problem when it is part of an insert statement, but disastrous when it is part of a predicate. Because an encoding mismatch will often prevent MS SQL Server from using indexes to execute the query, the query plan you nicely tuned from the SQL Server Management Studio (without unicode) will degrade into table scans.

Same encoding:
Index Seek(OBJECT:([playground].[dbo].[testCase].[testCase_reg]), SEEK:([playground].[dbo].[testCase].[regular]=[@1]) ORDERED FORWARD)

Encoding mismatch:
Table Scan(OBJECT:([playground].[dbo].[testCase]), WHERE:(CONVERT_IMPLICIT(nvarchar(36),[playground].[dbo].[testCase].[regular],0)=[@P1]))

In a simple testcase with only 256K short records this already accounts for a speed degradation from 1 to 100 milliseconds. I suspect that in real scenario’s the performance impact may very well be magnitudes larger because wide columns will make your I/O explode and there are all sorts of concurrency penalties if you switch from index access to full table scans.

The good news is that it is pretty easy to identify whether you are having this problem. Just profile your SQL Server with the option to trace the event SHOWPLAN ALL and in all your execution plans search for the string CONVERT_IMPLICIT. The bad news is that this is hard to fix. Switching “String format” off means your data gets garbled, switching it on means your server dies from an overload, so pretty much the only solution is to switch your entire database to N fields. That makes you wonder why other databases can do without this setting. For PostgreSQL I know the answer: charsets are a database wide setting so this situation can’t happen, you just switch your entire database to UTF-8 (and since it is UTF-8 and not UCS-16 like in MS SQL Server that doesn’t have the I/O overhead). But how about MySQL, Oracle etc., are they smart enough to do the conversion once on the side of that parameter instead of for every row on the side of the query or is it something else entirely?

Download the MS SQL Server String format testcase

I had a client with a pretty weird issue with client certificates last week and I thought I would share and maybe save somebody some time.

The client needed to consume a SOAP webservice that was protected with 2-way SSL, i.e. with both a server and a client certificate. Because of the new support for client certificates they had stepped away from the usual createObject() or cfinvoke code and moved to cfhttp, but they were getting connection errors on the ColdFusion side and HTTP 400 statuscodes in the logfiles of the webservice. That statuscode mean “Bad Request” and indeed the logfile did not show the correct resource path to the webservice but an empty path.

The first step towards solving the problem was to import the certificate chain of the server into ColdFusion. Steven Erat has an excellent article about this on his blog and sure enough the landscape changed. Instead of the HTTP 400 error the message in the cfhttp result and the webservice logfiles now was a HTTP 403 (Authentication Failed), but it did show the correct resource path. So obviously the next problem lay with the client certificates.

The client certificate appeared to be a normal client certificate with private key in PKCS#12 format, but from a different certificate chain then the server format. So I first had the client double-check that the chain was correctly installed on the server. Unfortunately it wasn’t that easy. I will spare you the details of what I tried, but in the end it turned out to be that the certificate was incorrectly encoded. These are the actual conversion steps I took to convert the certificate to a working state:

  • import the certificate in the Windows Certificate store through the MMC;
  • export the certificate with private key, whole chain and without strong encryption in PFX formet;
  • convert the PFX encoded certificate to PEM using OpenSSL:
    openssl pkcs12 -in raw.pfx -out intermediate.pem -nodes
  • re-order the certificates inside the PEM file to the following order:
    1. Identity certificate
    2. Intermediate certificate
    3. Root certificate
    4. Private Key
  • convert the PEM encoded certificate to PKCS#12 using OpenSSL:
    openssl pkcs12 -export -out final.pkcs -in final.pem

It is probably possible to skip / merge a few of these steps (you probably just need to convert to PEM, reorder and convert back), but I really didn’t have time to explore that, I just know that this worked for me.

Update 2013-09-29:

Several people have reported this procedure also fixes the following error which may occur after an update to ColdFusion 10:

Error while trying to get the SSL client certificate:
java.security.UnrecoverableKeyException: Could not decrypt key: Could not
decode key from BER. (Invalid encoding: expected tag not there. ).

In part 1 I showed how to adapt ColdFusion to work with PostgreSQLs native UUID datatype. In this part, I will show how to adapt PostgreSQL to accept ColdFusions native UUID datatype.

UUIDs: equal, but not the same

As ColdFusion developers are very aware thestring representation of a UUID in ColdFusionis a 35 character string. However, the rest of the world has decided on a 36 character string representation. Both in standards (RFC 4122, ITU-T X.667 and The Open Group) as in implementations (MS SQL Servers newID(), Javas java.util.UUID etc.). PostgreSQL is no exception and uses the 36 character string representation. As a result, the output of the ColdFusion CreateUUID() function will not be accepted into a PostgreSQL UUID field and the output of a PostgreSQL UUID field will not be recognized by ColdFusion as a UUID. I actually believe this is a bug in ColdFusions type detection and conversion system. Since CFML is a typeless language, it should just convert both the 35 and the 36 character string representation. ColdFusion will on the fly convert both a m/d/yy string and a yyyy-mm-dd string to a date object, so why not convert both UUID string representations to their native integer representation?

Patching PostgreSQL

Unfortunately ColdFusion is a closed source product and I can not adapt it to work the way I want it to work. However, PostgreSQL is Open Source and I can adapt it. A little digging in the source code reveals that the input and output formats for the UUID datatype are defined in uuid.c and it is trivial to adapt them to use the ColdFusion format. Just apply the patch for PostgreSQL 8.3 to accept a 35-character string representation as UUID and recompile PostgreSQL. Once you have done that, the following code will work:

CREATE TABLE test (testID UUID);

<cfquery>
INSERT INTO test (testID)
VALUES (<cfqueryparam cfsqltype="cf_sql_other" value="#CreateUUID()#">)
</cfquery>

Warnings:

  • c is not a language I am comfortable in so the code is probably suboptimal;
  • I do not intend to submit this as a patch to PostgreSQL since they follow all the standards and ColdFusion does not;
  • a permanent solution to the problem of ColdFusion UUIDs clashing with the rest of the world should come from Adobe.

So to go from ColdFusion to PostgreSQL we now can use the 36-character representation as an inline variable, with the updated JDBC driver as a bind variable, and with this patch also using a 35-character string. What is left to do is patching the JDBC driver to convert the 16-byte representation it receives from PostgreSQL to a 35-character string ColdFusion understand. Looks like that requires writing a new type. Unfortunately I will be busy for the rest of the month with a new three week Flex/LiveCycle class we will be teaching so I won’t be able to start on that for a while.

I set up a little test to measure the performance difference between native PostgreSQL UUIDs () and ColdFusion UUIDs stored as a CHAR(35) field in PostgreSQL. First the schema.

CREATE TABLE uuidanswer (
 answerID UUID,
 displayOrder INT
);
CREATE TABLE uuidtestAssignment (
     testAssignmentID UUID
);
CREATE TABLE uuidquestionScore (
     testAssignmentID UUID,
     answerID UUID,
     questionIDID UUID
);
load data (400000 answers, 1000000 questionscores and 77000 testassignments)
CREATE INDEX uuidanswer_idx
  ON uuidanswer (answerID);
CREATE INDEX uuidtestAssignment_idx
  ON uuidtestAssignment (testAssignmentID);
CREATE INDEX uuidquestionScore_idx
  ON uuidquestionScore (testassignmentid, questionid, answerid);
ANALYZE uuidanswer;
ANALYZE uuidtestAssignment;
ANALYZE uuidquestionScore;

For the string version all UUID datatypes were replaced with a CHAR(35) datatype and in the data loading stage the appropriate hyphens where added or removed. This is a very much trimmed down version of a real query from one of our applications. It aggregates the score of a candidate in some sort of Human Resource testing scenario. The real one joins 11 tables of a 19 GB table but I am too lazy to move all that data to PostgreSQL just for a little demonstration. Then the following query was run against the dataset 1000 times:

SELECT SUM(displayorder)
FROM
 uuidanswer ca
 INNER JOIN uuidquestionscore cqs USING (answerid)
 INNER JOIN uuidtestassignment cta USING (testassignmentid)
WHERE
 cta.testassignmentid = '164ab167-1517-0f51-7cef-1c6654e721b5'
 OR
 cta.testassignmentid = '01F3168A423B2BA34E914AEBFE9C8757'

The results, averaged over 5 runs were:

  • CHAR(35): 2680 ms
  • UUID: 2210 ms

Due to the test setup the difference here is probably smaller then it is in a real production environment. Since all the data fits into cache the effects of I/O are pretty much removed from the equation. With a dataset that is larger then RAM the results will favor the smaller dataset more (and the total size of the UUID dataset is 164 MB versus 311 MB for the CHAR(35) dataset).

UUID datatype and cfqueryparam

But obviously it is desirable from a security perspective to use cfqueryparam for all data values. A quick check of the cfqueryparam docs reveals that there is no cfsqltype for UUIDs, so we have to fall back to cf_sql_other for the UUID case. Which unfortunately blows up with ColdFusion 8 because the included driver is the 8.1 JDBC 3 driver, and the UUID datatype is only supported in the 8.3 drivers. Replacing postgresql-8.1-407.jdbc3.jar with postgresql-8.3-603.jdbc3.jar and restarting solves that problem. (An enhancement request for updating the PostgreSQL JDBC has been entered, but a few more voices wouldn’t hurt.) Now the query statement is changed to:

SELECT SUM(displayorder)
FROM
 uuidanswer ca
 INNER JOIN uuidquestionscore cqs USING (answerid)
 INNER JOIN uuidtestassignment cta USING (testassignmentid)
WHERE
 cta.testassignmentid = <cfqueryparam cfsqltype="cf_sql_other" value="164ab167-1517-0f51-7cef-1c6654e721b5" />
 OR
 cta.testassignmentid = <cfqueryparam cfsqltype="cf_sql_other" value="01F3168A423B2BA34E914AEBFE9C8757" />

The results, averaged over 5 runs were:

  • CHAR(35): 2640 ms
  • UUID: 2250 ms

So next to all the data validation and integrity benefits there are real performance benefits to using the UUID datatype.

Today the PostgreSQL Global Development Group announced the release of PostgreSQL 8.3. The major new features that are most exciting to me are:

  • TSearch has been integrated into the core so PostgreSQL now has Full Text Searching without the need to install additional modules;
  • all the performance features;
  • a UUID datatype for storing 128-bit integers.

I don’t think I need to explain why Full Text Searching is an important addition, but let me expand a little bit on my other favorites.

Performance features

Let me start by saying that PostgreSQL hasn’t been slow in a very long time. Sure, it comes out of the box with an arcane configuration so it will start on a P90 with 64 MB of RAM, but once you have a configuration that matches your hardware it will perform. But where it shines is scalability. Back in October 2006 I had the privilege of assisting tweakers.net in optimizing PostgreSQL for their database benchmark and the outcome there was that “eight cores deliver 7.4 times as much as a single core“. Other benchmarks consistently show that if you have the hardware, PostgreSQL will use it.

That is not to say there is nothing to improve. There are several specific scenario’s where PostgreSQL will not be very fast. One of those scenario’s was the repeated update of a single record. Which is actually a pretty common scenario for websites that store user sessions in the database and update the record on each page hit. The reason for this is that PostgreSQL uses MultiVersion Concurrency Control and hence doesn’t overwrite records on updates, but invalidates them and writes a new copy. Repeating this over and over again would strain the garbage collection mechanism and bloat the table. To improve with that scenario PostgreSQL now has Heap Only Tuples (HOT), where an update that does not change an indexed field will only change the heap and not the index. Simple, efficient, and up to 4 times as fast.

Amongst the other performance improvements is a lot of work to plan OUTER JOINs more efficiently and to smooth out the performance spikes caused by background database maintenance processes.

UUID support

While technically they are 128-bit integers, I historically store UUIDs as CHAR or VARCHAR fields of 35 or 36 (depending on which application is talking to the DB) characters. Due to overhead and byte alignment issues that means a UUID takes 40 bytes of space, instead of the 16 bytes it would need if it were just an 128-bit integer. And that space usage is repeated for every index, so for a simple table with 2 UUIDs that creates a many-to-many relationship and has 2 indexes this is the difference between 200 and 80 bytes.

That isn’t all though. Even though the operations performed on UUIDs are pretty limited (equality testing and sorting are pretty much the only relevant operations), they are also impacted by the difference between a string and an integer. Operations on integers are simple, bitwise operations. Operations on strings have to take the locale into account which means convoluted (multi-byte) functions. So not only is the storage smaller and thus faster, operations are faster too.

In a follow up I will provide some benchmarks of the performance difference between native UUIDs and VARCHAR UUIDs. Also, ColdFusion users can look forward to a way to get the PostgreSQL UUID datatype working from ColdFusion and for a patch that will make PostgreSQL accept the 35-byte string representation of ColdFusion UUIDs.