I set up a little test to measure the performance difference between native PostgreSQL UUIDs () and ColdFusion UUIDs stored as a CHAR(35) field in PostgreSQL. First the schema.

CREATE TABLE uuidanswer (
 answerID UUID,
 displayOrder INT
);
CREATE TABLE uuidtestAssignment (
     testAssignmentID UUID
);
CREATE TABLE uuidquestionScore (
     testAssignmentID UUID,
     answerID UUID,
     questionIDID UUID
);
load data (400000 answers, 1000000 questionscores and 77000 testassignments)
CREATE INDEX uuidanswer_idx
  ON uuidanswer (answerID);
CREATE INDEX uuidtestAssignment_idx
  ON uuidtestAssignment (testAssignmentID);
CREATE INDEX uuidquestionScore_idx
  ON uuidquestionScore (testassignmentid, questionid, answerid);
ANALYZE uuidanswer;
ANALYZE uuidtestAssignment;
ANALYZE uuidquestionScore;

For the string version all UUID datatypes were replaced with a CHAR(35) datatype and in the data loading stage the appropriate hyphens where added or removed. This is a very much trimmed down version of a real query from one of our applications. It aggregates the score of a candidate in some sort of Human Resource testing scenario. The real one joins 11 tables of a 19 GB table but I am too lazy to move all that data to PostgreSQL just for a little demonstration. Then the following query was run against the dataset 1000 times:

SELECT SUM(displayorder)
FROM
 uuidanswer ca
 INNER JOIN uuidquestionscore cqs USING (answerid)
 INNER JOIN uuidtestassignment cta USING (testassignmentid)
WHERE
 cta.testassignmentid = '164ab167-1517-0f51-7cef-1c6654e721b5'
 OR
 cta.testassignmentid = '01F3168A423B2BA34E914AEBFE9C8757'

The results, averaged over 5 runs were:

  • CHAR(35): 2680 ms
  • UUID: 2210 ms

Due to the test setup the difference here is probably smaller then it is in a real production environment. Since all the data fits into cache the effects of I/O are pretty much removed from the equation. With a dataset that is larger then RAM the results will favor the smaller dataset more (and the total size of the UUID dataset is 164 MB versus 311 MB for the CHAR(35) dataset).

UUID datatype and cfqueryparam

But obviously it is desirable from a security perspective to use cfqueryparam for all data values. A quick check of the cfqueryparam docs reveals that there is no cfsqltype for UUIDs, so we have to fall back to cf_sql_other for the UUID case. Which unfortunately blows up with ColdFusion 8 because the included driver is the 8.1 JDBC 3 driver, and the UUID datatype is only supported in the 8.3 drivers. Replacing postgresql-8.1-407.jdbc3.jar with postgresql-8.3-603.jdbc3.jar and restarting solves that problem. (An enhancement request for updating the PostgreSQL JDBC has been entered, but a few more voices wouldn’t hurt.) Now the query statement is changed to:

SELECT SUM(displayorder)
FROM
 uuidanswer ca
 INNER JOIN uuidquestionscore cqs USING (answerid)
 INNER JOIN uuidtestassignment cta USING (testassignmentid)
WHERE
 cta.testassignmentid = <cfqueryparam cfsqltype="cf_sql_other" value="164ab167-1517-0f51-7cef-1c6654e721b5" />
 OR
 cta.testassignmentid = <cfqueryparam cfsqltype="cf_sql_other" value="01F3168A423B2BA34E914AEBFE9C8757" />

The results, averaged over 5 runs were:

  • CHAR(35): 2640 ms
  • UUID: 2250 ms

So next to all the data validation and integrity benefits there are real performance benefits to using the UUID datatype.

2 Comments

  1. “it could be bunnies” » Blog Archive » PostgreSQL UUIDs and ColdFusion (2) says:

    [...] part 1 I showed how to adapt ColdFusion to work with PostgreSQLs native UUID datatype. In this part, I [...]

  2. “it could be bunnies” » Blog Archive » Storing ColdFusion G/UUIDs in MS SQL Server says:

    [...] February I posted on how storing UUIDs using the native UUID datatype in PostgreSQL was about 15% faster then using a VARCHAR(35) datatype. In March I posted about the performance [...]