Archive for February, 2008

With the recent release of FlexBuilder 3 I have been trying to work through some of the instructions for customizing the LiveCycle Workspace using FlexBuilder 3 instead of FlexBuilder 2 (if that works we don’t need to include FlexBuilder 2 in the software for the Adobe TechU). I didn’t expect too many problems because according to Customizing the LiveCycle® Workspace ES User Interface Flex SDK 2.0.1 is supported and you can pick that SDK version in FlexBuilder 3. And for the most part it did the work. The only problem I ran into was compiling the localization swf file. According to the instructions that compilation should work with the following command:

compc -locale=es -output=export/es/workspace_rb.swc -source-path
./es "C:/program files/adobe/flex builder 2/flex sdk 2/frameworks/locale/"
-include-resource-bundles alc_wks_client_msg alc_wks_client_trace
alc_wks_client_ui SharedResources collections controls core data effects
formatters logging messaging rpc skins states styles utils validators --

Now with FlexBuilder 3 this didn’t work because some of the paths to the SDK are different. With the correct paths the compile command becomes:

"C:\Program Files\Adobe\Flex Builder 3\sdks\2.0.1\bin\compc" -locale=es
  -output=spanish/workspace_rb.swc -source-path ./spanish
  "C:\Program Files\Adobe\Flex Builder 3\sdks\2.0.1\frameworks\locale"
  -include-resource-bundles alc_wks_client_msg alc_wks_client_trace
  alc_wks_client_ui SharedResources collections controls core data
  effects formatters logging messaging rpc skins states styles utils
  validators --

This didn’t compile in FlexBuilder 3 because the data properties file couldn’t be found. That is supposedly an easy fix: just copy the data.properties file from wherever FlexBuilder 2 got it to the project path in FlexBuilder 3. Only I couldn’t find a data.properties file anywhere in FlexBuilder 2. And while removing data from the compile command solved the compile time problem, it did result in a totally unacceptable runtime error. So in the end I just created my own data.properties file with the following data:

# Dummy data.properties file
foo=bar

With this file in the same directory as the localized files the compilation ran correctly and I haven’t received any runtime errors yet. It is quite likely I will get them at some point when the Workspace wants to present some message that is supposed to be in data.properties, but I think this will do until the next LiveCycle release that I expect to add support for FlexBuilder 3.

I had a client with a pretty weird issue with client certificates last week and I thought I would share and maybe save somebody some time.

The client needed to consume a SOAP webservice that was protected with 2-way SSL, i.e. with both a server and a client certificate. Because of the new support for client certificates they had stepped away from the usual createObject() or cfinvoke code and moved to cfhttp, but they were getting connection errors on the ColdFusion side and HTTP 400 statuscodes in the logfiles of the webservice. That statuscode mean “Bad Request” and indeed the logfile did not show the correct resource path to the webservice but an empty path.

The first step towards solving the problem was to import the certificate chain of the server into ColdFusion. Steven Erat has an excellent article about this on his blog and sure enough the landscape changed. Instead of the HTTP 400 error the message in the cfhttp result and the webservice logfiles now was a HTTP 403 (Authentication Failed), but it did show the correct resource path. So obviously the next problem lay with the client certificates.

The client certificate appeared to be a normal client certificate with private key in PKCS#12 format, but from a different certificate chain then the server format. So I first had the client double-check that the chain was correctly installed on the server. Unfortunately it wasn’t that easy. I will spare you the details of what I tried, but in the end it turned out to be that the certificate was incorrectly encoded. These are the actual conversion steps I took to convert the certificate to a working state:

  • import the certificate in the Windows Certificate store through the MMC;
  • export the certificate with private key, whole chain and without strong encryption in PFX formet;
  • convert the PFX encoded certificate to PEM using OpenSSL:
    openssl pkcs12 -in raw.pfx -out intermediate.pem -nodes
  • re-order the certificates inside the PEM file to the following order:
    1. Identity certificate
    2. Intermediate certificate
    3. Root certificate
    4. Private Key
  • convert the PEM encoded certificate to PKCS#12 using OpenSSL:
    openssl pkcs12 -export -out final.pkcs -in final.pem

It is probably possible to skip / merge a few of these steps (you probably just need to convert to PEM, reorder and convert back), but I really didn’t have time to explore that, I just know that this worked for me.

Update 2013-09-29:

Several people have reported this procedure also fixes the following error which may occur after an update to ColdFusion 10:

Error while trying to get the SSL client certificate:
java.security.UnrecoverableKeyException: Could not decrypt key: Could not
decode key from BER. (Invalid encoding: expected tag not there. ).

Through the Planet PostgreQL site I found a nice paper that compares database internals. It provides an interesting read, even if you are not really into databases.

Architecture of a Database System
Joseph M. Hellerstein, Michael Stonebraker and James Hamilton

The common misconception that a serializable transaction will single thread access to an entire table or even an entire database is pretty hard to eradicate. And the sad thing is that everybody just keeps repeating other supposedly informed sources instead of doing a little experimentation. So I have gathered some scripts that will show you how 2 serializable transactions are writing to the same database table simultaneously. When you run these scripts, run them from top to bottom, the commands on the left through one connection, the commands on the right through another connection.

PostgreSQL

There is no need to set up any configuration in PostgreSQL as it will be fully concurrent out of the box.

CREATE TABLE test (ID INT PRIMARY KEY);

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO test (id) VALUES (1);

                              SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
                              BEGIN TRANSACTION;
                              INSERT INTO test (id) VALUES (2);
                              COMMIT;
                              SELECT * FROM test WHERE id = 2;
                              SELECT * FROM test;

COMMIT;
SELECT * FROM test;

As you can see, despite having 2 serializable transactions, the transaction that was started last commits first. And none of the queries block at any point.

MS SQL Server

By default MS SQL Server 2005 is somewhat less concurrent then PostgreSQL because it does predicate locking. As long as you stay out of the way of the predicates of other transactions it is still concurrent, but when you cross other running transactions you might need to wait for those other transactions to commit or rollback. So here is the first example.

CREATE TABLE test (ID INT PRIMARY KEY);

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO test (id) VALUES (1);

                              SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
                              BEGIN TRANSACTION;
                              INSERT INTO test (id) VALUES (2);
                              COMMIT;
                              SELECT * FROM test WHERE id = 2;
                              SELECT * FROM test;

COMMIT;
SELECT * FROM test;

Again you can see the transaction that was started last commits first. But while the query for record 2 from the test table didn’t block, the query for the entire table blocked until the first transaction was committed. This is caused by predicate locking.

So what does serializable mean?

Mathematically speaking serializable means that it is possible to replay all the queries that were executed simultaneously one after another in a single threaded fashion and still get the same results. In terms of the SQL standard it doesn’t mean that. In terms of the SQL standard it just means that 3 specific events are not possible (and I quote this directly from SQL/IEC 9075-2:2003):

  • Dirty read: SQL-transaction T1 modifies a row. SQL-transactionT2 then reads that row before T1 performs a COMMIT. If T1 then performs a ROLLBACK, T2 will have read a row that was never committed and that may thus be considered to have never existed.
  • Non-repeatable read: SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.
  • Phantom: SQL-transaction T1 reads the set of rows N that satisfy some <search condition>. SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by the SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same <search confition>, it obtains a different collection of rows.

The above is what serializable means in a database. It does not mean mathematical serializability. It does not mean full table locks. It just means that these three specific events can not occur.

Today was the start of the Adobe LiveCycle Technical U training program. This is a 3 week training specifically for Adobe partners on LiveCycle and Flex. The following is the set of other training courses that is combined into this one:

  • Flex 3: Developing Rich Client Applications
  • Flex 3: Data and Communications
  • Flex 3: Programming the Visual Experience
  • LiveCycle: Designing Forms
  • LiveCycle: Building Applications

Yes, that is not a typo, we are actually covering Flex 3 in this training. This is one of the test runs with the beta courseware. But wait, there is more. Apart from bundling those 5 existing courses, there is also 6 days of additional, new material:

  • custom LiveCycle content (topics include database connectivity, LDAP, email etc.)
  • custom Flex content (Cairngorn)
  • LiveCycle and Flex integration
  • a 3-day ‘build a real application’ labs session

Since this is a newly developed training it will be given by a team of people from Adobe Training and from Prisma IT and it will be extensively reviewed. For now we just had the introductions and one of my coworkers started on the Flex content, but next week I will be delivering the LiveCycle bits of the training (including the JDBC and LDAP chapters I wrote myself). The last week will be given by my boss and the people from Adobe.

In part 1 I showed how to adapt ColdFusion to work with PostgreSQLs native UUID datatype. In this part, I will show how to adapt PostgreSQL to accept ColdFusions native UUID datatype.

UUIDs: equal, but not the same

As ColdFusion developers are very aware thestring representation of a UUID in ColdFusionis a 35 character string. However, the rest of the world has decided on a 36 character string representation. Both in standards (RFC 4122, ITU-T X.667 and The Open Group) as in implementations (MS SQL Servers newID(), Javas java.util.UUID etc.). PostgreSQL is no exception and uses the 36 character string representation. As a result, the output of the ColdFusion CreateUUID() function will not be accepted into a PostgreSQL UUID field and the output of a PostgreSQL UUID field will not be recognized by ColdFusion as a UUID. I actually believe this is a bug in ColdFusions type detection and conversion system. Since CFML is a typeless language, it should just convert both the 35 and the 36 character string representation. ColdFusion will on the fly convert both a m/d/yy string and a yyyy-mm-dd string to a date object, so why not convert both UUID string representations to their native integer representation?

Patching PostgreSQL

Unfortunately ColdFusion is a closed source product and I can not adapt it to work the way I want it to work. However, PostgreSQL is Open Source and I can adapt it. A little digging in the source code reveals that the input and output formats for the UUID datatype are defined in uuid.c and it is trivial to adapt them to use the ColdFusion format. Just apply the patch for PostgreSQL 8.3 to accept a 35-character string representation as UUID and recompile PostgreSQL. Once you have done that, the following code will work:

CREATE TABLE test (testID UUID);

<cfquery>
INSERT INTO test (testID)
VALUES (<cfqueryparam cfsqltype="cf_sql_other" value="#CreateUUID()#">)
</cfquery>

Warnings:

  • c is not a language I am comfortable in so the code is probably suboptimal;
  • I do not intend to submit this as a patch to PostgreSQL since they follow all the standards and ColdFusion does not;
  • a permanent solution to the problem of ColdFusion UUIDs clashing with the rest of the world should come from Adobe.

So to go from ColdFusion to PostgreSQL we now can use the 36-character representation as an inline variable, with the updated JDBC driver as a bind variable, and with this patch also using a 35-character string. What is left to do is patching the JDBC driver to convert the 16-byte representation it receives from PostgreSQL to a 35-character string ColdFusion understand. Looks like that requires writing a new type. Unfortunately I will be busy for the rest of the month with a new three week Flex/LiveCycle class we will be teaching so I won’t be able to start on that for a while.

I set up a little test to measure the performance difference between native PostgreSQL UUIDs () and ColdFusion UUIDs stored as a CHAR(35) field in PostgreSQL. First the schema.

CREATE TABLE uuidanswer (
 answerID UUID,
 displayOrder INT
);
CREATE TABLE uuidtestAssignment (
     testAssignmentID UUID
);
CREATE TABLE uuidquestionScore (
     testAssignmentID UUID,
     answerID UUID,
     questionIDID UUID
);
load data (400000 answers, 1000000 questionscores and 77000 testassignments)
CREATE INDEX uuidanswer_idx
  ON uuidanswer (answerID);
CREATE INDEX uuidtestAssignment_idx
  ON uuidtestAssignment (testAssignmentID);
CREATE INDEX uuidquestionScore_idx
  ON uuidquestionScore (testassignmentid, questionid, answerid);
ANALYZE uuidanswer;
ANALYZE uuidtestAssignment;
ANALYZE uuidquestionScore;

For the string version all UUID datatypes were replaced with a CHAR(35) datatype and in the data loading stage the appropriate hyphens where added or removed. This is a very much trimmed down version of a real query from one of our applications. It aggregates the score of a candidate in some sort of Human Resource testing scenario. The real one joins 11 tables of a 19 GB table but I am too lazy to move all that data to PostgreSQL just for a little demonstration. Then the following query was run against the dataset 1000 times:

SELECT SUM(displayorder)
FROM
 uuidanswer ca
 INNER JOIN uuidquestionscore cqs USING (answerid)
 INNER JOIN uuidtestassignment cta USING (testassignmentid)
WHERE
 cta.testassignmentid = '164ab167-1517-0f51-7cef-1c6654e721b5'
 OR
 cta.testassignmentid = '01F3168A423B2BA34E914AEBFE9C8757'

The results, averaged over 5 runs were:

  • CHAR(35): 2680 ms
  • UUID: 2210 ms

Due to the test setup the difference here is probably smaller then it is in a real production environment. Since all the data fits into cache the effects of I/O are pretty much removed from the equation. With a dataset that is larger then RAM the results will favor the smaller dataset more (and the total size of the UUID dataset is 164 MB versus 311 MB for the CHAR(35) dataset).

UUID datatype and cfqueryparam

But obviously it is desirable from a security perspective to use cfqueryparam for all data values. A quick check of the cfqueryparam docs reveals that there is no cfsqltype for UUIDs, so we have to fall back to cf_sql_other for the UUID case. Which unfortunately blows up with ColdFusion 8 because the included driver is the 8.1 JDBC 3 driver, and the UUID datatype is only supported in the 8.3 drivers. Replacing postgresql-8.1-407.jdbc3.jar with postgresql-8.3-603.jdbc3.jar and restarting solves that problem. (An enhancement request for updating the PostgreSQL JDBC has been entered, but a few more voices wouldn’t hurt.) Now the query statement is changed to:

SELECT SUM(displayorder)
FROM
 uuidanswer ca
 INNER JOIN uuidquestionscore cqs USING (answerid)
 INNER JOIN uuidtestassignment cta USING (testassignmentid)
WHERE
 cta.testassignmentid = <cfqueryparam cfsqltype="cf_sql_other" value="164ab167-1517-0f51-7cef-1c6654e721b5" />
 OR
 cta.testassignmentid = <cfqueryparam cfsqltype="cf_sql_other" value="01F3168A423B2BA34E914AEBFE9C8757" />

The results, averaged over 5 runs were:

  • CHAR(35): 2640 ms
  • UUID: 2250 ms

So next to all the data validation and integrity benefits there are real performance benefits to using the UUID datatype.

Today the PostgreSQL Global Development Group announced the release of PostgreSQL 8.3. The major new features that are most exciting to me are:

  • TSearch has been integrated into the core so PostgreSQL now has Full Text Searching without the need to install additional modules;
  • all the performance features;
  • a UUID datatype for storing 128-bit integers.

I don’t think I need to explain why Full Text Searching is an important addition, but let me expand a little bit on my other favorites.

Performance features

Let me start by saying that PostgreSQL hasn’t been slow in a very long time. Sure, it comes out of the box with an arcane configuration so it will start on a P90 with 64 MB of RAM, but once you have a configuration that matches your hardware it will perform. But where it shines is scalability. Back in October 2006 I had the privilege of assisting tweakers.net in optimizing PostgreSQL for their database benchmark and the outcome there was that “eight cores deliver 7.4 times as much as a single core“. Other benchmarks consistently show that if you have the hardware, PostgreSQL will use it.

That is not to say there is nothing to improve. There are several specific scenario’s where PostgreSQL will not be very fast. One of those scenario’s was the repeated update of a single record. Which is actually a pretty common scenario for websites that store user sessions in the database and update the record on each page hit. The reason for this is that PostgreSQL uses MultiVersion Concurrency Control and hence doesn’t overwrite records on updates, but invalidates them and writes a new copy. Repeating this over and over again would strain the garbage collection mechanism and bloat the table. To improve with that scenario PostgreSQL now has Heap Only Tuples (HOT), where an update that does not change an indexed field will only change the heap and not the index. Simple, efficient, and up to 4 times as fast.

Amongst the other performance improvements is a lot of work to plan OUTER JOINs more efficiently and to smooth out the performance spikes caused by background database maintenance processes.

UUID support

While technically they are 128-bit integers, I historically store UUIDs as CHAR or VARCHAR fields of 35 or 36 (depending on which application is talking to the DB) characters. Due to overhead and byte alignment issues that means a UUID takes 40 bytes of space, instead of the 16 bytes it would need if it were just an 128-bit integer. And that space usage is repeated for every index, so for a simple table with 2 UUIDs that creates a many-to-many relationship and has 2 indexes this is the difference between 200 and 80 bytes.

That isn’t all though. Even though the operations performed on UUIDs are pretty limited (equality testing and sorting are pretty much the only relevant operations), they are also impacted by the difference between a string and an integer. Operations on integers are simple, bitwise operations. Operations on strings have to take the locale into account which means convoluted (multi-byte) functions. So not only is the storage smaller and thus faster, operations are faster too.

In a follow up I will provide some benchmarks of the performance difference between native UUIDs and VARCHAR UUIDs. Also, ColdFusion users can look forward to a way to get the PostgreSQL UUID datatype working from ColdFusion and for a patch that will make PostgreSQL accept the 35-byte string representation of ColdFusion UUIDs.