Hidden below the “Show Advanced Settings” button on the datasource settings page for MS SQL Server lies the “String format” setting. According to the manual this setting is used to “Enable this option if your application uses Unicode data in DBMS-specific Unicode datatypes, such as National Character or nchar.” In other words, this allows you to use N fields in your database without adding those pesky N’s to your SQL and without butchering your data in the character set conversion process.
What that doesn’t tell you is that this setting can have serious performance consequences. When you enable this setting all variables send to the database using cfqueryparam are sent in their N format. Even the ones where the corresponding database field is not an N field but a regular field. And when that happens you have an encoding mismatch between the parameter and the column. Not really a problem when it is part of an insert statement, but disastrous when it is part of a predicate. Because an encoding mismatch will often prevent MS SQL Server from using indexes to execute the query, the query plan you nicely tuned from the SQL Server Management Studio (without unicode) will degrade into table scans.
Same encoding: Index Seek(OBJECT:([playground].[dbo].[testCase].[testCase_reg]), SEEK:([playground].[dbo].[testCase].[regular]=[@1]) ORDERED FORWARD) Encoding mismatch: Table Scan(OBJECT:([playground].[dbo].[testCase]), WHERE:(CONVERT_IMPLICIT(nvarchar(36),[playground].[dbo].[testCase].[regular],0)=[@P1]))
In a simple testcase with only 256K short records this already accounts for a speed degradation from 1 to 100 milliseconds. I suspect that in real scenario’s the performance impact may very well be magnitudes larger because wide columns will make your I/O explode and there are all sorts of concurrency penalties if you switch from index access to full table scans.
The good news is that it is pretty easy to identify whether you are having this problem. Just profile your SQL Server with the option to trace the event SHOWPLAN ALL and in all your execution plans search for the string CONVERT_IMPLICIT. The bad news is that this is hard to fix. Switching “String format” off means your data gets garbled, switching it on means your server dies from an overload, so pretty much the only solution is to switch your entire database to N fields. That makes you wonder why other databases can do without this setting. For PostgreSQL I know the answer: charsets are a database wide setting so this situation can’t happen, you just switch your entire database to UTF-8 (and since it is UTF-8 and not UCS-16 like in MS SQL Server that doesn’t have the I/O overhead). But how about MySQL, Oracle etc., are they smart enough to do the conversion once on the side of that parameter instead of for every row on the side of the query or is it something else entirely?
Download the MS SQL Server String format testcase
“it could be bunnies” » Blog Archive » Storing ColdFusion G/UUIDs in MS SQL Server says:
[...] 15% faster then using a VARCHAR(35) datatype. In March I posted about the performance effects of the “String Format” setting for MS SQL Server databases when using cfqueryparam. Recently I have been working on checking the performance effects when [...]
2008/12/11, 12:16