The common misconception that a serializable transaction will single thread access to an entire table or even an entire database is pretty hard to eradicate. And the sad thing is that everybody just keeps repeating other supposedly informed sources instead of doing a little experimentation. So I have gathered some scripts that will show you how 2 serializable transactions are writing to the same database table simultaneously. When you run these scripts, run them from top to bottom, the commands on the left through one connection, the commands on the right through another connection.
PostgreSQL
There is no need to set up any configuration in PostgreSQL as it will be fully concurrent out of the box.
CREATE TABLE test (ID INT PRIMARY KEY); SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN TRANSACTION; INSERT INTO test (id) VALUES (1); SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN TRANSACTION; INSERT INTO test (id) VALUES (2); COMMIT; SELECT * FROM test WHERE id = 2; SELECT * FROM test; COMMIT; SELECT * FROM test;
As you can see, despite having 2 serializable transactions, the transaction that was started last commits first. And none of the queries block at any point.
MS SQL Server
By default MS SQL Server 2005 is somewhat less concurrent then PostgreSQL because it does predicate locking. As long as you stay out of the way of the predicates of other transactions it is still concurrent, but when you cross other running transactions you might need to wait for those other transactions to commit or rollback. So here is the first example.
CREATE TABLE test (ID INT PRIMARY KEY); SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN TRANSACTION; INSERT INTO test (id) VALUES (1); SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN TRANSACTION; INSERT INTO test (id) VALUES (2); COMMIT; SELECT * FROM test WHERE id = 2; SELECT * FROM test; COMMIT; SELECT * FROM test;
Again you can see the transaction that was started last commits first. But while the query for record 2 from the test table didn’t block, the query for the entire table blocked until the first transaction was committed. This is caused by predicate locking.
So what does serializable mean?
Mathematically speaking serializable means that it is possible to replay all the queries that were executed simultaneously one after another in a single threaded fashion and still get the same results. In terms of the SQL standard it doesn’t mean that. In terms of the SQL standard it just means that 3 specific events are not possible (and I quote this directly from SQL/IEC 9075-2:2003):
- Dirty read: SQL-transaction T1 modifies a row. SQL-transactionT2 then reads that row before T1 performs a COMMIT. If T1 then performs a ROLLBACK, T2 will have read a row that was never committed and that may thus be considered to have never existed.
- Non-repeatable read: SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.
- Phantom: SQL-transaction T1 reads the set of rows N that satisfy some <search condition>. SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by the SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same <search confition>, it obtains a different collection of rows.
The above is what serializable means in a database. It does not mean mathematical serializability. It does not mean full table locks. It just means that these three specific events can not occur.
Brad Wood says:
Thanks for the good explanation of transaction isolation levels. This is only something I have fairly recently come to grasp. Good, understandable commentary isn’t always around to help.
2008/02/12, 17:18Tom Kitta says:
Thanks a lot for informative post. I just discovered a problem with one of our DBs where I have supposedly rare phantoms show up once per about 10000 transactions. Query A checks whatever user response was recorded in DB, if it was not query B inserts it. Once in 10000 times query A says “nothing in DB” while an insert is pending. B initiates another insert and I end up with two records that have identical data and identical time stamp. I am hoping placing A and B in serialize block will solve the issue.
2008/02/28, 01:04T1 says:
What is the best research book on this subject?
2008/03/29, 14:44Jochem says:
If you are looking for the best book on a specific database, start with the manual. Just read it cover to cover.
If you are looking for something that is not specific to a particular database, I would recommend starting with Architecture of a Database System. If you want to get really theoretical you could try Concurrency Control and Recovery in Database Systems.
But what you should never forget is that you can just try things. You can just start a query editor and write queries. And then start another one and see how they interact.
2008/03/29, 19:35Bernd says:
Good article - but I still can’t solve my problem Maybe you can help me with that.
I have an XML Document stored in an Oracle DB. There are 4 CF functions (delete,rename,insert, update) to make changes in the XML document. I need to make sure that while one function changes the document no other function can alter the document. As there are a couple of CF Servers I can’t use CF lock. The lock needs to take place in the DB. How can I do that?
Thanks in advance
Bernd
2008/04/30, 15:10murat yĆ¼cel says:
thanks a lot it helped me so much
2008/12/08, 10:07