MySQL/MariaDB cursors and temp tables

In MariaDB and MySQL, cursors create a temporary table.

Does this statement deserve a whole blog post? Apparently not. However, in some cases one does not expect a temporary table to be created:

  • SELECT ... FOR UPDATE: An exclusive lock is created, yes, but you still read data from a temporary table.
  • SELECT FROM a temporary table: you are reading from a temporary tables, yes, but an internal temporary table is created anyway.
  • Impossible WHERE and LIMIT 0.

A quick example:

CREATE TEMPORARY TABLE t
(
        id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY
);

CREATE PROCEDURE p()
BEGIN
        DECLARE c CURSOR FOR
                SELECT id FROM t WHERE 0 LIMIT 0 FOR UPDATE;
        OPEN c;
        CLOSE c;
END;

MySQL [test]> SHOW STATUS LIKE 'Created_tmp_tables';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Created_tmp_tables | 31    |
+--------------------+-------+
1 row in set (0.00 sec)

MySQL [test]> CALL p();
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> SHOW STATUS LIKE 'Created_tmp_tables';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Created_tmp_tables | 32    |
+--------------------+-------+
1 row in set (0.00 sec)

I am not complaining, and I don’t even know if this behavior can be changed. But one should certainly be aware of this behavior. For example, one could think that creating a temporary table one time and then loop on that table with cursors several times is an optimization – but that’s not the case.

Federico

MariaDB/MySQL Cursors: a brief Tutorial

Versione italiana

In MariaDB and MySQL, Cursors can only be used within a Stored Program, are slow, and have very limited functionalities. That said, they can still be useful in some cases. This page explains how to use them in action, with a trivial example.

Here is our example Stored Procedure which uses a Cursor:

DELIMITER ||
    
DROP TABLE IF EXISTS `test`.`tab1`;
CREATE TABLE `test`.`tab1` (`c` TINYINT UNSIGNED);
INSERT INTO `test`.`tab1` (`c`) VALUES
    (1),
    (2),
    (3);
    
DROP PROCEDURE IF EXISTS `test`.`demo`;
CREATE PROCEDURE `test`.`demo`()
    READS SQL DATA
BEGIN
    DECLARE `eof` BOOL;
    DECLARE `val` INT;
    
    DECLARE `crs` CURSOR FOR
        SELECT `c`
            FROM `test`.`tab1`;
    
    DECLARE CONTINUE HANDLER
        FOR NOT FOUND
        SET eof = TRUE;
    
    OPEN `crs`;
    `lp`: LOOP
        FETCH `crs` INTO `val`;
        IF `eof` IS TRUE THEN
            LEAVE `lp`;
        END IF;
        
        SELECT `val`;
    END LOOP;
    CLOSE `crs`;
    
    -- workaround for server bug
    DO (SELECT `c` FROM `test`.`tab1` WHERE FALSE);
END;
    
||
DELIMITER ;


This Procedure is totally useless in the Real World, but serves us as an example: it returns a different ResultSet for each row in test.tab1 table. Now, I’m going to explain how I implemented it.

First, I DECLARE some variables. eof is very important, becuase it indicates if the Cursor has read all rows from its query; for now, it’s FALSE. We’ll use it later. BOOL doesn’t really exist in Maria and My, and is mapper to TINYINT, while TRUE is just a synonym for 1, and FALSE is a synonym for 0.But using the BOOL, TRUE and FALSE keywords makes code more readable. val is the variable which contains the value that we’re going to read from the table.

Then, I DECLARE the Cursor. In this statement there are the Cursor’s name (crs) and its query. The query must be hardcoded here, because dynamic SQL is not allowed for Cursors. However, there is a workaround for this: see my post Cursors for Dynamic SQL.

Then, I declare an Error HANDLER for the Errors which are in the NOT FOUND class. Here we are interested in the 1329 error, which occurs when the Cursor has read all the rows from the query, and tries to read another row. I’ll explain this better later.

Now I OPEN the Cursor. Its query is executed at this point.

And then, there is a LOOP on the rows. What happens here? Well, first we try to FETCH a new row. If this operation succedes the value of the only field we’re SELECTing is copied into val. We check the value of eof, and since it is FALSE, we skip the IF part. And then we produce a ResultSet with the value we have read, and it is sent to the client. This happens three times, because we created a table with three rows.

The fourth time, the FETCH statement fails, because there are no more rows. An Error raises, so we go into the HANDLER which we defined before. Here, the eof variable is set to TRUE. Since it is a CONTINUE HANDLER, after that statement the Routine execution is resumed and finds again the IF. This time eof is TRUE, so we step into the IF. And we find a LEAVE statement which refers to the lp label. lp is the LOOP, so we step out of the LOOP.

Then, we CLOSE the Cursor.

Now, we have a little problem. Versions of MySQL older than 5.6, and MariaDB, have a bug (which MariaDB is solving, however) that makes the 1329 Error survive after the HANDLER. If we don’t do something to remove that error, it will be sent to the client. But if we run a query against a physical table, the Error is forgotten (the exact reason is beyond the purpose of this post). We don’t want another ResultSet to be returned, so we use the SELECT as a subquery of the DO statement (which doesn’t return anything). WHERE FALSE is used to make sure that we only add an acceptable overhead to the Procedure.

Exercises

This time I’ll give you some simple excercises. If you solve them, you can post your code in a comment, if you want. If you can’t solve them, you can post some questions so I’ll help you. I suggest you try these excercises, because you don’t really learn much without practice.

1) Modify the demo Procedure so that it terminates if it finds more than one NULL value.

2) Write a new Procedure which reads rows from the PROCESSLIST table, in the information_schema database, and KILLs all connections that are idle and exist since more than N seconds – where N is a Procedure’s parameter.

Enjoy!

MariaDB/MySQL: Cursors for Dynamic SQL

Note: In this post, I always refer to MariaDB. As far as I know, the techniques and the limitation I’m describing here also apply to Percona Server and MySQL, including 5.6. But since I didn’t test the examples, I just refer to MariaDB (tested on 5.5).

Terminology

A query is a SQL statement that returns some results (SELECT, SHOW, DESCRIBE, EXPLAIN, HELP).

A Stored Program is a Stored Routine, a Trigger or an Event.

A Stored Routine is a Stored Procedure or a Stored Function.

A Dynamic SQL statement is astring composed by a Stored Program, which will be executed as a Prepared Statement. Even within normal Prepared Statements you can insert some values, for example you can “pass” them an ID which will be used in the WHERE clause. But there are many limitations: you can’t compose the ORDER BY clause, or decide what columns should be SELECTed, for example. An example of Dynamic SQL is:

SET @sql = CONCAT('SELECT `', @col_name, '` AS `value` FROM `', @tab_name, '` LIMIT 1;');

The Problem

MariaDB cursors can’t be used with Dynamic SQL. Their syntax is:

DECLARE <cursor_name> CURSOR FOR <query>

But is a hard-coded SQL statement, and can not be a variable. So, if you can’t hard-code your query because it should contain some variable parts, you can’t use a cursor to read the result set.

Hey, wait a moment… can’t you? Well, actually you can. You must write some code and use some dirty tricks, but you can.

The solution (Code)

Here’s the code we will use to implement “Dynamic Cursors”. I will explain it later:

DELIMITER ||
 
 
DROP PROCEDURE IF EXISTS `dyn_lock`;
CREATE PROCEDURE `dyn_lock`(IN `query` TEXT, IN `lock_name` CHAR(64), IN `timeout` TINYINT UNSIGNED)
	NO SQL
	LANGUAGE SQL
	COMMENT 'Get a lock for dyn cursor'
BEGIN
	-- output from GET_LOCK()
	DECLARE lock_res TINYINT UNSIGNED;
 
	-- error in query
	DECLARE CONTINUE HANDLER
		FOR SQLEXCEPTION
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Dynamic SQL returned an error';
 
	-- get lock or exit with err
	SET lock_res = GET_LOCK(`lock_name`, IFNULL(`timeout`, 5));
	IF (lock_res IS NULL) THEN
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Could not acquire lock: Unknown error';
	ELSEIF (lock_res = 0) THEN
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Could not acquire lock: Timeout expired';
	END IF;
 
	-- create dynamic view
	SET @dyn_sql = CONCAT('CREATE OR REPLACE VIEW `', lock_name, '` AS ', query, ';');
	PREPARE stmt_dyn_view FROM @dyn_sql;
	EXECUTE stmt_dyn_view;
	DEALLOCATE PREPARE stmt_dyn_view;
END;
 
 
DROP PROCEDURE IF EXISTS `dyn_cursor`;
CREATE PROCEDURE `dyn_cursor`(IN `query` TEXT, IN `timeout` TINYINT UNSIGNED, OUT x TEXT)
	READS SQL DATA
	LANGUAGE SQL
	COMMENT 'Simulate dynamic cursor'
BEGIN
	-- we assume there is only 1 column to be read;
	-- TEXT is good for any type
	DECLARE val TEXT;
	-- no more rows flag
	DECLARE eof BOOL;
 
	-- select data from a view. only the view is really dynamic
	DECLARE cur_dyn CURSOR FOR
		SELECT * FROM dyn_view;
 
	-- set eof flag
	DECLARE CONTINUE HANDLER
		FOR NOT FOUND
		SET eof = TRUE;
 
	CALL `dyn_lock`(`query`, 'dyn_view', `timeout`);
 
	-- open cursor on view
	OPEN cur_dyn;
 
	-- now that we have the cursor, we can release lock
	DO RELEASE_LOCK('dyn_view');
 	
	lp_dyn: LOOP
		IF eof = TRUE THEN
			LEAVE lp_dyn;
		END IF;
		
		FETCH cur_dyn INTO val;
	END LOOP;
 
	CLOSE cur_dyn;
 
	SET x = val;
END;
 
 
||
DELIMITER ;

Just few notes before explaining this technique:
* The lp_dyn loop, in the dyn_cursor() PROCEDURE, is the part where we do something with the cursor we created. I could have leave out that part because it’s not really part of the general solution, but I needed some code which demonstrates that my solution works. If you leave out that part, then you don’t need the x OUT parameter and the val local variable.
* The user should call dyn_cursor(), not dyn_lock(). The latter is called by the former. This will be explained later.
* dyn_cursor() properly works if you pass a query which only produces 1 column. The last value returned by the query is assigned to the x OUT parameter. SELECT the variable you pass, to check that your query works and the result set has been correctly read by the cursor.
* To test the procedure, you can CALL something like these (call them from different connections to test concurrency):

CALL `dyn_cursor`('SELECT `ENGINE` FROM `information_schema`.`ENGINES`', NULL, @var1);
CALL `dyn_cursor`('SELECT 1 AS `uno` FROM `information_schema`.`TABLES`', NULL, @var2);
CALL `dyn_cursor`('SELECT -;', NULL, @var2);

The solution (Theory)

The technique is quite simple. Here’s the work flow.

The dynamic query is received through the query IN parameter. This happens in my example – please, note that in real-world examples, you could compose that string in the same PROCEDURE. Or it could be a FUNCTION, instead of a PROCEDURE.

We need to acquire a lock, so we call dyn_lock(). It is not the table-level lock you would acquire with a LOCK TABLE statement, but another type of lock. It’s more like a logical named lock entirely handled by four SQL functions (GET_LOCK(), RELEASE_LOCK(), IS_USED_LOCK(), IS_FREE_LOCK()). This kind of locks is interesting, because you may want to acquire locks that are not bound to a table, or you are using techniques that are affected by table locks in a strange way (transactions, triggers, etc). I will not fully explain the mechanism in this post, but here’s what we do. We acquire a lock named 'dyn_view' using GET_LOCK(). If that lock has been acquired by another connection, we wait for the number of seconds specified as `timeout` parameter. If `timeout` is NULL, the default value we decide is 5 (please, use the value you think is most appropriate in your situation). We could define two default values, one in dyn_cursor() and one in dyn_lock(). Later, the reason for this will be clear.

If the lock can not be acquired, dyn_lock() throws an error. The SQLSTATE '45000' is the one suggested by the MySQL documentation for user errors. The message is not complete, because this is just an example. But it should contain the name of the Stored Routine that thrown it, if you want make debug easier.

Then, dyn_lock() creates a view with the specified name. And then the execution exits from dyn_lock(). Here we open the cursor (alas, this can’t be done within dyn_lock() because cursors exist at a Routine level), and we release the lock using RELEASE_LOCK().

Let’s talk about the view. Its AS clause is the statement passed through `query` parameter. So, doing a SELECT * on that view is like directly running `query`. The performances are the same, or almost the same. The advantage is that SELECT * FROM `dyn_view` can be hard-coded in a cursor definition (we did it in dyn_cursor()), but `query` is a string that can be generated dynamically.

Views can not be temporary in MariaDB/MySQL (they can in Postgres). This means that when you create/replace a view, other connections could use it. In our case, we must prevent a concurrency like this:
1) A creates `dyn_view`;
2) Just 0.0000001 seconds after, B replaces `dyn_view`;
3) A executes the query on `dyn_view`, knowing not that it has been replaced.
To avoid this, we use the lock I described.

We want to release the lock as soon as possible, to increase performances. We can do this just after opening the cursor. It will work on a query, not on the view itself – so, we don’t need it anymore, and can allow other connections to replace it. In fact, the view is always created (if it doesn’t still exists) or replaced. It is never drop, to speed up things.

It is important that the view and the lock have the same name and are created within dyn_lock(). This way, dyn_lock() contains (part of) the generalized code. Not only dyn_cursor(), but also other Stored Routines, can call dyn_cursor() to get a lock and a view with a given name. If several Routines call dyn_cursor() passing the the same name, they can’t be executed at the same time. For the sake of brevity, I will not show here this case here, but it can be useful. If different Routines call dyn_cursor() with different names, they can be executed at the same time. But still two connections can not execute the same Routine at the same time. However, it is important that Routines release the lock name they acquired as soon as possible (again, see RELEASE_LOCK()).

To Do (or not To Do)

GET_LOCKS() accepts a timeout in seconds, not microseconds. For short queries, you may want to specify a value like 0.5. Or 1.5. If this limitation is a problem for you, you may consider to re-implement dyn_lock() and write an unlocking PROCEDURE. You could acquire locks by inserting your CONNECTION_ID() into a MEMORY (but not temporary!) table, and release the lock by deleting that ID. To check if a lock has already been acquired by someone else, you could use a SELECT ... FOR UPDATE. But there is a complication: a query could be KILLed before it releases its lock. So, you also need to write in the table a TIMESTAMP that indicates when the lock has been acquired. After a given timeout (which can be constant or variable), the lock expires.

If you re-implement the locking system, and you have a high concurrency, you may also consider to use more than one view. The SELECTs on the view must be hard-coded, so the number of allowed views and their names must be hard-coded too. But you could allow 3 view called dyn_view1, dyn_view2, dyn_view3, and create 3 locks with the same names in the MEMORY table. But if you use the GET_LOCK() and RELEASE_LOCK() functions, you need to use only 1 lock, and thus only 1 view.

Please, note that re-implementing the locking system will improve concurrency and timeout granularity, but will add complexity, and thus should reduce the performances in an environment where concurrency is almost irrilevant.

For the sake of brevity, errors generated by the Dynamic SQL are not handled here. But you should handle them.

Limitations and Feature Requests for MariaDB

* WARNING: because of a bug, in MySQL you must avoid to pass negative numbers to GET_LOCK() as timeout, because it is an endless timeout. In MariaDB, this only happens if you pass very low negative numbers (MDEV-4017).
* First of all: if MariaDB adds support for cursors on dynamic SQL statements, all this code will no longer be necessary.
* LIMITATION: This technique can not be used recursively – that is, the query you pass to dyn_cursor() can not contain a call to dyn_cursor(). This is because dyn_lock() uses a Prepared Statement. MariaDB/MySQL’s Prepared Statements exist at a connection-level, and are not recursive. I hope that this in the future we will have a way to implement this type of recursivity.
* #MDEV-4028 GET_LOCK() should accept a timeout argument with granularity in microseconds.
* Dynamic SQL could generate errors or warnings. In that case, it would be useful to get an error message specifying the name of the Prepared Statement and the Stored Routine that thrown it.
* If a Stored Routine could return a cursor or a result set, we could move the OPEN and DO RELEASE_LOCK() lines into dyn_lock(). If we could do this, the code would be easier to maintain.

I use a lot of Stored Programs, and I am working (when I have the time to do it) on a couple of libraries I would like to release publicly. But there are strong limitations on what you can do with Routines, at the moment. I will write a post about what features I think that could make a developer’s life easier. Maybe some of them seem useless, until one starts to use Routines in real life – then, you understand that they could help you writing better (mantainable, general, faster) code. And maybe some of them are not so difficult to implement… at least, I hope so.

See also:

Enjoy!