FOREACH in MySQL/MariaDB stored procedures

One of the annoying limitations of MySQL/MariaDB stored procedures is the lack of a FOREACH construct, which loops on each row returned by a query.

In practice, this forces users to write a lot of code just to tell MySQL how to fetch rows and exit properly. Nesting 2 loops of this kind simply results in unmaintenable code (don’t trust me, just try).

Now, I’m writing a library of views and procedures that I’ll share as open source in the next days, and I decided to finally write my foreach. Well, sort of. It is impossible to use the current stored procedures language to write a flexible foreach, because to loop rows you need a cursor. And cursors are based on a hard-coded query. In this old post I proposed a verbose, ugly, but working solution, but it has a limitation: the number of columns returned by the query must still be fixed. So, I used that technique (based on views) in my procedure, but I had to write different procedures: foreach_1(), foreach_2(), foreach_3(). If you need to read more rows you can modify the code easily, but I think that for an open source library 3 columns is reasonable.

Here I decided to share an experimental prototype. If you have a better idea on how to achieve a similar result, I’ll be glad to trash this hack and use your idea instead.

Also note that for this procedure I used MariaDB 10.2 improved PREPARE statement. If you want to run it on MySQL or older MariaDB versions, you’ll have to make some easy changes and test the procedure.

The code

CREATE PROCEDURE foreach_2(IN in_sql TEXT, IN in_callback_type VARCHAR(9), IN in_callback_body TEXT)
    NOT DETERMINISTIC
    READS SQL DATA
BEGIN
    DECLARE p_sql TEXT DEFAULT NULL;
    DECLARE p_eof BOOL DEFAULT FALSE;
    DECLARE v_p1, v_p2 TEXT DEFAULT NULL;
 
    DECLARE crs_foreach CURSOR FOR
    SELECT p1, p2 FROM vw_foreach;

    DECLARE CONTINUE HANDLER
        FOR NOT FOUND
    BEGIN
        SET p_eof := TRUE;
    END;

    SET in_callback_type := UPPER(in_callback_type);
     IF NOT (in_callback_type IN ('SQL', 'PROCEDURE')) THEN
         SIGNAL SQLSTATE VALUE '45000'
        SET MESSAGE_TEXT = '[foreach_2] Invalid in_callback_type';
    END IF;

    DO GET_LOCK('crs_foreach', 1);

    SET p_sql := CONCAT('CREATE OR REPLACE VIEW vw_foreach AS ', in_sql, ';');
    PREPARE stmt_foreach_2 FROM p_sql;
    EXECUTE stmt_foreach_2;

    OPEN crs_foreach;

    DO RELEASE_LOCK('crs_foreach');
 
    lp_while: WHILE TRUE DO
        FETCH crs_foreach INTO v_p1, v_p2;
 
        IF p_eof THEN
            LEAVE lp_while;
        END IF;

        IF in_callback_type IN ('SQL') THEN
            SET @p1 := v_p1;
            SET @p2 := v_p2;
            PREPARE stmt_foreach_2 FROM in_callback_body;
        ELSE
            PREPARE stmt_foreach_2 FROM CONCAT('CALL `', in_callback_body, '`(', QUOTE(v_p1), ', ', QUOTE(v_p2), ');');
        END IF;
        EXECUTE stmt_foreach_2;
    END WHILE;

    CLOSE crs_foreach;
    DEALLOCATE PREPARE stmt_foreach_2;
END

Now, the usage. There are 2 ways to use this procedure.

Using a callback procedure

First, create a callback procedure which will handle the rows returned by your query:

CREATE PROCEDURE p_echo(IN in_p1 TEXT, IN in_p2 TEXT)
BEGIN
    SELECT CONCAT_WS('.', in_p1, in_p2) AS r;
END

As you can see, the procedure must accept 2 values (foreach_2). The names don’t really matter, but p1 and p2 are clear, at least in this example.

Now, invoke foreach_2():

CALL foreach_2(
    'SELECT TABLE_SCHEMA AS p1, TABLE_NAME AS p2 FROM information_schema.TABLES LIMIT 3;',
    'PROCEDURE',
    'p_echo'
);

We passed the query producing the results we want to loop. Then we told foeach_2() that we want to use a callback procedure, and specified its name.

Using a callback SQL statement

Writing a procedure to perform a simple tasks would be annoying. That’s why I decided to also support callback statements:

CALL foreach_2(
    'SELECT TABLE_SCHEMA AS p1, TABLE_NAME AS p2 FROM information_schema.TABLES LIMIT 3;',
    'SQL',
    'SELECT @p1 AS p1, @p2 AS p2;'
);

As you can see, the callback query can access the values using user variables: @p1 and @p2.

These variables exist at session level, so it is possible that we are overwriting something. But I think there are no reasonable ways to avoid these collisions.

Enjoy!

Advertisements

MySQL/MariaDB cursors and temp tables

In MariaDB and MySQL, cursors create a temporary table.

Does this statement deserve a whole blog post? Apparently not. However, in some cases one does not expect a temporary table to be created:

  • SELECT ... FOR UPDATE: An exclusive lock is created, yes, but you still read data from a temporary table.
  • SELECT FROM a temporary table: you are reading from a temporary tables, yes, but an internal temporary table is created anyway.
  • Impossible WHERE and LIMIT 0.

A quick example:

CREATE TEMPORARY TABLE t
(
        id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY
);

CREATE PROCEDURE p()
BEGIN
        DECLARE c CURSOR FOR
                SELECT id FROM t WHERE 0 LIMIT 0 FOR UPDATE;
        OPEN c;
        CLOSE c;
END;

MySQL [test]> SHOW STATUS LIKE 'Created_tmp_tables';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Created_tmp_tables | 31    |
+--------------------+-------+
1 row in set (0.00 sec)

MySQL [test]> CALL p();
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> SHOW STATUS LIKE 'Created_tmp_tables';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Created_tmp_tables | 32    |
+--------------------+-------+
1 row in set (0.00 sec)

I am not complaining, and I don’t even know if this behavior can be changed. But one should certainly be aware of this behavior. For example, one could think that creating a temporary table one time and then loop on that table with cursors several times is an optimization – but that’s not the case.

Federico

MariaDB/MySQL Cursors: a brief Tutorial

Versione italiana

In MariaDB and MySQL, Cursors can only be used within a Stored Program, are slow, and have very limited functionalities. That said, they can still be useful in some cases. This page explains how to use them in action, with a trivial example.

Here is our example Stored Procedure which uses a Cursor:

DELIMITER ||
    
DROP TABLE IF EXISTS `test`.`tab1`;
CREATE TABLE `test`.`tab1` (`c` TINYINT UNSIGNED);
INSERT INTO `test`.`tab1` (`c`) VALUES
    (1),
    (2),
    (3);
    
DROP PROCEDURE IF EXISTS `test`.`demo`;
CREATE PROCEDURE `test`.`demo`()
    READS SQL DATA
BEGIN
    DECLARE `eof` BOOL;
    DECLARE `val` INT;
    
    DECLARE `crs` CURSOR FOR
        SELECT `c`
            FROM `test`.`tab1`;
    
    DECLARE CONTINUE HANDLER
        FOR NOT FOUND
        SET eof = TRUE;
    
    OPEN `crs`;
    `lp`: LOOP
        FETCH `crs` INTO `val`;
        IF `eof` IS TRUE THEN
            LEAVE `lp`;
        END IF;
        
        SELECT `val`;
    END LOOP;
    CLOSE `crs`;
    
    -- workaround for server bug
    DO (SELECT `c` FROM `test`.`tab1` WHERE FALSE);
END;
    
||
DELIMITER ;


This Procedure is totally useless in the Real World, but serves us as an example: it returns a different ResultSet for each row in test.tab1 table. Now, I’m going to explain how I implemented it.

First, I DECLARE some variables. eof is very important, becuase it indicates if the Cursor has read all rows from its query; for now, it’s FALSE. We’ll use it later. BOOL doesn’t really exist in Maria and My, and is mapper to TINYINT, while TRUE is just a synonym for 1, and FALSE is a synonym for 0.But using the BOOL, TRUE and FALSE keywords makes code more readable. val is the variable which contains the value that we’re going to read from the table.

Then, I DECLARE the Cursor. In this statement there are the Cursor’s name (crs) and its query. The query must be hardcoded here, because dynamic SQL is not allowed for Cursors. However, there is a workaround for this: see my post Cursors for Dynamic SQL.

Then, I declare an Error HANDLER for the Errors which are in the NOT FOUND class. Here we are interested in the 1329 error, which occurs when the Cursor has read all the rows from the query, and tries to read another row. I’ll explain this better later.

Now I OPEN the Cursor. Its query is executed at this point.

And then, there is a LOOP on the rows. What happens here? Well, first we try to FETCH a new row. If this operation succedes the value of the only field we’re SELECTing is copied into val. We check the value of eof, and since it is FALSE, we skip the IF part. And then we produce a ResultSet with the value we have read, and it is sent to the client. This happens three times, because we created a table with three rows.

The fourth time, the FETCH statement fails, because there are no more rows. An Error raises, so we go into the HANDLER which we defined before. Here, the eof variable is set to TRUE. Since it is a CONTINUE HANDLER, after that statement the Routine execution is resumed and finds again the IF. This time eof is TRUE, so we step into the IF. And we find a LEAVE statement which refers to the lp label. lp is the LOOP, so we step out of the LOOP.

Then, we CLOSE the Cursor.

Now, we have a little problem. Versions of MySQL older than 5.6, and MariaDB, have a bug (which MariaDB is solving, however) that makes the 1329 Error survive after the HANDLER. If we don’t do something to remove that error, it will be sent to the client. But if we run a query against a physical table, the Error is forgotten (the exact reason is beyond the purpose of this post). We don’t want another ResultSet to be returned, so we use the SELECT as a subquery of the DO statement (which doesn’t return anything). WHERE FALSE is used to make sure that we only add an acceptable overhead to the Procedure.

Exercises

This time I’ll give you some simple excercises. If you solve them, you can post your code in a comment, if you want. If you can’t solve them, you can post some questions so I’ll help you. I suggest you try these excercises, because you don’t really learn much without practice.

1) Modify the demo Procedure so that it terminates if it finds more than one NULL value.

2) Write a new Procedure which reads rows from the PROCESSLIST table, in the information_schema database, and KILLs all connections that are idle and exist since more than N seconds – where N is a Procedure’s parameter.

Enjoy!

MariaDB/MySQL: Cursors for Dynamic SQL

Note: In this post, I always refer to MariaDB. As far as I know, the techniques and the limitation I’m describing here also apply to Percona Server and MySQL, including 5.6. But since I didn’t test the examples, I just refer to MariaDB (tested on 5.5).

Terminology

A query is a SQL statement that returns some results (SELECT, SHOW, DESCRIBE, EXPLAIN, HELP).

A Stored Program is a Stored Routine, a Trigger or an Event.

A Stored Routine is a Stored Procedure or a Stored Function.

A Dynamic SQL statement is astring composed by a Stored Program, which will be executed as a Prepared Statement. Even within normal Prepared Statements you can insert some values, for example you can “pass” them an ID which will be used in the WHERE clause. But there are many limitations: you can’t compose the ORDER BY clause, or decide what columns should be SELECTed, for example. An example of Dynamic SQL is:

SET @sql = CONCAT('SELECT `', @col_name, '` AS `value` FROM `', @tab_name, '` LIMIT 1;');

The Problem

MariaDB cursors can’t be used with Dynamic SQL. Their syntax is:

DECLARE <cursor_name> CURSOR FOR <query>

But is a hard-coded SQL statement, and can not be a variable. So, if you can’t hard-code your query because it should contain some variable parts, you can’t use a cursor to read the result set.

Hey, wait a moment… can’t you? Well, actually you can. You must write some code and use some dirty tricks, but you can.

The solution (Code)

Here’s the code we will use to implement “Dynamic Cursors”. I will explain it later:

DELIMITER ||
 
 
DROP PROCEDURE IF EXISTS `dyn_lock`;
CREATE PROCEDURE `dyn_lock`(IN `query` TEXT, IN `lock_name` CHAR(64), IN `timeout` TINYINT UNSIGNED)
	NO SQL
	LANGUAGE SQL
	COMMENT 'Get a lock for dyn cursor'
BEGIN
	-- output from GET_LOCK()
	DECLARE lock_res TINYINT UNSIGNED;
 
	-- error in query
	DECLARE CONTINUE HANDLER
		FOR SQLEXCEPTION
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Dynamic SQL returned an error';
 
	-- get lock or exit with err
	SET lock_res = GET_LOCK(`lock_name`, IFNULL(`timeout`, 5));
	IF (lock_res IS NULL) THEN
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Could not acquire lock: Unknown error';
	ELSEIF (lock_res = 0) THEN
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Could not acquire lock: Timeout expired';
	END IF;
 
	-- create dynamic view
	SET @dyn_sql = CONCAT('CREATE OR REPLACE VIEW `', lock_name, '` AS ', query, ';');
	PREPARE stmt_dyn_view FROM @dyn_sql;
	EXECUTE stmt_dyn_view;
	DEALLOCATE PREPARE stmt_dyn_view;
END;
 
 
DROP PROCEDURE IF EXISTS `dyn_cursor`;
CREATE PROCEDURE `dyn_cursor`(IN `query` TEXT, IN `timeout` TINYINT UNSIGNED, OUT x TEXT)
	READS SQL DATA
	LANGUAGE SQL
	COMMENT 'Simulate dynamic cursor'
BEGIN
	-- we assume there is only 1 column to be read;
	-- TEXT is good for any type
	DECLARE val TEXT;
	-- no more rows flag
	DECLARE eof BOOL;
 
	-- select data from a view. only the view is really dynamic
	DECLARE cur_dyn CURSOR FOR
		SELECT * FROM dyn_view;
 
	-- set eof flag
	DECLARE CONTINUE HANDLER
		FOR NOT FOUND
		SET eof = TRUE;
 
	CALL `dyn_lock`(`query`, 'dyn_view', `timeout`);
 
	-- open cursor on view
	OPEN cur_dyn;
 
	-- now that we have the cursor, we can release lock
	DO RELEASE_LOCK('dyn_view');
 	
	lp_dyn: LOOP
		IF eof = TRUE THEN
			LEAVE lp_dyn;
		END IF;
		
		FETCH cur_dyn INTO val;
	END LOOP;
 
	CLOSE cur_dyn;
 
	SET x = val;
END;
 
 
||
DELIMITER ;

Just few notes before explaining this technique:
* The lp_dyn loop, in the dyn_cursor() PROCEDURE, is the part where we do something with the cursor we created. I could have leave out that part because it’s not really part of the general solution, but I needed some code which demonstrates that my solution works. If you leave out that part, then you don’t need the x OUT parameter and the val local variable.
* The user should call dyn_cursor(), not dyn_lock(). The latter is called by the former. This will be explained later.
* dyn_cursor() properly works if you pass a query which only produces 1 column. The last value returned by the query is assigned to the x OUT parameter. SELECT the variable you pass, to check that your query works and the result set has been correctly read by the cursor.
* To test the procedure, you can CALL something like these (call them from different connections to test concurrency):

CALL `dyn_cursor`('SELECT `ENGINE` FROM `information_schema`.`ENGINES`', NULL, @var1);
CALL `dyn_cursor`('SELECT 1 AS `uno` FROM `information_schema`.`TABLES`', NULL, @var2);
CALL `dyn_cursor`('SELECT -;', NULL, @var2);

The solution (Theory)

The technique is quite simple. Here’s the work flow.

The dynamic query is received through the query IN parameter. This happens in my example – please, note that in real-world examples, you could compose that string in the same PROCEDURE. Or it could be a FUNCTION, instead of a PROCEDURE.

We need to acquire a lock, so we call dyn_lock(). It is not the table-level lock you would acquire with a LOCK TABLE statement, but another type of lock. It’s more like a logical named lock entirely handled by four SQL functions (GET_LOCK(), RELEASE_LOCK(), IS_USED_LOCK(), IS_FREE_LOCK()). This kind of locks is interesting, because you may want to acquire locks that are not bound to a table, or you are using techniques that are affected by table locks in a strange way (transactions, triggers, etc). I will not fully explain the mechanism in this post, but here’s what we do. We acquire a lock named 'dyn_view' using GET_LOCK(). If that lock has been acquired by another connection, we wait for the number of seconds specified as `timeout` parameter. If `timeout` is NULL, the default value we decide is 5 (please, use the value you think is most appropriate in your situation). We could define two default values, one in dyn_cursor() and one in dyn_lock(). Later, the reason for this will be clear.

If the lock can not be acquired, dyn_lock() throws an error. The SQLSTATE '45000' is the one suggested by the MySQL documentation for user errors. The message is not complete, because this is just an example. But it should contain the name of the Stored Routine that thrown it, if you want make debug easier.

Then, dyn_lock() creates a view with the specified name. And then the execution exits from dyn_lock(). Here we open the cursor (alas, this can’t be done within dyn_lock() because cursors exist at a Routine level), and we release the lock using RELEASE_LOCK().

Let’s talk about the view. Its AS clause is the statement passed through `query` parameter. So, doing a SELECT * on that view is like directly running `query`. The performances are the same, or almost the same. The advantage is that SELECT * FROM `dyn_view` can be hard-coded in a cursor definition (we did it in dyn_cursor()), but `query` is a string that can be generated dynamically.

Views can not be temporary in MariaDB/MySQL (they can in Postgres). This means that when you create/replace a view, other connections could use it. In our case, we must prevent a concurrency like this:
1) A creates `dyn_view`;
2) Just 0.0000001 seconds after, B replaces `dyn_view`;
3) A executes the query on `dyn_view`, knowing not that it has been replaced.
To avoid this, we use the lock I described.

We want to release the lock as soon as possible, to increase performances. We can do this just after opening the cursor. It will work on a query, not on the view itself – so, we don’t need it anymore, and can allow other connections to replace it. In fact, the view is always created (if it doesn’t still exists) or replaced. It is never drop, to speed up things.

It is important that the view and the lock have the same name and are created within dyn_lock(). This way, dyn_lock() contains (part of) the generalized code. Not only dyn_cursor(), but also other Stored Routines, can call dyn_cursor() to get a lock and a view with a given name. If several Routines call dyn_cursor() passing the the same name, they can’t be executed at the same time. For the sake of brevity, I will not show here this case here, but it can be useful. If different Routines call dyn_cursor() with different names, they can be executed at the same time. But still two connections can not execute the same Routine at the same time. However, it is important that Routines release the lock name they acquired as soon as possible (again, see RELEASE_LOCK()).

To Do (or not To Do)

GET_LOCKS() accepts a timeout in seconds, not microseconds. For short queries, you may want to specify a value like 0.5. Or 1.5. If this limitation is a problem for you, you may consider to re-implement dyn_lock() and write an unlocking PROCEDURE. You could acquire locks by inserting your CONNECTION_ID() into a MEMORY (but not temporary!) table, and release the lock by deleting that ID. To check if a lock has already been acquired by someone else, you could use a SELECT ... FOR UPDATE. But there is a complication: a query could be KILLed before it releases its lock. So, you also need to write in the table a TIMESTAMP that indicates when the lock has been acquired. After a given timeout (which can be constant or variable), the lock expires.

If you re-implement the locking system, and you have a high concurrency, you may also consider to use more than one view. The SELECTs on the view must be hard-coded, so the number of allowed views and their names must be hard-coded too. But you could allow 3 view called dyn_view1, dyn_view2, dyn_view3, and create 3 locks with the same names in the MEMORY table. But if you use the GET_LOCK() and RELEASE_LOCK() functions, you need to use only 1 lock, and thus only 1 view.

Please, note that re-implementing the locking system will improve concurrency and timeout granularity, but will add complexity, and thus should reduce the performances in an environment where concurrency is almost irrilevant.

For the sake of brevity, errors generated by the Dynamic SQL are not handled here. But you should handle them.

Limitations and Feature Requests for MariaDB

* WARNING: because of a bug, in MySQL you must avoid to pass negative numbers to GET_LOCK() as timeout, because it is an endless timeout. In MariaDB, this only happens if you pass very low negative numbers (MDEV-4017).
* First of all: if MariaDB adds support for cursors on dynamic SQL statements, all this code will no longer be necessary.
* LIMITATION: This technique can not be used recursively – that is, the query you pass to dyn_cursor() can not contain a call to dyn_cursor(). This is because dyn_lock() uses a Prepared Statement. MariaDB/MySQL’s Prepared Statements exist at a connection-level, and are not recursive. I hope that this in the future we will have a way to implement this type of recursivity.
* #MDEV-4028 GET_LOCK() should accept a timeout argument with granularity in microseconds.
* Dynamic SQL could generate errors or warnings. In that case, it would be useful to get an error message specifying the name of the Prepared Statement and the Stored Routine that thrown it.
* If a Stored Routine could return a cursor or a result set, we could move the OPEN and DO RELEASE_LOCK() lines into dyn_lock(). If we could do this, the code would be easier to maintain.

I use a lot of Stored Programs, and I am working (when I have the time to do it) on a couple of libraries I would like to release publicly. But there are strong limitations on what you can do with Routines, at the moment. I will write a post about what features I think that could make a developer’s life easier. Maybe some of them seem useless, until one starts to use Routines in real life – then, you understand that they could help you writing better (mantainable, general, faster) code. And maybe some of them are not so difficult to implement… at least, I hope so.

See also:

Enjoy!