Reusing Prepared Statements in MariaDB 10.1

I may be wrong, but I think that MariaDB has a strange characteristic: it has many good features, some of which were not implemented intentionally. Well, this sentence is weird too, I know. But I have some examples: I’m not sure that the SEQUENCE engine was designed to generate a sequence of dates and/or times; but it can. And I believe that the CONNECT author had no idea that someone would have used his engine to create cursors for CALL statements; but I do.

Now I have a new example. MariaDB 10.1 supports compound statements out of stored procedures. Which means that you can write IF or WHILE in your install.sql files to create your databases in a dynamic way. This is done via the BEGIN NOT ATOMIC construct.

I played with this feature, like I usually do with new features. And what I’ve found out is amazing for me: BEGIN NOT ATOMIC allows us to nest prepared statements!

Uh, wait… maybe what I’ve just written sounds weird to you. Maybe you’re thinking: “prepared statements can be nested since they were first implemented!”. Which is only true in the documentation. The docs doesn’t lie of course, but it doesn’t work out there, in the real world’s complexity. PREPARE, EXECUTE and DEALLOCATE PREPARE statements cannot be prepared, and this limitation can be very frustrating if you try to write a reusable (perhaps public) stored procedure library.

I tried to explain the reason in this post, but it was becoming way too long, so I had to delete that boring explanation. I’ll just mention an example. You can write a procedure that prepares and executes a statement; but the prepared statement cannot call the procedure itself, recursively. Why? Because you cannot reuse its name, and annot dynamically generate a name for the new prepared statement. If this explanation is too short, just code and you’ll find out.

How can BEGIN NOT ATOMIC possibly fix this problem? Well, for a reason that’s far beyond me, the following blocks can be prepared and executed:

  • BEGIN NOT ATOMIC PREPARE ... ; END;
  • BEGIN NOT ATOMIC EXECUTE ... ; END;
  • BEGIN NOT ATOMIC DEALLOCATE PREPARE ... ; END;

Now you may be thinking that this feature is totally useless. But it isn’t. Thanks to this change, I’ve written a procedure that:

  • Executes a specified SQL string.
  • Autogenerates a “free” name for that statement, or uses an id passed by the user.
  • Returns the autogenerated id, so you can reuse it, or deallocate the statement.

Writing this procedure has been a bit annoying, because after all it uses a dirty trick. But now the procedure is written, and the dirty trick is encapsulated in it. You can use it as if it was a native feature, and forget the trick. Here’s the code:

CREATE DATABASE IF NOT EXISTS _;

CREATE TABLE IF NOT EXISTS _.prepared_statement
(
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
)
    ENGINE = MEMORY
;

CREATE PROCEDURE _.exec(IN p_sql TEXT, INOUT p_stmt_id INTEGER UNSIGNED)
BEGIN
    IF p_stmt_id IS NULL THEN
        INSERT INTO _.prepared_statement (id) VALUES (DEFAULT);
        SET p_stmt_id := LAST_INSERT_ID();
    END IF;
    
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC PREPARE stmt_dyn_', p_stmt_id, ' '
        , 'FROM ', QUOTE(p_sql), IF(RIGHT(p_sql, 1) = ';', ' ', '; ')
        , 'END;'
    );
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC EXECUTE stmt_dyn_', p_stmt_id, '; END;'
    );
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    DEALLOCATE PREPARE _stmt_exec;
    SET @_SQL_exec := NULL;
END;

How do I use it? Very simple:

MariaDB [_]> -- redundant: @id is not set, so it's NULL
MariaDB [_]> SET @id := NULL;
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> CALL _.exec('SELECT 42 AS answer', @id);
+--------+
| answer |
+--------+
| 42 |
+--------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> -- reuse @id
MariaDB [_]> CALL _.exec('SHOW SCHEMAS LIKE \'info%\'', @id);
+--------------------+
| Database (info%) |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

I am writing a general purpose stored procedure library, that should make stored procedures developing more friendly. It will include this procedure, as well as a procedure for deallocating a specified statement, or all statements you prepared. As soon as the code is interesting and tested, I’ll make it public.

Enjoy!
Federico

SQL Games

I have just created a GitHub repository called sql_games. It contains games implemented as stored procedures, that can run on MariaDB or (with some changes) on Percona Server or Oracle MySQL.

You play the games via the command-line client. You call a procedure to make your move, then a text or an ASCII image appears.

Of course the same call should produce a different effect, depending on the game’s current state. To remember the state I use both user variables and temporary tables.

Why did I do that? Mainly because it was funny. I can’t explain why. And I can’t tell that it is funny for anyone: perhaps, where I’ve found interesting challenges, someone else would find a cause a frustration. But yes, it has been funny for me.

Also, I did it because I could. This means that others can do it. Stored procedures are not a useless and nasty feature that users should ignore – they are useful tools. Yes, several times I’ve complained that they need important improvements. But I complain because I like them.

Currently, three games are implemented:

  • Anagram – The anagram game. See a randomly generated anagram and try to guess the word. You can choose a language, and a min/max word length.
  • Bulls And Cows – If you don’t know the game, take a look at the Wikipedia page.
  • Hangman –  Try to guess letters and see an ASCII hangman materializing slowly while you fail.

Each game will be installed in a different database. CALL help() in the proper database to see how to play.

Drop Primary Key if exists in MariaDB

MariaDB improved the use of IF EXISTS and IF NOT EXISTS clauses in two ways: they added them to almost all DDL statements (along with OR REPLACE), and added them to several ALTER TABLE sub-commands.

So one can run, for example:

ALTER TABLE nice_table DROP INDEX IF EXISTS ugly_index;
DROP INDEX IF EXISTS ugly_index ON nice_table;

However, there is no such thing as DROP IF EXISTS PRIMARY KEY. Why? I think that we all can easily guess: a primary key is too important to be unsure if it exists or not. That’s a reasonable assumption… in theory.

In practice, all limitations are, at least in rare cases, not so reasonable. Yes, today I had to write a procedure that drops the PK if it exists, knowing that sometimes it doesn’t. Explaining the logic behing this weird behavior requires space and time (that seem to be the same thing, if we trust modern physics). I’ll just say that I need to do complex normalization tasks – data from 1 table to 9 tables… really. In the process I need to accept duplicates, and later remove them.

So, can we easily drop/create a PK IF [NOT] EXISTS? Yes, because from a DDL’s point of view, a PK is just an index called `PRIMARY`. So:

DROP INDEX IF NOT EXISTS `PRIMARY`;

Enjoy!
Federico

MariaDB/MySQL: ON REPLACE triggers

There are several reasons why, generally, in MySQL/MariaDB one should not use REPLACE as a shortcut for SELECT + (UPDATE or INSERT). One of these reasons is the way REPLACE fires triggers. Of course it does not fire UPDATE triggers, and there is no such thing as a REPLACE trigger; DELETE and INSERT triggers are fired instead. What is not obvious is the order in which REPLACE activates triggers (UPDATE: this beavior was undocumented; I documented it here):

  1. BEFORE INSERT;
  2. BEFORE DELETE (if a row is being replaced);
  3. AFTER DELETE (if a row is being replaced);
  4. AFTER INSERT.

Knowing this, we can create REPLACE triggers. Or, better said, we can create a Stored Procedure that is invoked before or after each row’s replacement. It is not exactly easy, but it can be done. Some user variables will help us in two ways:

  • They will tell each trigger which triggers have been fired before;
  • they will be used to acces NEW values in a DELETE trigger.

Now, take a look at the following example. The notes below will make some points clearer.

CREATE OR REPLACE TABLE xmp
(
    a INT NOT NULL PRIMARY KEY,
    b INT NOT NULL
)
    ENGINE = InnoDB
;

DELIMITER ||
CREATE PROCEDURE xmp_on_replace(IN old_a INT, IN new_a INT, IN old_b INT, IN new_b INT)
    CONTAINS SQL
BEGIN
    SET @out := CONCAT(
        'In column b, we are replacing ', old_b, ' with ', new_b, '. Cool!'
    );
END ||

CREATE TRIGGER xmp_bi
    BEFORE INSERT
    ON xmp
    FOR EACH ROW
BEGIN
    SET @maybe_replacing := TRUE;
    SET @new_a := NEW.a;
    SET @new_b := NEW.b;
END ||

CREATE TRIGGER xmp_ai
    AFTER INSERT
    ON xmp
    FOR EACH ROW
BEGIN
    SET @maybe_replacing := FALSE;
END ||

CREATE TRIGGER xmp_bd
    BEFORE DELETE
    ON xmp
    FOR EACH ROW
BEGIN
    IF @maybe_replacing THEN
        SET @maybe_replacing := FALSE;
        CALL xmp_on_replace(OLD.a, @new_a, OLD.b, @new_b);
        SET @new_a := NULL;
        SET @new_b := NULL;
    END IF;
END ||
DELIMITER ;

When no row is being replaced, the workflow will be the following:

  1. The BEFORE INSERT triggers sets some user variables, that will only be useful if a row will be replaced.
  2. The AFTER INSERT trigger resets these variables.

If a row is being replaced, the following workflow will take place:

  1. The BEFORE INSERT trigger will set some variables.
  2. The BEFORE DELETE trigger will reset them and call xmp_on_replace(); both OLD.* and NEW.* values are passed, thanks to the user variables.
  3. xmp_on_replace() gives us some information about the row being replaced.

Now, let’s test this example:

MariaDB [test]> INSERT INTO xmp VALUES (2, 0);
Query OK, 1 row affected (0.01 sec)

MariaDB [test]> REPLACE xmp VALUES (1, 100), (2, 200), (3, 300);
Query OK, 4 rows affected (0.01 sec)
Records: 3 Duplicates: 1 Warnings: 0

MariaDB [test]> SELECT @out;
+-------------------------------------------------+
| @out                                            |
+-------------------------------------------------+
| In column b, we are replacing 0 with 200. Cool! |
+-------------------------------------------------+
1 row in set (0.00 sec)

Hey, it works as expected!

Enjoy!
Federico

How much memory should we assign to MariaDB?

How much memory should we assign to MariaDB?

Before answering, I’ll show you a procedure I use to get the amount of memory I’m using. See the comments to make it work on Oracle MySQL.

CREATE PROCEDURE _.show_buffers()
	CONTAINS SQL
	COMMENT 'Show information about buffers size'
BEGIN
	SET @memory_per_thread := (
			  @@global.read_buffer_size
			+ @@global.read_rnd_buffer_size
			+ @@global.sort_buffer_size
			+ @@global.thread_stack
			+ @@global.join_buffer_size
			+ @@global.binlog_cache_size
		);
	SET @global_memory := (
			  @@global.innodb_buffer_pool_size
			+ @@global.innodb_additional_mem_pool_size
			+ @@global.innodb_log_buffer_size
			+ (SELECT SUM(FULL_SIZE) FROM information_schema.KEY_CACHES) -- MySQL: @@global.key_buffer_size
			+ @@global.aria_pagecache_buffer_size -- only MariaDB
			+ @@global.query_cache_size
		);
	SELECT
		  @global_memory AS global_memory
		, @memory_per_thread AS memory_per_thread
		, @@global.max_connections AS max_connections
		, @memory_per_thread * @@global.max_connections AS max_session_memory
		, CAST((@global_memory + @memory_per_thread * @@global.max_connections) AS UNSIGNED) AS max_total_memory
		;
	SET @memory_per_thread := NULL;
	SET @global_memory := NULL;
END;


Results on my dev machine:

MariaDB [(none)]> CALL _.show_buffers() \G
*************************** 1. row ***************************
     global_memory: 419430400
 memory_per_thread: 2949120
   max_connections: 151
max_session_memory: 445317120
  max_total_memory: 864747520
1 row in set (0.00 sec)


So, what do these columns mean?

  • global_memory: Memory used by global buffers.
  • memory_per_thread: Memory used by each thread (session buffers).
  • max_connections: Max allowed concurrent connectins (max_connections variable).
  • max_session_memory: Memory used by each thread, multiplied by the max allowed concurrent connections.
  • max_total_memory: Total memory occupied by MariaDB with the maximum of concurrent connections.

Let’s see which settings affect these values.

global_memory = innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + key_buffer_size + aria_pagecache_buffer_size + query_cache_size.

memory_per_thread = read_buffer_size + read_rnd_buffer_size + sort_buffer_size + thread_stack + join_buffer_size + binlog_cache_size.

If you use non-common storage engines, you may want to add more settings to these formulas. For example, if you use TokuDB, you should add tokudb_cache_size.

So, basically, my advice is: Tune these values so that max_total_memory reaches 80% your memory. Start by setting query_cache_size and unused storage engines caches to 0, then tune the buffer pool. Usually you don’t want to touch session buffers.

Why 80%? Well, an higher percentage could be ok, but I consider 80% safer for the general case. Because:

  • The system needs memory, too.
  • InnoDB buffer pool slightly exceeds the limit you set.
  • Some operations (like building a big index or repairing a non-transactional table) require memory.
  • Some connections could resize their session buffers.
  • There are more buffers, that we didn’t consider here.
  • You don’t want MariaDB/MySQL to swap. But if you turn off the swap and you run out of memory, you could experience various failures, or even crashes.

Creating JSON documents with MariaDB

You probably know about two MariaDB great features: Dynamic Columns and Virtual Colums. I don’t think that we already found all possible use cases for them: more ideas will come in the future, and hopefully someone will write about them. Here, I will describe a use case which involves both Virtual Columns and Dynamic Columns.

Now, suppose that we have relational data in MariaDB, and we want to periodically export them in JSON, so we can export them to a variety of other data sources. For most data format, the best exporting solution is generally the CONNECT storage engine; but CONNECT doesn’t currently support JSON, so this option is not available.

Here I will introduce you to the best solution I know. Then, you will find slightly more complex solutions for specific cases.

Our data

Before we can start having fun with MariaDB features, I will show our data. Dear data, this is my reader… say hello!

CREATE OR REPLACE TABLE category
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, name VARCHAR(50) NOT NULL
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO category (name) VALUES ('Chair'), ('Tables');

CREATE OR REPLACE TABLE product
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, category INTEGER NOT NULL
	, name VARCHAR(50) NOT NULL
	, PRICE DECIMAL(9, 2) NOT NULL
	, FOREIGN KEY (category) REFERENCES category (id)
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO product (category, name, price) VALUES
	  (1, 'Black Chair', '49.99')
	, (1, 'Red Chair', '44.00')
	, (2, 'Glass Table', '250.00')
	;

COLUMN_JSON()

MariaDB supports Dynamic Columns. The COLUMN_JSON() function is used to trasform a Dynamic Column value into JSON. The apparent problem here is: we don’t have a dynamic column, we have normal atomic columns.

Well, no problem. After all, a Dynamic Column is just a BLOB (binary) value encoded in a certain way, so that Dynamic Column functions can parse and modify it. We can create a dynamic value using COLUMN_CREATE(), and modify it with COLUMN_ADD(), COLUMN_DELETE(), etc.

That said, the following example query should be self-explanatory:

SELECT CONCAT('[\n\t', GROUP_CONCAT(
		COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', id)
				, 'name', name
				, 'price', price
			)
		)
		ORDER BY id
		SEPARATOR ',\n\t'
	), '\n]') AS json
	FROM product \G


Its output is:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Neat, isn't it? Of course it has some limits:

  • It's expensive. No temptables are needed, but MariaDB does a full table scan and transforms the rows on the fly.
  • What if we want to build a JSON array from multiple (joined) tables? Unless there is a daytime when MariaDB does nothing, we don't want to do more than one table scan.

Let's see how we can optimize the basic idea.

Pre-transforming each product

MariaDB also supports Virtual Columns (other DBMS's call them Generated Columns or Computed Columns). These fields are deterministic expressions that don't involve subqueries, and they can be computer on the fly or permanently stored on-disk. They are useful in a variety of situations, sometimes in combination with Dynamic columns.

In our example, we can use PERSISTENT Virtual Columns to store JSON representations of individual products, so that they don't need be generated on the fly. Here's how:

ALTER TABLE product ADD COLUMN IF NOT EXISTS json TEXT AS (
	COLUMN_JSON(
		COLUMN_ADD(
			COLUMN_CREATE('id', id)
			, 'name', name
			, 'price', price
		)
	)
) PERSISTENT;

To get a complete JSON array from these objects:

SELECT CONCAT(
	  '[\n\t'
	, GROUP_CONCAT(json ORDER BY id SEPARATOR ',\n\t')
	, '\n]') AS json
	FROM product \G

Output:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Pre-trasforming the complete array

Maybe we don't want to query a single table. Since Virtual Columns cannot be computed with a subquery, we cannot use them to put data from other tables into each product's JSON object.

A good alternative is storing JSON object into a summary table, that is updated every time someone modifies the category or product tables.

I don't like to do this kind of things withing the application, because more than one application could modify the tables, and we could also modify the tables via SQL queries (or via a GUI). To reliably guarantee integrity in such cases, I prefer to use foreign keys and triggers.

I will not show the complete example here, because it is too complex: three kind of actions (INSERT/UPDATE/DELETE) should be handled (or forbidden) for both the product and category tables. I will just show the summary table, and the stored function that creates a JSON object for one product. INSERT and UPDATE triggers will call this procedure when necessary. Here's the code:

CREATE OR REPLACE TABLE json
(
	  product INTEGER NOT NULL PRIMARY KEY
	, json TEXT
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

DELIMITER ||
CREATE PROCEDURE make_product_json(IN p_product_id INTEGER)
	MODIFIES SQL DATA
BEGIN
	REPLACE INTO json SELECT
		p_product_id
		, COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', p.id)
				, 'name', p.name
				, 'price', p.price
				, 'category', c.name
			)
		) AS json
		FROM product p
		LEFT JOIN category c
			ON p.category = c.id
		WHERE p.id = p_product_id
	;
END
||
DELIMITER ;

MariaDB and Stored Procedures: errata & repository

In the article How MariaDB makes Stored Procedures usable I explained how to use the MariaDB CONNECT Storage Engine to workaround Stored Procedures limitations. Please read that article, if you didn’t, for details.

Since the technique to workaround such limitations is verbose and error-proof, I also proposed a procedure which makes it easy and convenient. Even thought this is a simple country blog, I have immediately received positive feedback after that article. But then, people started to write me: “Hey, doesn’t work!”. The reason is… MariaDB/MySQL bugs. While investigating these problems, I was able to report the following bugs:

I’m sure there are others, but isolating such bugs to produce decent test cases can be difficult. I will do, if I can.

Basically, while developing the procedures I needed, was that some combinations of (correct) statements could misteryously trigger bugs that are not triggered by other combinations of statements. Despite those procedures being trivial, testing their behavior was frustrating.

I was able to develop a version of these procedures which does not trigger any bug, and I put them in the materialize repository, on GitHub.

I can finally suggest people to try them. If you have any problems (becuase there are bugs in the procedures, on in MariaDB, or you don’t understand something) feel free to contact me at info [At, federico - razzoli /Dot, com (replace [At, with @ and /Dot, with .). I will help. But be very patient or pay me: remember that 3 days are not a long time for a free service. :)

Enjoy!