MariaDB/MySQL missing features: View comments

The COMMENT clause

All database components (and database themselves) should have a COMMENT clause in their CREATE/ALTER statements, and a *_COMMENT column in the information_schema. For example:

        COMMENT 'We''re lazy. Let''s do nothing!'
        DO NULL;

In fact most database objects have those clauses in MySQL/MariaDB, but not all. Views are an exception.

Comments in code

MariaDB and MySQL have multiple syntaxes for comments. Including executable comments (commented code that is only executed on some MySQL/MariaDB versions).

One can use comments in stored procedures and triggers, and those codes are preserved:

        -- We're lazy. Let's do nothing!
        DO NULL;

But, there are a couple problems:

  • This doesn’t work for views.
  • mysql client strips comments away, unless it’s started with --comments parameter. So, by default, procedures created with mysql have no comments.


Views have no comment. No comments in metadata, no comments in code.

This prevents us to create self-documenting databases. Even if names are self-documenting, we may still need to add notes like “includes sold-out products”, or “very slow”.

Criticism makes us better

As a final note, let me say that this post is not an attack against MariaDB or MySQL. It is criticism, yes, because I like MariaDB (and MySQL). Criticism helps, keeps projects alive, encourages discussions.

To explain what I mean, I’ll show you a negative example. Recently I’ve attended a public talk from a LibreOffice Italia’s guy. It shown us a chart demonstrating that, according a generic “independent American study”, LibreOffice has no bug, while MS Office is full of bugs. The guy seems to think that software lint-like can automatically search for bugs. First I wondered how can LibreOffice survive with a community that is totally unable to produce criticism. Then I realized why it is the most buggy piece of software I’ve ever tried.

Hiding problems is the safest way to make those problems persist.

I’m happy that my favorite DBMS’s get the necessary amount of criticism.


Using apt-get update in a Dockerfile

Dockerfiles are the source code of Docker images. While they are amazingly simple, they hide some traps that the Dockerfile authors should be aware of.

Now, how does Docker build an image from a Dockerfile? It runs its statements sequentially and synchronously. For each statement, it starts a new container from the existing image, it runs the statement in the container, and creates a new image from the container.

You may think that this process is quite slow, and you are right. That’s why Docker has a cache – that means, doesn’t rebuild images that are already present. If the first statements in the file are cached, Docker skips them, until it find a statement that is not cached (because it was added or changed after the last build). From that point, it won’t use the old cache anymore.

That’s why statements adding metadata that are subject to changes (such as most LABELs) should be at the end of the Dockerfile: we don’t want Docker to rebuild all intermediary images just because we added a label.

But there are other consequences. Think about the following lines:

RUN apt-get update
RUN apt-get install apache2

Now consider the following line:

RUN apt-get update && apt-get install apache2

Apparently, they have the same effects. But what happens if you change the package list, adding an Apache module or replacing Apache with Nginx? In the first example, apt-get update is not executed again, in the second case it is. Usually we prefer the second option: periodically updating the packages is up to the user. Image maintainers should not rebuild their images each time a package version is released.

But there’s another case. Suppose that we don’t want to change the package list, but we need apt-get update to be executed again to fix some package security bug. We could build the Dockerfile with the --no-cache option. Well, it works, but it completely ignores the existing cache. That option is better used occasionally to address cache problems, not periodically to keep the system updated.

Another way is adding a cheap statement before apt-get update. We also want that the purpose of the new statements to be self-documenting. The almost-standard way is:

ENV UPDATED_ADD 2016-02-23
RUN apt-get update && apt-get install apache2

This sets an environment variable that can also be used to find out when the Dockerfile was last modified up to that line. The rest of the file could be modified months later, so you can set that variable again if necessary. Remember that it only represents the Dockerfile last changes: an image could still be built with --no-cache.


Reusing Prepared Statements in MariaDB 10.1

I may be wrong, but I think that MariaDB has a strange characteristic: it has many good features, some of which were not implemented intentionally. Well, this sentence is weird too, I know. But I have some examples: I’m not sure that the SEQUENCE engine was designed to generate a sequence of dates and/or times; but it can. And I believe that the CONNECT author had no idea that someone would have used his engine to create cursors for CALL statements; but I do.

Now I have a new example. MariaDB 10.1 supports compound statements out of stored procedures. Which means that you can write IF or WHILE in your install.sql files to create your databases in a dynamic way. This is done via the BEGIN NOT ATOMIC construct.

I played with this feature, like I usually do with new features. And what I’ve found out is amazing for me: BEGIN NOT ATOMIC allows us to nest prepared statements!

Uh, wait… maybe what I’ve just written sounds weird to you. Maybe you’re thinking: “prepared statements can be nested since they were first implemented!”. Which is only true in the documentation. The docs doesn’t lie of course, but it doesn’t work out there, in the real world’s complexity. PREPARE, EXECUTE and DEALLOCATE PREPARE statements cannot be prepared, and this limitation can be very frustrating if you try to write a reusable (perhaps public) stored procedure library.

I tried to explain the reason in this post, but it was becoming way too long, so I had to delete that boring explanation. I’ll just mention an example. You can write a procedure that prepares and executes a statement; but the prepared statement cannot call the procedure itself, recursively. Why? Because you cannot reuse its name, and annot dynamically generate a name for the new prepared statement. If this explanation is too short, just code and you’ll find out.

How can BEGIN NOT ATOMIC possibly fix this problem? Well, for a reason that’s far beyond me, the following blocks can be prepared and executed:


Now you may be thinking that this feature is totally useless. But it isn’t. Thanks to this change, I’ve written a procedure that:

  • Executes a specified SQL string.
  • Autogenerates a “free” name for that statement, or uses an id passed by the user.
  • Returns the autogenerated id, so you can reuse it, or deallocate the statement.

Writing this procedure has been a bit annoying, because after all it uses a dirty trick. But now the procedure is written, and the dirty trick is encapsulated in it. You can use it as if it was a native feature, and forget the trick. Here’s the code:


CREATE TABLE IF NOT EXISTS _.prepared_statement

    IF p_stmt_id IS NULL THEN
        INSERT INTO _.prepared_statement (id) VALUES (DEFAULT);
        SET p_stmt_id := LAST_INSERT_ID();
    END IF;
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC PREPARE stmt_dyn_', p_stmt_id, ' '
        , 'FROM ', QUOTE(p_sql), IF(RIGHT(p_sql, 1) = ';', ' ', '; ')
        , 'END;'
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC EXECUTE stmt_dyn_', p_stmt_id, '; END;'
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    DEALLOCATE PREPARE _stmt_exec;
    SET @_SQL_exec := NULL;

How do I use it? Very simple:

MariaDB [_]> -- redundant: @id is not set, so it's NULL
MariaDB [_]> SET @id := NULL;
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> CALL _.exec('SELECT 42 AS answer', @id);
| answer |
| 42 |
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> -- reuse @id
MariaDB [_]> CALL _.exec('SHOW SCHEMAS LIKE \'info%\'', @id);
| Database (info%) |
| information_schema |
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

I am writing a general purpose stored procedure library, that should make stored procedures developing more friendly. It will include this procedure, as well as a procedure for deallocating a specified statement, or all statements you prepared. As soon as the code is interesting and tested, I’ll make it public.


SQL Games

I have just created a GitHub repository called sql_games. It contains games implemented as stored procedures, that can run on MariaDB or (with some changes) on Percona Server or Oracle MySQL.

You play the games via the command-line client. You call a procedure to make your move, then a text or an ASCII image appears.

Of course the same call should produce a different effect, depending on the game’s current state. To remember the state I use both user variables and temporary tables.

Why did I do that? Mainly because it was funny. I can’t explain why. And I can’t tell that it is funny for anyone: perhaps, where I’ve found interesting challenges, someone else would find a cause a frustration. But yes, it has been funny for me.

Also, I did it because I could. This means that others can do it. Stored procedures are not a useless and nasty feature that users should ignore – they are useful tools. Yes, several times I’ve complained that they need important improvements. But I complain because I like them.

Currently, three games are implemented:

  • Anagram – The anagram game. See a randomly generated anagram and try to guess the word. You can choose a language, and a min/max word length.
  • Bulls And Cows – If you don’t know the game, take a look at the Wikipedia page.
  • Hangman –  Try to guess letters and see an ASCII hangman materializing slowly while you fail.

Each game will be installed in a different database. CALL help() in the proper database to see how to play.

Drop Primary Key if exists in MariaDB

MariaDB improved the use of IF EXISTS and IF NOT EXISTS clauses in two ways: they added them to almost all DDL statements (along with OR REPLACE), and added them to several ALTER TABLE sub-commands.

So one can run, for example:

ALTER TABLE nice_table DROP INDEX IF EXISTS ugly_index;
DROP INDEX IF EXISTS ugly_index ON nice_table;

However, there is no such thing as DROP IF EXISTS PRIMARY KEY. Why? I think that we all can easily guess: a primary key is too important to be unsure if it exists or not. That’s a reasonable assumption… in theory.

In practice, all limitations are, at least in rare cases, not so reasonable. Yes, today I had to write a procedure that drops the PK if it exists, knowing that sometimes it doesn’t. Explaining the logic behing this weird behavior requires space and time (that seem to be the same thing, if we trust modern physics). I’ll just say that I need to do complex normalization tasks – data from 1 table to 9 tables… really. In the process I need to accept duplicates, and later remove them.

So, can we easily drop/create a PK IF [NOT] EXISTS? Yes, because from a DDL’s point of view, a PK is just an index called `PRIMARY`. So:



MariaDB/MySQL: ON REPLACE triggers

There are several reasons why, generally, in MySQL/MariaDB one should not use REPLACE as a shortcut for SELECT + (UPDATE or INSERT). One of these reasons is the way REPLACE fires triggers. Of course it does not fire UPDATE triggers, and there is no such thing as a REPLACE trigger; DELETE and INSERT triggers are fired instead. What is not obvious is the order in which REPLACE activates triggers (UPDATE: this beavior was undocumented; I documented it here):

  2. BEFORE DELETE (if a row is being replaced);
  3. AFTER DELETE (if a row is being replaced);

Knowing this, we can create REPLACE triggers. Or, better said, we can create a Stored Procedure that is invoked before or after each row’s replacement. It is not exactly easy, but it can be done. Some user variables will help us in two ways:

  • They will tell each trigger which triggers have been fired before;
  • they will be used to acces NEW values in a DELETE trigger.

Now, take a look at the following example. The notes below will make some points clearer.

    ENGINE = InnoDB

CREATE PROCEDURE xmp_on_replace(IN old_a INT, IN new_a INT, IN old_b INT, IN new_b INT)
    SET @out := CONCAT(
        'In column b, we are replacing ', old_b, ' with ', new_b, '. Cool!'
END ||

    ON xmp
    SET @maybe_replacing := TRUE;
    SET @new_a := NEW.a;
    SET @new_b := NEW.b;
END ||

    ON xmp
    SET @maybe_replacing := FALSE;
END ||

    ON xmp
    IF @maybe_replacing THEN
        SET @maybe_replacing := FALSE;
        CALL xmp_on_replace(OLD.a, @new_a, OLD.b, @new_b);
        SET @new_a := NULL;
        SET @new_b := NULL;
    END IF;
END ||

When no row is being replaced, the workflow will be the following:

  1. The BEFORE INSERT triggers sets some user variables, that will only be useful if a row will be replaced.
  2. The AFTER INSERT trigger resets these variables.

If a row is being replaced, the following workflow will take place:

  1. The BEFORE INSERT trigger will set some variables.
  2. The BEFORE DELETE trigger will reset them and call xmp_on_replace(); both OLD.* and NEW.* values are passed, thanks to the user variables.
  3. xmp_on_replace() gives us some information about the row being replaced.

Now, let’s test this example:

MariaDB [test]> INSERT INTO xmp VALUES (2, 0);
Query OK, 1 row affected (0.01 sec)

MariaDB [test]> REPLACE xmp VALUES (1, 100), (2, 200), (3, 300);
Query OK, 4 rows affected (0.01 sec)
Records: 3 Duplicates: 1 Warnings: 0

MariaDB [test]> SELECT @out;
| @out                                            |
| In column b, we are replacing 0 with 200. Cool! |
1 row in set (0.00 sec)

Hey, it works as expected!


How much memory should we assign to MariaDB?

How much memory should we assign to MariaDB?

Before answering, I’ll show you a procedure I use to get the amount of memory I’m using. See the comments to make it work on Oracle MySQL.

CREATE PROCEDURE _.show_buffers()
	COMMENT 'Show information about buffers size'
	SET @memory_per_thread := (
			+ @@global.read_rnd_buffer_size
			+ @@global.sort_buffer_size
			+ @@global.thread_stack
			+ @@global.join_buffer_size
			+ @@global.binlog_cache_size
	SET @global_memory := (
			+ @@global.innodb_additional_mem_pool_size
			+ @@global.innodb_log_buffer_size
			+ (SELECT SUM(FULL_SIZE) FROM information_schema.KEY_CACHES) -- MySQL: @@global.key_buffer_size
			+ @@global.aria_pagecache_buffer_size -- only MariaDB
			+ @@global.query_cache_size
		  @global_memory AS global_memory
		, @memory_per_thread AS memory_per_thread
		, @@global.max_connections AS max_connections
		, @memory_per_thread * @@global.max_connections AS max_session_memory
		, CAST((@global_memory + @memory_per_thread * @@global.max_connections) AS UNSIGNED) AS max_total_memory
	SET @memory_per_thread := NULL;
	SET @global_memory := NULL;

Results on my dev machine:

MariaDB [(none)]> CALL _.show_buffers() \G
*************************** 1. row ***************************
     global_memory: 419430400
 memory_per_thread: 2949120
   max_connections: 151
max_session_memory: 445317120
  max_total_memory: 864747520
1 row in set (0.00 sec)

So, what do these columns mean?

  • global_memory: Memory used by global buffers.
  • memory_per_thread: Memory used by each thread (session buffers).
  • max_connections: Max allowed concurrent connectins (max_connections variable).
  • max_session_memory: Memory used by each thread, multiplied by the max allowed concurrent connections.
  • max_total_memory: Total memory occupied by MariaDB with the maximum of concurrent connections.

Let’s see which settings affect these values.

global_memory = innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + key_buffer_size + aria_pagecache_buffer_size + query_cache_size.

memory_per_thread = read_buffer_size + read_rnd_buffer_size + sort_buffer_size + thread_stack + join_buffer_size + binlog_cache_size.

If you use non-common storage engines, you may want to add more settings to these formulas. For example, if you use TokuDB, you should add tokudb_cache_size.

So, basically, my advice is: Tune these values so that max_total_memory reaches 80% your memory. Start by setting query_cache_size and unused storage engines caches to 0, then tune the buffer pool. Usually you don’t want to touch session buffers.

Why 80%? Well, an higher percentage could be ok, but I consider 80% safer for the general case. Because:

  • The system needs memory, too.
  • InnoDB buffer pool slightly exceeds the limit you set.
  • Some operations (like building a big index or repairing a non-transactional table) require memory.
  • Some connections could resize their session buffers.
  • There are more buffers, that we didn’t consider here.
  • You don’t want MariaDB/MySQL to swap. But if you turn off the swap and you run out of memory, you could experience various failures, or even crashes.