MySQL/MariaDB cursors and temp tables

In MariaDB and MySQL, cursors create a temporary table.

Does this statement deserve a whole blog post? Apparently not. However, in some cases one does not expect a temporary table to be created:

  • SELECT ... FOR UPDATE: An exclusive lock is created, yes, but you still read data from a temporary table.
  • SELECT FROM a temporary table: you are reading from a temporary tables, yes, but an internal temporary table is created anyway.
  • Impossible WHERE and LIMIT 0.

A quick example:

CREATE TEMPORARY TABLE t
(
        id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY
);

CREATE PROCEDURE p()
BEGIN
        DECLARE c CURSOR FOR
                SELECT id FROM t WHERE 0 LIMIT 0 FOR UPDATE;
        OPEN c;
        CLOSE c;
END;

MySQL [test]> SHOW STATUS LIKE 'Created_tmp_tables';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Created_tmp_tables | 31    |
+--------------------+-------+
1 row in set (0.00 sec)

MySQL [test]> CALL p();
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> SHOW STATUS LIKE 'Created_tmp_tables';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| Created_tmp_tables | 32    |
+--------------------+-------+
1 row in set (0.00 sec)

I am not complaining, and I don’t even know if this behavior can be changed. But one should certainly be aware of this behavior. For example, one could think that creating a temporary table one time and then loop on that table with cursors several times is an optimization – but that’s not the case.

Federico

MaxScale binaries are now available

UPDATE: Short story: ignore the next update and read the post. Long story: the original post was a mistake, as explained in the next update. But then, MariaDB released free MaxScale binaries and everything I wrote in the post is now correct.

UPDATE 2016-04-14: It seems that I was mistaken. MaxScale download page is a bit different from MariaDB Enterprise page, and does not explicitly require us to accept terms of use before download. But we accept those terms while creating an account.

So, MaxScale binaries cannot be used in production without paying for MariaDB enterprise. Thanks to the persons who commented this post and pointed my mistake. My apologies to my readers.

I won’t delete this post because I don’t want the comments to disappear, as they express opinions of some community members.

My jestarday’s post Comments on MaxScale binaries followed up a post from Percona’s blog. It had much more visits than any other post I wrote before. It was linked by Peter Zaitsev and Oli Senhauser on social networks. No, this is not a self-advertisement, I’m just saying that the problem I’ve talked about is considered important by the community.

Today, MaxScale binaries are available! Not because of me (obviously), but because MariaDB must have found out that the community badly wants those binaries.

MaxScale 1.4.1 was released today, and it is available from the Database Downloads page on MariaDB.com. You can click on MaxScale and then you can select the version (1.4.1, 1.3.0, 1.2.1) and the system (Debian, Ubuntu, RHEL/CentOS, SLES, both available as deb/rpm or tarball). Registration is required for download, but this is acceptable, as long as the binaries are freely available.

There are no restrictive terms of use. Here is how the copyright note starts:

This source code is distributed as part of MariaDB Corporation MaxScale. It is free
software: you can redistribute it and/or modify it under the terms of the
GNU General Public License as published by the Free Software Foundation,
version 2.

The only problem is the lack for a repository – but now that the binaries are freely available, I expect most Linux distros to provide their packages.

I downloaded the Ubuntu 14.04 version on my Mint machine, and everything worked as expected:

fede-mint-0 ~ # dpkg -i /home/federico/Downloads/maxscale-1.4.1-1.ubuntu_trusty.x86_64.deb 
Selecting previously unselected package maxscale.
(Reading database ... 184280 files and directories currently installed.)
Preparing to unpack .../maxscale-1.4.1-1.ubuntu_trusty.x86_64.deb ...
Unpacking maxscale (1.4.1) ...
Setting up maxscale (1.4.1) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Processing triggers for libc-bin (2.19-0ubuntu6.7) ...
fede-mint-0 ~ # maxadmin -uadmin -pmariadb
MaxScale> show servers
Server 0x1b7a310 (server1)
	Server:                              127.0.0.1
	Status:                              Auth Error, Down
	Protocol:                    MySQLBackend
	Port:                                3306
	Node Id:                     -1
	Master Id:                   -1
	Slave Ids:                   
	Repl Depth:                  -1
	Number of connections:               0
	Current no. of conns:                0
	Current no. of operations:   0

So, thanks MariaDB! I love software projects that listen to their community needs. This should be a lesson for another company – we all know who I am talking about.

Federico

Comments on MaxScale binaries

I’m writing this post after reading Downloading MariaDB MaxScale binaries, from Percona’s MySQL Performance Blog.

I was already aware about the problem: MaxScale is open source, but the binaries are not free. You can download them and use them for testing, but if you want to use them in production, you’ll need to buy MariaDB Enterprise.

Note that MaxScale Docker images seem to have the same problem. I’ve tried some of them, but all those I’ve tried were running MariaDB Enterprise binaries, so using them in production is illegal (unless you pay).

The alternative is… compiling MaxScale. I had problems in doing so and couldn’t solve those problems myself. From the MaxScale Google Group, I see that Justin Swanhart had the same problems… so I don’t feel particularly stupid for that.

After some questions on that group, 2 posts were published on MariaDB blog:

When I tried them, the first one worked for me, while the former didn’t.

But in any case, even if you are able to compile MaxScale on your Linux/BSD of choice, updates will be a problem. You will need to compile all next releases of MaxScale, which is simply not a viable solution for many companies.

This prevents MaxScale from being as widely used as it could be. The lack of free binaries is a problem. I understand their commercial choice – it is legit, but I don’t agree. First, because open source shouldn’t work in this way. Second, because the lack of free binaries could bring some customers to them, but… most probably, it simply prevents lots of people from trying MaxScale at all.

This would be negative for any software project, in my opinion. But I think that it is particularly negative for MaxScale. Why? Because it is amazingly versatile and writing a new module is amazingly simple. This combination of characteristics would make it greatly benefit from being widely adopted – people could write new modules and distribute them.

Please, MariaDB, reconsider this choice.

Federico

MariaDB/MySQL missing features: View comments

The COMMENT clause

All database components (and database themselves) should have a COMMENT clause in their CREATE/ALTER statements, and a *_COMMENT column in the information_schema. For example:

CREATE PROCEDURE do_nothing()
        COMMENT 'We''re lazy. Let''s do nothing!'
BEGIN
        DO NULL;
END;
SELECT ROUTINE_COMMENT FROM information_schema.ROUTINES;

In fact most database objects have those clauses in MySQL/MariaDB, but not all. Views are an exception.

Comments in code

MariaDB and MySQL have multiple syntaxes for comments. Including executable comments (commented code that is only executed on some MySQL/MariaDB versions).

One can use comments in stored procedures and triggers, and those codes are preserved:

CREATE PROCEDURE do_nothing()
BEGIN
        -- We're lazy. Let's do nothing!
        DO NULL;
END;

But, there are a couple problems:

  • This doesn’t work for views.
  • mysql client strips comments away, unless it’s started with --comments parameter. So, by default, procedures created with mysql have no comments.

So…

Views have no comment. No comments in metadata, no comments in code.

This prevents us to create self-documenting databases. Even if names are self-documenting, we may still need to add notes like “includes sold-out products”, or “very slow”.

Criticism makes us better

As a final note, let me say that this post is not an attack against MariaDB or MySQL. It is criticism, yes, because I like MariaDB (and MySQL). Criticism helps, keeps projects alive, encourages discussions.

To explain what I mean, I’ll show you a negative example. Recently I’ve attended a public talk from a LibreOffice Italia’s guy. It shown us a chart demonstrating that, according a generic “independent American study”, LibreOffice has no bug, while MS Office is full of bugs. The guy seems to think that software lint-like can automatically search for bugs. First I wondered how can LibreOffice survive with a community that is totally unable to produce criticism. Then I realized why it is the most buggy piece of software I’ve ever tried.

Hiding problems is the safest way to make those problems persist.

I’m happy that my favorite DBMS’s get the necessary amount of criticism.

Federico

Using apt-get update in a Dockerfile

Dockerfiles are the source code of Docker images. While they are amazingly simple, they hide some traps that the Dockerfile authors should be aware of.

Now, how does Docker build an image from a Dockerfile? It runs its statements sequentially and synchronously. For each statement, it starts a new container from the existing image, it runs the statement in the container, and creates a new image from the container.

You may think that this process is quite slow, and you are right. That’s why Docker has a cache – that means, doesn’t rebuild images that are already present. If the first statements in the file are cached, Docker skips them, until it find a statement that is not cached (because it was added or changed after the last build). From that point, it won’t use the old cache anymore.

That’s why statements adding metadata that are subject to changes (such as most LABELs) should be at the end of the Dockerfile: we don’t want Docker to rebuild all intermediary images just because we added a label.

But there are other consequences. Think about the following lines:

RUN apt-get update
RUN apt-get install apache2

Now consider the following line:

RUN apt-get update && apt-get install apache2

Apparently, they have the same effects. But what happens if you change the package list, adding an Apache module or replacing Apache with Nginx? In the first example, apt-get update is not executed again, in the second case it is. Usually we prefer the second option: periodically updating the packages is up to the user. Image maintainers should not rebuild their images each time a package version is released.

But there’s another case. Suppose that we don’t want to change the package list, but we need apt-get update to be executed again to fix some package security bug. We could build the Dockerfile with the --no-cache option. Well, it works, but it completely ignores the existing cache. That option is better used occasionally to address cache problems, not periodically to keep the system updated.

Another way is adding a cheap statement before apt-get update. We also want that the purpose of the new statements to be self-documenting. The almost-standard way is:

ENV UPDATED_ADD 2016-02-23
RUN apt-get update && apt-get install apache2

This sets an environment variable that can also be used to find out when the Dockerfile was last modified up to that line. The rest of the file could be modified months later, so you can set that variable again if necessary. Remember that it only represents the Dockerfile last changes: an image could still be built with --no-cache.

Enjoy!

Reusing Prepared Statements in MariaDB 10.1

I may be wrong, but I think that MariaDB has a strange characteristic: it has many good features, some of which were not implemented intentionally. Well, this sentence is weird too, I know. But I have some examples: I’m not sure that the SEQUENCE engine was designed to generate a sequence of dates and/or times; but it can. And I believe that the CONNECT author had no idea that someone would have used his engine to create cursors for CALL statements; but I do.

Now I have a new example. MariaDB 10.1 supports compound statements out of stored procedures. Which means that you can write IF or WHILE in your install.sql files to create your databases in a dynamic way. This is done via the BEGIN NOT ATOMIC construct.

I played with this feature, like I usually do with new features. And what I’ve found out is amazing for me: BEGIN NOT ATOMIC allows us to nest prepared statements!

Uh, wait… maybe what I’ve just written sounds weird to you. Maybe you’re thinking: “prepared statements can be nested since they were first implemented!”. Which is only true in the documentation. The docs doesn’t lie of course, but it doesn’t work out there, in the real world’s complexity. PREPARE, EXECUTE and DEALLOCATE PREPARE statements cannot be prepared, and this limitation can be very frustrating if you try to write a reusable (perhaps public) stored procedure library.

I tried to explain the reason in this post, but it was becoming way too long, so I had to delete that boring explanation. I’ll just mention an example. You can write a procedure that prepares and executes a statement; but the prepared statement cannot call the procedure itself, recursively. Why? Because you cannot reuse its name, and annot dynamically generate a name for the new prepared statement. If this explanation is too short, just code and you’ll find out.

How can BEGIN NOT ATOMIC possibly fix this problem? Well, for a reason that’s far beyond me, the following blocks can be prepared and executed:

  • BEGIN NOT ATOMIC PREPARE ... ; END;
  • BEGIN NOT ATOMIC EXECUTE ... ; END;
  • BEGIN NOT ATOMIC DEALLOCATE PREPARE ... ; END;

Now you may be thinking that this feature is totally useless. But it isn’t. Thanks to this change, I’ve written a procedure that:

  • Executes a specified SQL string.
  • Autogenerates a “free” name for that statement, or uses an id passed by the user.
  • Returns the autogenerated id, so you can reuse it, or deallocate the statement.

Writing this procedure has been a bit annoying, because after all it uses a dirty trick. But now the procedure is written, and the dirty trick is encapsulated in it. You can use it as if it was a native feature, and forget the trick. Here’s the code:

CREATE DATABASE IF NOT EXISTS _;

CREATE TABLE IF NOT EXISTS _.prepared_statement
(
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
)
    ENGINE = MEMORY
;

CREATE PROCEDURE _.exec(IN p_sql TEXT, INOUT p_stmt_id INTEGER UNSIGNED)
BEGIN
    IF p_stmt_id IS NULL THEN
        INSERT INTO _.prepared_statement (id) VALUES (DEFAULT);
        SET p_stmt_id := LAST_INSERT_ID();
    END IF;
    
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC PREPARE stmt_dyn_', p_stmt_id, ' '
        , 'FROM ', QUOTE(p_sql), IF(RIGHT(p_sql, 1) = ';', ' ', '; ')
        , 'END;'
    );
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC EXECUTE stmt_dyn_', p_stmt_id, '; END;'
    );
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    DEALLOCATE PREPARE _stmt_exec;
    SET @_SQL_exec := NULL;
END;

How do I use it? Very simple:

MariaDB [_]> -- redundant: @id is not set, so it's NULL
MariaDB [_]> SET @id := NULL;
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> CALL _.exec('SELECT 42 AS answer', @id);
+--------+
| answer |
+--------+
| 42 |
+--------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> -- reuse @id
MariaDB [_]> CALL _.exec('SHOW SCHEMAS LIKE \'info%\'', @id);
+--------------------+
| Database (info%) |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

I am writing a general purpose stored procedure library, that should make stored procedures developing more friendly. It will include this procedure, as well as a procedure for deallocating a specified statement, or all statements you prepared. As soon as the code is interesting and tested, I’ll make it public.

Enjoy!
Federico

SQL Games

I have just created a GitHub repository called sql_games. It contains games implemented as stored procedures, that can run on MariaDB or (with some changes) on Percona Server or Oracle MySQL.

You play the games via the command-line client. You call a procedure to make your move, then a text or an ASCII image appears.

Of course the same call should produce a different effect, depending on the game’s current state. To remember the state I use both user variables and temporary tables.

Why did I do that? Mainly because it was funny. I can’t explain why. And I can’t tell that it is funny for anyone: perhaps, where I’ve found interesting challenges, someone else would find a cause a frustration. But yes, it has been funny for me.

Also, I did it because I could. This means that others can do it. Stored procedures are not a useless and nasty feature that users should ignore – they are useful tools. Yes, several times I’ve complained that they need important improvements. But I complain because I like them.

Currently, three games are implemented:

  • Anagram – The anagram game. See a randomly generated anagram and try to guess the word. You can choose a language, and a min/max word length.
  • Bulls And Cows – If you don’t know the game, take a look at the Wikipedia page.
  • Hangman –  Try to guess letters and see an ASCII hangman materializing slowly while you fail.

Each game will be installed in a different database. CALL help() in the proper database to see how to play.