Comments on MaxScale binaries

I’m writing this post after reading Downloading MariaDB MaxScale binaries, from Percona’s MySQL Performance Blog.

I was already aware about the problem: MaxScale is open source, but the binaries are not free. You can download them and use them for testing, but if you want to use them in production, you’ll need to buy MariaDB Enterprise.

Note that MaxScale Docker images seem to have the same problem. I’ve tried some of them, but all those I’ve tried were running MariaDB Enterprise binaries, so using them in production is illegal (unless you pay).

The alternative is… compiling MaxScale. I had problems in doing so and couldn’t solve those problems myself. From the MaxScale Google Group, I see that Justin Swanhart had the same problems… so I don’t feel particularly stupid for that.

After some questions on that group, 2 posts were published on MariaDB blog:

When I tried them, the first one worked for me, while the former didn’t.

But in any case, even if you are able to compile MaxScale on your Linux/BSD of choice, updates will be a problem. You will need to compile all next releases of MaxScale, which is simply not a viable solution for many companies.

This prevents MaxScale from being as widely used as it could be. The lack of free binaries is a problem. I understand their commercial choice – it is legit, but I don’t agree. First, because open source shouldn’t work in this way. Second, because the lack of free binaries could bring some customers to them, but… most probably, it simply prevents lots of people from trying MaxScale at all.

This would be negative for any software project, in my opinion. But I think that it is particularly negative for MaxScale. Why? Because it is amazingly versatile and writing a new module is amazingly simple. This combination of characteristics would make it greatly benefit from being widely adopted – people could write new modules and distribute them.

Please, MariaDB, reconsider this choice.

Federico

MariaDB/MySQL missing features: View comments

The COMMENT clause

All database components (and database themselves) should have a COMMENT clause in their CREATE/ALTER statements, and a *_COMMENT column in the information_schema. For example:

CREATE PROCEDURE do_nothing()
        COMMENT 'We''re lazy. Let''s do nothing!'
BEGIN
        DO NULL;
END;
SELECT ROUTINE_COMMENT FROM information_schema.ROUTINES;

In fact most database objects have those clauses in MySQL/MariaDB, but not all. Views are an exception.

Comments in code

MariaDB and MySQL have multiple syntaxes for comments. Including executable comments (commented code that is only executed on some MySQL/MariaDB versions).

One can use comments in stored procedures and triggers, and those codes are preserved:

CREATE PROCEDURE do_nothing()
BEGIN
        -- We're lazy. Let's do nothing!
        DO NULL;
END;

But, there are a couple problems:

  • This doesn’t work for views.
  • mysql client strips comments away, unless it’s started with --comments parameter. So, by default, procedures created with mysql have no comments.

So…

Views have no comment. No comments in metadata, no comments in code.

This prevents us to create self-documenting databases. Even if names are self-documenting, we may still need to add notes like “includes sold-out products”, or “very slow”.

Criticism makes us better

As a final note, let me say that this post is not an attack against MariaDB or MySQL. It is criticism, yes, because I like MariaDB (and MySQL). Criticism helps, keeps projects alive, encourages discussions.

To explain what I mean, I’ll show you a negative example. Recently I’ve attended a public talk from a LibreOffice Italia’s guy. It shown us a chart demonstrating that, according a generic “independent American study”, LibreOffice has no bug, while MS Office is full of bugs. The guy seems to think that software lint-like can automatically search for bugs. First I wondered how can LibreOffice survive with a community that is totally unable to produce criticism. Then I realized why it is the most buggy piece of software I’ve ever tried.

Hiding problems is the safest way to make those problems persist.

I’m happy that my favorite DBMS’s get the necessary amount of criticism.

Federico

Using apt-get update in a Dockerfile

Dockerfiles are the source code of Docker images. While they are amazingly simple, they hide some traps that the Dockerfile authors should be aware of.

Now, how does Docker build an image from a Dockerfile? It runs its statements sequentially and synchronously. For each statement, it starts a new container from the existing image, it runs the statement in the container, and creates a new image from the container.

You may think that this process is quite slow, and you are right. That’s why Docker has a cache – that means, doesn’t rebuild images that are already present. If the first statements in the file are cached, Docker skips them, until it find a statement that is not cached (because it was added or changed after the last build). From that point, it won’t use the old cache anymore.

That’s why statements adding metadata that are subject to changes (such as most LABELs) should be at the end of the Dockerfile: we don’t want Docker to rebuild all intermediary images just because we added a label.

But there are other consequences. Think about the following lines:

RUN apt-get update
RUN apt-get install apache2

Now consider the following line:

RUN apt-get update && apt-get install apache2

Apparently, they have the same effects. But what happens if you change the package list, adding an Apache module or replacing Apache with Nginx? In the first example, apt-get update is not executed again, in the second case it is. Usually we prefer the second option: periodically updating the packages is up to the user. Image maintainers should not rebuild their images each time a package version is released.

But there’s another case. Suppose that we don’t want to change the package list, but we need apt-get update to be executed again to fix some package security bug. We could build the Dockerfile with the --no-cache option. Well, it works, but it completely ignores the existing cache. That option is better used occasionally to address cache problems, not periodically to keep the system updated.

Another way is adding a cheap statement before apt-get update. We also want that the purpose of the new statements to be self-documenting. The almost-standard way is:

ENV UPDATED_ADD 2016-02-23
RUN apt-get update && apt-get install apache2

This sets an environment variable that can also be used to find out when the Dockerfile was last modified up to that line. The rest of the file could be modified months later, so you can set that variable again if necessary. Remember that it only represents the Dockerfile last changes: an image could still be built with --no-cache.

Enjoy!

Reusing Prepared Statements in MariaDB 10.1

I may be wrong, but I think that MariaDB has a strange characteristic: it has many good features, some of which were not implemented intentionally. Well, this sentence is weird too, I know. But I have some examples: I’m not sure that the SEQUENCE engine was designed to generate a sequence of dates and/or times; but it can. And I believe that the CONNECT author had no idea that someone would have used his engine to create cursors for CALL statements; but I do.

Now I have a new example. MariaDB 10.1 supports compound statements out of stored procedures. Which means that you can write IF or WHILE in your install.sql files to create your databases in a dynamic way. This is done via the BEGIN NOT ATOMIC construct.

I played with this feature, like I usually do with new features. And what I’ve found out is amazing for me: BEGIN NOT ATOMIC allows us to nest prepared statements!

Uh, wait… maybe what I’ve just written sounds weird to you. Maybe you’re thinking: “prepared statements can be nested since they were first implemented!”. Which is only true in the documentation. The docs doesn’t lie of course, but it doesn’t work out there, in the real world’s complexity. PREPARE, EXECUTE and DEALLOCATE PREPARE statements cannot be prepared, and this limitation can be very frustrating if you try to write a reusable (perhaps public) stored procedure library.

I tried to explain the reason in this post, but it was becoming way too long, so I had to delete that boring explanation. I’ll just mention an example. You can write a procedure that prepares and executes a statement; but the prepared statement cannot call the procedure itself, recursively. Why? Because you cannot reuse its name, and annot dynamically generate a name for the new prepared statement. If this explanation is too short, just code and you’ll find out.

How can BEGIN NOT ATOMIC possibly fix this problem? Well, for a reason that’s far beyond me, the following blocks can be prepared and executed:

  • BEGIN NOT ATOMIC PREPARE ... ; END;
  • BEGIN NOT ATOMIC EXECUTE ... ; END;
  • BEGIN NOT ATOMIC DEALLOCATE PREPARE ... ; END;

Now you may be thinking that this feature is totally useless. But it isn’t. Thanks to this change, I’ve written a procedure that:

  • Executes a specified SQL string.
  • Autogenerates a “free” name for that statement, or uses an id passed by the user.
  • Returns the autogenerated id, so you can reuse it, or deallocate the statement.

Writing this procedure has been a bit annoying, because after all it uses a dirty trick. But now the procedure is written, and the dirty trick is encapsulated in it. You can use it as if it was a native feature, and forget the trick. Here’s the code:

CREATE DATABASE IF NOT EXISTS _;

CREATE TABLE IF NOT EXISTS _.prepared_statement
(
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
)
    ENGINE = MEMORY
;

CREATE PROCEDURE _.exec(IN p_sql TEXT, INOUT p_stmt_id INTEGER UNSIGNED)
BEGIN
    IF p_stmt_id IS NULL THEN
        INSERT INTO _.prepared_statement (id) VALUES (DEFAULT);
        SET p_stmt_id := LAST_INSERT_ID();
    END IF;
    
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC PREPARE stmt_dyn_', p_stmt_id, ' '
        , 'FROM ', QUOTE(p_sql), IF(RIGHT(p_sql, 1) = ';', ' ', '; ')
        , 'END;'
    );
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    
    SET @_SQL_exec := CONCAT(
        'BEGIN NOT ATOMIC EXECUTE stmt_dyn_', p_stmt_id, '; END;'
    );
    PREPARE _stmt_exec FROM @_SQL_exec;
    EXECUTE _stmt_exec;
    DEALLOCATE PREPARE _stmt_exec;
    SET @_SQL_exec := NULL;
END;

How do I use it? Very simple:

MariaDB [_]> -- redundant: @id is not set, so it's NULL
MariaDB [_]> SET @id := NULL;
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> CALL _.exec('SELECT 42 AS answer', @id);
+--------+
| answer |
+--------+
| 42 |
+--------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> -- reuse @id
MariaDB [_]> CALL _.exec('SHOW SCHEMAS LIKE \'info%\'', @id);
+--------------------+
| Database (info%) |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

I am writing a general purpose stored procedure library, that should make stored procedures developing more friendly. It will include this procedure, as well as a procedure for deallocating a specified statement, or all statements you prepared. As soon as the code is interesting and tested, I’ll make it public.

Enjoy!
Federico

SQL Games

I have just created a GitHub repository called sql_games. It contains games implemented as stored procedures, that can run on MariaDB or (with some changes) on Percona Server or Oracle MySQL.

You play the games via the command-line client. You call a procedure to make your move, then a text or an ASCII image appears.

Of course the same call should produce a different effect, depending on the game’s current state. To remember the state I use both user variables and temporary tables.

Why did I do that? Mainly because it was funny. I can’t explain why. And I can’t tell that it is funny for anyone: perhaps, where I’ve found interesting challenges, someone else would find a cause a frustration. But yes, it has been funny for me.

Also, I did it because I could. This means that others can do it. Stored procedures are not a useless and nasty feature that users should ignore – they are useful tools. Yes, several times I’ve complained that they need important improvements. But I complain because I like them.

Currently, three games are implemented:

  • Anagram – The anagram game. See a randomly generated anagram and try to guess the word. You can choose a language, and a min/max word length.
  • Bulls And Cows – If you don’t know the game, take a look at the Wikipedia page.
  • Hangman –  Try to guess letters and see an ASCII hangman materializing slowly while you fail.

Each game will be installed in a different database. CALL help() in the proper database to see how to play.

Drop Primary Key if exists in MariaDB

MariaDB improved the use of IF EXISTS and IF NOT EXISTS clauses in two ways: they added them to almost all DDL statements (along with OR REPLACE), and added them to several ALTER TABLE sub-commands.

So one can run, for example:

ALTER TABLE nice_table DROP INDEX IF EXISTS ugly_index;
DROP INDEX IF EXISTS ugly_index ON nice_table;

However, there is no such thing as DROP IF EXISTS PRIMARY KEY. Why? I think that we all can easily guess: a primary key is too important to be unsure if it exists or not. That’s a reasonable assumption… in theory.

In practice, all limitations are, at least in rare cases, not so reasonable. Yes, today I had to write a procedure that drops the PK if it exists, knowing that sometimes it doesn’t. Explaining the logic behing this weird behavior requires space and time (that seem to be the same thing, if we trust modern physics). I’ll just say that I need to do complex normalization tasks – data from 1 table to 9 tables… really. In the process I need to accept duplicates, and later remove them.

So, can we easily drop/create a PK IF [NOT] EXISTS? Yes, because from a DDL’s point of view, a PK is just an index called `PRIMARY`. So:

DROP INDEX IF NOT EXISTS `PRIMARY`;

Enjoy!
Federico

MariaDB/MySQL: ON REPLACE triggers

There are several reasons why, generally, in MySQL/MariaDB one should not use REPLACE as a shortcut for SELECT + (UPDATE or INSERT). One of these reasons is the way REPLACE fires triggers. Of course it does not fire UPDATE triggers, and there is no such thing as a REPLACE trigger; DELETE and INSERT triggers are fired instead. What is not obvious is the order in which REPLACE activates triggers (UPDATE: this beavior was undocumented; I documented it here):

  1. BEFORE INSERT;
  2. BEFORE DELETE (if a row is being replaced);
  3. AFTER DELETE (if a row is being replaced);
  4. AFTER INSERT.

Knowing this, we can create REPLACE triggers. Or, better said, we can create a Stored Procedure that is invoked before or after each row’s replacement. It is not exactly easy, but it can be done. Some user variables will help us in two ways:

  • They will tell each trigger which triggers have been fired before;
  • they will be used to acces NEW values in a DELETE trigger.

Now, take a look at the following example. The notes below will make some points clearer.

CREATE OR REPLACE TABLE xmp
(
    a INT NOT NULL PRIMARY KEY,
    b INT NOT NULL
)
    ENGINE = InnoDB
;

DELIMITER ||
CREATE PROCEDURE xmp_on_replace(IN old_a INT, IN new_a INT, IN old_b INT, IN new_b INT)
    CONTAINS SQL
BEGIN
    SET @out := CONCAT(
        'In column b, we are replacing ', old_b, ' with ', new_b, '. Cool!'
    );
END ||

CREATE TRIGGER xmp_bi
    BEFORE INSERT
    ON xmp
    FOR EACH ROW
BEGIN
    SET @maybe_replacing := TRUE;
    SET @new_a := NEW.a;
    SET @new_b := NEW.b;
END ||

CREATE TRIGGER xmp_ai
    AFTER INSERT
    ON xmp
    FOR EACH ROW
BEGIN
    SET @maybe_replacing := FALSE;
END ||

CREATE TRIGGER xmp_bd
    BEFORE DELETE
    ON xmp
    FOR EACH ROW
BEGIN
    IF @maybe_replacing THEN
        SET @maybe_replacing := FALSE;
        CALL xmp_on_replace(OLD.a, @new_a, OLD.b, @new_b);
        SET @new_a := NULL;
        SET @new_b := NULL;
    END IF;
END ||
DELIMITER ;

When no row is being replaced, the workflow will be the following:

  1. The BEFORE INSERT triggers sets some user variables, that will only be useful if a row will be replaced.
  2. The AFTER INSERT trigger resets these variables.

If a row is being replaced, the following workflow will take place:

  1. The BEFORE INSERT trigger will set some variables.
  2. The BEFORE DELETE trigger will reset them and call xmp_on_replace(); both OLD.* and NEW.* values are passed, thanks to the user variables.
  3. xmp_on_replace() gives us some information about the row being replaced.

Now, let’s test this example:

MariaDB [test]> INSERT INTO xmp VALUES (2, 0);
Query OK, 1 row affected (0.01 sec)

MariaDB [test]> REPLACE xmp VALUES (1, 100), (2, 200), (3, 300);
Query OK, 4 rows affected (0.01 sec)
Records: 3 Duplicates: 1 Warnings: 0

MariaDB [test]> SELECT @out;
+-------------------------------------------------+
| @out                                            |
+-------------------------------------------------+
| In column b, we are replacing 0 with 200. Cool! |
+-------------------------------------------------+
1 row in set (0.00 sec)

Hey, it works as expected!

Enjoy!
Federico