MariaDB storage engines

This is a list of MariaDB storage engines that are not distributed with MySQL. I think that most of them will work with MySQL, but not all – at least CassandraSE doesn’t.

Engine Description Introduced
XtraDB A fully-compatible fork of InnoDB, mantained by Percona Big Bang
Aria A crash-safe MyISAM, also used for internal temptables Big Bang
TokuDB A transactional engine with innovative buffers and high compression, by TokuTek 10.0
mroonga Supports fulltext searches with Chinese, Japanese and Korean languages 10.0
SPIDER Shards tables through multiple servers 10.0
OQGRAPH Used to implement tree structures 10.0
Sequence Returns a sequence of numbers as a resultset 10.0
CONNECT Support several external data sources (data files, DBMS’s…) 10.0
CassandraSE A bridge to Apache Cassandra 10.0
SphinxSE A bridge to Sphinx 5.2
FederatedX A richer fork of Federated Big Bang

Notes:

  • XtraDB is the default and should be preferred to InnoDB, unless it has a bug which affects you (but XtraDB fixes several InnoDB bugs).
  • OQGRAPH was introduced during the Big Bang or soon after, but disabled in 5.5 and reintroduced in 10.0. See this post.
  • FederatedX is called Federated, so it’s possible that you are already using it and you don’t know.
  • CONNECT will hopefully obsolete CSV and FederatedX, but some features are still missing. For example, CONNECT cannot store NULL values in flat files.
  • With MariaDB you need to specify SQL_MODE='NO_ENGINE_SUBSTITUTION'. If you mistype the engine name, or try to use a disabled engine, you want to get an error, rather than silently switch to XtraDB.

Obsolete storage engines:

PBXT (Big Bang – 5.3) – A transactional engine which aimed to be “somewhere between InnoDB and MyISAM”. Seemed to support foreign keys better than InnoDB (I couldn’t reproduce a tirgger+FK bug and it had more options), but alas it’s dead.

Enjoy!

Advertisements

MariaDB 10: Performing fast & inaccurate statistics

As I already wrote, COUNT(*) can be expensive in InnoDB, because this value is not stored in indexes. Some other storage engines may not store that value. Also, COUNT(), MAX(), MIN() are immediate only if executed on an indexed column. But we don’t want too many indexes, and some exotic storage engines do not support indexes – thus, we may need to execute an aggregate function on a non-indexed column.

Now, the question is: do you really need such values to be exact? If you need to report statistical values from your database, probably an error is acceptable. In this case, MariaDB 10 provides a solution: engine-independent statistics.

Note that using this feature may have a cost. Updating statistics for a table means that the server does a full table scan. That is probably why engine-independent statistics are disabled by default. But if you are already use them, or if you think you can benefit from them, you can use the statistics to retrieve the results of some aggregate queries, in a way which will not be always accurate.

Here is how to translate the queries.

COUNT():

SELECT `cardinality` FROM `mysql`.`table_stats` WHERE db_name='test' AND table_name='t';

MIN() and MAX():

SELECT `min_value`, `max_value` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a';

AVG(LENGTH()):

SELECT `avg_length` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a';

COUNT(*) WHERE IS NULL:

SELECT COUNT(*) / (SELECT `nulls_ratio` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a') FROM t2;

Note that this could be a division by 0, so you should handle NULL as a result.

MariaDB: DELETE … RETURNING

MariaDB 10.0.5 supports DELETE ... RETURNING, like PostgreSQL. It deletes some rows and returns the specified columns (or, more generally, the specified SELECT expressions) from the deleted rows. As for regular DELETEs, Com_delete is incremented and Com_select is not.

Let’s see what this feature does and what it does not.

You cannot:

  • Cannot use it with INSERT or CREATE TABLE to create a table containing data from the deleted rows (a sort of delete log, or easy-to-restore backup).
  • Cannot use it as a subquery.
  • Cannot use it as a cursor, and there is no INTO. So you cannot process the results in a stored routine, because there is no way to copy deleted data into variables.
  • Cannot use aggregating functions. But why should you? If the purpose is to get a row count, you can use SELECT ROW_COUNT().

You can:

  • Include a subquery, virtual column, stored function, or anything else in the RETURNING clause.
  • RETURNING * works, AS works.
  • Use it as a prepared statement.
  • Use it in a stored procedure. But in practice, you can only return a resultset.

Some of the things I included in this list should be obvious: I just stated that some documented features can work together. But I’m afraid this is not obvious in MySQL and MariaDB: they have great features, but in many cases they don’t work together.

UPDATE: Some of the limitations can be workarounded with this trick.

Thanks, MariaDB team!

DELETE RETURNING is another reason to use MariaDB. And I see a task for UPDATE ... RETURNING.

Enjoy!

MariaDB: ALTER TABLE IF [NOT] EXISTS

Since when I remember (and I can remember the 3.23 version!) MySQL always supported the IF EXISTS and IF NOT EXISTS options for many DDL statements. They help developers in writing install and uninstall scripts: with IF NOT EXISTS, a table is created if needed, and there is no error (so, the script’s execution goes on) if the table is already there.

However, the case when a table exists but must be modified is problematic, because the ALTER statements never supported those options… until MariaDB 10. The old way to avoid errors was:

DROP TABLE IF EXISTS tab_name;
CREATE TABLE tab_name ... ;

But this way it is not possible to preserve current table’s data.

The cleanest solution was probably writing a script, or a Stored Procedure, which queries the information_schema database to check if a column exists, and conditionally CREATE/DROP it. Or, to avoid such queries, a more error-prone script could read the database version in use and upgrade it.

MariaDB 10.0.2 supports IF EXISTS and IF NOT EXISTS options for several ALTER TABLE sub-commands. Here’s the list of those clauses (copied from the MariaDB KnowledgeBase):

ADD COLUMN
ADD INDEX
ADD FOREIGN KEY
ADD PARTITION
 
DROP COLUMN
DROP INDEX
DROP FOREIGN KEY
DROP PARTITION
 
CHANGE COLUMN
MODIFY COLUMN

Also, these options are supported for CREATE INDEX and DROP INDEX.

Note that the rarely used ALTER COLUMN syntax has not an IF EXISTS option, so the longer MODIFY syntax must be used instead.

Also note that the PostgreSQL syntax is not supported; thus, this will not work:

ALTER TABLE IF EXISTS tab_name ... ;

At this point, you are probably wondering: “But then, is this feature totally useless for me? Is there still no way to alter a table only if it exists?”. Hey, don’t be dramatic! The solution is easy, and it’s probably better than Postgres-style:

-- create the table as we want it to be
CREATE TABLE IF NOT EXISTS tab_name ... ;
-- if an older table version exists, update it
ALTER TABLE name
    DROP COLUMN IF NOT EXISTS col_name col,
    DROP INDEX IF EXISTS idx_name;

This feature is not compatible with Oracle MySQL, and MariaDB’s executable comments cannot help here. You may write two different upgrade scripts, one for MariaDB and one for Oracle MySQL… but this feature should simplify your work, not make it harder.

So you can just switch to MariaDB and leave Oracle MySQL. This is an option even for public software projects. MariaDB has more interesting features, it is more open, and is probably the future. Also, migrating to MariaDB is amazingly simple: in most cases, all you need to do is to install MariaDB. So, don’t be prisoner of Oracle just because MySQL is still the most widely used DBMS: things will change.

Enjoy!

MariaDB/MySQL: Procedures to easily work with the Diagnostics Area

Versione italiana

UPDATE 2013-08-30: I fixed 2 bugs and create a GitHub repo called sql_diagnostix. Sorry for not doing this before. If I (or someone else) find new bugs, I’ll update the repo.

The problem

To quickly see information about the errors and warnings generated by the last executed statement, we can use SHOW WARNINGS. However this statement’s results cannot be accessed via SQL, so they cannot be used for error handling within a stored program.

For that purpose we can use GET DIAGNOSTICS, which has two problems:

  • It requires a lot of code
  • There is not a trivial way to quicly show all info from Diagnostics Area.

The first problem may discourage developers to use GET DIAGNOSTICS at all.

The second problem is not so important in most cases, because the missing info are SQLSTATE and the information which can only be set via SIGNAL and RESIGNAL, and cannot be read by an external program (SCHEMA_NAME, CURSOR_NAME, etc). However SQLSTATE can be important for you and setting/reading all SIGNAL‘s clauses could help debugging and problem solving… that’s why they exist 🙂

Oracle MySQL 5.7 has two Diagnostics Areas, and supports the STACKED keyword to access the second DA from an error HANDLER. However, this doesn’t help much: it just allows to execute another table-based statement before reading information about the error conditions which occurred. So, the STACKED DA doesn’t seem to solve any real life problem.

The solution

I created three Stored Procedures to address these problems:

  • _.materialize_diagnostics_area()
  • _.show_diagnostics_area()
  • _.show_full_diagnostics_area()

All the Procedures create a MEMORY temporary table called DIAGNOSTICS_AREA in a database called _. If the table already exists, it is DROPped and reCREATEd. Incidentally they also empty server’s Diagnostics Area, but the information will be avaible in that table until you call one of those routines again.

_.DIAGNOSTICS_AREA has a column called ID, which indicates each condition’s position in the Diagnostics Area. Other columns names and values are equal to the conditions properties in the diagnostics area. The only exception is the SQLSTATE column, which has the name that can be used with SIGNAL and RESIGNAL, not the name used in the diagnostics area (RETURNED_SQLSTATE).

Since both Stored Programs and external programs will be able to access _.DIAGNOSTICS_AREA, this technique exposes all the information that you can set with SIGNAL to external programs (MYSQL_ERRNO, etc).

Directly call materialize_diagnostics_area() if you want to easily copy the DA into a table and later run queries on it.

show_full_diagnostics_area() shows all the information, after populating the table. It’s a sort of SHOW WARNINGS with a lot of columns.

show_diagnostics_area() only shows the ID, SQLSTATE, MYSQL_ERRNO, MESSAGE_TEXT columns.

Example:

MariaDB [test]> INSERT INTO `t` VALUES (1/0), (1/0), (1/0), (-1);
Query OK, 4 rows affected, 4 warnings (0.07 sec)
Records: 4  Duplicates: 0  Warnings: 4
 
MariaDB [test]> CALL _.show_diagnostics_area();
+----+----------+-------------+--------------------------------------------+
| ID | SQLSTATE | MYSQL_ERRNO | MESSAGE_TEXT                               |
+----+----------+-------------+--------------------------------------------+
|  1 | 22012    |        1365 | Division by 0                              |
|  2 | 22012    |        1365 | Division by 0                              |
|  3 | 22012    |        1365 | Division by 0                              |
|  4 | 22003    |        1264 | Out of range value for column 'c' at row 4 |
+----+----------+-------------+--------------------------------------------+
4 rows in set (0.09 sec)

Limitations

If your Stored Program needs to be fast, use these Procedures only for debug.

As mentioned above, each call will erase old data from _.DIAGNOSTICS_AREA. This is by design.

An information is present in SHOW WARNING‘s output but missing from _.DIAGNOSTICS_AREA: the Level column. The reason is that I wasn’t able to find in MySQL documentation how to distinguish a Warning from a Note, except using SHOW WARNINGS. If there is no way, or a way exists but is not documented, I can’t add that information.

5.7’s STACKED diagnostics area is not used.

The code

Okay, I wrote enough. Now, if you are interested:

Download the SQL code

I won’t explain the code, because it seems to me very simple and it’s commented. Just take a look to materialize_diagnostics_area() and you’ll understand what I did.

Enjoy!

Quickly shutdown MariaDB 10

Italian version

How do you normally shutdown a MariaDB 5.5 or MySQL server?

mysqladmin shutdown -uroot -p

But MariaDB 10 (beta quality, at the time of this writing) has a quick way to do the same:

C:\Documents and Settings\utente1>mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.0.4-MariaDB mariadb.org binary distribution

Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SELECT 'I\'m alive!';
+------------+
| I'm alive! |
+------------+
| I'm alive! |
+------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SHUTDOWN;
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> SELECT 'I\'m gone away';
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2003 (HY000): Can't connect to MySQL server on 'localhost' (10061 "Unknown
 error")
ERROR: Can't connect to the server

unknown [(none)]> \q
Bye

Update: SHUTDOWN works in stored programs. This allows to write an event which turns the server off everyday at a certain time, or when a task is finished. This is safer than a cron job which kills it, and works on any system.

DELIMITER ||
CREATE EVENT `test`.`shutd`
    ON SCHEDULE
        EVERY 1 DAY
        STARTS '2014-01-01 20:00:00'
    COMMENT 'Shutdown Maria when the office closes'
DO BEGIN
    SHUTDOWN;
END;
||
DELIMITER ;

Enjoy!

MariaDB’s Sequence Storage Engine

Versione italiana

MariaDB 10.0.3 introduces a new Storage Engine: Sequence. It isn’t in MySQL or in older MariaDBs versions. Sequence is a special engine, which does not create or read any table. It only generates on the fly a sequence of integers which is returned to your query, and then dies. The sequence’s bounds and increment depend from the table name.

Very quick start

To install:

INSTALL PLUGIN sequence SONAME 'ha_sequence';

NOTE: On MariaDB (unlike Oracle MySQL) there is no need to add ‘.dll’ to the command on Windows, so the command is platform-independent. Thanks Vladislav Vaintroub for this information!

To create a sequence, you must first select a default database, using the USE command. Then issue this query:

SELECT * FROM seq_1_to_100;

To create a sequence specifying an increment different from 1:

SELECT * FROM seq_1_to_100_step_2;

A sequence can’t involve negative numbers, but can be descending:

SELECT * FROM seq_10_to_1;

Please, note that descending sequences always generate the same number as ascending sequence, in reverse order. This means that the highest number of a descending sequence sometimes is not the one you typed:

MariaDB [test]> SELECT * FROM seq_8_to_1_step_3;
+-----+
| seq |
+-----+
|   7 |
|   4 |
|   1 |
+-----+
3 rows in set (0.00 sec)

What sequence is not

Sequence Storage Engine is not an SQL SEQUENCE like the ones you can use in PostgreSQL or IBM, or FirebirdSQL’s generators. This means that a sequence exists only during the execution of a query. You cannot use a value from a sequence in a query and then use the next value in another query.

Also, sequence can’t generate negative numbers and can’t rotate when it reaches the maximum/minimum value (like they do in PostgreSQL).

Some sequence use cases

Other people don’t believe it, but developers have a lot of fantasy. It is needed to solve problems when there isn’t an obvious solution, and can’t find an algorithm designed by someone else. So, I’m sure that creative uses of the sequence Storage Engine will be found in the future. But for now, I’d like to share some ideas.

Almost all the following techniques can be used to populate tables with generated data. This can be useful, because when it can be done using some queries, you avoid network communications – which can be slow or error prone.

Find holes in a column

Sometimes a column should contain a sequence of integer values and holes shouldn’t be there. A common example is a column used to sort some elements in a web page: while holes shouldn’t be a problem for queries, they can make re-sorting operations buggy. In the following example we’ll create a table with a lot of holes, and then we’ll list the missing values using a sequence:

CREATE TABLE t1 (c TINYINT UNSIGNED) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1), (2), (3), (5), (10);
SELECT s.seq FROM seq_1_to_10 s LEFT JOIN t1 t ON s.seq = t.c WHERE t.c IS NULL;

Build a combination of numbers

SELECT s1.seq, s2.seq FROM seq_1_to_3 s1 JOIN seq_1_to_3 s2 ORDER BY 1, 2;

But you can also combinate numbers and characters (see below).

Find multiples of 3, minor than 100

SELECT seq FROM seq_3_to_100_step_3;

Find multiples of both 3 and 5

SELECT s1.seq FROM seq_5_to_100_step_5 s1 INNER JOIN seq_3_to_100_step_3 s2 ON s1.seq = s2.seq;

Optimized: find multiples of (2 or one of its powers)

SELECT seq FROM seq_1_to_100 WHERE NOT seq & 1; -- ...of 2
SELECT seq FROM seq_1_to_100 WHERE NOT seq & 3; -- ...of 4
SELECT seq FROM seq_1_to_100 WHERE NOT seq & 7; -- ...of 8
SELECT seq FROM seq_1_to_100 WHERE NOT seq & 15; -- ...of 16
...

Build a sequence of 1-char strings

The following query includes lowercase chars, uppercase chars and digits. But I’m sure that anyone can modify it to exlude one of these groups, or include more groups (for example, accented vowels).

-- Sequence of 1-char strings
SELECT CHAR(seq) AS ch
    FROM (
                -- lowercase
                (SELECT seq FROM seq_97_to_122 l)
            UNION
                -- uppercase
                (SELECT seq FROM seq_65_to_90 u)
            UNION
                -- digits
                (SELECT seq FROM seq_48_to_57 d)
        ) ch;

Build a sequence of 2-char strings

Ouch… this UNION is tricky. But someone could still use it.

SELECT CONCAT(ch1.ch1, ch2.ch2) AS ch
    FROM (
        (SELECT CHAR(seq) AS ch1
            FROM (
                        -- lowercase
                        (SELECT seq FROM seq_97_to_122 l1)
                    UNION
                        -- uppercase
                        (SELECT seq FROM seq_65_to_90 u1)
                    UNION
                        -- digits
                        (SELECT seq FROM seq_48_to_57 d1)
                ) s1
        )
    ) ch1
    CROSS JOIN (
        (SELECT CHAR(seq) AS ch2
            FROM (
                        -- lowercase
                        (SELECT seq FROM seq_97_to_122 l2)
                    UNION
                        -- uppercase
                        (SELECT seq FROM seq_65_to_90 u2)
                    UNION
                        -- digits
                        (SELECT seq FROM seq_48_to_57 d2)
                ) s2
        )
    ) ch2
    ORDER BY ch1, ch2;

Sorry for the aliases confusion (particularly ch1 and ch2). I ran out of fantasy before finishing the query.

Sequence of dates

This is interesting, in my opinion. Many booking applications, during the installation, populate a table with days or hours when a reservation could start. This example shows a simple way to fill a table with all days in January:

SELECT DATE ('2014.01.01' + INTERVAL (s.seq - 1) DAY) AS d
    FROM (SELECT seq FROM seq_1_to_30) s;

Sequence of hours, halfes of an hour, etc

-- Hours in a day
SELECT CAST('00:00:00' AS TIME) + INTERVAL (s.seq - 1) HOUR AS t
    FROM (SELECT seq FROM seq_1_to_24) s;
-- Halfes of an hour in a day
SELECT CAST('00:00:00' AS TIME) + INTERVAL (30 * s.seq) MINUTE AS t
    FROM (SELECT seq FROM seq_1_to_48) s;

Working days

The last two examples are not so useful, alone. Usually you don't need to fill a table with all days in a month/year, or all hours in a day. Instead, you only need to write working days and working hours. To find working days, you can use DAYOFWEEK() function to exclude saturdays and sundays. Of course this doesn't automatically exclude christmas or other holidays - but I think that the best solution is deleting them after filling the table.

SELECT DATE ('2014-01-01' + INTERVAL (s.seq - 1) DAY) AS d
    FROM (SELECT seq FROM seq_1_to_30) s
    -- exclude sunday (1) and saturday (7)
    WHERE DAYOFWEEK(DATE ('2014-01-01' + INTERVAL (s.seq - 1) DAY)) BETWEEN 2 AND 6;

Just a note on implementation. Why do I start dates from day 1 and then I perform a - 1, rather than directly starting from day 0? For two reasons. First, in Setting a strict SQL_MODE article, I explain that the NO_ZERO_DATE and NO_ZERO_IN_DATE flags in SQL_MODE should be set. Second: '2014-01-00' + INTERVAL 1 DAY returns NULL, and it is the correct behavior, not a bug. If you don't understand the reason (but also if you understand it), please set those flags. Permissive SQL_MODEs pretend to be your friends, but one of these days they will kill you!

Working hours

SELECT {dt '2013-01-01 01:00:00'} + INTERVAL (wd.d - 1) DAY + INTERVAL (dh.h - 1) HOUR AS wh
    FROM
        (
            -- working days in a month
            SELECT seq AS d FROM seq_1_to_30
        ) wd
    CROSS JOIN
        (
            -- daily working hours
            (SELECT seq AS h FROM seq_9_to_12)
            UNION
            (SELECT seq AS h FROM seq_14_to_17)
        ) dh
    ORDER BY 1;

Here we have one subquery returning the working days, JOINed with a subquery returning working hours. Note that working hours have a hole (workers lunch), so the second subquery is a UNION of 2 sequences.

And... yes, carthesian product is evil, but here it's exactly what we need. The CROSS keyword makes clearer that if you modify the query and make a mistake, you could create a SELECT returning a big number of rows.

To-Do and suggestions

In a future article, I will show a Stored Function which dynamicall composes a query to populate a table with DATETIME values, in a given time range, with a given granularity.

Do you think that one of these query is not well-optimized? Do you have other interesting ideas which involve the sequence Storage Engine? Please, comment this article.

Enjoy!