MariaDB/MySQL missing features: View comments

The COMMENT clause

All database components (and database themselves) should have a COMMENT clause in their CREATE/ALTER statements, and a *_COMMENT column in the information_schema. For example:

CREATE PROCEDURE do_nothing()
        COMMENT 'We''re lazy. Let''s do nothing!'
BEGIN
        DO NULL;
END;
SELECT ROUTINE_COMMENT FROM information_schema.ROUTINES;

In fact most database objects have those clauses in MySQL/MariaDB, but not all. Views are an exception.

Comments in code

MariaDB and MySQL have multiple syntaxes for comments. Including executable comments (commented code that is only executed on some MySQL/MariaDB versions).

One can use comments in stored procedures and triggers, and those codes are preserved:

CREATE PROCEDURE do_nothing()
BEGIN
        -- We're lazy. Let's do nothing!
        DO NULL;
END;

But, there are a couple problems:

  • This doesn’t work for views.
  • mysql client strips comments away, unless it’s started with --comments parameter. So, by default, procedures created with mysql have no comments.

So…

Views have no comment. No comments in metadata, no comments in code.

This prevents us to create self-documenting databases. Even if names are self-documenting, we may still need to add notes like “includes sold-out products”, or “very slow”.

Criticism makes us better

As a final note, let me say that this post is not an attack against MariaDB or MySQL. It is criticism, yes, because I like MariaDB (and MySQL). Criticism helps, keeps projects alive, encourages discussions.

To explain what I mean, I’ll show you a negative example. Recently I’ve attended a public talk from a LibreOffice Italia’s guy. It shown us a chart demonstrating that, according a generic “independent American study”, LibreOffice has no bug, while MS Office is full of bugs. The guy seems to think that software lint-like can automatically search for bugs. First I wondered how can LibreOffice survive with a community that is totally unable to produce criticism. Then I realized why it is the most buggy piece of software I’ve ever tried.

Hiding problems is the safest way to make those problems persist.

I’m happy that my favorite DBMS’s get the necessary amount of criticism.

Federico

Advertisements

MariaDB/MySQL: Cursors for Dynamic SQL

Note: In this post, I always refer to MariaDB. As far as I know, the techniques and the limitation I’m describing here also apply to Percona Server and MySQL, including 5.6. But since I didn’t test the examples, I just refer to MariaDB (tested on 5.5).

Terminology

A query is a SQL statement that returns some results (SELECT, SHOW, DESCRIBE, EXPLAIN, HELP).

A Stored Program is a Stored Routine, a Trigger or an Event.

A Stored Routine is a Stored Procedure or a Stored Function.

A Dynamic SQL statement is astring composed by a Stored Program, which will be executed as a Prepared Statement. Even within normal Prepared Statements you can insert some values, for example you can “pass” them an ID which will be used in the WHERE clause. But there are many limitations: you can’t compose the ORDER BY clause, or decide what columns should be SELECTed, for example. An example of Dynamic SQL is:

SET @sql = CONCAT('SELECT `', @col_name, '` AS `value` FROM `', @tab_name, '` LIMIT 1;');

The Problem

MariaDB cursors can’t be used with Dynamic SQL. Their syntax is:

DECLARE <cursor_name> CURSOR FOR <query>

But is a hard-coded SQL statement, and can not be a variable. So, if you can’t hard-code your query because it should contain some variable parts, you can’t use a cursor to read the result set.

Hey, wait a moment… can’t you? Well, actually you can. You must write some code and use some dirty tricks, but you can.

The solution (Code)

Here’s the code we will use to implement “Dynamic Cursors”. I will explain it later:

DELIMITER ||
 
 
DROP PROCEDURE IF EXISTS `dyn_lock`;
CREATE PROCEDURE `dyn_lock`(IN `query` TEXT, IN `lock_name` CHAR(64), IN `timeout` TINYINT UNSIGNED)
	NO SQL
	LANGUAGE SQL
	COMMENT 'Get a lock for dyn cursor'
BEGIN
	-- output from GET_LOCK()
	DECLARE lock_res TINYINT UNSIGNED;
 
	-- error in query
	DECLARE CONTINUE HANDLER
		FOR SQLEXCEPTION
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Dynamic SQL returned an error';
 
	-- get lock or exit with err
	SET lock_res = GET_LOCK(`lock_name`, IFNULL(`timeout`, 5));
	IF (lock_res IS NULL) THEN
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Could not acquire lock: Unknown error';
	ELSEIF (lock_res = 0) THEN
		SIGNAL SQLSTATE VALUE '45000' SET
				MESSAGE_TEXT  = '[dyn_lock] Could not acquire lock: Timeout expired';
	END IF;
 
	-- create dynamic view
	SET @dyn_sql = CONCAT('CREATE OR REPLACE VIEW `', lock_name, '` AS ', query, ';');
	PREPARE stmt_dyn_view FROM @dyn_sql;
	EXECUTE stmt_dyn_view;
	DEALLOCATE PREPARE stmt_dyn_view;
END;
 
 
DROP PROCEDURE IF EXISTS `dyn_cursor`;
CREATE PROCEDURE `dyn_cursor`(IN `query` TEXT, IN `timeout` TINYINT UNSIGNED, OUT x TEXT)
	READS SQL DATA
	LANGUAGE SQL
	COMMENT 'Simulate dynamic cursor'
BEGIN
	-- we assume there is only 1 column to be read;
	-- TEXT is good for any type
	DECLARE val TEXT;
	-- no more rows flag
	DECLARE eof BOOL;
 
	-- select data from a view. only the view is really dynamic
	DECLARE cur_dyn CURSOR FOR
		SELECT * FROM dyn_view;
 
	-- set eof flag
	DECLARE CONTINUE HANDLER
		FOR NOT FOUND
		SET eof = TRUE;
 
	CALL `dyn_lock`(`query`, 'dyn_view', `timeout`);
 
	-- open cursor on view
	OPEN cur_dyn;
 
	-- now that we have the cursor, we can release lock
	DO RELEASE_LOCK('dyn_view');
 	
	lp_dyn: LOOP
		IF eof = TRUE THEN
			LEAVE lp_dyn;
		END IF;
		
		FETCH cur_dyn INTO val;
	END LOOP;
 
	CLOSE cur_dyn;
 
	SET x = val;
END;
 
 
||
DELIMITER ;

Just few notes before explaining this technique:
* The lp_dyn loop, in the dyn_cursor() PROCEDURE, is the part where we do something with the cursor we created. I could have leave out that part because it’s not really part of the general solution, but I needed some code which demonstrates that my solution works. If you leave out that part, then you don’t need the x OUT parameter and the val local variable.
* The user should call dyn_cursor(), not dyn_lock(). The latter is called by the former. This will be explained later.
* dyn_cursor() properly works if you pass a query which only produces 1 column. The last value returned by the query is assigned to the x OUT parameter. SELECT the variable you pass, to check that your query works and the result set has been correctly read by the cursor.
* To test the procedure, you can CALL something like these (call them from different connections to test concurrency):

CALL `dyn_cursor`('SELECT `ENGINE` FROM `information_schema`.`ENGINES`', NULL, @var1);
CALL `dyn_cursor`('SELECT 1 AS `uno` FROM `information_schema`.`TABLES`', NULL, @var2);
CALL `dyn_cursor`('SELECT -;', NULL, @var2);

The solution (Theory)

The technique is quite simple. Here’s the work flow.

The dynamic query is received through the query IN parameter. This happens in my example – please, note that in real-world examples, you could compose that string in the same PROCEDURE. Or it could be a FUNCTION, instead of a PROCEDURE.

We need to acquire a lock, so we call dyn_lock(). It is not the table-level lock you would acquire with a LOCK TABLE statement, but another type of lock. It’s more like a logical named lock entirely handled by four SQL functions (GET_LOCK(), RELEASE_LOCK(), IS_USED_LOCK(), IS_FREE_LOCK()). This kind of locks is interesting, because you may want to acquire locks that are not bound to a table, or you are using techniques that are affected by table locks in a strange way (transactions, triggers, etc). I will not fully explain the mechanism in this post, but here’s what we do. We acquire a lock named 'dyn_view' using GET_LOCK(). If that lock has been acquired by another connection, we wait for the number of seconds specified as `timeout` parameter. If `timeout` is NULL, the default value we decide is 5 (please, use the value you think is most appropriate in your situation). We could define two default values, one in dyn_cursor() and one in dyn_lock(). Later, the reason for this will be clear.

If the lock can not be acquired, dyn_lock() throws an error. The SQLSTATE '45000' is the one suggested by the MySQL documentation for user errors. The message is not complete, because this is just an example. But it should contain the name of the Stored Routine that thrown it, if you want make debug easier.

Then, dyn_lock() creates a view with the specified name. And then the execution exits from dyn_lock(). Here we open the cursor (alas, this can’t be done within dyn_lock() because cursors exist at a Routine level), and we release the lock using RELEASE_LOCK().

Let’s talk about the view. Its AS clause is the statement passed through `query` parameter. So, doing a SELECT * on that view is like directly running `query`. The performances are the same, or almost the same. The advantage is that SELECT * FROM `dyn_view` can be hard-coded in a cursor definition (we did it in dyn_cursor()), but `query` is a string that can be generated dynamically.

Views can not be temporary in MariaDB/MySQL (they can in Postgres). This means that when you create/replace a view, other connections could use it. In our case, we must prevent a concurrency like this:
1) A creates `dyn_view`;
2) Just 0.0000001 seconds after, B replaces `dyn_view`;
3) A executes the query on `dyn_view`, knowing not that it has been replaced.
To avoid this, we use the lock I described.

We want to release the lock as soon as possible, to increase performances. We can do this just after opening the cursor. It will work on a query, not on the view itself – so, we don’t need it anymore, and can allow other connections to replace it. In fact, the view is always created (if it doesn’t still exists) or replaced. It is never drop, to speed up things.

It is important that the view and the lock have the same name and are created within dyn_lock(). This way, dyn_lock() contains (part of) the generalized code. Not only dyn_cursor(), but also other Stored Routines, can call dyn_cursor() to get a lock and a view with a given name. If several Routines call dyn_cursor() passing the the same name, they can’t be executed at the same time. For the sake of brevity, I will not show here this case here, but it can be useful. If different Routines call dyn_cursor() with different names, they can be executed at the same time. But still two connections can not execute the same Routine at the same time. However, it is important that Routines release the lock name they acquired as soon as possible (again, see RELEASE_LOCK()).

To Do (or not To Do)

GET_LOCKS() accepts a timeout in seconds, not microseconds. For short queries, you may want to specify a value like 0.5. Or 1.5. If this limitation is a problem for you, you may consider to re-implement dyn_lock() and write an unlocking PROCEDURE. You could acquire locks by inserting your CONNECTION_ID() into a MEMORY (but not temporary!) table, and release the lock by deleting that ID. To check if a lock has already been acquired by someone else, you could use a SELECT ... FOR UPDATE. But there is a complication: a query could be KILLed before it releases its lock. So, you also need to write in the table a TIMESTAMP that indicates when the lock has been acquired. After a given timeout (which can be constant or variable), the lock expires.

If you re-implement the locking system, and you have a high concurrency, you may also consider to use more than one view. The SELECTs on the view must be hard-coded, so the number of allowed views and their names must be hard-coded too. But you could allow 3 view called dyn_view1, dyn_view2, dyn_view3, and create 3 locks with the same names in the MEMORY table. But if you use the GET_LOCK() and RELEASE_LOCK() functions, you need to use only 1 lock, and thus only 1 view.

Please, note that re-implementing the locking system will improve concurrency and timeout granularity, but will add complexity, and thus should reduce the performances in an environment where concurrency is almost irrilevant.

For the sake of brevity, errors generated by the Dynamic SQL are not handled here. But you should handle them.

Limitations and Feature Requests for MariaDB

* WARNING: because of a bug, in MySQL you must avoid to pass negative numbers to GET_LOCK() as timeout, because it is an endless timeout. In MariaDB, this only happens if you pass very low negative numbers (MDEV-4017).
* First of all: if MariaDB adds support for cursors on dynamic SQL statements, all this code will no longer be necessary.
* LIMITATION: This technique can not be used recursively – that is, the query you pass to dyn_cursor() can not contain a call to dyn_cursor(). This is because dyn_lock() uses a Prepared Statement. MariaDB/MySQL’s Prepared Statements exist at a connection-level, and are not recursive. I hope that this in the future we will have a way to implement this type of recursivity.
* #MDEV-4028 GET_LOCK() should accept a timeout argument with granularity in microseconds.
* Dynamic SQL could generate errors or warnings. In that case, it would be useful to get an error message specifying the name of the Prepared Statement and the Stored Routine that thrown it.
* If a Stored Routine could return a cursor or a result set, we could move the OPEN and DO RELEASE_LOCK() lines into dyn_lock(). If we could do this, the code would be easier to maintain.

I use a lot of Stored Programs, and I am working (when I have the time to do it) on a couple of libraries I would like to release publicly. But there are strong limitations on what you can do with Routines, at the moment. I will write a post about what features I think that could make a developer’s life easier. Maybe some of them seem useless, until one starts to use Routines in real life – then, you understand that they could help you writing better (mantainable, general, faster) code. And maybe some of them are not so difficult to implement… at least, I hope so.

See also:

Enjoy!

MariaDB: How to create a Black Hole when BLACKHOLE is disabled

Versione italiana

The BLACKHOLE Storage Engine causes data to fall into a black hole. Not only the data you are INSERTing or LOADing, but also existing data. In other words, BLACKHOLE disables a table and erases its contents. It can be useful for debugging (don’t execute query, but verify if they cause an error) or other special purposes.

Thought, hosting providers generally have BLACKHOLE disabled. I’m not sure about how that simple Storage Engine can damage their service, but it’s usually you can’t use it, it’s a fact.

But then, how can you emulate a BLACKHOLE table? The answer is simple: just use a VIEW!

First, we need a table. The process is trivial and riskless, so I will not write a CREATE TABLE. Use one of your existing tables. The following example assumes that your table is called `t_user` – replace that name with the real one.

Then, we create a view. It is of the simplest type: it just mirrors an existing table. You can read from it and write in it. Here’s the code:

CREATE OR REPLACE VIEW `t_user`
	AS SELECT * FROM `user`;

You will use `t_user` instead of `user` for all your operations. At some point, you want to disable that view – ie, turn it into a Black Hole!

The process, as I said, is trivial. Just re-define `user`, but with 2 changes:
* Add a WHERE FALSE clause. Obviously the view will not show any data (because no row can possibly satisfy a WHERE FALSE).
* Add a WITH CHECK OPTION clause. It will make the view not-writable (because a row can only be INSERTed if it satisfies WHERE FALSE, which is impossible).

Here’s the code:

CREATE OR REPLACE VIEW `t_user`
	AS SELECT * FROM `user` WHERE FALSE
	WITH CHECK OPTION;

When you are ready to turn the Black Hole into a habitable planet, execute the first statement again.

The OR REPLACE clause is useful in this case, because you don’t need to DROP an re-CREATE the view – you replace it with a single atomic operation.

Enjoy!

Working with different tables using a view

The problem and the possible solutions

Most online stores have a common problem: they sell several families of products and each of them has different attributes. For example, shirts have a color and a size, while cell phones have a brand and a model. For this reason, a reasonable solution may be to store them in different tables. But they also have some important common attributes, like price or avaibility; and those attributes are read by many queries, so it would be comfortable to have those families in one table.

The worst solution is to create one table with a lots of columns which will be used only for few records. The historical solution is view-based. MariaDB introduces Virtual Columns, so there is now a Virtual Columns-based solution.

In this article we will discuss the view-based solution. Virtual Columns will be explained in a future article.

Ok, let’s CREATE what we need

Here’s the SQL code:

CREATE TABLE `shirt`
(
	`id`          TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
	`price`       DECIMAL(10, 2) NOT NULL,
	`avaibility`  SMALLINT NOT NULL,
	`size`        ENUM('xl','l','m','s','xs') NOT NULL,
	`color`       ENUM('yellow','white','black') NOT NULL,
	PRIMARY KEY (`id`)
);

CREATE TABLE `cell`
(
	`id`          TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
	`price`       DECIMAL(10, 2) NOT NULL,
	`avaibility`  SMALLINT NOT NULL,
	`brand`       CHAR(50) NOT NULL,
	`model`       CHAR(50) NOT NULL,
	PRIMARY KEY (`id`)
);

CREATE VIEW `product`
	AS
		(
		SELECT 
			`id` + 0 AS `id`, `id` AS `origin_id`, 'shirt' AS `family`, `price`, `avaibility`
			FROM `shirt`
		) UNION (
		SELECT 
			`id` + 1000 AS `id`, `id` AS `origin_id`, 'cell' AS `family`, `price`, `avaibility`
			FROM `cell`
		);

I think that the tables don’t need to be explain. There is a table for every product family, and they could be 100 or 1000; but for this tutorial, 2 are enough.

The view contains the common attributes of the different product families (price, avaibility). To gather them in one view, we use a UNION statement. Plus, the view contains some columns that need some to be explained.

Points of VIEW

`family`: This is a constant value, but rows located in different tables have a different value for `family`. That’s exactly what we want, so that we can distinguish shirts from phones.

`id`: We want to have a unique id for all products, disregarding their family. But differenct tables may contain many identical `id` values, so we can’t rely on their `id`. That’s why we need to create an artificial value inside the view.
Note that in this example shirts and phones have a TINYINT Primary Key. This is reasonable, because a store can hardly sell more than 100 different types of shirts or cell phones, and the maximum value for TINYINT UNSIGNED is 255. We need to find a comfortable value that is higher than 255: 1000 is a good candidate.
Now, we decide that id’s of product of different families are in different ranges. So the shirts id’s are 1..255. But id’s of cell phones are 1000..1255, so the value is unique. We do this by adding 1000 to the cell phones id’s.
This value can be used for SELECTs which read common values for specific rows in the view, not in the basetables.

`origin_id`: We can’t use the `id` value to UPDATE or DELETE individual rows, because the VIEWs based on a UNION are not updatable. To execute those operations, we must run our statements directly on the tables. But to do this, we need to know their original `id`. That’s why we have an `origin_id` field: it matches to the `id` columns in the basetables.

The distinction between `id` and `origin_id` is not very common. I like it, because it allows me to SELECT a signgle row from the view, when/if I need it; however, this rarely happens. So you may want to include the basetables `id` in your view without any alias, and forget the artificial value.

It’s important to remember that the view is not insertable or updatable: you will SELECT from the view, but insert, update and delete from the basetables.

Enjoy!

Emulating the CHECK clause in MariaDB

Versione italiana

MariaDB (like MySQL and its forks) does not have a CHECK clause in the CREATE TABLE statement. But you can emulate it using VIEWs.

Data Definition

First, let’s create a table:

CREATE TABLE `t_person`
(
	`id`       INTEGER UNSIGNED NOT NULL AUTO_INCREMENT ,
	`name`     CHAR(30) NOT NULL ,
	`surname`  CHAR(30) NOT NULL ,
	`email`    CHAR(50) NOT NULL ,
	PRIMARY KEY (`id`)
);

The table is ready to store our data, but we need to check that data is valid. When new row is inserted, we want to check that `surname` contains at least 2 characters and `email` contains ‘@’ and ‘.’ characters.

So, let’s create a view that would show only data that follow these simple rules:

CREATE VIEW `person`
	AS SELECT * FROM `t_person`
		WHERE LENGTH(`surname`) > 1
			AND `email` LIKE '%_@_%._%'
	WITH CHECK OPTION;

The view is INSERTable, because it is based on only one basetable and has no grouping functions/clauses. The WITH CHECK OPTION clause allow new rows to be inserted only if they satisfy the view’s WHERE condition.

Data Manipulation

Now, you can insert data through the view. Let’s start with an invalid row:

INSERT
	INTO `person`
		(`name`, `surname`, `email`)
	VALUES
		('Mario', 'Rossi', 'mario.rossi');

`email` does not contain a ‘@’, so the row is rejected with a good error message:
[Err] 1369 - CHECK OPTION failed 'test.person'

So, let’s try to insert a valid row:

INSERT
	INTO `person`
		(`name`, `surname`, `email`)
	VALUES
		('Mario', 'Rossi', 'mario.rossi@rossionline.it');

The row is successfully inserted!

You can SELECT your data from `t_person` or from `person`, as you prefer. If you use `person`, than you (or other developers) can pretend that the view is a table and `t_person` does not exists – it’s transparent. However, SELECTing rows from `person` adds a useless WHERE that is always true but affects the performances.

Enjoy!

See also: