Observations about MariaDB’s Virtual Columns

I have played a bit with MariaDB’s Virtual Columns. The documentation seems to be incomplete, especially it does not list all limitations. I edited the page, and I hope that what I written is correct. In this post, I want to list everything I’ve found: I didn’t write some of these things in the wiki, because I’m not sure if they are bugs, or because I’m not sure they are worth being written there.

If there are errors (or you know about other limitations) please add a comment, and I will fix this post ASAP.

Enjoy!

  • Virtual Columns can only be used with Storage Engine which supports them. The only official Storage Engines which support Virtual Columns are: InnoDB, Aria, MyISAM.
  • All datatypes can be used for Virtual Columns. They can be Zerofill. But you can’t specify NULL/NOT NULL. Of course, no DEFAULT value (the default value is the result of the SQL expression). Comments are ok.
  • You can’t use Virtual Columns in Triggers because of this bug. Should be solved in the next version of each tree (see bug page).
  • INSERT statements should use the SET clause or include all Virtual Columns – assign DEFAULT values explicitly. Of coure you can not modify a value with UPDATE.
  • Virtual Columns expressions can only use non-Virtual columns from the same table, operators and built-in deterministic functions. Everything else is not allowed. Constant values (including not-null-column IS NULL) are not allowed.
    • Please, note that if you don’t write correctly a function’s name, you will see this error: `Function or expression is not allowed for column ‘v2’`. The fuction you tried to use may be ok, so check if you written correctly its name.
  • MODIFY and CHANGE operations in ALTER TABLE are not supported for Virtual Columns. You need to DROP + ADD the column. This is always true, even if you are not changing anything.
  • The only way to get a Virtual Column’s expression is perform a SHOW CREATE TABLE and parse its output (maybe not so hard, but not convenient).
  • Administrative commands seem to work correctly. CHECKSUM counts stored columns, and does not count non-stored columns – this is correct. ALTER TABLE ORDER BY works.

Foreign Keys

Still I did not test exensively FOREIGN KEYs, but here’s what I’ve found out.

  • All columns in FOREIGN KEYs, or referred by FOREIGN KEYs, can be Persistent. Non-stored columns can’t be used (this seems to me consistent, as referred columns need to be indexed, and a foreign key itself is an index).
  • ON UPDATE CASCADE, ON UPDATE SET NULL and ON DELETE SET NULL are not allowed (you will receive a clear error message, so I’m sure this is not a bug; also, a computed value can not be modified).
    (I’ve noticed that SET DEFAULT fails with a generic error 150, but this is not related to Virtual Columns, and I tried this just because this clause was supported by PBXT.)
  • In my tests, all contraints were enforced as expected. All error messages were ok.
  • I combined virtual/non-virtual FOREIGN KEYs columns, and referred columns in several ways: everything worked good.

MariaDB and MySQL’s optimizer does not recognize tautologies and contradictions

In logic, a tautology is an expression which is always TRUE, no matter what values are assigned to variables (col = col). The opposite of tautologies are contradictions: a contradiction is an expression which is always FALSE (col col).

I wondered if MariaDB’s optimizer recognizes tautologies and contradictions, so I played with some stupid queries. The procedure I used to verify the tautologies is simple:

EXPLAIN EXTENDED <query>;
SHOW WARNINGS;

As you probably know, if you execute SHOW WARNINGS after EXPLAIN EXTENDED, it returns the query internally rewritten by MariaDB. This way, it’s easy to verify if a tautology/contradiction has been recognized: WHERE <tautology> should be replaced with: WHERE 1, while WHERE <contradiction> should be replaced with: WHERE 0. Just to be sure, I check if EXPLAIN says we are going to have a full table scan… just to verify a tautology.

It turns out that the optimizer recognize very few tautologies. For example:

<const_value>
<const_expression>
col = col
col != col
NOT NOT col = col
(col = val) AND (col  val)

(col = val) OR NOT (col = val)

Tuatologies and contradictions that are NOT recognized: (some of them are partly optimized, however)

col IN (col, ...)
col IN (val1, val2) = col NOT IN (val1, val2)
col IN (val1, val2) = (col = val1 OR col = val2))
(col > val) AND (col < val)
(col = val) = (val = col)
(col AND val) = (val AND col) // etc
(col = val) AND NOT (col = val)
(col = val) OR (col <> val)
(col IS NULL) OR (col IS NOT NULL)
(col = val) XOR NOT (col = val)
(col = val1) = (col = val1) AND (col = val2)

I could have tried more tautologies/contadictions, but if these one are not recognized, I can reasonably say that more complex ones are not recognized.

Please, note that I didn't try these in any other DBMS. So I am NOT saying that other software projects has a superior optimizer. Probably they have not. However, I find these results quite amazing. It is probably rare for a DBMS to meet one of these tautologies in a query. But, I guess that sometimes it happens… and it's really hard for a user to guess that col IN (col, ...) is not properly optimized.

UPDATE
Henrik Ingo commented this post on Facebook, and I wish to share his thoughts with everyone:
It’s easy for us app developers to think that a RDBMS contains some form of AI or general purpose logic engine that will solve our “obvious” where clauses. I’ve had similar surprises, but in real world cases, too. It’s good for app developers to understand that the optimizations that exist in a RDBMS are there only because someone actually spent lot of time implementing them.
http://openlife.cc/blogs/2012/september/hand-holding-postgresql-simple-query

JavaScript Is NOT Java!

Versione italiana

When I tell that JavaScript has nothing to do with Java, people say that I am exaggerating. And maybe they think I am fighting some strange kind of holy war. But I am just trying to highlight a real problem. If you use Lisp, and someone convinces you that you are using Java… well, your programs simply can’t be decent. For 2 reasons: you are not using the features of Lisp, and you can not use the features of Java.

Here are some facts that people can hardly question:

  • Java has classes and interfaces, JavaScript has not. This is because Java is Object Oriented and JavaScript is not (altought it has an object type).
  • JavaScript has a function type and lazy evaluation, Java has not. This is because JavaScript is a functional language and Java is not.
  • In JavaScript you can add properties to an object at runtime, or even change its parent; in Java you can’t. That’s because JavaScript is dynamic and Java is static.
  • In JavaScript, there is no reason to use the new operator. And usually, there is a good reason to avoid it. In Java you can’t even say “Hello World” without new.
  • In JavaScript you always should nest functions. In Java you can’t, and (if you know what Java is) you don’t want to do anything like that.
  • Java has private and protected properties/methods. In JavaScript you encapsulate data in another way.
  • Yes, in JavaScript you can emulate class inheritance. But you should use Prototypes. Don’t pretend you are using classes, because JavaScript doesn’t have classes.
  • a[“b”] is an array in Java, but is an object in JavaScript.

I know that this list is not an explanation of what JavaScript is. But I hope I made clear that JavaScript is NOT similar to Java.
Is JavaScript a subset of Java? No. A dynamic functional language can’t be a subset of a static object oriented language. Unless you think that Haskell is a subset of C++.
Are their syntaxes similar? No. JavaScript syntax is a dirty cheap, because it makes you beilive you are instantiating classes or accessing an element from an associative array, when you’re doing something totally different.

JavaScript is similar to Lisp, not Java. And programming in Java using Lisp is one of the most (hard + stupid + useless) things a developer could try to do. But it’s not your fault. It’s ECMA‘s fault. Ask them why they cheated you.

Validating data using a TRIGGER

Versione italiana

I wrote a post about how to emulate the CHECK clause in MariaDB and MySQL using a VIEW. Here is a different approach: this time, we will use TRIGGERs to reach the same goal. And then we will discuss which approach is better.

Let’s CREATE what we need

Let’s CREATE a simple table:

-- make sure that `person` can be created
DROP VIEW   IF EXISTS `person`;
DROP TABLE  IF EXISTS `person`;

-- table whose data must be validated
CREATE TABLE `person`
(
	`id`  mediumint UNSIGNED NOT NULL AUTO_INCREMENT,
	`name`  char(50) NOT NULL,
	`email`  char(80) NOT NULL,
	PRIMARY KEY (`id`)
)
	ENGINE = Aria;

`person` contains a Primary Key, a field called `name`, and `email`. We will just care about the `email`. We want to validate it using a LIKE pattern. But how?

Let’s CREATE the TRIGGER

TRIGGERs can be set on the base DML operations (INSERT, UPDATE, DELETE). If we want to be sure that data are valid, we should use both INSERT and UPDATE TRIGGERs, but for brevity here I will just show the INSERT TRIGGER. TRIGGERs can be executed BEFORE or AFTER the operation they are associated to; of course, if we want to prevent invalid data to be written, we need the BEFORE option.

-- validate email
CREATE TRIGGER `person_validate_insert`
	BEFORE INSERT
	ON `person`
	FOR EACH ROW
BEGIN
	IF NEW.`email` NOT LIKE '%_@%_.__%' THEN
		SIGNAL SQLSTATE VALUE '45000'
			SET MESSAGE_TEXT = '[table:person] - `email` column is not valid';
	END IF;
END;

TRIGGERs can’t have a comment, so I always try to use descriptive names. The first word is person, which is the name of the table; then, I explain what the TRIGGER does: ‘validate insert’.

The TRIGGER checks if the new `email` is valid. If it is, execution continues and the INSERT can be performed. But if it is not, an error is thrown. This stops the execution, and the INSERT will not be performed.

Just a note about the error. The SQLSTATE is ‘45000’, as suggested by MySQL’s documentation – user-defined errors should set SQLSTATE to ‘45000’. The message starts with an info wrapped between the [square brackets]. When I throw an error from a Stored Routine, I write the name of the routine; when I throw it from a TRIGGER, I write the table name. Then, there is a descriptive error message. “Descriptive” means that the user is told what he was trying to do (INSERT) and why the operation aborted (`email` column is not valid).

Let’s test!

I assume that everything I don’t test (if it can be tested), simply does not work. So, let’s INSERT a valid row:

MariaDB [test]> INSERT INTO `person`
    ->          (`name`, `email`)
    ->  VALUES
    ->          ('Emiliano Zapata', 'zapata@revolucion.mx');
Query OK, 1 row affected (0.59 sec)

Great! Now, let’s see what happens when we try to INSERT invalid data:

MariaDB [test]> INSERT INTO `person`
    ->          (`name`, `email`)
    ->  VALUES
    ->          ('John Doe', 'misterdoe@nowhere');
ERROR 1644 (45000): [table:person] - `email` column is not valid

Great! That’s exactly the error we expected to see.

What is the best technique?

Now that I discussed two different techniques (read how to use a VIEW), I am supposed to explain which one is better. But I’ll just write some notes, that should clarify the pro’s and the con’s of each method:

* If you are using MariaDB 5.3 or older, or MySQL 5.4 or older, the SIGNAL statement is not avaible. You can use a SELECT on a table which does not exists, which is a bit less clean (you know why you see that error, but it can be confusing for other developers).
* Hosting companies often do not allow you to use TRIGGERs – in that case, you can only use the VIEW technique.
* The INSERT/UPDATE‘s performance should not be notably affected by this choice. DELETE and SELECT performance may be a little bit slower for the VIEW method.
* The TRIGGER method needs 2 different triggers (on insert, on update; maybe on delete, if you want to check data on other tables to ensure database’s consinstency). The VIEW method needs a view and renamed table. The second option is probably more convenient… but this is TRUE for me, and may be FALSE for you. (a boolean way to say that it’s subjective)
* This also means that the TRIGGER method requires duplicated code (same statements in INSERT and in UPDATE TRIGGERs). You can avoid this issue by using a Stored Routine, but it is slower.

Since there is not an important difference in performances or ease of mantainence of these methods, the choice is up to you.

Enjoy!