Creating JSON documents with MariaDB

You probably know about two MariaDB great features: Dynamic Columns and Virtual Colums. I don’t think that we already found all possible use cases for them: more ideas will come in the future, and hopefully someone will write about them. Here, I will describe a use case which involves both Virtual Columns and Dynamic Columns.

Now, suppose that we have relational data in MariaDB, and we want to periodically export them in JSON, so we can export them to a variety of other data sources. For most data format, the best exporting solution is generally the CONNECT storage engine; but CONNECT doesn’t currently support JSON, so this option is not available.

Here I will introduce you to the best solution I know. Then, you will find slightly more complex solutions for specific cases.

Our data

Before we can start having fun with MariaDB features, I will show our data. Dear data, this is my reader… say hello!

CREATE OR REPLACE TABLE category
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, name VARCHAR(50) NOT NULL
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO category (name) VALUES ('Chair'), ('Tables');

CREATE OR REPLACE TABLE product
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, category INTEGER NOT NULL
	, name VARCHAR(50) NOT NULL
	, PRICE DECIMAL(9, 2) NOT NULL
	, FOREIGN KEY (category) REFERENCES category (id)
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO product (category, name, price) VALUES
	  (1, 'Black Chair', '49.99')
	, (1, 'Red Chair', '44.00')
	, (2, 'Glass Table', '250.00')
	;

COLUMN_JSON()

MariaDB supports Dynamic Columns. The COLUMN_JSON() function is used to trasform a Dynamic Column value into JSON. The apparent problem here is: we don’t have a dynamic column, we have normal atomic columns.

Well, no problem. After all, a Dynamic Column is just a BLOB (binary) value encoded in a certain way, so that Dynamic Column functions can parse and modify it. We can create a dynamic value using COLUMN_CREATE(), and modify it with COLUMN_ADD(), COLUMN_DELETE(), etc.

That said, the following example query should be self-explanatory:

SELECT CONCAT('[\n\t', GROUP_CONCAT(
		COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', id)
				, 'name', name
				, 'price', price
			)
		)
		ORDER BY id
		SEPARATOR ',\n\t'
	), '\n]') AS json
	FROM product \G


Its output is:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Neat, isn't it? Of course it has some limits:

  • It's expensive. No temptables are needed, but MariaDB does a full table scan and transforms the rows on the fly.
  • What if we want to build a JSON array from multiple (joined) tables? Unless there is a daytime when MariaDB does nothing, we don't want to do more than one table scan.

Let's see how we can optimize the basic idea.

Pre-transforming each product

MariaDB also supports Virtual Columns (other DBMS's call them Generated Columns or Computed Columns). These fields are deterministic expressions that don't involve subqueries, and they can be computer on the fly or permanently stored on-disk. They are useful in a variety of situations, sometimes in combination with Dynamic columns.

In our example, we can use PERSISTENT Virtual Columns to store JSON representations of individual products, so that they don't need be generated on the fly. Here's how:

ALTER TABLE product ADD COLUMN IF NOT EXISTS json TEXT AS (
	COLUMN_JSON(
		COLUMN_ADD(
			COLUMN_CREATE('id', id)
			, 'name', name
			, 'price', price
		)
	)
) PERSISTENT;

To get a complete JSON array from these objects:

SELECT CONCAT(
	  '[\n\t'
	, GROUP_CONCAT(json ORDER BY id SEPARATOR ',\n\t')
	, '\n]') AS json
	FROM product \G

Output:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Pre-trasforming the complete array

Maybe we don't want to query a single table. Since Virtual Columns cannot be computed with a subquery, we cannot use them to put data from other tables into each product's JSON object.

A good alternative is storing JSON object into a summary table, that is updated every time someone modifies the category or product tables.

I don't like to do this kind of things withing the application, because more than one application could modify the tables, and we could also modify the tables via SQL queries (or via a GUI). To reliably guarantee integrity in such cases, I prefer to use foreign keys and triggers.

I will not show the complete example here, because it is too complex: three kind of actions (INSERT/UPDATE/DELETE) should be handled (or forbidden) for both the product and category tables. I will just show the summary table, and the stored function that creates a JSON object for one product. INSERT and UPDATE triggers will call this procedure when necessary. Here's the code:

CREATE OR REPLACE TABLE json
(
	  product INTEGER NOT NULL PRIMARY KEY
	, json TEXT
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

DELIMITER ||
CREATE PROCEDURE make_product_json(IN p_product_id INTEGER)
	MODIFIES SQL DATA
BEGIN
	REPLACE INTO json SELECT
		p_product_id
		, COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', p.id)
				, 'name', p.name
				, 'price', p.price
				, 'category', c.name
			)
		) AS json
		FROM product p
		LEFT JOIN category c
			ON p.category = c.id
		WHERE p.id = p_product_id
	;
END
||
DELIMITER ;

Descending indexes in MariaDB

Since the dawn of time, MySQL indexes have a limit: they cannot be descending. Yes, ALTER TABLE and CREATE INDEX can contain the ASC and DESC keywords. It is perfectly legal in MySQL. But this does not mean that descending indexes are created. This is a well-known MySQL feature: when it cannot do something, it just pretends to. Well… someone thinks it is a feature. I think it is a bug (a bug is an unexpected behaviour), but what can we do.

The lack of support for descending indexes is only an issue when we need to create an index in which at least one column is ascending and at least one column is descending. For example, MySQL pretends to understand the following statement, but the resulting index won’t probably help us:

CREATE INDEX my_index ON my_table (my_column ASC, your_column DESC);

An ORDER BY my_column ASC, your_column DESC won’t be able to use this index.

MariaDB has a feature which provides a good workaround for the lack of descending indexes: the PERSISTENT columns. If you don’t know them, please check the MariaDB KnowledgeBase first.

You don’t know them but you didn’t follow the link, right? But you can’t escape my telepathic powers!

How can PERSISTENT columns help us creating mixed order index? If you didn’t already guess, I’ll tell you. Suppose we have a column A and we want it to be part of an ASC+DESC index. We’ll need to create a PERSISTENT column rev_A, whose values are calculated based on the values in A. The important thing here is that the order of rev_A is reversed. Here is an example:

CREATE TABLE `exam_scores`
(
	  `id` INT SIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
	, `student_name` VARCHAR(200) NOT NULL
	, `exam_code` CHAR(1)
	, `exam_score` SMALLINT UNSIGNED NOT NULL
	, `exam_date` DATETIME NOT NULL
	
	-- auxiliary columns
	, `rev_exam_code` TINYINT UNSIGNED AS (255 - ORD(exam_code)) PERSISTENT
	, `rev_exam_score` SMALLINT SIGNED AS (0 - exam_score) PERSISTENT
	, `rev_exam_date` SMALLINT UNSIGNED AS (DATEDIFF('2050-01-01', exam_date)) PERSISTENT
	
	-- mixed indexes
	, INDEX `desc_exam_code` (`rev_exam_code`, `student_name`, `exam_code`)
	, INDEX `desc_exam_score` (`rev_exam_score`, `student_name`)
	, INDEX `desc_exam_date` (`rev_exam_date`, `student_name`, `exam_date`)
)
	ENGINE = InnoDB
	DEFAULT CHARACTER SET = 'utf8'
;
-- this is needed because some rev* columns return negative values
SET @@global.sql_mode := CONCAT(@@global.sql_mode, ',NO_UNSIGNED_SUBTRACTION');
SET @@session.sql_mode := @@global.sql_mode;
INSERT INTO `exam_scores`
		(`student_name`, `exam_code`, `exam_score`, `exam_date`)
	VALUES
		  ('Anthony Stark', 'F', 100, '1994-01-01')
		, ('Klark Kent', 'G', 60, '2000-01-01')
		, ('Corto Maltese', 'A', 50, '2007-01-01')
		, ('Tin Tin', 'Z', 900, '2005-01-01')
		, ('Philip Mortimer', 'C', 20, '2000-01-01')
		, ('Dago', 'M', 80, '2004-01-01')
	;

Yes, this table sucks. It is not relational, etc, etc. But we are talking about mixed order indexing in MariaDB, not relational theory.

Is it clear now? I hope so. Let’s see an example query. We want to list the students: recent exams first, and if two dates are identical, sort them alphabetically.

MariaDB [test]> EXPLAIN SELECT `student_name`, `exam_date`
    -> FROM `exam_scores`
    -> ORDER BY `rev_exam_date` ASC, `student_name` ASC;
+------+-------------+-------------+-------+---------------+----------------+---------+------+------+-------------+
| id   | select_type | table       | type  | possible_keys | key            | key_len | ref  | rows | Extra       |
+------+-------------+-------------+-------+---------------+----------------+---------+------+------+-------------+
|    1 | SIMPLE      | exam_scores | index | NULL          | desc_exam_date | 613     | NULL |    7 | Using index |
+------+-------------+-------------+-------+---------------+----------------+---------+------+------+-------------+
1 row in set (0.00 sec)

The desc_exam_score index shows that including the original column in the index can be unnecessary:

MariaDB [test]> EXPLAIN SELECT `student_name`, -`rev_exam_score` FROM `exam_scores` ORDER BY `rev_exam_score` ASC, `student_name` ASC;
+------+-------------+-------------+-------+---------------+-----------------+---------+------+------+-------------+
| id   | select_type | table       | type  | possible_keys | key             | key_len | ref  | rows | Extra       |
+------+-------------+-------------+-------+---------------+-----------------+---------+------+------+-------------+
|    1 | SIMPLE      | exam_scores | index | NULL          | desc_exam_score | 605     | NULL |    6 | Using index |
+------+-------------+-------------+-------+---------------+-----------------+---------+------+------+-------------+
1 row in set (0.00 sec)

-column is obviously a quick operation.

The example table shows how to create a reversed index part for numbers, dates and ASCII 1-char strings. Multi-char ASCII strings are quite easy, too. I don’t know how to create a reversed Unicode column. If you find a way, please leave a comment below.

Enjoy!

Observations about MariaDB’s Virtual Columns

I have played a bit with MariaDB’s Virtual Columns. The documentation seems to be incomplete, especially it does not list all limitations. I edited the page, and I hope that what I written is correct. In this post, I want to list everything I’ve found: I didn’t write some of these things in the wiki, because I’m not sure if they are bugs, or because I’m not sure they are worth being written there.

If there are errors (or you know about other limitations) please add a comment, and I will fix this post ASAP.

Enjoy!

  • Virtual Columns can only be used with Storage Engine which supports them. The only official Storage Engines which support Virtual Columns are: InnoDB, Aria, MyISAM.
  • All datatypes can be used for Virtual Columns. They can be Zerofill. But you can’t specify NULL/NOT NULL. Of course, no DEFAULT value (the default value is the result of the SQL expression). Comments are ok.
  • You can’t use Virtual Columns in Triggers because of this bug. Should be solved in the next version of each tree (see bug page).
  • INSERT statements should use the SET clause or include all Virtual Columns – assign DEFAULT values explicitly. Of coure you can not modify a value with UPDATE.
  • Virtual Columns expressions can only use non-Virtual columns from the same table, operators and built-in deterministic functions. Everything else is not allowed. Constant values (including not-null-column IS NULL) are not allowed.
    • Please, note that if you don’t write correctly a function’s name, you will see this error: `Function or expression is not allowed for column ‘v2’`. The fuction you tried to use may be ok, so check if you written correctly its name.
  • MODIFY and CHANGE operations in ALTER TABLE are not supported for Virtual Columns. You need to DROP + ADD the column. This is always true, even if you are not changing anything.
  • The only way to get a Virtual Column’s expression is perform a SHOW CREATE TABLE and parse its output (maybe not so hard, but not convenient).
  • Administrative commands seem to work correctly. CHECKSUM counts stored columns, and does not count non-stored columns – this is correct. ALTER TABLE ORDER BY works.

Foreign Keys

Still I did not test exensively FOREIGN KEYs, but here’s what I’ve found out.

  • All columns in FOREIGN KEYs, or referred by FOREIGN KEYs, can be Persistent. Non-stored columns can’t be used (this seems to me consistent, as referred columns need to be indexed, and a foreign key itself is an index).
  • ON UPDATE CASCADE, ON UPDATE SET NULL and ON DELETE SET NULL are not allowed (you will receive a clear error message, so I’m sure this is not a bug; also, a computed value can not be modified).
    (I’ve noticed that SET DEFAULT fails with a generic error 150, but this is not related to Virtual Columns, and I tried this just because this clause was supported by PBXT.)
  • In my tests, all contraints were enforced as expected. All error messages were ok.
  • I combined virtual/non-virtual FOREIGN KEYs columns, and referred columns in several ways: everything worked good.