Creating JSON documents with MariaDB

You probably know about two MariaDB great features: Dynamic Columns and Virtual Colums. I don’t think that we already found all possible use cases for them: more ideas will come in the future, and hopefully someone will write about them. Here, I will describe a use case which involves both Virtual Columns and Dynamic Columns.

Now, suppose that we have relational data in MariaDB, and we want to periodically export them in JSON, so we can export them to a variety of other data sources. For most data format, the best exporting solution is generally the CONNECT storage engine; but CONNECT doesn’t currently support JSON, so this option is not available.

Here I will introduce you to the best solution I know. Then, you will find slightly more complex solutions for specific cases.

Our data

Before we can start having fun with MariaDB features, I will show our data. Dear data, this is my reader… say hello!

CREATE OR REPLACE TABLE category
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, name VARCHAR(50) NOT NULL
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO category (name) VALUES ('Chair'), ('Tables');

CREATE OR REPLACE TABLE product
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, category INTEGER NOT NULL
	, name VARCHAR(50) NOT NULL
	, PRICE DECIMAL(9, 2) NOT NULL
	, FOREIGN KEY (category) REFERENCES category (id)
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO product (category, name, price) VALUES
	  (1, 'Black Chair', '49.99')
	, (1, 'Red Chair', '44.00')
	, (2, 'Glass Table', '250.00')
	;

COLUMN_JSON()

MariaDB supports Dynamic Columns. The COLUMN_JSON() function is used to trasform a Dynamic Column value into JSON. The apparent problem here is: we don’t have a dynamic column, we have normal atomic columns.

Well, no problem. After all, a Dynamic Column is just a BLOB (binary) value encoded in a certain way, so that Dynamic Column functions can parse and modify it. We can create a dynamic value using COLUMN_CREATE(), and modify it with COLUMN_ADD(), COLUMN_DELETE(), etc.

That said, the following example query should be self-explanatory:

SELECT CONCAT('[\n\t', GROUP_CONCAT(
		COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', id)
				, 'name', name
				, 'price', price
			)
		)
		ORDER BY id
		SEPARATOR ',\n\t'
	), '\n]') AS json
	FROM product \G


Its output is:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Neat, isn't it? Of course it has some limits:

  • It's expensive. No temptables are needed, but MariaDB does a full table scan and transforms the rows on the fly.
  • What if we want to build a JSON array from multiple (joined) tables? Unless there is a daytime when MariaDB does nothing, we don't want to do more than one table scan.

Let's see how we can optimize the basic idea.

Pre-transforming each product

MariaDB also supports Virtual Columns (other DBMS's call them Generated Columns or Computed Columns). These fields are deterministic expressions that don't involve subqueries, and they can be computer on the fly or permanently stored on-disk. They are useful in a variety of situations, sometimes in combination with Dynamic columns.

In our example, we can use PERSISTENT Virtual Columns to store JSON representations of individual products, so that they don't need be generated on the fly. Here's how:

ALTER TABLE product ADD COLUMN IF NOT EXISTS json TEXT AS (
	COLUMN_JSON(
		COLUMN_ADD(
			COLUMN_CREATE('id', id)
			, 'name', name
			, 'price', price
		)
	)
) PERSISTENT;

To get a complete JSON array from these objects:

SELECT CONCAT(
	  '[\n\t'
	, GROUP_CONCAT(json ORDER BY id SEPARATOR ',\n\t')
	, '\n]') AS json
	FROM product \G

Output:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Pre-trasforming the complete array

Maybe we don't want to query a single table. Since Virtual Columns cannot be computed with a subquery, we cannot use them to put data from other tables into each product's JSON object.

A good alternative is storing JSON object into a summary table, that is updated every time someone modifies the category or product tables.

I don't like to do this kind of things withing the application, because more than one application could modify the tables, and we could also modify the tables via SQL queries (or via a GUI). To reliably guarantee integrity in such cases, I prefer to use foreign keys and triggers.

I will not show the complete example here, because it is too complex: three kind of actions (INSERT/UPDATE/DELETE) should be handled (or forbidden) for both the product and category tables. I will just show the summary table, and the stored function that creates a JSON object for one product. INSERT and UPDATE triggers will call this procedure when necessary. Here's the code:

CREATE OR REPLACE TABLE json
(
	  product INTEGER NOT NULL PRIMARY KEY
	, json TEXT
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

DELIMITER ||
CREATE PROCEDURE make_product_json(IN p_product_id INTEGER)
	MODIFIES SQL DATA
BEGIN
	REPLACE INTO json SELECT
		p_product_id
		, COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', p.id)
				, 'name', p.name
				, 'price', p.price
				, 'category', c.name
			)
		) AS json
		FROM product p
		LEFT JOIN category c
			ON p.category = c.id
		WHERE p.id = p_product_id
	;
END
||
DELIMITER ;