How much memory should we assign to MariaDB?

How much memory should we assign to MariaDB?

Before answering, I’ll show you a procedure I use to get the amount of memory I’m using. See the comments to make it work on Oracle MySQL.

CREATE PROCEDURE _.show_buffers()
	CONTAINS SQL
	COMMENT 'Show information about buffers size'
BEGIN
	SET @memory_per_thread := (
			  @@global.read_buffer_size
			+ @@global.read_rnd_buffer_size
			+ @@global.sort_buffer_size
			+ @@global.thread_stack
			+ @@global.join_buffer_size
			+ @@global.binlog_cache_size
		);
	SET @global_memory := (
			  @@global.innodb_buffer_pool_size
			+ @@global.innodb_additional_mem_pool_size
			+ @@global.innodb_log_buffer_size
			+ (SELECT SUM(FULL_SIZE) FROM information_schema.KEY_CACHES) -- MySQL: @@global.key_buffer_size
			+ @@global.aria_pagecache_buffer_size -- only MariaDB
			+ @@global.query_cache_size
		);
	SELECT
		  @global_memory AS global_memory
		, @memory_per_thread AS memory_per_thread
		, @@global.max_connections AS max_connections
		, @memory_per_thread * @@global.max_connections AS max_session_memory
		, CAST((@global_memory + @memory_per_thread * @@global.max_connections) AS UNSIGNED) AS max_total_memory
		;
	SET @memory_per_thread := NULL;
	SET @global_memory := NULL;
END;


Results on my dev machine:

MariaDB [(none)]> CALL _.show_buffers() \G
*************************** 1. row ***************************
     global_memory: 419430400
 memory_per_thread: 2949120
   max_connections: 151
max_session_memory: 445317120
  max_total_memory: 864747520
1 row in set (0.00 sec)


So, what do these columns mean?

  • global_memory: Memory used by global buffers.
  • memory_per_thread: Memory used by each thread (session buffers).
  • max_connections: Max allowed concurrent connectins (max_connections variable).
  • max_session_memory: Memory used by each thread, multiplied by the max allowed concurrent connections.
  • max_total_memory: Total memory occupied by MariaDB with the maximum of concurrent connections.

Let’s see which settings affect these values.

global_memory = innodb_buffer_pool_size + innodb_additional_mem_pool_size + innodb_log_buffer_size + key_buffer_size + aria_pagecache_buffer_size + query_cache_size.

memory_per_thread = read_buffer_size + read_rnd_buffer_size + sort_buffer_size + thread_stack + join_buffer_size + binlog_cache_size.

If you use non-common storage engines, you may want to add more settings to these formulas. For example, if you use TokuDB, you should add tokudb_cache_size.

So, basically, my advice is: Tune these values so that max_total_memory reaches 80% your memory. Start by setting query_cache_size and unused storage engines caches to 0, then tune the buffer pool. Usually you don’t want to touch session buffers.

Why 80%? Well, an higher percentage could be ok, but I consider 80% safer for the general case. Because:

  • The system needs memory, too.
  • InnoDB buffer pool slightly exceeds the limit you set.
  • Some operations (like building a big index or repairing a non-transactional table) require memory.
  • Some connections could resize their session buffers.
  • There are more buffers, that we didn’t consider here.
  • You don’t want MariaDB/MySQL to swap. But if you turn off the swap and you run out of memory, you could experience various failures, or even crashes.

Creating JSON documents with MariaDB

You probably know about two MariaDB great features: Dynamic Columns and Virtual Colums. I don’t think that we already found all possible use cases for them: more ideas will come in the future, and hopefully someone will write about them. Here, I will describe a use case which involves both Virtual Columns and Dynamic Columns.

Now, suppose that we have relational data in MariaDB, and we want to periodically export them in JSON, so we can export them to a variety of other data sources. For most data format, the best exporting solution is generally the CONNECT storage engine; but CONNECT doesn’t currently support JSON, so this option is not available.

Here I will introduce you to the best solution I know. Then, you will find slightly more complex solutions for specific cases.

Our data

Before we can start having fun with MariaDB features, I will show our data. Dear data, this is my reader… say hello!

CREATE OR REPLACE TABLE category
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, name VARCHAR(50) NOT NULL
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO category (name) VALUES ('Chair'), ('Tables');

CREATE OR REPLACE TABLE product
(
	  id INTEGER AUTO_INCREMENT PRIMARY KEY
	, category INTEGER NOT NULL
	, name VARCHAR(50) NOT NULL
	, PRICE DECIMAL(9, 2) NOT NULL
	, FOREIGN KEY (category) REFERENCES category (id)
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

INSERT INTO product (category, name, price) VALUES
	  (1, 'Black Chair', '49.99')
	, (1, 'Red Chair', '44.00')
	, (2, 'Glass Table', '250.00')
	;

COLUMN_JSON()

MariaDB supports Dynamic Columns. The COLUMN_JSON() function is used to trasform a Dynamic Column value into JSON. The apparent problem here is: we don’t have a dynamic column, we have normal atomic columns.

Well, no problem. After all, a Dynamic Column is just a BLOB (binary) value encoded in a certain way, so that Dynamic Column functions can parse and modify it. We can create a dynamic value using COLUMN_CREATE(), and modify it with COLUMN_ADD(), COLUMN_DELETE(), etc.

That said, the following example query should be self-explanatory:

SELECT CONCAT('[\n\t', GROUP_CONCAT(
		COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', id)
				, 'name', name
				, 'price', price
			)
		)
		ORDER BY id
		SEPARATOR ',\n\t'
	), '\n]') AS json
	FROM product \G


Its output is:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Neat, isn't it? Of course it has some limits:

  • It's expensive. No temptables are needed, but MariaDB does a full table scan and transforms the rows on the fly.
  • What if we want to build a JSON array from multiple (joined) tables? Unless there is a daytime when MariaDB does nothing, we don't want to do more than one table scan.

Let's see how we can optimize the basic idea.

Pre-transforming each product

MariaDB also supports Virtual Columns (other DBMS's call them Generated Columns or Computed Columns). These fields are deterministic expressions that don't involve subqueries, and they can be computer on the fly or permanently stored on-disk. They are useful in a variety of situations, sometimes in combination with Dynamic columns.

In our example, we can use PERSISTENT Virtual Columns to store JSON representations of individual products, so that they don't need be generated on the fly. Here's how:

ALTER TABLE product ADD COLUMN IF NOT EXISTS json TEXT AS (
	COLUMN_JSON(
		COLUMN_ADD(
			COLUMN_CREATE('id', id)
			, 'name', name
			, 'price', price
		)
	)
) PERSISTENT;

To get a complete JSON array from these objects:

SELECT CONCAT(
	  '[\n\t'
	, GROUP_CONCAT(json ORDER BY id SEPARATOR ',\n\t')
	, '\n]') AS json
	FROM product \G

Output:

*************************** 1. row ***************************
json: [
	{"id":1,"name":"Black Chair","price":49.99},
	{"id":2,"name":"Red Chair","price":44.00},
	{"id":3,"name":"Glass Table","price":250.00}
]
1 row in set (0.00 sec)

Pre-trasforming the complete array

Maybe we don't want to query a single table. Since Virtual Columns cannot be computed with a subquery, we cannot use them to put data from other tables into each product's JSON object.

A good alternative is storing JSON object into a summary table, that is updated every time someone modifies the category or product tables.

I don't like to do this kind of things withing the application, because more than one application could modify the tables, and we could also modify the tables via SQL queries (or via a GUI). To reliably guarantee integrity in such cases, I prefer to use foreign keys and triggers.

I will not show the complete example here, because it is too complex: three kind of actions (INSERT/UPDATE/DELETE) should be handled (or forbidden) for both the product and category tables. I will just show the summary table, and the stored function that creates a JSON object for one product. INSERT and UPDATE triggers will call this procedure when necessary. Here's the code:

CREATE OR REPLACE TABLE json
(
	  product INTEGER NOT NULL PRIMARY KEY
	, json TEXT
)
	ENGINE InnoDB
	CHARACTER SET utf8
	;

DELIMITER ||
CREATE PROCEDURE make_product_json(IN p_product_id INTEGER)
	MODIFIES SQL DATA
BEGIN
	REPLACE INTO json SELECT
		p_product_id
		, COLUMN_JSON(
			COLUMN_ADD(
				COLUMN_CREATE('id', p.id)
				, 'name', p.name
				, 'price', p.price
				, 'category', c.name
			)
		) AS json
		FROM product p
		LEFT JOIN category c
			ON p.category = c.id
		WHERE p.id = p_product_id
	;
END
||
DELIMITER ;

MariaDB and Stored Procedures: errata & repository

In the article How MariaDB makes Stored Procedures usable I explained how to use the MariaDB CONNECT Storage Engine to workaround Stored Procedures limitations. Please read that article, if you didn’t, for details.

Since the technique to workaround such limitations is verbose and error-proof, I also proposed a procedure which makes it easy and convenient. Even thought this is a simple country blog, I have immediately received positive feedback after that article. But then, people started to write me: “Hey, doesn’t work!”. The reason is… MariaDB/MySQL bugs. While investigating these problems, I was able to report the following bugs:

I’m sure there are others, but isolating such bugs to produce decent test cases can be difficult. I will do, if I can.

Basically, while developing the procedures I needed, was that some combinations of (correct) statements could misteryously trigger bugs that are not triggered by other combinations of statements. Despite those procedures being trivial, testing their behavior was frustrating.

I was able to develop a version of these procedures which does not trigger any bug, and I put them in the materialize repository, on GitHub.

I can finally suggest people to try them. If you have any problems (becuase there are bugs in the procedures, on in MariaDB, or you don’t understand something) feel free to contact me at info [At, federico - razzoli /Dot, com (replace [At, with @ and /Dot, with .). I will help. But be very patient or pay me: remember that 3 days are not a long time for a free service. :)

Enjoy!

How MariaDB makes Stored Procedures usable

I already wrote how MariaDB makes the debug of stored procedures much easier via the SQL Error Log. But I recently found out that MariaDB provides a usable workaround for some big limitations of their procedural SQL.

First, SELECT is not the only SQL statement which returns a resultset. Other examples are DELETE RETURNING, CALL, SHOW, EXPLAIN and administrative commands like ANALYZE TABLE or CHECK TABLE. But these commands cannot be used in place of SELECT in the following contexts:

  • Subqueries, derived tables, JOINs, UNIONs
  • CREATE TABLE ... SELECT
  • INSERT/REPLACE INTO ... SELECT
  • CREATE VIEW ... AS SELECT
  • DECLARE CURSOR ... SELECT
  • SET @my_var := SELECT

This means that there is no way to read the output of these statements within a stored procedure or trigger.

Also, some statements cannot be executed in a stored procedures and/or in prepared statements. An example is CHECK TABLES: even if you don’t care about the resultset (which would be a serious limitation), you cannot execute it in a stored procedure or in a prepared statements.

MariaDB and MySQL support the DO statement, which is identical to SELECT except that it doesn’t return any result. However, stored procedures could return one or more resultsets too, and you cannot invoke them with DO. This means that in some contexts you may have to deal to annoying resultsets, or perhaps you cannot call those procedures (within triggers or functions).

I faced all these problems while developing STK/Unit and in other projects.

But MariaDB lets us workaround these limitations – and the funny thing is that probably its developers are not aware about this! Here is an example:

CREATE OR REPLACE TABLE show_master_status
	ENGINE = CONNECT
	TABLE_TYPE = MYSQL
	SRCDEF = 'SHOW MASTER STATUS'
	CONNECTION = '_'
;

DELIMITER ||
CREATE PROCEDURE test()
BEGIN
	DECLARE eof BOOL DEFAULT FALSE;
	DECLARE v_file, v_position TEXT;
	
	DECLARE cur CURSOR FOR
		SELECT `File`, `Position` FROM show_master_status;
	
	DECLARE CONTINUE HANDLER
		FOR NOT FOUND
	BEGIN
		SET eof := TRUE;
	END;
	
	OPEN cur;
	`lbl_loop`:
	LOOP
		FETCH cur INTO v_file, v_position;
		IF eof IS TRUE THEN
			LEAVE lbl_loop;
		END IF;
		
		SELECT CONCAT('I''ll do something with ', v_file, v_position);
	END LOOP;
	CLOSE cur;
END ||
DELIMITER ;

MariaDB [_]> CALL test(); 
+--------------------------------------------------------+ 
| CONCAT('I''ll do something with ', v_file, v_position) | 
+--------------------------------------------------------+ 
| I'll do something with binlog.00000242965              | 
+--------------------------------------------------------+ 
1 row in set (0.01 sec) 

Query OK, 0 rows affected (0.01 sec)

As you can see, with this trick I was able to use a cursor with SHOW MASTER STATUS.

How does the trick work? If you know CONNECT, probably you already guessed. CONNECT is a storage engine which allows the users to read data from several kind of data sources as if they were regular SQL tables. Supported data sources include several file formats, remote DBMS’s and more. In this case I defined a table which connects to a MariaDB/MySQL server (TABLE_TYPE=MYSQL). The server is… localhost. Since I specified a SRCDEF table option, the data source is not a table in the server, it is the resultset of the specified query (SHOW MASTER STATUS). This means that, when the table is queried, the CONNECT engine opens a connection to the local server, it executes SHOW MASTER STATUS, and it returns the results of that statement. But since this work is done by the CONNECT engine, from the MariaDB point of view we are just querying a table. So, we can use that query in all contexts where SELECT works.

Also note that we don’t have to specify the table structure. CONNECT supports table discovery, which means that it automatically knows which columns and types are needed.

Confused? I hope not. Look again at the example – everything will be clear.

Of course, having to CREATE a TABLE each time is not very comfortable, even with table discovery. So I wrote a stored procedure which does the boring work for us:

UPDATE:
I removed the code because it was buggy triggered MariaDB bugs. See this post and this GitHub repository for working code.

Enjoy!

GNU Affero GPL license: (trying to) debunking myths

There are a lot of misconceptions about the GNU AGPL license. I’m writing this brief article to try debunking them and explain what this license really is.

Let’s start from the GNU GPL. It makes a licensed piece of code (the covered work) forever free. If you download/receive a copy of a GPL’d software, you can use it in any way, study the code, modify the code, redistribute the software as is or a modified version. There are some limitations: if you redistribute the software, you can’t do things which limit the freedom of your users. For example, you can’t distribute the software as closed source.

This is considered as a strong limitations by non-free software producers and even by some free software producers. I completely disagree. Whoever can produce non-free software; but why should free software developers help them in doing so? Do non-free software developers help us in any way? No, they don’t.

However, GPL has a deficiency. The hey is the concept of distribution. If Jenny gives to Butch an USB stick containing a software, she’s distributing it. The same applies if she sends the software via email. In these cases, the GPL grants the basic rights of the end user. When the GPL was written, this was enough.

Nowadays, a software is often used via a network. No distribution takes place: Butch connects to Jessie’s server, and uses her copy of a software. He doesn’t receive the software itself, so the GPL cannot grant him any right. But… he is the end user! Most software users today don’t even know if the software they use is running on their hardware or somewhere else. In many cases, you just download a client. And no one even tells you it is just a client.

The AGPL addresses this problem. It is basically a GPL license with an additional clause: the section 13. This section is brief and very easy to understand (like most parts of the license) and the key part is: if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (…) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge. This includes non-system libraries used by the software.

So, the AGPL simply does what the GPL cannot do anymore because we don’t run programs on our computers.

Of course you can use it on non-network software. It is generally a good idea: someday it might become network software, or a piece of your code could be used in a network software.

I can’t imagine possible reasons why one can honestly think that GPL is ethically right and AGPL is not. Of course there may be commercial reasons to pretend you think it. It depends on your company’s business model. If your company modifies GNU GPL software on your server, and you don’t want users to know this, then yes, you probably must pretend that AGPL violates your basic human rights.

License compatibility

There is however a caveat. Many people think that all free licenses can be used together. False: mixing incompatible licenses is just as illegal as copying Microsoft Windows for a friend. GPL is a great license which comes with a price: not all free licenses are compatible with it. AGPL is an even more free license, which comes with a greater price: even GPLv2 is not compatible with it.

Enjoy!

Debugging MariaDB stored procedures with the SQL Error Log

A very old version of the MySQL site contained a nice header: Speed, Power, Ease of Use. I don’t think that stored programs (routines, triggers, events) were supported, at that time. Now they are, and… developing them is amazingly hard.

There are many problems, for example the language is not flexible and the execution is sloooow. But the biggest problem is that there is no debug API.

In details, one of the biggest problems is that, if you have complex procedures (accessing multiple tables, having error handling, and/or calling other procedures) in practice you have no idea of what warnings occur within your procedures.

MariaDB 10.0 makes things much easier by adding the SQL_ERROR_LOG plugin. Well, I don’t think that they had stored programs in mind when they developed it. But what’s important for us is that… it makes stored programs debug decent.

This article explains how. Its use in application debugging is not covered here, because it seems to me quite obvious and I want to focus on stored procedures.

Enabling SQL_ERROR_LOG

We will use the SQL_ERROR_LOG plugin. Its purpose is to log SQL errors into a text file. To enable it:

INSTALL SONAME 'sql_errlog';

Note that specifying the file extension is not necessary; this makes the statement platform-independent. The MySQL syntax is still supported.

When the plugin is installed, the logging is always active. To stop the logging, uninstall the plugin:

UNINSTALL PLUGIN SQL_ERROR_LOG;

Unfortunately, you might see a warning like this:

Warning (Code 1620): Plugin is busy and will be uninstalled on shutdown

I don’t know any way to avoid this. But I don’t think that SQL_ERROR_LOG is meant to be ran on a production server, I consider it as a development tool. However, if this is a problem for you, just ask the list or file a bug.

Errors are written in a file called sql_errors.log, in the data directory:

MariaDB [test]> SELECT CONCAT(@@global.datadir, @@global.sql_error_log_filename) AS sql_errlog;
+--------------------------------------+
| sql_errlog                           |
+--------------------------------------+
| /usr/local/mysql/data/sql_errors.log |
+--------------------------------------+
1 row in set (0.00 sec)


To make debug and life easier, I rotate the files very often:

SET @@global.sql_error_log_rotate = 1;

The format is the following:

2014-06-30  1:22:30 root[root] @ localhost [] ERROR 1051: Unknown table 'test.unknown_table' : DROP TABLE `unknown_table`

Limitations

  • The session id is not written to the log. The hostname and the user account are written, but they don't help us distinguishing one session from another. I suppose it is meant for being executed by application developers on their local machines.
  • In theory, only errors are logged - no warnings, no notes. But keep on reading.
  • The file format is not safely parseable. It's designed for humans.

Debug Messages

Look at the following procedure:

DROP PROCEDURE IF EXISTS do_something;
DELIMITER ||
CREATE PROCEDURE do_something(IN p TEXT)
    READS SQL DATA
BEGIN
    DECLARE CONTINUE HANDLER
        FOR SQLSTATE '45000'
    BEGIN END;
    
    IF @debug = TRUE THEN
        SET GLOBAL sql_error_log_rotate = 1;
    END IF;
    
    IF @debug = TRUE THEN
        SET @error = CONCAT('p = ', IF(p IS NULL, 'NULL', QUOTE(p)));
        SIGNAL SQLSTATE '45000' SET
            MYSQL_ERRNO = 9999,
            MESSAGE_TEXT = @error;
    END IF;
    
    -- why do something? let's do nothing!
    DO NULL;
END ||
DELIMITER ;
SET @debug = TRUE;
CALL do_something('x');

Got the trick? If the @debug variable is set to TRUE, an error is produced. The error is immediately suppressed by an empty CONTINUE HANDLER, so the procedure's execution continues and nothing happens. Well, almost nothing: the error is logged!

So:

  1. Write your CONTINUE HANDLERs.
  2. Set @debug to 1.
  3. CALL your procedure.
  4. cat the sql_errors.log.

Warning Logging

Look at this example:

DROP PROCEDURE IF EXISTS do_something;
DELIMITER ||
CREATE PROCEDURE do_something()
    READS SQL DATA
BEGIN
    DECLARE CONTINUE HANDLER
        FOR SQLWARNING
    BEGIN
        DECLARE CONTINUE HANDLER
            FOR SQLSTATE '45000'
        BEGIN END;
        
        GET DIAGNOSTICS CONDITION 1
              @errno = MYSQL_ERRNO
            , @error = MESSAGE_TEXT
            ;
        IF @debug = TRUE THEN
            SIGNAL SQLSTATE '45000' SET
                MYSQL_ERRNO = @errno,
                MESSAGE_TEXT = @error;
        END IF;
        RESIGNAL;
    END;
    
    IF @debug = TRUE THEN
        SET GLOBAL sql_error_log_rotate = 1;
    END IF;
    
    -- here goes the 'real' procedure.
    -- let's produce a warning to trigger the main HANDLER...
    DO 1/0;
END ||
DELIMITER ;

I know: this is tricky. Writing this has been hard. But don't worry: reusing this template will be quite easy.

The main HANDLER catches the warnings (in this case division by 0, if you have a good SQL_MODE). Then we get the warning properties (GET DIAGNOSTICS) and we use them to issue an error (SIGNAL). The error is suppressed by the internal empty HANDLER, but it is logged. Finally, the error is RESIGNAL'd, because we don't want the debug code to suppress warnings.

A final note. Do you want to log warnings only if they appear within a particular portion of your procedure? You can. Just enclose those lines in a BEGIN ... END construct, and define your handlers inside the construct.

Enjoy!

MariaDB/MySQL: Making ENGINE clause mandatory

I got this idea from a Valerii Kravchuk’s MySQL bug report:

http://bugs.mysql.com/bug.php?id=71978

In theory, I completely agree that MySQL and forks should not allow us to set a default storage engine which cannot be used to create a table. You can see the same with MariaDB’s SEQUENCE. The MySQL & forks philosophy seems to be: ignore your mistakes, so you can repeat them forever. Which can turn a mistype into a major data loss.

Unless you only use InnoDB and your magic powers tell you that this will never change, the ENGINE clause should be mandatory in your MySQL installation. Since there is no clean way to make it mandatory, setting a “weird” storage engine as default seems to be a decent workaround. I don’t like it, but it can prevent human mistakes.

MariaDB [test]> SET SESSION default_storage_engine = 'performance_schema';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> CREATE TABLE x (id INT) ;
ERROR 1005 (HY000): Can't create table `test`.`x` (errno: 131 "Command not supported by database")
Error (Code 1005): Can't create table `test`.`x` (errno: 131 "Command not supported by database")
Error (Code 1683): Invalid performance_schema usage.

Enjoy!