MySQL 8.0 Resource Groups

MySQL 8.0 is out. Is this a great piece of news? No.

But MySQL 8.0 is wonderful, and this IS a great piece of news!

It has lots of interesting features, really. Oracle advertised some of them very well with talks at conferences, blog posts, etc. However I am very pleased by a features that they did not advertised at all: resource groups.

The documentation describes them in detail, but here is a small recap.

As we all know, MySQL has system (background) threads, and user (foreground) threads. Until now, nothing could be done to change their priority. All we could do was to tune InnoDB concurrency tickets to make sure that long running queries don’t prevent other queries from using CPU time. Basically, what we do is splitting the execution of a query to improve concurrency.

Now we can also affect threads priority. By default, all threads have the same priority. We can increase the priority of system threads, or decrease the priority of user threads. In any case, no user thread can have more priority than any system thread. Furthermore, we can limit the execution of some threads to one or more virtual CPUs of our choice.

How are these things done? By creating a resource group and setting THREAD_PRIORITY and VCPU attributes. THREAD_PRIORITY is a number (-20..0 for system threads, 0..19 for user threads; lower numbers have higher priority). VCPU is a virtual CPU number (see which ones are available with lscpu), or a range, or a list of numbers/ranges. Then, we can assign running threads to one of the groups we created.

For example, to create a group for system threads:

CREATE RESOURCE GROUP sql_thread
    TYPE = SYSTEM
    VCPU = 0,1
    THREAD_PRIORITY = -10;

To assign an existing thread to that group, check its id with SHOW PROCESSLIST, and then run something like:

SET RESOURCE GROUP sql_thread FOR 10;

Some possible ideas:

  • Increase priority of replication threads to reduce slave lag;
  • Increase priority of the even scheduler;
  • Decrease priority of long-running analytical queries.

I plan to spend some time to experiment what happens if we change the priority of specific system threads on a busy server. I expect this feature to be helpful during peaks of CPU usage.

Unfortunately, resource groups can only be assigned to running threads. There is no way to tell MySQL that connections from a certain user should use a certain group. However, we can achieve the same result by making a change in the applications. After connecting, the application can run:

SET RESOURCE GROUP my_group;

Federico

Advertisements

About READ UNCOMMITTED

Transaction isolation levels are the least understood feature of relational databases. Most developers don’t know them and just use the default one. Actually, a relevant part of them even believe they use MySQL without transactions.

Amongst isolation levels, READ UNCOMMITTED is the least understood. So here’s a quick note about what it is and why – if you know about it – you probably have false beliefs.

Basically, READ UNCOMMITTED is totally inconsistent. It sees changes (new, deleted, modified rows) made by other transactions, that didn’t COMMIT yet. And actually, it’s possible that those transactions will fail, which leads READ UNCOMMITTED to see something that will never happen.

Despite this, it is extremely useful in some cases. For example:

  • To run SELECTs which read a huge amount of rows for analytics.
  • To DELETE rows not used by anyone (archived historical contents).
  • To INSERT rows when no one else can possibly insert new rows.

When using the default isolation level, or even READ COMMITTED, if your transaction involves many rows (say, millions) your statements will probably be slow, and will probably use much CPU time, maybe will also cause replication lag.

READ UNCOMMITTED is not magic and cannot make a slow query fast. But it can save a lot of transaction logs processing, making a query a bit faster and avoiding consuming many of resources for no reason.

That said, its drawbacks should also be noted:

  • Bugs. Precisely because it is the least used (tested) isolation level.
  • It would be a mistake to assume that READ UNCOMMITTED is lock-free. When trying to modify a locked row, it’s queued. When writing anything, it puts a lock. In some cases (like the ones listed above) this is irrelevant, but you need to understand this.
  • Selecting all rows from a table while it’s been intensively INSERTed is slower with READ UNCOMMITTED.

One last important detail. When I explain isolation levels, people tend to think that READ UNCOMMITTED doesn’t take snapshots. This is wrong: it takes a snapshots for every statement it runs, but snapshots include uncommitted changes.

EDIT: Actually there is no snapshot, see the comments. But despite my mistake, the following part remains true.

This may sound slightly obscure. To understand better, create a test table and run something like:

SELECT a FROM my_test WHERE a > SLEEP(a);

The query will evaluate each row, and the evaluation will cause a wait of some seconds. So you have the time to add a new row from another connection. The row will not appear in the results of these query – it’s not in the snapshot that has been taken when the query was issued. However, if you run the query again, the new row will be there, because a new snapshot is taken.

Links for the curious:

Federico

About USE/FORCE/IGNORE INDEX

USE/FORCE/IGNORE INDEX syntax, or index hints, are nice shortcuts to make sure that MySQL will (or will not) use a certain index. But it comes with some drawbacks:

USE/FORCE INDEX will not allow to use an index not mentioned in the list

This could be by design, though in the case of USE INDEX it sounds weird to me. Why? Because if none of the indexes mentioned in the list is usable, a full table scan will happen.

Why is this a problem? Because in the real world queries are generated dynamic and evolve over time. Today’s optimisations could be tomorrow’s wrong hints. I had a case of a wrong USE INDEX preventing the use of the primary key.

Produces an error if the index doesn’t exist

Again, this could be by design, but in the case of IGNORE INDEX this seems to me not ideal. A warning would be much better. Even better, I’d like to have this behaviour governed by a variable.

Why is this a problem? Because indexes can be deleted. Maybe it’s because queries change, maybe it’s because they were wrong from the start (possibly not even created by a DBA). But then, dropping an index can generate errors for existing applications.

Unfortunately, IMHO, documenting the usage of such hints is too difficult.

Federico

Nice surprises in CockroachDB

Dear reader, this is not a detailed technical post. I started experimenting with CockroachDB what I found is enough to excite my curiosity. That’s why I am sharing a couple of snippets to you.

To be clear: I am not the kind of guy that sees X and claims that Y is obsolete. I will still use MySQL and MariaDB in the future, as well as PostgreSQL. I have very good reason for that. I don’t even have a use case for CockroachDB, currently. That said, there are some reasons why DBAs should take a look at this project – such reasons are beyond the scope of this post, but are clearly highlighted in Cockroach documentation.

Basically, I made some tests on DDLs whose results pleased me; incidentally, I’ll add some comments on SQL, the GUI and error handling. But note that I am a Cockroach newbie. I didn’t yet start to look at Cockroach internals – and the same goes for RocksDB, which handles Cockroach storage layer. So I observe behaviour, but I don’t know the underlying mechanisms. If you have questions, I am most probably not the best person to answer. Continue reading

My 2018 Databases Wishlist

Well, the most important wishes I have for 2018 are a bit out of topic for this blog: forms of organisation without a formal authority, schools not teaching religions, and so on. But in this post, I will write about databases… as usual.

So, here is my whishlist, for what it matters.

More research on Learned Indexes

If you don’t know what I’m talking about, see this paper. Having a data structure faster than B-Trees is exciting. But of course I’d like to see also considerations on write performance.

Progress on using ML for database tuning

See this article. I don’t think that Machine Learning will ever be able to replace (good) DBAs, but having a tool which suggests tuning based on real workload sounds great. It can be a valid help for DBAs.

More research on stored functions transformation

Stored functions are useful but slow. But see this paper. It seems it is possible to transform imperative programs to queries, improving the complexity by some orders of magnitude.

On a side note, MariaDB implemented a lot of syntax from Oracle for stored procedures. While this sounds like a precise commercial strategy, the technical improvement on this area is great. Still, what I’d like to see is better performance, as well as support for external languages.

Galera 4

Let me be clear, I didn’t read any announcement that Galera 4 will be released this year. But they announced exciting news over time, and still the new version isn’t here. At some point, it should be released (hopefully).

Transactional DDL in the MySQL ecosystem

MySQL 8.0 has support for atomic DDL statements. They did it in a good way: it’s engine independent and, while it uses InnoDB information_schema tables, any engine is free to add support for this feature. They stated that this is the basis for transactional DDL, but we are not yet there. MariaDB has a task for transactional DDL.

EDIT: Thanks to Valerii Kravchuk for pointing me MDEV-11424 – Instant ALTER TABLE of failure-free record format changes. It is clearly worth adding it to my wishlist: please Maria, get it done!

Engines, engines, engines

RocksDB is great, please consolidate it. TokuDB can improve in many ways, please don’t stop investing on it. Next version of SPIDER will be in MariaDB 10.3, I hope that the development will be a bit more active in the future.

Don’t kill MyISAM. It is still useful in some cases. For Catawiki use cases, I find it better than InnoDB for temporary tables. Also JFG has a great use case example.

More progress on Tarantool and CockroachDB

Tarantool is a great database, originally NoSQL. It is extremely scriptable (actually it can be seen as a Lua application server) and its modules allow to read and write data from a wide variety of data sources, including MySQL replication. Recently, SQL support has been added.

CockroachDB is an open source RDBMS design to scale geographically. It uses distributed transaction. It also allows to tune the redundancy of data at table level and define replication zones.

Great conferences

I will be both at M18 (I’m not sponsored by my company, but I chosen to go anyway) and Percona Live. At M18 I will give a talk titled Somewhere between schema and schemaless. Of course I also submitted proposal for Percona Live, let’s see if they get accepted.

Random thoughts after Percona Live

Percona Live Europe 2017 has ended. As always it’s been a great opportunity to hear great technical talks, ask questions, get in touch with people, drink beer, collect nice & useless gadgets. And, on a personal note, to meet former colleagues and former customers, some of which I never met before. And even joke with a customer about my sleepy voice when I had to start all my Monday mornings with a Skype call with him, at 8, at home. I revealed him I tried everything, including singing, but my voice didn’t improve. Oh, well, if he messaged me after I left Percona, I probably did a decent job.

Here are some completely random thoughts in random order.

  • MariaDB wasn’t there. There was a single talk, officially from the Foundation (I can’t really distinguish the two entities, and I’m not alone). They have their own conference, M18 will be the second edition. So most people have to choose if to attend Percona or Maria. The only consequence I care about is that they’re splitting the community and making every part weaker. As I wrote for the BSL time ago, I can only hope they will change their mind.
  • Tarantool wasn’t there. So bad, I believe in it.
  • Technologies that are growing fast or strongly promoted: ClickHouse, ProxySQL, Vitess, MyRocks storage engine. Wow.
    • But I’ve attended a very nice talk on a TokuDB real life story. Don’t underestimate it.
  • At Percona Live talks, you often hear questions about performance and quality. With MariaDB talk, I was the only one to ask these things (and actually my questions were almost a DoS attack, sorry Vicentiu). I’m not saying that MariaDB has not quality, I’m saying that users have probably a different approach.
  • Sharding is becoming a hot topic.
  • Open source philosophy is consolidating and expanding. It goes far beyond licenses and marketing claims, it is the attitude and pleasure to collaborate and do things that, without a community, wouldn’t be as strong.
  • As remarked by PZ, the first criteria for choosing a technology is usability. There are very good reasons for this, given the current complexity of big or even medium infrastructures. But it’s still a bit dangerous.
  • If you don’t have a reason to be there next year, I’ll give you one: Irish red beer.

Enjoy.
Federico

FOREACH in MySQL/MariaDB stored procedures

One of the annoying limitations of MySQL/MariaDB stored procedures is the lack of a FOREACH construct, which loops on each row returned by a query.

In practice, this forces users to write a lot of code just to tell MySQL how to fetch rows and exit properly. Nesting 2 loops of this kind simply results in unmaintenable code (don’t trust me, just try).

Now, I’m writing a library of views and procedures that I’ll share as open source in the next days, and I decided to finally write my foreach. Well, sort of. It is impossible to use the current stored procedures language to write a flexible foreach, because to loop rows you need a cursor. And cursors are based on a hard-coded query. In this old post I proposed a verbose, ugly, but working solution, but it has a limitation: the number of columns returned by the query must still be fixed. So, I used that technique (based on views) in my procedure, but I had to write different procedures: foreach_1(), foreach_2(), foreach_3(). If you need to read more rows you can modify the code easily, but I think that for an open source library 3 columns is reasonable.

Here I decided to share an experimental prototype. If you have a better idea on how to achieve a similar result, I’ll be glad to trash this hack and use your idea instead.

Also note that for this procedure I used MariaDB 10.2 improved PREPARE statement. If you want to run it on MySQL or older MariaDB versions, you’ll have to make some easy changes and test the procedure.

The code

CREATE PROCEDURE foreach_2(IN in_sql TEXT, IN in_callback_type VARCHAR(9), IN in_callback_body TEXT)
    NOT DETERMINISTIC
    READS SQL DATA
BEGIN
    DECLARE p_sql TEXT DEFAULT NULL;
    DECLARE p_eof BOOL DEFAULT FALSE;
    DECLARE v_p1, v_p2 TEXT DEFAULT NULL;
 
    DECLARE crs_foreach CURSOR FOR
    SELECT p1, p2 FROM vw_foreach;

    DECLARE CONTINUE HANDLER
        FOR NOT FOUND
    BEGIN
        SET p_eof := TRUE;
    END;

    SET in_callback_type := UPPER(in_callback_type);
     IF NOT (in_callback_type IN ('SQL', 'PROCEDURE')) THEN
         SIGNAL SQLSTATE VALUE '45000'
        SET MESSAGE_TEXT = '[foreach_2] Invalid in_callback_type';
    END IF;

    DO GET_LOCK('crs_foreach', 1);

    SET p_sql := CONCAT('CREATE OR REPLACE VIEW vw_foreach AS ', in_sql, ';');
    PREPARE stmt_foreach_2 FROM p_sql;
    EXECUTE stmt_foreach_2;

    OPEN crs_foreach;

    DO RELEASE_LOCK('crs_foreach');
 
    lp_while: WHILE TRUE DO
        FETCH crs_foreach INTO v_p1, v_p2;
 
        IF p_eof THEN
            LEAVE lp_while;
        END IF;

        IF in_callback_type IN ('SQL') THEN
            SET @p1 := v_p1;
            SET @p2 := v_p2;
            PREPARE stmt_foreach_2 FROM in_callback_body;
        ELSE
            PREPARE stmt_foreach_2 FROM CONCAT('CALL `', in_callback_body, '`(', QUOTE(v_p1), ', ', QUOTE(v_p2), ');');
        END IF;
        EXECUTE stmt_foreach_2;
    END WHILE;

    CLOSE crs_foreach;
    DEALLOCATE PREPARE stmt_foreach_2;
END

Now, the usage. There are 2 ways to use this procedure.

Using a callback procedure

First, create a callback procedure which will handle the rows returned by your query:

CREATE PROCEDURE p_echo(IN in_p1 TEXT, IN in_p2 TEXT)
BEGIN
    SELECT CONCAT_WS('.', in_p1, in_p2) AS r;
END

As you can see, the procedure must accept 2 values (foreach_2). The names don’t really matter, but p1 and p2 are clear, at least in this example.

Now, invoke foreach_2():

CALL foreach_2(
    'SELECT TABLE_SCHEMA AS p1, TABLE_NAME AS p2 FROM information_schema.TABLES LIMIT 3;',
    'PROCEDURE',
    'p_echo'
);

We passed the query producing the results we want to loop. Then we told foeach_2() that we want to use a callback procedure, and specified its name.

Using a callback SQL statement

Writing a procedure to perform a simple tasks would be annoying. That’s why I decided to also support callback statements:

CALL foreach_2(
    'SELECT TABLE_SCHEMA AS p1, TABLE_NAME AS p2 FROM information_schema.TABLES LIMIT 3;',
    'SQL',
    'SELECT @p1 AS p1, @p2 AS p2;'
);

As you can see, the callback query can access the values using user variables: @p1 and @p2.

These variables exist at session level, so it is possible that we are overwriting something. But I think there are no reasonable ways to avoid these collisions.

Enjoy!