My 2019 Database Wishlist

2019

Last year I published my 2018 Database Wishlist, which I recently revisited to check what happened and what didn’t. Time for a 2019 wishlist.

I am not going to list items from my 2018 list, even if they didn’t happen or they partially happened. Not because I changed my mind about their importance. Just because I wrote about them recently, and I don’t want to be more boring than I usually am. Continue reading

SQL Common Errors: confusing WHERE and ON

You probably know the ON clause of SQL JOINs. It is a condition that determines which rows from a table are related to a row from another table.

Conceptually, this is extremely different from the WHERE clause, that determines which row combinations will be returned to the client. However, sometimes a condition can be moved from the WHERE clause to the ON clause, because it will still prevent rows from being excluded. But please keep on reading, because the TL;DR of this post is: don’t do that.

Actually, why would one move conditions from WHERE to ON? Because many developers think that this is an optimization. They believe that the ON clause is executed before the WHERE clause, so moving the conditions will cause them to be evaluated at an earlier stage, avoiding some useless work.

Now, one may actually show some examples where moving a condition actually speeds up a query. This can happen in some uncommon cases, and the root cause is that the optimizer is not able to find the optimal execution plan. Don’t focus on the ON clause. Just focus on building proper indexes or finding out why the optimizer is not using them. But this is a wider topic that will not be covered here. Continue reading

Revisiting my 2018 Database Wishlist

It is December and 2018 is going to end. In January, when it just started, I wrote my 2018 Database Wishlist. Probably next January I’ll write another list. But first, it makes sense to review the last one. If some of my wishes actually happen, I really should know that – I don’t want to miss something nice, or forget topics that I considered interesting less than one year ago. Well, let’ s do some copy/paste and some googling… Continue reading

On Percona Community Blog

I liked Percona Community Blog from the beginning. First of all, the idea is great. There is no other community blog for the MySQL ecosystem.

Well, Oracle has its own planet.mysql.com – and I have to say, they are correct: as far as I know, they never censored posts about MariaDB and Percona Server, nor opinions that they don’t like. I wrote heavy criticism about Oracle, sometimes using strong terms (“another dirty trick”), but they never censored me. I like to be fair regardless who/what I’m talking about, so this is a good time to spend some good words about them. Not the first time, anyway. That said, their blogroll only advertises a very small number of blogs. Very good ones of course (except for mine?), but this has the inevitable side effect of obfuscating the rest of the world. If John Doe writes an enlightening post about MySQL, I’ll never read it, because everything I need to know appears on Planet MySQL. Continue reading

Open source databases and bug trackers

All software has bugs. I rarely feel scared because of a bug – even famous ones that are sometimes in the news. People should manage software wisely, and be able to workaround bugs they hit.

Instead, I’m more concerned about how vendors treat their bugs. In particular, I care about transparency, and how projects process their bugs.

Here’s what I mean. Continue reading

Hidden caches catch your data

This article is different from my usual posts. It explains things that may be obvious to many database professionals – not all of them though.

The idea came indirectly from my friend Francesco Allertsen. He has a weekly mailing list he uses to share links to interesting articles he reads on the web. One of them was The hidden components of Web caching. Its purpose is to list all caches that play some role when we interact with a web site. An interesting idea, even if I find it incomplete. So I thought it was a good idea to talk about caches that we hit whenever we interact with a database. Continue reading

MySQL vs. MariaDB: WAIT, NOWAIT, SKIP LOCKED

NOWAIT, WAIT and SKIP LOCKED are syntaxes added in MySQL 8.0 and MariaDB 10.3. The idea came from AliSQL (MySQL fork by Alibaba). It was revisited in MySQL, and I am not aware if MariaDB used the original implementation. EDIT: As Morgan Tocker points out in a comment, originally Ali Baba filed a feature request to MySQL.

While MySQL and MariaDB syntaxes are similar, there are important differences and the compatibility is only apparent. This article discusses these differences.

WAIT

This syntax is only available in MariaDB. It means that, if a row or table that we want to read is write-locked, we can wait up to the specified number of seconds. If the lock is not released after the timeout occurs, the query will fail.

NOWAIT

If a table or row we need to read is write-locked, the query will not be queued; instead, it will fail immediately.

Incompatibilities:

  • MariaDB supports this syntax for some DDL statements (ALTER TABLE and its shortcuts), LOCK TABLES, and SELECT. MySQL only supports it for SELECT.
  • MySQL only supports this syntax in combination with FOR UPDATE or FOR SHARE. In order to introduce an incompatibility, they sacrificed the support of this feature for SELECTs in SERIALIZABLE mode, that have an implicit LOCK IN SHARE MODE clause. Fortunately this is an edge case, but it is another case where Oracle marketing strategies affect users in a bad way.
  • MySQL implements FOR UPDATE OF and FOR SHARE OF. This is interesting, and not only for the NOWAIT feature, because it allows us to JOIN multiple tables without locking them all. Thanks, Oracle engineers.
  • MySQL and MariaDB report different error codes and messages.
    MySQL says: ERROR 3572 (HY000): Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set
    MariaDB says: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

SKIP LOCKED

This is only in MySQL. It excludes locked rows from the resultset, making our queries fast. The documentation warns us that the resultset will be inconsistent. This is implicit in the future, but it is worth emphatizing. However, consistency is not always so important, and skipping rows seems to me a great way to solve some performance problems.

EDIT: Morgan’s comment points out that SKIP LOCKED is also non-deterministic. Again, I believe this is clear if you understand what this feature does, but still, maybe I should point it out. I think it could be compare to READ UNCOMMITTED: they are different optimizations, but in both cases the results you get depend on what other connections are doing. So results are inconsistent and not deterministic by nature.

My conclusions

MariaDB implementation of NOWAIT is clearly more complete. They have WAIT syntax to set a timeout; they implemented this syntax in more SQL statements; and the syntax is compatible with implicit LOCK IN SHARE MODE.

Despite this, IMHO, MySQL wins. They have SKIP LOCKED, which is very interesting. Also the above mentioned FOR UPDATE OF syntax is a nice feature.

Once again, Oracle spent some resources to add incompatibilities. This does not affect the quality of MySQL, but it’s still a damage to the community, which should be able to use both MySQL and MariaDB in the same environments, with the same tools and the same libraries. But the distance between these DBMS’s is growing constantly.

References

Federico

MySQL 8.0 Resource Groups

MySQL 8.0 is out. Is this a great piece of news? No.

But MySQL 8.0 is wonderful, and this IS a great piece of news!

It has lots of interesting features, really. Oracle advertised some of them very well with talks at conferences, blog posts, etc. However I am very pleased by a features that they did not advertised at all: resource groups.

The documentation describes them in detail, but here is a small recap.

As we all know, MySQL has system (background) threads, and user (foreground) threads. Until now, nothing could be done to change their priority. All we could do was to tune InnoDB concurrency tickets to make sure that long running queries don’t prevent other queries from using CPU time. Basically, what we do is splitting the execution of a query to improve concurrency.

Now we can also affect threads priority. By default, all threads have the same priority. We can increase the priority of system threads, or decrease the priority of user threads. In any case, no user thread can have more priority than any system thread. Furthermore, we can limit the execution of some threads to one or more virtual CPUs of our choice.

How are these things done? By creating a resource group and setting THREAD_PRIORITY and VCPU attributes. THREAD_PRIORITY is a number (-20..0 for system threads, 0..19 for user threads; lower numbers have higher priority). VCPU is a virtual CPU number (see which ones are available with lscpu), or a range, or a list of numbers/ranges. Then, we can assign running threads to one of the groups we created.

For example, to create a group for system threads:

CREATE RESOURCE GROUP sql_thread
    TYPE = SYSTEM
    VCPU = 0,1
    THREAD_PRIORITY = -10;

To assign an existing thread to that group, check its id with SHOW PROCESSLIST, and then run something like:

SET RESOURCE GROUP sql_thread FOR 10;

Some possible ideas:

  • Increase priority of replication threads to reduce slave lag;
  • Increase priority of the even scheduler;
  • Decrease priority of long-running analytical queries.

I plan to spend some time to experiment what happens if we change the priority of specific system threads on a busy server. I expect this feature to be helpful during peaks of CPU usage.

Unfortunately, resource groups can only be assigned to running threads. There is no way to tell MySQL that connections from a certain user should use a certain group. However, we can achieve the same result by making a change in the applications. After connecting, the application can run:

SET RESOURCE GROUP my_group;

Federico

About READ UNCOMMITTED

Transaction isolation levels are the least understood feature of relational databases. Most developers don’t know them and just use the default one. Actually, a relevant part of them even believe they use MySQL without transactions.

Amongst isolation levels, READ UNCOMMITTED is the least understood. So here’s a quick note about what it is and why – if you know about it – you probably have false beliefs.

Basically, READ UNCOMMITTED is totally inconsistent. It sees changes (new, deleted, modified rows) made by other transactions, that didn’t COMMIT yet. And actually, it’s possible that those transactions will fail, which leads READ UNCOMMITTED to see something that will never happen.

Despite this, it is extremely useful in some cases. For example:

  • To run SELECTs which read a huge amount of rows for analytics.
  • To DELETE rows not used by anyone (archived historical contents).
  • To INSERT rows when no one else can possibly insert new rows.

When using the default isolation level, or even READ COMMITTED, if your transaction involves many rows (say, millions) your statements will probably be slow, and will probably use much CPU time, maybe will also cause replication lag.

READ UNCOMMITTED is not magic and cannot make a slow query fast. But it can save a lot of transaction logs processing, making a query a bit faster and avoiding consuming many of resources for no reason.

That said, its drawbacks should also be noted:

  • Bugs. Precisely because it is the least used (tested) isolation level.
  • It would be a mistake to assume that READ UNCOMMITTED is lock-free. When trying to modify a locked row, it’s queued. When writing anything, it puts a lock. In some cases (like the ones listed above) this is irrelevant, but you need to understand this.
  • Selecting all rows from a table while it’s been intensively INSERTed is slower with READ UNCOMMITTED.

One last important detail. When I explain isolation levels, people tend to think that READ UNCOMMITTED doesn’t take snapshots. This is wrong: it takes a snapshots for every statement it runs, but snapshots include uncommitted changes.

EDIT: Actually there is no snapshot, see the comments. But despite my mistake, the following part remains true.

This may sound slightly obscure. To understand better, create a test table and run something like:

SELECT a FROM my_test WHERE a > SLEEP(a);

The query will evaluate each row, and the evaluation will cause a wait of some seconds. So you have the time to add a new row from another connection. The row will not appear in the results of these query – it’s not in the snapshot that has been taken when the query was issued. However, if you run the query again, the new row will be there, because a new snapshot is taken.

Links for the curious:

Federico

About USE/FORCE/IGNORE INDEX

USE/FORCE/IGNORE INDEX syntax, or index hints, are nice shortcuts to make sure that MySQL will (or will not) use a certain index. But it comes with some drawbacks:

USE/FORCE INDEX will not allow to use an index not mentioned in the list

This could be by design, though in the case of USE INDEX it sounds weird to me. Why? Because if none of the indexes mentioned in the list is usable, a full table scan will happen.

Why is this a problem? Because in the real world queries are generated dynamic and evolve over time. Today’s optimisations could be tomorrow’s wrong hints. I had a case of a wrong USE INDEX preventing the use of the primary key.

Produces an error if the index doesn’t exist

Again, this could be by design, but in the case of IGNORE INDEX this seems to me not ideal. A warning would be much better. Even better, I’d like to have this behaviour governed by a variable.

Why is this a problem? Because indexes can be deleted. Maybe it’s because queries change, maybe it’s because they were wrong from the start (possibly not even created by a DBA). But then, dropping an index can generate errors for existing applications.

Unfortunately, IMHO, documenting the usage of such hints is too difficult.

EDIT: On my professional website, I’ve published The prerils of ALTER TABLE in MySQL/MariaDB. It includes the problem discussed here of course, but it is a more comprehensive overview of the problems you may encounter during a migration.

Federico