Open source databases and bug trackers

All software has bugs. I rarely feel scared because of a bug – even famous ones that are sometimes in the news. People should manage software wisely, and be able to workaround bugs they hit.

Instead, I’m more concerned about how vendors treat their bugs. In particular, I care about transparency, and how projects process their bugs.

Here’s what I mean.

Transparency:

  1. Ability to search for new, active, and closed bugs.
  2. Public tests to check yourself if bugs are fixed, and if regression bugs made their way back into the code.
  3. Public version maintenance policy – if I use an old version, I should know if bugfixes are still backported.
  4. Optional: See developers comments, see their activity on a specific bug.
  5. Optional: See which version should contain a fix (if planned) and when that version will be released (if planned).

I want to highlight the word Optional. Points 1, 2 and 3, in my opinion, should be in place for any project that is not a personal hobby. If they miss (quite often, I have to say…) the project can still be useful and cool, but it is not being managed properly from QA point of view.

Bug processing:

I am not able to write a detailed list here. I think that this part depends on project characteristics: what the program does, if it is led by a company or not, how active the community is in development, and so on. Still, common sense should apply. Users are affected by bugs, therefore bugs should be fixed when it is reasonable to do so.

When it comes to open source RDBMSs…

Yes, I’m only talking about relational databases. And I’m only consider the most widely known ones. With this, I don’t want to disrespect any project. There can be great reasons to use NoSQL or less known SQL databases, but NoSQL is a huge set of different products, and I only want to write about what I know (or “know”).

MySQL

MySQL has a public bug tracker, which is good. But they also have a private one, and this has some bad consequences:

  • If you read about a bug that is still active, you cannot be sure that information is up to date.
  • Some bugs are private, for security reasons. Those reasons are often quite obscure, if not always. Note that MySQL typically runs on open source layers whose security bugs are public.
  • For the same reasons, not all tests are public. This reduces community QA capabilities.

But you can read more detailed information about this topic in Blog of (former?) MySQL Entomologist.

Another flaw is that not all MySQL components have the same level of quality. If you make heavy use of certain features (XA transactions, stored procedures…) you can reasonably expect to hit bugs. This is partly understandable, because Oracle needs to improve the features that are most used and useful. What is much less understandable is how an area of the product that was disappointing and buggy many years ago is still disappointing and buggy.

About the bug tracker… I’d like to highlight something that is rarely mentioned: its search capabilities are powerful. Maybe not so user friendly – you need to know how to use the tool. But once you know, you will be able to find everything you need.

Percona Server

Percona Server is a fork of MySQL which tries to keep the differences as small as possible. Thus, generally MySQL bugs are also Percona Server bugs, and all the features they implement could cause some specific bugs.

If they fix MySQL bugs, you will find this in their documentation.

Specific bugs can be found in Percona Server JIRA.

You can usually find internal details and tests. You know who works on a bug and when. Generally you don’t know in advance which version will contain a fix.

I think they do very well, and no, the fact I worked for Percona does not affect my opinion. Keep in mind, however, that Percona is a customer-centric company. This means that generally they fix bugs that affect their customers. And, obviously, they merge bug fixes that happen in mainstream MySQL code.

MariaDB

They also have a JIRA to report and search bugs. In their JIRA you can also find tests and several technical information written by the developers. It happened to me more than once to read more information about a bug and find interesting details about InnoDB internals.

In MariaDB JIRA you can also find some information that you can’t find in Percona JIRA. You can see the active sprint’s activity, so you know which bugs they are working on and what their status is (todo, in progress, in review, done). And you can see the next releases, including the next planned release dates (keep in mind that they are not commitments). This is a very high level of transparency and potentially a valuable source of information, for example if you are planning upgrades or if you have pressing problems with a specific bug. And of course it’s still interesting if you are just curious.

I also want to highlight that they discuss in the same way contributions and new features. Also, developers often interact with the community using their mailing lists, which makes the whole project much more transparent.

PostgreSQL

PostgreSQL does not have a bug tracker. Bugs can be reported to the dedicated mailing list or via a web form which sends a message to the mailing list. The archives are the only available tool to see information about Postgres bugs. In other words, there is no way to see the list of open bugs. This is sad.

However, Percona recently started to offer support for PostgreSQL. Their Careers page indicated that they were looking for someone able to work with Postgres code, some time ago (yes, I keep an eye on some companies career pages, because they are a source of information about their plans). Therefore my hope is that they intend to do a great QA work with Postgres like they always did with MySQL, and that they will push to have a publicly available professional bug tracker.

CockroachDB

CockroachDB uses GitHub issues to track bugs. The tool is quite poor compared to JIRA or bugs.mysql.com, but it has the most important features that you can reasonably expect.

Usually there are not many technical details about bugs. And you don’t know what is happening around a bug, unless there is a new comment.

Why do I always criticise?

Actually I don’t. But I’m aware that, if a person expresses 3 positive thoughs and 1 criticism, in many cases only the latter will be remembered. This is natural.

Anyway, my spirit is always: please do better. I use my time to criticise what I like, not what I dislike.

My criticism could also be wrong, in which case there could at least be a good discussion in the comments.

Federico

Advertisements

Hidden caches catch your data

This article is different from my usual posts. It explains things that may be obvious to many database professionals – not all of them though.

The idea came indirectly from my friend Francesco Allertsen. He has a weekly mailing list he uses to share links to interesting articles he reads on the web. One of them was The hidden components of Web caching. Its purpose is to list all caches that play some role when we interact with a web site. An interesting idea, even if I find it incomplete. So I thought it was a good idea to talk about caches that we hit whenever we interact with a database.

Why should we care?

But first, a note on why we should care:

  • Caches increase the speed of IO by orders of magnitude. Latency numbers that everyone should know gives you an idea of that. The first comment suggests to use a solar system image to visualise the scale; curiously I had the same idea, and I used this image for some slides I made in the past.
  • When reliability is important, caches can get in the way in a dangerous way. Because caches are volatile, so they won’t survive a crash or other types of hardware/software failure (aka bugs and limitations). So when we write data, for certain use cases (financial transactions, etc) at no time data should be cached but not written to disk. Or it can happen temporarily before the database says “Ok Mr. Application, I got your data, you can resume your job”.

What guarantees this requirement? Well, caches can be write-through, write-around or write-back. Write-through and write-around caches are reliable in this respect, because data is always written to disk before the control is returned to the writer. Write-back are not reliable, because data is made persistent asynchronously, after the control has been returned. But of course they are much faster because they allow to groups more IO operations together (+ throughput), because latency is very low and because there is no stall if the IO capacity is currently saturated.

It’s all about tradeoffs

Depending on the use case, we will have to choice the proper tradeoff between reliability and performance. For example, InnoDB allows to:

  • Flush changes to disks at every commit: even if mysqld crashes, no data loss is possible if you don’t hit any software/hardware bug;
  • Flush changes to the filesystem cache at every commit: a filesystem crash is the only event that can cause data loss, and it is not likely;
  • Flush data to disk once a second, or even longer intervals.

Also, when we make the choice, we should keep into account data redundancy. For example, if we run a Galera Cluster, we have at least 3 copies of the data on different servers. Or we could store data on a RAID array, which also guarantees that we have multiple copies of data. Failover guarantees that our services don’t break if one copy gets damaged, and we can restore it from another copy. In such cases, even if we deal with critical data, we don’t necessarily need InnoDB to store data in the most reliable way – which is the slowest.

Types of caches

Enough blah blah. Here is the list of caches that could get in the way when we try to persist our data, starting from the lowest levels:

Disk caches – Both spinning disks and SSD can have caches. In your laptop, this cache is most probably write-back. This can usually be changed with something like hdparm.

Drive controllers and RAIDs caches – These can be write-through or write-back, and usually they are configurable. Notice that they could also be battery-backed, which means that they will survive a crash (unless the device has no power for a long time). Battery-backed caches can safely be write-back, but for other caches a write-through strategy could be necessary. Battery-backed RAIDs need a periodic learning cycle. A learning cycle slows down all operations sensibly, but it is necessary to be sure that the battery is totally charged. This operation should be scheduled carefully.

Filesystem cache – You can use it in a write-through or write-back fashion. This topic is amazingly complex (just like the previous ones), so I’ll just give you a link to a wonderful article: Files are Hard.

Virtual machines – Virtual machines have a disk interface cache. Its write strategy depends on the cache mode. There are several cache modes, but here we’ll only mention the reliable ones: none, which means that the VM doesn’t cache data (but the host system can), and writethrough, whose meaning should now be clear. Virtual machines also have a filesystem cache, of course. Note that having reliable settings on the VM guarantees that data changes will survive if the VM or anything running in it will crash; but if the host doesn’t have reliable settings and it crashes, most recent changes could be lost. Still, in production, typically a hypervisor runs many VMs. If many VMs bypass the cache, hypervisor’s IO capacity can be easily saturated. It could be better to build a database cluster with VMs running on separate physical hosts, to reduce the risk of data loss in case of crashes – in other words, often it’s better to rely on redundancy and failover, rather than reliability of individual systems.

Transaction log buffer – Terminology differ from DBMS to DBMS (WAL, transaction logs…), but the idea is that changes are persistent when they hit these logs. They will also need to be written to data files, but if they are in the logs they are safe. These logs have a buffer, which contains data not yet flushed. This idea can be confusing for some, so I’ll make it clear: this speeds up things, but doesn’t cause any danger. If your flush strategy is a write-through one, the buffer will contain not yet committed changes, which are flushed on commit – and only after the flush, the DBMS will report success. Regardless your flush strategy, some changes are flushed if the buffer gets full.

Binary log buffer – There is not necessarily a binary log separated from the transaction logs. MySQL has it because its architecture requires it – binary log contains all changes to data and it’s handled by the server, transaction logs contain information necessary to replay or rollback transactions and it’s handled by InnoDB (actually even non-transactional storage engines can have logs, but I’m not going to discuss this here). Considerations about the transaction logs apply to the binary log as well, keeping in mind that its purpose is different (incremental backups and replication, not crash recovery). In Postgres you have WAL files, which are used both for incremental backups/replication and for crash recovery.

Buffer pool – Most databases (Postgres is a famous exception) have a buffer pool to cache frequently accessed data and indexes. It can even contain dirty pages: changes that are not yet written to data files. This makes things much faster. And again: changes are persistent when they are written to transaction logs. Even after a crash, data files can be repaired using transaction logs.

Session buffers, work mem – These buffers speed up parts of query execution, like joins and sorting. However they have nothing to do with writes.

Query cache – MySQL older versions, MariaDB, and maybe other DBMS’s (not sure, sorry) have a query cache. This can speed up reads when the very same query is ran often. “Very same” means that hashes of the queries are compared, so any difference is relevant, including whitespaces. Every time a table is written, all queries mentioning the table are invalidated in the cache. This and its well-known scalability problems make it usually a bad idea, at least in MariaDB/MySQL (there are exceptions – for example, if you have small concurrency, a reasonable number of very slow queries and not many writes).

Proxy caches – Proxies, like ProxySQL, can also have a query cache. It can have problems and it is not necessarily useful, but at least it is supposed to be built with scalability in mind (because proxies are about scalability).

Redis & friends – This should be obvious: retrieving data from a cache system (like Redis or Memcached) is much faster than retrieving it from MySQL. Usually those data have a TTL (time to live), which determines when they will expire, and they can also be invalidated manually. Keep in mind that this makes response times unpredictable: if data is cached response time is X, if it is expired time is Y – where X and Y could be very different. It is even more unpredictable if this cache is not enough to contain all your hot data. So you should be careful about what to cache, unless your dataset is small. Note that these caches could also use the disks: for example, older Redis versions had Virtual Memory (currently deprecated). But we will not dig into this, as our focus is the persistent database. The point is: these caches can avoid database queries, but not always.

Application – No matter how fast proxy’s query cache, Redis and Memcached are: retrieving data from local RAM is much faster. No network round trip, no other servers response time involved. Of course you shouldn’t cache locally a big amount of data, or your memory will not be enough and your application could suffer. And cache invalidation can be a very complex problem to consider. But still, for hottest small data, local memory is the fastest option. To avoid making the response time unpredictable, it’s better to keep application-level caches updated, instead of running queries when an entry expires. Writes to the database are still necessary, and they can be synchronous or asynchronous, depending on how critical these data are.

Trust no one

A famous fairy said that some lies have short legs and others have long a nose. If hard disks, controllers and even filesystems had noses, some of them would have a long nose.

I will not dig into this complex topic myself, but the take away of this paragraph is: don’t trust them. They sometimes lie about consistency, so benchmarks are more impressive and marketing people are happy. Instead, try diskchecker.pl. It will tell you if something in your system is lying. It will not tell you if it is the hard disk, or the controller, or something in the OS. But it will tell you if data it writes are actually persisted immediately.

If your data are on the cloud, you cannot use this tool – because it involves shutting down the physical server suddenly while a file is being written. I am not aware of any tool or procedure to check if your cloud provider is lying about persistence. If you know one, please write a comment to this post. That would be much appreciated.

Databases don’t lie – at least, I am not aware of any DBMS or object store lying about persistence. But they have bugs, just like any piece of software, so you should check them periodically. Here is a PostgreSQL example.

Federico

MySQL vs. MariaDB: WAIT, NOWAIT, SKIP LOCKED

NOWAIT, WAIT and SKIP LOCKED are syntaxes added in MySQL 8.0 and MariaDB 10.3. The idea came from AliSQL (MySQL fork by Alibaba). It was revisited in MySQL, and I am not aware if MariaDB used the original implementation. EDIT: As Morgan Tocker points out in a comment, originally Ali Baba filed a feature request to MySQL.

While MySQL and MariaDB syntaxes are similar, there are important differences and the compatibility is only apparent. This article discusses these differences.

WAIT

This syntax is only available in MariaDB. It means that, if a row or table that we want to read is write-locked, we can wait up to the specified number of seconds. If the lock is not released after the timeout occurs, the query will fail.

NOWAIT

If a table or row we need to read is write-locked, the query will not be queued; instead, it will fail immediately.

Incompatibilities:

  • MariaDB supports this syntax for some DDL statements (ALTER TABLE and its shortcuts), LOCK TABLES, and SELECT. MySQL only supports it for SELECT.
  • MySQL only supports this syntax in combination with FOR UPDATE or FOR SHARE. In order to introduce an incompatibility, they sacrificed the support of this feature for SELECTs in SERIALIZABLE mode, that have an implicit LOCK IN SHARE MODE clause. Fortunately this is an edge case, but it is another case where Oracle marketing strategies affect users in a bad way.
  • MySQL implements FOR UPDATE OF and FOR SHARE OF. This is interesting, and not only for the NOWAIT feature, because it allows us to JOIN multiple tables without locking them all. Thanks, Oracle engineers.
  • MySQL and MariaDB report different error codes and messages.
    MySQL says: ERROR 3572 (HY000): Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set
    MariaDB says: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

SKIP LOCKED

This is only in MySQL. It excludes locked rows from the resultset, making our queries fast. The documentation warns us that the resultset will be inconsistent. This is implicit in the future, but it is worth emphatizing. However, consistency is not always so important, and skipping rows seems to me a great way to solve some performance problems.

EDIT: Morgan’s comment points out that SKIP LOCKED is also non-deterministic. Again, I believe this is clear if you understand what this feature does, but still, maybe I should point it out. I think it could be compare to READ UNCOMMITTED: they are different optimizations, but in both cases the results you get depend on what other connections are doing. So results are inconsistent and not deterministic by nature.

My conclusions

MariaDB implementation of NOWAIT is clearly more complete. They have WAIT syntax to set a timeout; they implemented this syntax in more SQL statements; and the syntax is compatible with implicit LOCK IN SHARE MODE.

Despite this, IMHO, MySQL wins. They have SKIP LOCKED, which is very interesting. Also the above mentioned FOR UPDATE OF syntax is a nice feature.

Once again, Oracle spent some resources to add incompatibilities. This does not affect the quality of MySQL, but it’s still a damage to the community, which should be able to use both MySQL and MariaDB in the same environments, with the same tools and the same libraries. But the distance between these DBMS’s is growing constantly.

References

Federico

MySQL 8.0 Resource Groups

MySQL 8.0 is out. Is this a great piece of news? No.

But MySQL 8.0 is wonderful, and this IS a great piece of news!

It has lots of interesting features, really. Oracle advertised some of them very well with talks at conferences, blog posts, etc. However I am very pleased by a features that they did not advertised at all: resource groups.

The documentation describes them in detail, but here is a small recap.

As we all know, MySQL has system (background) threads, and user (foreground) threads. Until now, nothing could be done to change their priority. All we could do was to tune InnoDB concurrency tickets to make sure that long running queries don’t prevent other queries from using CPU time. Basically, what we do is splitting the execution of a query to improve concurrency.

Now we can also affect threads priority. By default, all threads have the same priority. We can increase the priority of system threads, or decrease the priority of user threads. In any case, no user thread can have more priority than any system thread. Furthermore, we can limit the execution of some threads to one or more virtual CPUs of our choice.

How are these things done? By creating a resource group and setting THREAD_PRIORITY and VCPU attributes. THREAD_PRIORITY is a number (-20..0 for system threads, 0..19 for user threads; lower numbers have higher priority). VCPU is a virtual CPU number (see which ones are available with lscpu), or a range, or a list of numbers/ranges. Then, we can assign running threads to one of the groups we created.

For example, to create a group for system threads:

CREATE RESOURCE GROUP sql_thread
    TYPE = SYSTEM
    VCPU = 0,1
    THREAD_PRIORITY = -10;

To assign an existing thread to that group, check its id with SHOW PROCESSLIST, and then run something like:

SET RESOURCE GROUP sql_thread FOR 10;

Some possible ideas:

  • Increase priority of replication threads to reduce slave lag;
  • Increase priority of the even scheduler;
  • Decrease priority of long-running analytical queries.

I plan to spend some time to experiment what happens if we change the priority of specific system threads on a busy server. I expect this feature to be helpful during peaks of CPU usage.

Unfortunately, resource groups can only be assigned to running threads. There is no way to tell MySQL that connections from a certain user should use a certain group. However, we can achieve the same result by making a change in the applications. After connecting, the application can run:

SET RESOURCE GROUP my_group;

Federico

About READ UNCOMMITTED

Transaction isolation levels are the least understood feature of relational databases. Most developers don’t know them and just use the default one. Actually, a relevant part of them even believe they use MySQL without transactions.

Amongst isolation levels, READ UNCOMMITTED is the least understood. So here’s a quick note about what it is and why – if you know about it – you probably have false beliefs.

Basically, READ UNCOMMITTED is totally inconsistent. It sees changes (new, deleted, modified rows) made by other transactions, that didn’t COMMIT yet. And actually, it’s possible that those transactions will fail, which leads READ UNCOMMITTED to see something that will never happen.

Despite this, it is extremely useful in some cases. For example:

  • To run SELECTs which read a huge amount of rows for analytics.
  • To DELETE rows not used by anyone (archived historical contents).
  • To INSERT rows when no one else can possibly insert new rows.

When using the default isolation level, or even READ COMMITTED, if your transaction involves many rows (say, millions) your statements will probably be slow, and will probably use much CPU time, maybe will also cause replication lag.

READ UNCOMMITTED is not magic and cannot make a slow query fast. But it can save a lot of transaction logs processing, making a query a bit faster and avoiding consuming many of resources for no reason.

That said, its drawbacks should also be noted:

  • Bugs. Precisely because it is the least used (tested) isolation level.
  • It would be a mistake to assume that READ UNCOMMITTED is lock-free. When trying to modify a locked row, it’s queued. When writing anything, it puts a lock. In some cases (like the ones listed above) this is irrelevant, but you need to understand this.
  • Selecting all rows from a table while it’s been intensively INSERTed is slower with READ UNCOMMITTED.

One last important detail. When I explain isolation levels, people tend to think that READ UNCOMMITTED doesn’t take snapshots. This is wrong: it takes a snapshots for every statement it runs, but snapshots include uncommitted changes.

EDIT: Actually there is no snapshot, see the comments. But despite my mistake, the following part remains true.

This may sound slightly obscure. To understand better, create a test table and run something like:

SELECT a FROM my_test WHERE a > SLEEP(a);

The query will evaluate each row, and the evaluation will cause a wait of some seconds. So you have the time to add a new row from another connection. The row will not appear in the results of these query – it’s not in the snapshot that has been taken when the query was issued. However, if you run the query again, the new row will be there, because a new snapshot is taken.

Links for the curious:

Federico

About USE/FORCE/IGNORE INDEX

USE/FORCE/IGNORE INDEX syntax, or index hints, are nice shortcuts to make sure that MySQL will (or will not) use a certain index. But it comes with some drawbacks:

USE/FORCE INDEX will not allow to use an index not mentioned in the list

This could be by design, though in the case of USE INDEX it sounds weird to me. Why? Because if none of the indexes mentioned in the list is usable, a full table scan will happen.

Why is this a problem? Because in the real world queries are generated dynamic and evolve over time. Today’s optimisations could be tomorrow’s wrong hints. I had a case of a wrong USE INDEX preventing the use of the primary key.

Produces an error if the index doesn’t exist

Again, this could be by design, but in the case of IGNORE INDEX this seems to me not ideal. A warning would be much better. Even better, I’d like to have this behaviour governed by a variable.

Why is this a problem? Because indexes can be deleted. Maybe it’s because queries change, maybe it’s because they were wrong from the start (possibly not even created by a DBA). But then, dropping an index can generate errors for existing applications.

Unfortunately, IMHO, documenting the usage of such hints is too difficult.

Federico

My 2018 Databases Wishlist

Well, the most important wishes I have for 2018 are a bit out of topic for this blog: forms of organisation without a formal authority, schools not teaching religions, and so on. But in this post, I will write about databases… as usual.

So, here is my whishlist, for what it matters.

More research on Learned Indexes

If you don’t know what I’m talking about, see this paper. Having a data structure faster than B-Trees is exciting. But of course I’d like to see also considerations on write performance.

Progress on using ML for database tuning

See this article. I don’t think that Machine Learning will ever be able to replace (good) DBAs, but having a tool which suggests tuning based on real workload sounds great. It can be a valid help for DBAs.

More research on stored functions transformation

Stored functions are useful but slow. But see this paper. It seems it is possible to transform imperative programs to queries, improving the complexity by some orders of magnitude.

On a side note, MariaDB implemented a lot of syntax from Oracle for stored procedures. While this sounds like a precise commercial strategy, the technical improvement on this area is great. Still, what I’d like to see is better performance, as well as support for external languages.

Galera 4

Let me be clear, I didn’t read any announcement that Galera 4 will be released this year. But they announced exciting news over time, and still the new version isn’t here. At some point, it should be released (hopefully).

Transactional DDL in the MySQL ecosystem

MySQL 8.0 has support for atomic DDL statements. They did it in a good way: it’s engine independent and, while it uses InnoDB information_schema tables, any engine is free to add support for this feature. They stated that this is the basis for transactional DDL, but we are not yet there. MariaDB has a task for transactional DDL.

EDIT: Thanks to Valerii Kravchuk for pointing me MDEV-11424 – Instant ALTER TABLE of failure-free record format changes. It is clearly worth adding it to my wishlist: please Maria, get it done!

Engines, engines, engines

RocksDB is great, please consolidate it. TokuDB can improve in many ways, please don’t stop investing on it. Next version of SPIDER will be in MariaDB 10.3, I hope that the development will be a bit more active in the future.

Don’t kill MyISAM. It is still useful in some cases. For Catawiki use cases, I find it better than InnoDB for temporary tables. Also JFG has a great use case example.

More progress on Tarantool and CockroachDB

Tarantool is a great database, originally NoSQL. It is extremely scriptable (actually it can be seen as a Lua application server) and its modules allow to read and write data from a wide variety of data sources, including MySQL replication. Recently, SQL support has been added.

CockroachDB is an open source RDBMS design to scale geographically. It uses distributed transaction. It also allows to tune the redundancy of data at table level and define replication zones.

Great conferences

I will be both at M18 (I’m not sponsored by my company, but I chosen to go anyway) and Percona Live. At M18 I will give a talk titled Somewhere between schema and schemaless. Of course I also submitted proposal for Percona Live, let’s see if they get accepted.