My 2019 Database Wishlist

2019

Last year I published my 2018 Database Wishlist, which I recently revisited to check what happened and what didn’t. Time for a 2019 wishlist.

I am not going to list items from my 2018 list, even if they didn’t happen or they partially happened. Not because I changed my mind about their importance. Just because I wrote about them recently, and I don’t want to be more boring than I usually am.

External languages for MySQL and MariaDB

MariaDB 10.3 implemented a parser for PL/SQL stored procedures. This could be good for their business, as it facilitates the migration from Oracle. But it isn’t an answer to the community request of supporting external languages, like C or Python.

Oracle theoretically allows to use languages supported by GraalVM in MySQL. But unspecified legal problems seem to stop them from releasing this. In any case, unfortunately, this feature is only available on GraalVM.

External languages are really desirable. Antony Curtis wrote a patch for this years ago, but neither MySQL or MariaDB included it. Ronal Bouman wrote mysqlv8udfs, a UDF to run JavaScript code via Google Chrome’s JavaScript engine, but it was never included in MySQL or MariaDB.

Please, Oracle and MariaDB: do it. Don’t just start another discussion in a mailing list, do it for real.

ClickHouse without ZooKeeper

ClickHouse needs ZooKeeper to setup a cluster. ZooKeeper is based on the JVM and it’s hard to use. Having to use it because there is no alternative is always quite annoying.

In their repo, there is a feature request to support Consul instead of ZooKeeper. To recap: the main problem is that they use some ZooKeeper unique features, but they plan to refactor the ZooKeeper client library, and after that it is possible they they will implement this feature.

I wrote about Consul some time ago, when I was a Percona consultant. I consider it a very good solution. The other alternative is etcd. It’s worth noting that, despite it being widely used, the CNCF still considers it as an incubating project.

Sphinx: implement JOINs

Sphinx seems to be forgot by many. It is not trendy anymore. There have not been cool titles on the tech sites involving Sphinx for years. So, is it slowly dying? No, it’s simply stable.

There are more trendy alternatives nowadays, yes. Like ElasticSearch and Solr. So why do I care about Sphinx? Well, first, it’s quite KISS (keep it simple, stupid!). It doesn’t have lots of features just because it was theoretically possible to implement them. Sorry for repeating myself, but basically… it’s stable. Second, it supports a subset of SQL, it’s kind of relational, and its language is compatible with MySQL. It could invent a new model and a new language that are supported by nothing else in the world, but they (and not many other modern database vendors) realised that it wouldn’t be a good thing.

But it misses JOINs. I don’t want to run complex JOINs on it, it will never be the right tool for that. Still, this would open more opportunities for Sphinx users. They always mentioned JOINs as something “currently missing”, so I still hope to see them implemented.

PostgreSQL: setup a real bug tracker

As I already mentioned, PostgreSQL doesn’t have a bug tracker. There is a mailing list. A problem that this surely causes is that it’s impossible to do structured searches – for example: get a list of confirmed bugs in version 11 involving foreign keys. I suspect there is another major problem: this could prevent some people from reporting bugs.

PostgreSQL is a great project, but please take bugs seriously.

Percona, be more conservative about your unique features

Every major version removes some features from the previous one. I understand why you do that, and I appreciate it. You want to keep the delta between Percona Server and MySQL as small as possible. If a feature is not used by enough customers, it implies some unjustified difference to maintain. I know that this policy is important to maintain your fork’s quality high.

Yet, I feel that you tend to remove too much stuff. Suppose I start to use a feature now and after some time I have to stop. If I think about it, I wish you didn’t implement it at all in the first place. So my point is not necessarily “maintain more features”, it could be “develop less features” as well.

Open source databases and cloud providers

We have read about some open source databases going proprietary because they spend money to innovate, while cloud providers simply take their customers away without giving anything in return. In particular, Amazon modifies open source software to create its own products, and sells them without paying the original vendors.

What can I say about Amazon… nothing, just a sarcastic “thank you”.

But I’ve something to ask database vendors. Do you know what MaxScale is? If not, it’s because this strategy of going proprietary is not as smart as you may think. Users will simply move away and forget you. Some of those users are paying customers. But even many of those who never paid you did contribute to your product – and indirectly, to your company’s incomes. How? With bug reporting, by writing technical contents about your products, by talking about how good your software is, etc. I don’t know of any estimation of how much these things contribute to a company’s economy, but I would be very interested in reading such a thing.

Federico

Advertisements

Open source databases and bug trackers

All software has bugs. I rarely feel scared because of a bug – even famous ones that are sometimes in the news. People should manage software wisely, and be able to workaround bugs they hit.

Instead, I’m more concerned about how vendors treat their bugs. In particular, I care about transparency, and how projects process their bugs.

Here’s what I mean.

Transparency:

  1. Ability to search for new, active, and closed bugs.
  2. Public tests to check yourself if bugs are fixed, and if regression bugs made their way back into the code.
  3. Public version maintenance policy – if I use an old version, I should know if bugfixes are still backported.
  4. Optional: See developers comments, see their activity on a specific bug.
  5. Optional: See which version should contain a fix (if planned) and when that version will be released (if planned).

I want to highlight the word Optional. Points 1, 2 and 3, in my opinion, should be in place for any project that is not a personal hobby. If they miss (quite often, I have to say…) the project can still be useful and cool, but it is not being managed properly from QA point of view.

Bug processing:

I am not able to write a detailed list here. I think that this part depends on project characteristics: what the program does, if it is led by a company or not, how active the community is in development, and so on. Still, common sense should apply. Users are affected by bugs, therefore bugs should be fixed when it is reasonable to do so.

When it comes to open source RDBMSs…

Yes, I’m only talking about relational databases. And I’m only consider the most widely known ones. With this, I don’t want to disrespect any project. There can be great reasons to use NoSQL or less known SQL databases, but NoSQL is a huge set of different products, and I only want to write about what I know (or “know”).

MySQL

MySQL has a public bug tracker, which is good. But they also have a private one, and this has some bad consequences:

  • If you read about a bug that is still active, you cannot be sure that information is up to date.
  • Some bugs are private, for security reasons. Those reasons are often quite obscure, if not always. Note that MySQL typically runs on open source layers whose security bugs are public.
  • For the same reasons, not all tests are public. This reduces community QA capabilities.

But you can read more detailed information about this topic in Blog of (former?) MySQL Entomologist.

Another flaw is that not all MySQL components have the same level of quality. If you make heavy use of certain features (XA transactions, stored procedures…) you can reasonably expect to hit bugs. This is partly understandable, because Oracle needs to improve the features that are most used and useful. What is much less understandable is how an area of the product that was disappointing and buggy many years ago is still disappointing and buggy.

About the bug tracker… I’d like to highlight something that is rarely mentioned: its search capabilities are powerful. Maybe not so user friendly – you need to know how to use the tool. But once you know, you will be able to find everything you need.

Percona Server

Percona Server is a fork of MySQL which tries to keep the differences as small as possible. Thus, generally MySQL bugs are also Percona Server bugs, and all the features they implement could cause some specific bugs.

If they fix MySQL bugs, you will find this in their documentation.

Specific bugs can be found in Percona Server JIRA.

You can usually find internal details and tests. You know who works on a bug and when. Generally you don’t know in advance which version will contain a fix.

I think they do very well, and no, the fact I worked for Percona does not affect my opinion. Keep in mind, however, that Percona is a customer-centric company. This means that generally they fix bugs that affect their customers. And, obviously, they merge bug fixes that happen in mainstream MySQL code.

MariaDB

They also have a JIRA to report and search bugs. In their JIRA you can also find tests and several technical information written by the developers. It happened to me more than once to read more information about a bug and find interesting details about InnoDB internals.

In MariaDB JIRA you can also find some information that you can’t find in Percona JIRA. You can see the active sprint’s activity, so you know which bugs they are working on and what their status is (todo, in progress, in review, done). And you can see the next releases, including the next planned release dates (keep in mind that they are not commitments). This is a very high level of transparency and potentially a valuable source of information, for example if you are planning upgrades or if you have pressing problems with a specific bug. And of course it’s still interesting if you are just curious.

I also want to highlight that they discuss in the same way contributions and new features. Also, developers often interact with the community using their mailing lists, which makes the whole project much more transparent.

PostgreSQL

PostgreSQL does not have a bug tracker. Bugs can be reported to the dedicated mailing list or via a web form which sends a message to the mailing list. The archives are the only available tool to see information about Postgres bugs. In other words, there is no way to see the list of open bugs. This is sad.

However, Percona recently started to offer support for PostgreSQL. Their Careers page indicated that they were looking for someone able to work with Postgres code, some time ago (yes, I keep an eye on some companies career pages, because they are a source of information about their plans). Therefore my hope is that they intend to do a great QA work with Postgres like they always did with MySQL, and that they will push to have a publicly available professional bug tracker.

CockroachDB

CockroachDB uses GitHub issues to track bugs. The tool is quite poor compared to JIRA or bugs.mysql.com, but it has the most important features that you can reasonably expect.

Usually there are not many technical details about bugs. And you don’t know what is happening around a bug, unless there is a new comment.

Why do I always criticise?

Actually I don’t. But I’m aware that, if a person expresses 3 positive thoughs and 1 criticism, in many cases only the latter will be remembered. This is natural.

Anyway, my spirit is always: please do better. I use my time to criticise what I like, not what I dislike.

My criticism could also be wrong, in which case there could at least be a good discussion in the comments.

Federico