SQL Common Errors: confusing WHERE and ON

You probably know the ON clause of SQL JOINs. It is a condition that determines which rows from a table are related to a row from another table.

Conceptually, this is extremely different from the WHERE clause, that determines which row combinations will be returned to the client. However, sometimes a condition can be moved from the WHERE clause to the ON clause, because it will still prevent rows from being excluded. But please keep on reading, because the TL;DR of this post is: don’t do that.

Actually, why would one move conditions from WHERE to ON? Because many developers think that this is an optimization. They believe that the ON clause is executed before the WHERE clause, so moving the conditions will cause them to be evaluated at an earlier stage, avoiding some useless work.

Now, one may actually show some examples where moving a condition actually speeds up a query. This can happen in some uncommon cases, and the root cause is that the optimizer is not able to find the optimal execution plan. Don’t focus on the ON clause. Just focus on building proper indexes or finding out why the optimizer is not using them. But this is a wider topic that will not be covered here.

ON means not WHERE

Let me show you an example. I used MariaDB here, but I expect the same results with all DBMSs I know.

SELECT a.full_name, b.title
    FROM author a
    LEFT JOIN book b
        ON a.id = b.author_id
    WHERE b.title LIKE 'the%'
;
+------------+-----------------------------+
| full_name  | title                       |
+------------+-----------------------------+
| H.G. Wells | The invisible man           |
| H.G. Wells | The Island of Doctor Moreau |
+------------+-----------------------------+
2 rows in set (0.000 sec)

We get a list of the books whose title starts with “the” (case-insensitive), and their authors. As you can see, the ON clause is used to join the rows correctly, and the WHERE clause determines which rows are to be filtered in.

In this database, I inserted another author: Edgar Allan Poe. But he doesn’t appear, because none of the books I inserted starts with “The”.

But what if I move the WHERE condition to ON?

SELECT a.full_name, b.title
    FROM author a
    LEFT JOIN book b
        ON a.id = b.author_id
        AND b.title LIKE 'The%'
;
+-----------------+-----------------------------+
| full_name       | title                       |
+-----------------+-----------------------------+
| H.G. Wells      | The invisible man           |
| H.G. Wells      | The Island of Doctor Moreau |
| Edgar Allan Poe | NULL                        |
+-----------------+-----------------------------+
3 rows in set (0.000 sec)

Edgar Allan Poe appears! Why? SQL is supid!? MariaDB is buggy?! These are typical reactions, but… no, the query is wrong.

This is a LEFT JOIN. Rows from the left tables are returned even if they never match the ON clause.

With an INNER JOIN, actually you will not notice any difference. But what if you are looking for books who title starts with “the” OR whose author name starts with a letter < "M"? Try it yourself, as an exercise. You will see that you cannot always fix the problem by using an INNER JOIN: you need to use ON properly : -)

Federico

Advertisements

Revisiting my 2018 Database Wishlist

It is December and 2018 is going to end. In January, when it just started, I wrote my 2018 Database Wishlist. Probably next January I’ll write another list. But first, it makes sense to review the last one. If some of my wishes actually happen, I really should know that – I don’t want to miss something nice, or forget topics that I considered interesting less than one year ago. Well, let’ s do some copy/paste and some googling…

More research on Learned Indexes

I’m not sure if more research actually happened – I hope so, and I hope that we’ll see its results at some point. At least, it seems that the topic was not forgotten. It was mentioned at least at Artificial Intelligence conference in May, and at Stanford Seminar in October.

It’s worth noting that Wikipedia still doesn’t have a page for Learned Index.

Progress on using ML for database tuning

The Overtune website didn’t publish anything new – it just removed some previously available information. It’s now possible to register to become a tester for a beta version, so it is reasonable to think that there has been some progress.¬†The repository is actually active. No new public articles, so bad. I’ll definitely stay tuned.

More research on stored functions transformation

I can’t find anything newer than¬†the original paper, except for a post from Microsoft Research. I found no evidence that anyone not working at Microsoft considers this research interesting.

Galera 4

MariaDB 10.4 plans mention that Galera 4 will be included. But this could be just another optimistic hypothesis, so basically… I still see nothing new.

Transactional DDL in the MySQL ecosystem

MDEV-4259 – transactional DDL is still open, no fix version was set. They indeed dedicated resources to MDEV-11424 – Instant ALTER TABLE of failure-free record format changes.

Oracle doesn’t say much about what will be in future MySQL versions. However, someone from Oracle said that atomic ALTER TABLE is a foundation for transactional DDL, which could mean that they’re closer to that than MariaDB. So, let’s hope we’ll see this feature in the next version – but there was no claim from them about that.

Engines, engines, engines

The storage engines I mentioned are still available for MySQL and MariaDB, or come with Percona Server. Oracle still didn’t kill MyISAM. However, no, I didn’t see any big change in SPIDER.

More progress on Tarantool and CockroachDB

Apparently, Tarantool 2 (still not stable) fixed a lot of bugs and improved its SQL support. This includes removing ON CONFLICT REPLACE for UNIQUE indexes, that is also problematic for MySQL.

Cockroach actually added a lot of features. Amongst other things, I want to report the cost-based optimizer and the CDC. IMPORT command allows to import dumps from MySQL and PostgreSQL, as well as CockroachDB itself and CSV files.

Final thoughts

Some things simply didn’t happen.

Learned index structures and machine learning to tune database performance apparently weren’t forgotten, so hopefully we’ll see something interesting in the future.

Tarantool and CockroachDB show interesting enhancements. MySQL third-party storage engines didn’t introduce anything fancy, but keep on doing a good job.

Federico

On Percona Community Blog

I liked Percona Community Blog from the beginning. First of all, the idea is great. There is no other community blog for the MySQL ecosystem.

Well, Oracle has its own planet.mysql.com – and I have to say, they are correct: as far as I know, they never censored posts about MariaDB and Percona Server, nor opinions that they don’t like. I wrote heavy criticism about Oracle, sometimes using strong terms (“another dirty trick”), but they never censored me. I like to be fair regardless who/what I’m talking about, so this is a good time to spend some good words about them. Not the first time, anyway. That said, their blogroll only advertises a very small number of blogs. Very good ones of course (except for mine?), but this has the inevitable side effect of obfuscating the rest of the world. If John Doe writes an enlightening post about MySQL, I’ll never read it, because everything I need to know appears on Planet MySQL.

Percona Community Blog may have the same side effect… or maybe not, or the effect could be weaker at least. I saw outstanding contents by my well-known friend JF, yes. But I also saw articles by people that I don’t know, and I never saw on Planet MySQL. So I believe PCB is proving itself quite inclusive.

I started to publish some contents there. First, I used it to promote my talk at Percona Live Europe in Frankfurt, MariaDB System-Versioned Tables. Then I published an article on the same topic, Some notes on MariaDB system-versioned tables. Even if recently I’m not writing as much as I used to do some years ago, I believe that you will see more posts from me in the near future. PCB is a great place to publish stuff.

One could object that PBC contains the name of a private company and is hosted on its own website, so it is not genuinely a community project. Which is absolutely true. But if you want to see something better in the MySQL ecosystem, you will have to create it, because currently it doesn’t exist.

So, is this blog going to die? Absolutely not. This is my personal space. Any third-party website, no matter how good, can disappear or delete our contents, and there is nothing you can do about it. A personal space is there till you want it to be there. I don’t know how I will decide what will go here and what will go on PCB, I’ll have to think more about it.

Furthermore, being in several places is a form of redundancy, if we decide that our presence on the web is important for us. That is why I always keep my profiles on LinkedIn and Facebook a bit active, and some days ago I even created a YouTube playlist with my webinar recordings – only three, two of which in Italian, but still.

Well, enough babbling. Just a final word: if you have something interesting to say about open source databases, you should definitely propose it to PBC. Making it even more interesting is up to us!

Federico