MariaDB 10: Performing fast & inaccurate statistics

As I already wrote, COUNT(*) can be expensive in InnoDB, because this value is not stored in indexes. Some other storage engines may not store that value. Also, COUNT(), MAX(), MIN() are immediate only if executed on an indexed column. But we don’t want too many indexes, and some exotic storage engines do not support indexes – thus, we may need to execute an aggregate function on a non-indexed column.

Now, the question is: do you really need such values to be exact? If you need to report statistical values from your database, probably an error is acceptable. In this case, MariaDB 10 provides a solution: engine-independent statistics.

Note that using this feature may have a cost. Updating statistics for a table means that the server does a full table scan. That is probably why engine-independent statistics are disabled by default. But if you are already use them, or if you think you can benefit from them, you can use the statistics to retrieve the results of some aggregate queries, in a way which will not be always accurate.

Here is how to translate the queries.

COUNT():

SELECT `cardinality` FROM `mysql`.`table_stats` WHERE db_name='test' AND table_name='t';

MIN() and MAX():

SELECT `min_value`, `max_value` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a';

AVG(LENGTH()):

SELECT `avg_length` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a';

COUNT(*) WHERE IS NULL:

SELECT COUNT(*) / (SELECT `nulls_ratio` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a') FROM t2;

Note that this could be a division by 0, so you should handle NULL as a result.

About these ads

4 thoughts on “MariaDB 10: Performing fast & inaccurate statistics

  1. column_stats.nulls_ratio is implemented. Its meaning is a “fraction of rows that have $column_name IS NULL”. I’ve just tried a simple example, if the column has NULL values, then column_stats.nulls_ratio is non-zero after ANALYZE command.

    • Thank you for letting me know, I’ll fix the post! But there is still something I don’t understand, or maybe I’ve hit a bug:
      MariaDB [test]> SELECT (SELECT COUNT(*) FROM t) / (SELECT COUNT(*) FROM t WHERE b IS NULL);
      +———————————————————————+
      | (SELECT COUNT(*) FROM t) / (SELECT COUNT(*) FROM t WHERE b IS NULL) |
      +———————————————————————+
      | 1.4422 |
      +———————————————————————+
      1 row in set (4.74 sec)

      MariaDB [test]> ANALYZE TABLE t;
      +——–+———+———-+———-+
      | Table | Op | Msg_type | Msg_text |
      +——–+———+———-+———-+
      | test.t | analyze | status | OK |
      +——–+———+———-+———-+
      1 row in set (0.57 sec)

      MariaDB [test]> SELECT nulls_ratio FROM mysql.column_stats WHERE db_name=’test’ AND table_name=’t’ AND column_name=’b';
      +————-+
      | nulls_ratio |
      +————-+
      | 0.0000 |
      +————-+
      1 row in set (0.00 sec)
      MariaDB [test]> SHOW CREATE TABLE t\G
      *************************** 1. row ***************************
      Table: t
      Create Table: CREATE TABLE `t` (
      `a` int(11) DEFAULT NULL,
      `b` int(11) DEFAULT NULL,
      `c` int(11) NOT NULL DEFAULT ‘0’,
      KEY `idx_a` (`a`)
      ) ENGINE=InnoDB DEFAULT CHARSET=latin1
      1 row in set (0.00 sec)

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s