MariaDB 10: Performing fast & inaccurate statistics

As I already wrote, COUNT(*) can be expensive in InnoDB, because this value is not stored in indexes. Some other storage engines may not store that value. Also, COUNT(), MAX(), MIN() are immediate only if executed on an indexed column. But we don’t want too many indexes, and some exotic storage engines do not support indexes – thus, we may need to execute an aggregate function on a non-indexed column.

Now, the question is: do you really need such values to be exact? If you need to report statistical values from your database, probably an error is acceptable. In this case, MariaDB 10 provides a solution: engine-independent statistics.

Note that using this feature may have a cost. Updating statistics for a table means that the server does a full table scan. That is probably why engine-independent statistics are disabled by default. But if you are already use them, or if you think you can benefit from them, you can use the statistics to retrieve the results of some aggregate queries, in a way which will not be always accurate.

Here is how to translate the queries.

COUNT():

SELECT `cardinality` FROM `mysql`.`table_stats` WHERE db_name='test' AND table_name='t';

MIN() and MAX():

SELECT `min_value`, `max_value` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a';

AVG(LENGTH()):

SELECT `avg_length` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a';

COUNT(*) WHERE IS NULL:

SELECT COUNT(*) / (SELECT `nulls_ratio` FROM `mysql`.`column_stats` WHERE db_name='test' AND table_name='t' AND column_name='a') FROM t2;

Note that this could be a division by 0, so you should handle NULL as a result.