On MySQL Partial Indexes
June 28, 2011 § 1 Comment
MySQL partial indexes are a great way to reduce the size of your indexes. In Rails apps, the default string column is a
VARCHAR(255) and adding an index to it can create large indexes. Since very few of the columns you use will ever actually be 255 characters in length, and many everyday attributes and columns have high entropy in some prefix substring, partial indexes make for great compromises.
Another quick thing to note is that if you are using the InnoDB storage you can’t use full indexes on
VARCHAR(255) columns in compound indexes because of the 767 byte limit on the index key size.
When working with partial indexes it can be helpful to know exactly how much of the column is covered uniquely by an index of a given size. Fernando Ipar has a pretty nifty little SQL query that will give you a rudimentary peek into how well a partial index will perform. The query will tell you what percentage of rows are uniquely identified by the index. You can check out his blog post about it over here. Here is the general form of the query:
-- SELECT COUNT(DISTNICT(SUBSTR(<column>,1,<partial index length>))) / COUNT(DISTINCT(<column>)) * 100 FROM <table>; SELECT COUNT(DISTNICT(SUBSTR(name,1,10))) / COUNT(DISTINCT(name)) * 100 FROM customers;
A Little Problem
With all the goodness that partial indexes offer, I have found at least one draw back. It seems that partial indexes cannot be used with aggregation functions like
GROUP BY. Even if the partial index does not uniquely identify each row in the table, one would think that MySQL would be able to use the partial index to at least help the
Update (11/8/2011): Someone posted an interesting answer to my question about this problem on stackoverflow. They made the point that using an index for a hint can’t really buy you anything when doing grouping operations. If the index doesn’t cover the entire string then the partial index might be able to tell if they are different, but it can’t tell for sure if they are the same. It’ll have to go to the table itself for confirmation, and if it is having to go to the table a bunch for confirmation then it might as well just to a table scan. The table scan will be more likely to have the nicer properties of a sequential read while using a partial index for hints and then going to the table for confirmation could create a bunch of random reads. There is probably some tipping point here that would make using the partial index’s hints favorable, but one would probably be better served shrinking the size of the column and indexing the full thing if you want to use the index with grouping operations.
Update (10/30/2011): Turns out this post shows up when some searches for mysql partial index in Google. Figured I might want to make it a little more helpful for those who end up here.
-- The most basic way to create a new partial index on a column -- CREATE INDEX <index name> ON <table name> (<column name>(<number of characters to index>)); CREATE INDEX part_of_name ON customers (name(10));
# To create a partial index with a Rails migration # add_index(<table name>, <column name>, :name => <index name>, :length => <partial length>) add_index(:customers, :name, :name => 'part_of_name', :length => 10)
[…] Manual. This investigation was prompted by an answer to a question I posted on Stackoverflow about partial indexes and the group by operation. LD_AddCustomAttr("AdOpt", "1"); LD_AddCustomAttr("Origin", "other"); […]