One thing that I’ve found with very large tables in MySQL is that they suck when you need to return a large subset of ordered records. The MySQL call for this is this (baseline explain shows filesort of 1,000,000 rows). This demo table has 10,000,000 rows.
select detail1, detail2, city, country from data (...lots of joins...) where country = 'us' order by username desc limit 200000,1000; -- Time: 43.503ms
Even with indexes setup, on my backup server, this takes 20-40 seconds to execute: an extremely painful exercise (the real select has three joins and IF conditions)
No matter what indexes where configured, forced or ignored , the query just would not execute fast enough. This is where Redis was used to provide some much needed help.
Redis has an extremely useful sorted sets, which were used here to provide another query condition to MySQL, thus dropping the number of rows in the query. Once the script code has been updated to ZADD the country code to a sorted set and the Redis cache updated with the MySQL data, the process looks like this (in semi-php)
$start = $redis->zrevrange('us', 210000,210000); (returns 5299000)
$end = $redis->zrevrange('us',200000,200000); (returns 5421000)
The two Redis calls to get the values, takes milliseconds.
If you then feed the values into MySQL and get the new call of
select detail1, detail2, country from data (...lots of joins...) where country = 'us' and id >= 5299091 and id < 5421000; -- Time: 0.406ms
As I am updating Redis on insert anyway, adding another Redis command to the pipeline, adds next to nothing to the overall processing. Sure there is a risk of data getting out of sync if you purge data directly, outside the scope of the control scripts but I don’t do that. It wouldn’t take more than a couple of minutes to add deletion code and maybe a check/resync but this speed is such a major benefit to my site and it stops a Google crawl from killing MySQL.
Pros – Order by Magnitude speed increases, much lower system load, faster web pages, less site lag
Cons – Redis uses 600mb of memory for the data, possible data sync issues, some extra coding.
Wishes – that MySQL would talk to Redis and that Redis sorted sets could take a packed int as the score (as an option) , instead of just a double as this would save a lot of memory. It might even fit into the ziplist