This third part of MySQL optimizing tutorial mini series shows an importance of table order during JOIN. The difference between regular JOIN and optimized one is dramatic, so keep reading! This post may save your time and headaches in some MySQL problem handling.
Once upon a time I bought an application for article submissions, written in PHP. Yes, it sounds incredibly, but it happened. Primary reason was my laziness to write another application from scratch, secondary one was a time pressure. Nevertheless, I did it so. Today I know that it was a mistake, because the time spent by correction was at least same or very close to time I would spent by development of application from scratch. The bug described in this post was just one of tens found in this commercial (and not cheap!) web application. But let’s move deeper!
Neverending Page Loads
At beginning the application worked fine, but along with increasing number of database (MySQL) entries, the page loading time began increase too. After couple of months, when key database table had about 50.000 entries, the page loads exceded acceptable time and visitors patience. The site was absolutely useless! First I started to investigate security of the web site and server as itself. No security issues were found, so I moved into the application source code. I enabled MySQL log_slow_queries as described in my previous part of this MySQL optimization mini series.
What a Surprise!
After couple of hours I checked MySQL slow queries log and found very interesting entries in there. As I mentioned earlier, I set up the long_query_time to 5 seconds but the query_time found in slow_log was pretty far from this treshold. You probably wouldn’t believe (I did not too) but number which I saw there was 00:56:14. Yes, you can see right, it was almost one minute to perform MySQL query! I’m sure you all agree with me, that this query time is strongly NOT acceptable and must be fixed as soon as I lose all visitors.
What Now?
Once I found potencial problem described in paragraph above I started to investigate the problematic query. Slow log field named sql_text showed me that doubtful query and I immediately had my enemy in front of me.
select *, unix_timestamp(StartDate) as SD from al_articles as art inner join al_users as u on art.AuthorID = u.UserID left join al_categoryassociations as ca on art.ArticleID = ca.ArticleID inner join al_categories as c on ca.CategoryID = c.CategoryID where art.Featured=0 and art.Visible=1 and c.inrecent=1 and art.Status=1 and unix_timestamp(art.StartDate) <= unix_timestamp(now()) and art.IsFinished=1 and (unix_timestamp(art.ExpiryDate) >= unix_timestamp(now()) or art.EnableExpiry=0) order by art.StartDate DESC, art.ArticleID DESC limit 7
First I checked whether all indexes are set up properly, but there seemed to be everything OK. I’m not going to bother you with long story about hours of MySQL query, manual, google outputs and source code research. I rather jump at the end, to share the solution with you.
Remedial Action
Well, after couple of hours I found that problem will probably be the JOIN, more exactly – the MySQL optimizer puts the tables in the wrong order while joining them. This causes those neverending query executions. What I did now, is just add the STRAIGHT_JOIN into the SELECT statement as shown below.
select STRAIGHT_JOIN *, unix_timestamp(StartDate) as SD from al_articles as art inner join al_users as u on art.AuthorID = u.UserID left join al_categoryassociations as ca on art.ArticleID = ca.ArticleID inner join al_categories as c on ca.CategoryID = c.CategoryID where art.Featured=0 and art.Visible=1 and c.inrecent=1 and art.Status=1 and unix_timestamp(art.StartDate) <= unix_timestamp(now()) and art.IsFinished=1 and (unix_timestamp(art.ExpiryDate) >= unix_timestamp(now()) or art.EnableExpiry=0) order by art.StartDate DESC, art.ArticleID DESC limit 7
I was pretty excited reloading the page after this correction. It was done in a second and slow_log didn’t write any entry, which means that execution had to be under 5 seconds! What a difference comparing with 56 seconds, doesn’t it? I was pretty happy to solve this problem and saved my visitors!
Summary
As you can see from the story above, order of table joins may have dramatic influence on MySQL performance. The only sentence I found in MySQL manual was: STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order. But you never know whether your case is not that one of those few! So, once you encounter disproportionately long page loads, I suggest to read my previous MySQL optimization article and let MySQL log slow queries. You’ll get extremely valuable information which could lead to fix an issue. Use the STRAIGHT_JOIN if it’s necessary to tell MySQL to join the tables in the order that they are specified in your statement.
unix_timestamp(art.StartDate) = unix_timestamp(now())
will NEVER use the fields index,
because the date is calculated/converted.
The indexed field must be “pure”,
for instance, if you have an indexed datetime field and do date comparison with it, it won’t uses the index on that field.
You should rewrite it as:
art.StartDate >= current_date
art.ExpiryDate >= current_date
or if it’s a DateTime
art.StartDate >= now()
art.ExpiryDate >= now()
If you really need some calculation to be done, then you must rewrite the condition such that the indexed field is pure and the condition is not and can be computed before the index is used…
For instance:
DATE_SUB(CURRENT_DATE,INTERVAL 30 DAY) <= art.ExpiryDate;
== (constant <= field)
is faster than:
CURRENT_DATE <= DATE_ADD(art.ExpiryDate,INTERVAL 30 DAY);
== (constant <= computed(field))
Also, you are using two fields to be ordered by which are not "explicitly" being stated in your SELECT uncomputed…
Normally you should never uses *,
except for debugging, if you upgrade your schema, then your application logic may break.
Did you try EXPLAIN SELECT ?
I absolutely understand what you are writing about, Fred. But in this case I’ve just pasted whole statement as it was originaly writen by the authors of that software. I realize that there are more deficiencies, but this one was used for demonstration purposes only. Nevertheless, I appreciate your comment, I’m sure it will be helpfull for all readers.
Do you proof read your articles?
If so, you should have someone else do it for you.
Good topic though.
Why? Is there something wrong with it?
Simple but great stuff. I hit with the same problem and your article helped me. Thanks Teddy.
Join order does not matter in MYSQL. The optimizer does a quite good job at figuring out the best join order. Do you have indexes on the columns you are using in the join condition? If not, MySQL has no data to use for choosing the best join order and seems to just use the same order as used in the query. I did not consider
such a case as you usually have those indexes to get those such queries done
in a reasonable time.If MySQL gets the join order wrong although there are indexes, try to run OPTIMIZE TABLE on the tables to get up to date statistics.
Great. I was in trouble by the same problem and your article helped me very much.
Thank you in advance.