Misusing GROUP BY to return unpredictable results Murphys Law

suggest change
SELECT item.item_id, uses.category,   /* nonstandard */ 
       COUNT(*) number_of_uses 
  FROM item 
  JOIN uses ON item.item_id, uses.item_id
 GROUP BY item.item_id

will show the rows in a table called item, and show the count of related rows in a table called uses. It will also show the value of a column called uses.category.

This query works in MySQL (before the ONLY_FULL_GROUP_BY flag appeared). It uses MySQL’s nonstandard extension to GROUP BY.

But the query has a problem: if several rows in the uses table match the ON condition in the JOIN clause, MySQL returns the category column from just one of those rows. Which row? The writer of the query, and the user of the application, doesn’t get to know that in advance. Formally speaking, it’s unpredictable: MySQL can return any value it wants.

Unpredictable is like random, with one significant difference. One might expect a random choice to change from time to time. Therefore, if a choice were random, you might detect it during debugging or testing. The unpredictable result is worse: MySQL returns the same result each time you use the query, until it doesn’t. Sometimes it’s a new version of the MySQL server that causes a different result. Sometimes it’s a growing table causing the problem. What can go wrong, will go wrong, and when you don’t expect it. That’s called Murphy’s Law.

The MySQL team has been working to make it harder for developers to make this mistake. Newer versions of MySQL in the 5.7 sequence have a sql_mode flag called ONLY_FULL_GROUP_BY. When that flag is set, the MySQL server returns the 1055 error and refuses to run this kind of query.

Feedback about page:

Feedback:
Optional: your email if you want me to get back to you:



Table Of Contents