Mysql how fast




















Indexes allow for fast retrieval of data. There are several types of indexes in MySQL, and each has its use case and performance. Every table should have a primary key unless you know what you are doing.

The primary key can span multiple columns, which guarantees that the data is unique. I have never worked in that area and never used that index. This index is just like the primary key in that it guarantees that the columns that are part of the index will be unique, but unlike the primary key, managing this index takes space and memory. For large indexes, you may need a big enough memory to handle any operations. This index allows us to index substrings inside a string column.

This is useful for wildcard string searches. These are regular indexes, but their sort order is reversed. This is supported by MySQL 8 which you should use anyway and is useful to process new data first. Doing so allows us to use the same index if we plan to put many rows in the where clause.

If we do put a separate index for each row, MySQL execution plan will find all the results from each where clause and then find only the ones that match all indexes. Finding all the results and then merging them means a massive waste of resources! Using the same index on multiple rows, MySQL execution plan will find only the results from the first row, and then it will filter just those results using the second row where clause.

And so on for more rows. The drawback of an index over more than one row is the index size, sometimes an index can get quite big, as can be seen in the picture below taken using phpmyadmin. For the queries, we want to optimize we need to check the execution plan also called Query Plan. The reason is that what we think is the right way may not be the right way for MySQL. Furthermore, our indexes may not be usable for the query we are running.

There are fewer rows than what the explanation showed. This means that you need to make sure that the columns checked with the where clause are indexed or are small so that a scan will be fast. In this example, we are doing a calculation on the column with the primary key; look what MySQL will do:. It performed a full table scan, even though we have a primary key! The way to solve it is to add a precalculated field, index it, and use that index field in the where clause.

When doing a join between one or more tables all type of joins: inner join, left join, right join, etc. Though we glossed over this before, MySQL actually creates the handler instances early in the optimization stage. The optimizer uses them to get information about the tables, such as their column names and index statistics. This is enough for a query that does an index scan. Not everything is a handler operation. For example, the server manages table locks.

As explained in Chapter 1 , anything that all storage engines share is implemented in the server, such as date and time functions, views, and triggers. To execute the query, the server just repeats the instructions until there are no more rows to examine. The final step in executing a query is to reply to the client. If the query is cacheable, MySQL will also place the results into the query cache at this stage.

The server generates and sends results incrementally. Think back to the single-sweep multijoin method we mentioned earlier. As soon as MySQL processes the last table and generates one row successfully, it can and should send that row to the client.

This has two benefits: it lets the server avoid holding the row in memory, and it means the client starts getting the results as soon as possible. Some of these limitations will probably be eased or removed entirely in future versions, and some have already been fixed in versions not yet released as GA generally available. In particular, there are a number of subquery optimizations in the MySQL 6 source code, and more are in progress.

MySQL sometimes optimizes subqueries very badly. This feels natural to write with a subquery, as follows:. We said an IN list is generally very fast, so you might expect the query to be optimized to something like this:.

Unfortunately, exactly the opposite happens. It rewrites the query as follows:. Sometimes this can be faster than a JOIN. MySQL has been criticized thoroughly for this particular type of subquery execution plan. Although it definitely needs to be fixed, the criticism often confuses two different issues: execution order and caching. Rewriting the query yourself lets you take control over both aspects. Future versions of MySQL should be able to optimize this type of query much better, although this is no easy task.

There are very bad worst cases for any execution plan, including the inside-out execution plan that some people think would be simple to optimize. Instead, benchmark and make your own decision.

Sometimes a correlated subquery is a perfectly reasonable, or even optimal, way to get a result. This is an example of the early-termination algorithm we mentioned earlier in this chapter. So, in theory, MySQL will execute the queries almost identically. In reality, benchmarking is the only way to tell which approach is really faster.

We benchmarked both queries on our standard setup. The results are shown in Table Sometimes a subquery can be faster. For example, it can work well when you just want to see rows from one table that match rows in another table.

The following join, which is designed to find every film that has an actor, will return duplicates because some films have multiple actors:. But what are we really trying to express with this query, and is it obvious from the SQL? Again, we benchmarked to see which strategy was faster. In this example, the subquery performs much faster than the join. We showed this lengthy example to illustrate two points: you should not heed categorical advice about subqueries, and you should use benchmarks to prove your assumptions about query plans and execution speed.

Index merge algorithms, introduced in MySQL 5. In MySQL 5. There are three variations on the algorithm: union for OR conditions, intersection for AND conditions, and unions of intersections for combinations of the two.

The following query uses a union of two index scans, as you can see by examining the Extra column:. This is especially true if not all of the indexes are very selective, so the parallel scans return lots of rows to the merge operation. This is another reason to design realistic benchmarks. Equality propagation can have unexpected costs sometimes. This is normally helpful, because it gives the query optimizer and execution engine more options for where to actually execute the IN check.

But when the list is very large, it can result in slower optimization and execution. This is a feature offered by some other database servers, but not MySQL. However, you can emulate hash joins using hash indexes.

MySQL has historically been unable to do loose index scans, which scan noncontiguous ranges of an index. MySQL will scan the entire range of rows within these end points. An example will help clarify this. Suppose we have a table with an index on columns a, b , and we want to run the following query:. Figure shows what that strategy would look like if MySQL were able to do it. A loose index scan, which MySQL cannot currently do, would be more efficient.

Beginning in MySQL 5. This is a good optimization for this special purpose, but it is not a general-purpose loose index scan. Until MySQL supports general-purpose loose index scans, the workaround is to supply a constant or list of constants for the leading columns of the index. We showed several examples of how to get good performance with these types of queries in our indexing case study in the previous chapter.

However, in this case, MySQL will scan the whole table, which you can verify by profiling the query. This general strategy often works well when MySQL would otherwise choose to scan more rows than necessary. True, but sometimes you have to compromise your principles to get high performance. The query updates each row with the number of similar rows in the table:. To work around this limitation, you can use a derived table, because MySQL materializes it as a temporary table. In this section, we give advice on how to optimize certain kinds of queries.

Most of the advice in this section is version-dependent, and it may not hold for future versions of MySQL. You can do a web search and find more misinformation on this topic than we care to think about. COUNT is a special function that works in two very different ways: it counts values and rows. If you specify a column name or other expression inside the parentheses, COUNT counts how many times that expression has a value. This is confusing for many people, in part because values and NULL are confusing.

The Internet is not necessarily a good source of accurate information on this topic, either. One of the most common mistakes we see is specifying column names inside the parentheses when you want to count rows. This communicates your intention clearly and avoids poor performance.

MySQL can optimize this away because the storage engine always knows how many rows are in the table. MyISAM does not have any magical speed optimizations for counting rows when the query has a WHERE clause, or for the more general case of counting values instead of rows. It may be faster than other storage engines for a given query, or it may not be. That depends on a lot of factors. The following example uses the standard World database to show how you can efficiently find the number of cities whose ID is greater than 5.

You might write this query as follows:. If you negate the conditions and subtract the number of cities whose ID s are less than or equal to 5 from the total number of cities, you can reduce that to five rows:.

This version reads fewer rows because the subquery is turned into a constant during the query optimization phase, as you can see with EXPLAIN :. A frequent question on mailing lists and IRC channels is how to retrieve counts for several different values in the same column with just one query, to reduce the number of queries required.

For example, say you want to create a single query that counts how many items have each of several colors. Here is a query that solves this problem:.

Your only other option for optimizing within MySQL itself is to use a covering index, which we discussed in Chapter 3. Consider summary tables also covered in Chapter 3 , and possibly an external caching system such as memcached. This topic is actually spread throughout most of the book, but we mention a few highlights:.

Consider the join order when adding indexes. Unused indexes are extra overhead. Be careful when upgrading MySQL, because the join syntax, operator precedence, and other behaviors have changed at various times. What used to be a normal join can sometimes become a cross product, a different kind of join that returns different results, or even invalid syntax.

The most important advice we can give on subqueries is that you should usually prefer a join where possible, at least in current versions of MySQL. We covered this topic extensively earlier in this chapter. Subqueries are the subject of intense work by the optimizer team, and upcoming versions of MySQL may have more subquery optimizations.

The server is getting smarter all the time, and the cases where you have to tell it how to do something instead of what results to return are becoming fewer. MySQL optimizes these two kinds of queries similarly in many cases, and in fact converts between them as needed internally during the optimization process. Either one can be more efficient for any given query. Grouping by actor. However, sometimes your only concern will be making MySQL execute the query as quickly as possible.

The purists will be satisfied with the following way of writing the query:. But sometimes the cost of creating and filling the temporary table required for the subquery is high compared to the cost of fudging pure relational theory a little bit. Remember, the temporary table created by the subquery has no indexes. A variation on grouped queries is to ask MySQL to do superaggregation within the results. You may be able to force the grouping method with the hints we mentioned earlier in this section.

You can also nest a subquery in the FROM clause or use a temporary table to hold intermediate results. A frequent problem is having a high value for the offset. If your query looks like LIMIT , 20 , it is generating 10, rows and throwing away the first 10, of them, which is very expensive. Assuming all pages are accessed with equal frequency, such queries scan half the table on average. To optimize them, you can either limit how many pages are permitted in a pagination view, or try to make the high offsets more efficient.

One simple technique to improve efficiency is to do the offset on a covering index, rather than the full rows. You can then join the result to the full row and retrieve the additional columns you need. This can be much more efficient. Consider the following query:. This works because it lets the server examine as little data as possible in an index without accessing rows, and then, once the desired rows are found, join them against the full table to retrieve the other columns from the row.

Sometimes you can also convert the limit to a positional query, which the server can execute as an index range scan. For example, if you precalculate and index a position column, you can rewrite the query as follows:. If you really need to optimize pagination systems, you should probably use precomputed summaries. You can also use Sphinx; see Appendix C for more information.

This option just tells the server to generate and throw away the rest of the result set, instead of stopping when it reaches the desired number of rows.

Assuming there are 20 results per page, the query should then use a LIMIT of 21 rows and display only Another possibility is to fetch and cache many more rows than you need—say, 1,—and then retrieve them from the cache for successive pages.

This strategy lets your application know how large the full result set is. This is quite expensive. You place the appropriate hint in the query whose plan you want to modify, and it is effective for only that query. Check the MySQL manual for the exact syntax of each hint.

Some of them are version-dependent. The options are:. These hints tell MySQL how to prioritize the statement relative to other statements that are trying to access the same tables. These hints are effective on storage engines with table-level locking, but you should never need them on InnoDB or other engines with fine-grained locking and concurrency control.

Be careful when using them on MyISAM, because they can disable concurrent inserts and greatly reduce performance. It lets the statement to which it is applied return immediately and places the inserted rows into a buffer, which will be inserted in bulk when the table is free.

The second usage forces a join order on the two tables between which the hint appears. This hint tells the optimizer to put the results into a temporary table and release table locks as soon as possible. These hints instruct the server that the query either is or is not a candidate for caching in the query cache. See the next chapter for details on how to use them.

They enable you to place locks on the matched rows, which can be useful when you want to lock rows you know you are going to update later, or when you want to avoid lock escalation and just acquire exclusive locks as soon as possible. MySQL 5. When using these hints with InnoDB, be aware that they may disable some optimizations, such as covering indexes.

These hints tell the optimizer which indexes to use or ignore for finding rows in a table for example, when deciding on a join order. This variable tells the optimizer how exhaustively to examine partial plans. This variable, which is enabled by default, lets the optimizer skip certain plans based on the number of rows examined. Both options control optimizer shortcuts.

These shortcuts are valuable for good performance on complex queries, but they can cause the server to miss optimal plans for the sake of efficiency. They work especially well for queries that benefit from a mixture of procedural and relational logic. Purely relational queries treat everything as unordered sets that the server somehow manipulates all at once. MySQL takes a more pragmatic approach.

This can be a weakness, but it can be a strength if you know how to exploit it, and user-defined variables can help. User-defined variables are temporary containers for values, which persist as long as your connection to the server lives.

They are case sensitive in MySQL versions prior to 5. The best thing to do is initially assign a value of 0 for variables you want to use for integers, 0. The optimizer might optimize away these variables in some situations, preventing them from doing what you want.

Order of assignment, and indeed even the time of assignment, can be nondeterministic and depend on the query plan the optimizer chose. One of the most important features of variables is that you can assign a value to a variable and use the resulting value at the same time. In other words, an assignment is an L-value. Still, it has its uses—one of which is ranking.

We start with a query that finds the actors and the number of movies:. We change the rank when the movie count changes. Debugging such problems can be tough, but it can really pay off.

Ranking in SQL normally requires quadratic algorithms, such as counting the distinct number of actors who played in a greater number of movies. A user-defined variable solution can be a linear algorithm—quite an improvement. An easy solution in this case is to add another level of temporary tables to the query, using a subquery in the FROM clause:. Most problems with user variables come from assigning to them and reading them at different stages in the query.

The solution to this problem is to assign and read in the same stage of query execution:. Try it and see. This trick is very helpful when you want to do variable assignments solely for their side effects: it lets you hide the return value and avoid extra columns, such as the dummy column we showed in a previous example.

In fact, this is one of the best uses for user-defined variables. For example, you can rewrite expensive queries, such as rank calculations with subqueries, as cheap once-through UPDATE statements. It also makes sure that all fields in the table only belong to one domain of data being described. For example, in the employee table, the fields could be id, name, social security number, but those three fields have nothing to do with the department.

Only employee id describes which department the employee belongs to. So this implies that which department an employee is in should be in another table. Use Optimal Data Types MySQL supports different data types and choosing the correct type to store your data is vital to have good performance. Different data types serve different purposes. When creating your tables you need to understand what type of data each column will hold and choose the most fitting data type.

Use integer values if you expect all values to be numbers. When it comes to computation, MySQL can do better with integer values as compared to text data types such as Varchar stores variable-length character strings and is the most common string data type.

Make the length of the data type as small as possible. Say for example, Varchar 10 is always perform better than Varchar Allowing null values absence of any value in a column in your database is a really bad idea unless the field can logically have a null value.

The presence of null value can adversely affect your database results. For instance, if you want to get the sum of all orders in a database, the anticipated result might behave badly if a particular order record has a null amount.

The biggest downside to having many columns is extra IO and storage overhead. Having wide tables can be extremely expensive and causes storage overhead. It is ideal no to go above a hundred unless your business logic specifically necessitates this. As opposed to creating one wide table, splitting it apart into logical structures can be beneficial. Suppose you are creating an employee table. In certain instances, you realize that an employee can have multiple addresses. This can help reduce the network usage while fetching large dataset.

Reduce the join statements in queries.



0コメント

  • 1000 / 1000