1. What is the index
Index is a data structure that can improve the efficiency of database query. It can be compared to a dictionary directory, which can help you quickly find the corresponding records. Indexes are generally stored in files on disk, which occupy physical space. Just as water can carry a boat, it can also capsize it. Proper indexes can improve query efficiency, and excessive indexes will affect the insertion and update functions of database tables
2. What types of MySQL indexes are
Data structure dimension
B+tree index: all data is stored in the leaf node, the complexity is O (logn), and is suitable for range query. Hash index: It is suitable for equivalent query, with high retrieval efficiency and can be achieved at one time. Full-text index: Both MyISAM and InnoDB support the use of full-text index, which is generally created on the text type char, text, varchar. R-Tree index: used to create SPATIAL index for GIS data type
physical storage dimension
Clustered index: A clustered index is an index created with a primary key. The data in the table is stored in the leaf node. (Innodb storage engine) Non-clustered index: A non-clustered index is an index created with a non-primary key. The primary key and index column are stored in the leaf node. (Innodb storage engine)
logical dimension
Primary key index: a special unique index that does not allow null values. Normal index: the basic index type in MySQL, which allows null and duplicate values. Joint index: The index created by multiple fields follows the leftmost prefix principle. Unique index: The value in the index column must be unique, but null values are allowed. Spatial index: MySQL 5.7 supports spatial index, which follows the rules of OpenGIS geometric data model
3. When will the index expire
If the query condition contains or, the index may be invalidated. If the field type is string, where must be enclosed in quotation marks, otherwise the index may be invalidated by the like wildcard. Joint index. The condition column in the query is not the first column in the joint index, and the index is invalid. Use the built-in function of mysql on the index column, and the index will become invalid. The index is invalidated when the index column is operated (for example,+, -, *,/). When (!=or<>, not in) is used on an index field, the index may become invalid. The use of is null or is not null on the index field may cause the index to become invalid. The fields associated with the left-connected query or the right-connected query have different coding formats, which may cause the index to become invalid. MySQL estimates that using full table scanning is faster than using indexes, so indexes are not used
4. Which scenarios are not suitable for indexing
Tables with a small amount of data are not suitable for indexing and updating frequently, nor for fields with low index differentiation. They are not suitable for indexing (such as gender) where, group by, order by, and other fields that are not used later. There is no need to build an index that already has redundant indexes (for example, there is a joint index of a and b, and there is no need to build a index separately)
5 Why use B+tree and why not binary tree
You can look at the problem from several dimensions, such as whether the query is fast enough, whether the efficiency is stable, how much data is stored, and how many times the disk is searched. Why is it not a binary tree, why is it not a balanced binary tree, why is it not a B tree, but rather a B+tree?
Why is it not a general binary tree
If the binary tree is specialized into a linked list, it is equivalent to a full table scan. Compared with binary search tree, balanced binary tree has more stable search efficiency and faster overall search speed.
Why is it not a balanced binary tree
We know that the query efficiency of data in memory is much faster than that of data on disk. If the data structure of the tree is used as an index, we need to read a node from the disk every time we search for data, that is, a disk block, but the balanced binary tree only stores one key value and data per node. If it is a B-tree, it can store more node data, and the height of the tree will also be reduced, so the number of times to read the disk will be reduced, and the query efficiency will be faster.
Then why not B tree but B+tree
The non-leaf nodes of the B+tree do not store data, but only the key value. The B+tree node stores not only the key value, but also the data. The default size of the page in innodb is 16KB. If data is not stored, more key values will be stored. The corresponding tree order (the node’s sub-node tree) will be larger, and the tree will be shorter and fatter. In this way, the number of times we search data for disk IO will be reduced again, and the efficiency of data query will be faster. All the data of the B+tree index are stored in the leaf node, and the data is arranged in order, linked by the list. Then B+tree makes range search, sorting search, grouping search and de-research extremely simple
6. A B+tree index tree lookup process
assumes the following table structure and initializes these data
CREATE TABLE `employee` (
select * from Temployee where age=32;
Then draw the id primary key index. Let’s draw the cluster index structure chart first, as follows:
The execution process of this SQL query statement is as follows:
Search idx_ Age index tree, loading disk block 1 into memory, due to 32< 43. Search the left branch to the disk address block 2. Load disk block 2 into memory due to 32< 36. Search the left-way branch to disk address block 4. Load disk block 4 into memory, continue to traverse in memory, find the record with age=32, get id=400. After getting id=400, return to the id primary key index tree. Search the id primary key index tree and load disk block 1 into memory because 300< 400< 500, so select the middle branch to address disk block 3. Although the id=400 is found in disk block 3, it is not a leaf node, so it will continue to search. To disk addressing block 8. Load disk block 8 into memory, traverse the memory, find the record with id=400, and get the data of R4 line. OK, it’s done
7. What is return form? How to reduce return
When the queried data is in the index tree and cannot be found, you need to go back to theprimary key index treeto get it. This process is calledtable return.
For example, the query SQL used in section 6
select * from employee where age=32;
8. What is an overlay index
If we change the select * of SQL query to select id, age, it is actuallyno need to return to the table. Because the values of id and age are in idx_ On the leaf node of the age index tree, this involves only points covering the index.
Overwrite index is that the selected data column can be obtained only from the index without returning to the table. In other words, the query column should be overwritten by the created index.
9. Talk about the leftmost prefix principle of the index
The leftmost prefix principle of the index can be the leftmost N fields of theunion index. For example, if you build a composite index (a, b, c), you can actually build three indexes (a), (a, b), and (a, b, c), which greatly improves the index reuse ability.
Of course, the leftmost prefix can also be the leftmost M character of thestring index。 For example, your common index tree is purple:
This SQL: select * from employee where name like ‘small%’ order by age desc; It also hits the index.
10. Did you know about index push down? What is the index push down
to give you this SQL:
select*fromemployeewherenamelike'small%'andage=28and sex='0';
If it isbefore MySQL 5.6, in idx_ name_ Age index tree, find out all the people whose first word is”small”, get their primary key id, then go back to the table to find the data row, and then compare other fields such as age and gender. As shown in the figure:
Some friends may find it strange, idx_ name_ Isn’t age (name, age) a federated index? Why is it not more efficient to select the word”small” and then return to the table without looking at the age? Therefore, MySQL 5.6 introducedindex push down optimization, which can judge the fields contained in the index first during the index traversal process, directly filter out the records that do not meet the conditions, and reduce the number of table returns.
Therefore, after MySQL 5.6, select the words containing”small”, and filter them in the order of age=28
11. How to add an index to a large table
If the data size of a table is more than ten million, how to add an index to this table?
We need to know that when adding indexes to a table,will lock the table. If you do not operate carefully, there may be production accidents. You can refer to the following methods:
- Create a new table B with the same data structure as the original table A. Add the new index to be added in the new table B. Import the data from the original table A to the new table Brename The new table B is the table name A of the original table, and the original table A is replaced by another table name
12. How do I know whether the statement is indexed
explain View the SQL execution plan, so that you can know whether the index has been hit.
When explain is used with SQL, MySQL will display information about the statement execution plan from the optimizer.
Generally speaking, we need to focus on type, rows, filtered, extra and key.
1.2.1 type
type represents theconnection type, which is an important indicator of index execution. The following performance is from good to bad: system> const > eq_ ref > ref > ref_ or_ null > index_ merge > unique_ subquery > index_ subquery > range > index > ALL
System: This type requires only one piece of data in the database table. It is a special case of const type, which generally does not appear. Const: The data can be found through one index. It is generally used for the primary key or unique index as the condition. This kind of scanning is very efficient and fast. eq_ Ref: commonly used for primary key or unique index scanning, generally referring to the associated query using primary key ref: commonly used for non-primary key and unique index scanning. ref_ or_ Null: This connection type is similar to ref, but the difference is that MySQL will additionally search for the row index containing NULL value_ Merge: The index merge optimization method is used, and the query uses more than two indexes. unique_ Subquery: similar to eq_ Ref, the condition uses the in subquery index_ Subquery: different from unique_ Subquery, used for non-unique indexes, can return duplicate values. Range: commonly used for range queries, such as between And or In operations index: Full index scan ALL: Full table scan
1.2.2 rows
This column represents the number of rows MySQL needs to read to find the records we need. For InnoDB tables, this number is an estimate, not necessarily an accurate value.
1.2.3 filtered
This column is a percentage value, which is the percentage of qualified records in the table. In short, this field represents the proportion of the number of records that meet the conditions after the data returned by the storage engine is filtered.
1.2.4 extras
This field contains other information about how MySQL parses queries. It usually has these values:
Using filesort: indicates sorting by file, which usually occurs when the specified sorting and index sorting are inconsistent. Generally seen in the order by statement Using index: indicates whether the overwrite index is used. Using temporary: indicates whether temporary tables are used. The performance is particularly poor and needs to be optimized. It is commonly seen in group by statements or union statements. Using where: indicates where condition filtering is used. Using index condition: the new index push down after MySQL 5.6. Data filtering is performed at the storage engine layer, rather than at the service layer, and the existing data of the index is used to reduce the data returned to the table
1.2.5 key
This column represents theindex actually used. General cooperation possible_ Keys column.
13. What is the difference between Hash index and B+tree? How did you choose when designing the index
The B+tree can query the range, but the hash index cannot. B+tree supports the leftmost principle of joint index, while Hash index does not. B+trees support order by sorting, while hash indexes do not. Hash index is more efficient than B+tree in equivalent query. (But if there are many duplicate values in the index column, the hash conflict will reduce the efficiency). When B+tree uses like for fuzzy query, the words after like (such as the beginning of%) can play an optimization role. Hash index can't perform fuzzy query at all
14. What are the advantages and disadvantages of indexing
Advantages:
Index can speed up data query and reduce query time. Unique index can ensure the uniqueness of data in each row of the database table
Disadvantages:
It takes time to create and maintain indexes. Indexes need to occupy physical space. In addition to the data space occupied by data tables, each index also needs to occupy a certain amount of physical space. When the data in the table is added, deleted, or modified, the indexes also need to be dynamically maintained
15. The difference between clustered index and non-clustered index
Clustered index is not a separate index type, but adata storage method. It represents the index where the index structure and data are stored together. A nonclustered index is an index whoseindex structure and data are stored separately.
Next, let’s talk about different storage engines~
In MySQL’s InnoDB storage engine, the biggest difference between clustered index and non-clustered index is whether the leaf node stores a whole row of records. The clustered index leaf node stores an entire row of records, while the non-clustered index leaf node stores the primary key information
A table can only have one clustered index (because the general clustered index is theprimary key index), while a non-clustered index can have more than one table. Generally speaking, compared with non-clustered indexes, clustered indexes have higher query efficiency because there is no need to return to the table
In MyISM storage engine, its primary key index and common index are non-clustered indexes. Because data and index are separated, leaf nodes usean address to point to the real table data.
Original article,If reprinted, please indicate the source:https://www.todaynews.cc/news/1gzqMdl.html