Current location - Education and Training Encyclopedia - Graduation thesis - How to optimize the database
How to optimize the database
First, database design optimization.

1, without cursor.

Using cursors not only takes up memory, but also locks the table in an incredible way, which can make all the performance optimization that DBA can do equal to zero. Each selection in the cursor is equivalent to a selection.

2. Create an appropriate index

Whenever an index is added to a table, select is faster, but insert and delete are much slower, because creating a maintenance index requires a lot of extra work.

(1) Fields processed by functions cannot use indexes.

(2) When a condition includes multiple fields in the table, it cannot be indexed.

Step 3 use transactions

For some time-consuming operations, using transactions can achieve good optimization results.

4. Beware of deadlock

Visit your watch in a certain order. If you lock table A first and then table B, you should lock them in this order in all stored procedures. If the stored procedure locks table B first and then table A, this may lead to deadlock.

5. Don't open large data sets.

6. Don't use server-side cursors.

Compared with the server-side cursor, the client-side cursor can reduce the system overhead of the server and the network, and can also reduce the locking time.

7. Don't ignore the problem of modifying the same record at the same time.

Sometimes, two users will modify the same record at the same time, so that the latter reviser will modify the previous reviser's operation and lose some updates. To handle this situation, create a timestamp field, check it before writing, merge changes if allowed, and prompt the user if there is a conflict.

8. Try not to use text data types.

Don't use text unless you use it to deal with a large amount of data. Because it is not easy to query, it is slow, and it will waste a lot of space if it is not used well. Generally speaking, varchar can handle data better.

9. Avoid using calculations on indexed columns.

In the where clause, if the indexed column is part of a function, the optimizer will use a full table scan instead of an index. For example:

(inefficient) select ... from [dept] where [sal] *12 > 25000; (Efficient) Select ... from [dept] where [sal] > 25000/12; 12 10, different types of indexes have different performance, so the efficient one should be used first.

The index search efficiency of numerical type is higher than that of string type, and the index efficiency of fixed-length strings char and nchar is higher than that of variable-length strings varchar and nvarchar.

(Inefficient) Selection ... Select from the table names with user name =' Zhang San' and age > =2 1 (valid) ... from tablename where age > = 2 1 and user name =' Zhang San' 12 seconds, and optimize the SQL statement.

1, do not use select *

The benefits of specifying required columns in select are as follows:

(1) Reduce memory consumption and network bandwidth.

(2) It is safer

(3) Give the query optimizer a chance to read all the necessary columns from the index.

2. Use parameter query

Mainly to prevent SQL injection and improve security.

3. use exists or not exists instead of in or not in.

(valid) select * from [EMP] where [empno] > 0 and exists (select 'X' from [dept] where [dept]). [deptno]=[emp]。 [deptno] and [loc] =' melb'); (inefficient) select * from [EMP] where [EMPNO] > 0 and [deptno] in (select [deptno] from [dept] where [loc] =' melb'); 124, empty or non-empty operation.

To judge whether a field is empty, an index is generally not applied, because an index does not index empty values. Null cannot be used as an index, and any column with null value will not be included in the index. That is, if a column has a null value, even if it is indexed, it will not improve performance. Any statement optimizer that uses is null or not null in the where clause is not allowed to use indexes.

Recommended scheme: replace it with other operations with the same function, such as: A is not null and is changed to a>0 or a>? Wait a minute.

5,< and > operation

If it is larger or smaller than the general situation, there is no need to adjust it, because it will use index search if it has an index, but it can be optimized in some cases. If a table has 6,543,800+0,000 records, the effect of executing >: 2 and > =3 is very different.

(inefficient) select * from [EMP] where [deptno] > 2; (efficient) select * from [EMP] where [deptno] > = 3; 126, like to operate.

Like operation can apply wildcard query, and the combination of wildcards in it may reach almost any query, but if it is not used well, it will cause performance problems, such as lide? %5400%? This kind of query does not refer to the index, but to something like? X5400%? Reference range index.

7. Conditional order after 7.where affects

The conditional order after the where clause will have a direct impact on the query of large data tables. For example:

Select * from ZL _ yhjbqk where dy _ DJ =' 1kV or less' and xh _ BZ =1;

Select * from ZL _ yhjbqk, where dy _ DJ = 1 and dy _ DJ =' 1 below kilovolts'; For the above two queries, 123, neither field has an index, so both fields are scanned in the whole table. The first SQL DY _ DJ =' 1 kV? The ratio of conditions in the record set is 99%, while the ratio of xh_bz= 1 is only 0.5%. In the first SQL, 99% of the records were compared with dy_dj and xh_bz. In the second SQL, 0.5% of the records are compared with dy_dj and xh_bz, and it can be concluded that the CPU utilization rate of the second SQL is obviously lower than that of the first SQL.

8. Replace or with union (for indexed columns)

Generally speaking, it is effective to replace or in the where clause with union. Using on an indexed column will cause a full table scan. Note: This rule is only valid for multiple index columns. If a column is not indexed, the query efficiency may be reduced because you have no choice or. In the following example, there are indexes on both loc_id and region.

(inefficient) select loc _ id, loc _ desc, begion from location where loc _ id =10 or begion =' Melbourne'; (efficient) select loc _ id, loc _ desc, Begion from location where loc _ id =10 union select loc _ id, loc _ desc _ Begion from location where begion =' melbourne'; 129, optimize the grouping basis

In order to improve the efficiency of group by statement, we can filter out unnecessary records before group by.

(inefficient) select [job], avg ([sal]) from [EMP] group by [job] having job =' president' or job =' manager'; (efficient) select [job], avg ([sal]) from [EMP] where [job] =' president' or job =' manager' group by [job]; 12 10, using stored procedures.

You can consider using stored procedures to encapsulate those complex SQL statements or business logic, which has several advantages:

The execution plan of (1) stored procedure can be cached in memory for a long time, which reduces the recompilation time.

(2) Stored procedures reduce the complicated interaction between the client and the server.

(3) If you need to make some modifications after the program is released, you can directly modify the stored procedure without modifying the program to avoid reinstalling the deployment program.

1 1, use sp _ configure' to query the governor overhead limit? Or set QUERY_GOVERNOR_COST_LIMIT to limit the resources consumed by the query. When the resource consumed by the evaluation query exceeds the limit, the server automatically cancels the query and kills it before the query. Set Lock Time Set the lock time.

12. Use select top or set rowcount to limit the number of rows in the operation.

13. If the query is not indexed in or or etc. Please use the explicit statement to specify the index: select * from person member (index = IX _ title) where to process id in (? A man? ,? Woman? )。

14. If you want to insert a large binary value in the Image column, use a stored procedure, and never insert it with an embedded insert (I don't know if it is JAVA). Because in this way, the application first converts the binary value into a string (twice its size), and the server receives the character and then converts it into a binary value. There is no such action in the stored procedure: Method: Create the procedure p _ insert as insert into table (f image) values (@ image), and call this stored procedure in the foreground to pass in binary parameters, which obviously improves the processing speed.

15, select EMP _ name form employee where salary > analysis; 3000 In this statement, if salary is of Float type, the optimizer will optimize it for conversion (float, 3000). Because 3000 is an integer, we should use 3000.0 in programming, instead of letting DBMS convert it at runtime. Conversion of the same character and integer data.

Third, the method of processing more than one million data to improve the query speed.

1, try to avoid using it in the where clause! = or

2. We should consider building indexes on the columns involved in where and order by.

3. Try to avoid judging the null value of the field in the where clause, otherwise the whole table will be scanned.

4. Just avoid using the or join condition in the where clause, otherwise it will lead to a full table scan.

Select id from where num = 10 or num = 20 is rewritten as select id from where num =10 union all select id from where num = 201235. Try to avoid using leading percent signs.

Select id from which names like "%ABC%"16, in and not in should also be used carefully. Exists and not exists can be used in many cases, otherwise the whole table will be scanned.

7. If parameters are used in the where clause, it will also lead to a full table scan.

Select id from where num = @ num can be changed to force the query to use an index.

Select id from t with(index) where num=@num 1238. Try to avoid expression operations on fields in the where clause, otherwise the whole table will be scanned.

Select the id, where num/2 = 100 1 should be changed to:

select id from t where num = 100 * 2 19。 Try to avoid functional operations on fields in the where clause, otherwise the whole table will be scanned.

Select id from substring(name, 1, 3) =' ABC' 1 should be changed to:

Select the id, where the name is similar to "ABC%"110. Not all indexes are valid for the query. SQL optimizes the query according to the data in the table. When there is a lot of duplicate data in the index column, the SQL query may not use the index.

1 1, the more indexes, the better. Indexes submit the efficiency of select, but reduce the efficiency of insert and update. The number of indexes in a table should not exceed 6.

12. Try using a numeric field. If the field only contains numerical information, try not to design it as characters, which will reduce the performance of query and connection and increase the storage overhead. Because the engine compares each character in the string one by one when processing queries and connections, it only needs to compare it once for numeric types.

13. Use varchar/nvarchar instead of char/nchar as much as possible, because the storage space of variable-length fields is very small from the beginning, which can save storage space; Secondly, for queries, it is obviously more efficient to search in a relatively small area.

14. Do not use select anywhere, but use a specific field list, and do not return unused fields.

15. Try to avoid using cursors because cursors are inefficient. If the data of cursor operation exceeds 10000 rows, consider rewriting.

16. Try to avoid large transaction operations and improve system concurrency.

17, using set rowcount to achieve high performance paging.

Fourth, the database primary key selection Common database primary key selection methods are:

● Automatic growth field ● Uniqueidentifier●? Comb (close)? 123 1 type, advantages in the field of automatic growth:

(1) is simple and efficient. 1 Disadvantages:

(1) Self-increment generally uses int type, which is limited by the number of data. (2) It will be troublesome to merge the data in the database. 122, GUID Advantages:

(1) is safe and unique. (2) There will be no data merging problems like self-added fields. 12 Disadvantages:

(1) has a length of 16 bytes, which takes up a lot of storage space. (2) The data type is irregular, and it is time-consuming to build an index on it, which is less efficient than using self-increasing fields. Copyright statement: This article is a CSDN blogger? Good and bad are all experiences. The original text conforms to CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement. Original link: /Li tangyuan/ article/details /96867580

How to optimize the database

Label: How to protect data types? Full table scanning csdn wildcard calculation replacement.