Current location - Education and Training Encyclopedia - Graduation thesis - Seeking SQL database papers
Seeking SQL database papers
Research on SQL query optimization in ORACLE

Database performance has always been the focus of decision makers and technicians, and an important factor affecting database performance is the inefficiency of SQL query statements. This paper first analyzes four common reasons that lead to the low performance of SQL query statements and the general steps of SQL tuning, and then analyzes how to reduce I/O operations, how to avoid high-cost operations on query results in query statements, and how to improve the query efficiency when multiple tables are connected.

Keywords Oracle Bone Inscriptions; ; SQL optimization; connect

1 Introduction

With the continuous development of network applications, system performance has attracted more and more attention of decision makers. There are many factors that affect system performance, and inefficient SQL statements are one of the important reasons that cannot be ignored. This paper first analyzes the common reasons that lead to low SQL performance, then analyzes the general steps that should be followed in SQL tuning, and finally studies how to reduce I/O, avoid the high-cost operation of query results and how to improve SQL performance in multi-table connection. In view of the dominant position of ORACLE in the database market at present, this article will only discuss ORACLE.

Two factors affecting SQL performance

There are many factors that affect the performance of SQL, such as unreasonable initialization parameters, inaccurate system and schema statistics, and the correct judgment of the optimizer (CBO). These factors are often closely related to DBA. From the perspective of SQL statements, the author believes that there are four important reasons that affect the performance of SQL:

(1) Perform costly operations on large recordsets, such as using predicates that cause sorting.

(2) There are too many I/O operations (including physical I/O and logical I/O), and the most typical one is that the appropriate index is not established, which leads to the full table scanning of the query table.

(3) Too many useless records have been processed. For example, when multiple tables are connected, the improper position of filter conditions leads to too many useless records in the intermediate result set.

(4) The functions provided by the database are not fully utilized, such as the parallel processing of queries.

The fourth reason is relatively simple to deal with. This paper will discuss how to improve the performance of SQL query statements for the first three reasons.

3 the general steps of SQL optimization

SQL optimization generally needs to go through several steps, such as finding problems, analyzing problems, proposing solutions, applying measures and testing performance, as shown in figure 1. "Finding the problem is half the solution", so locating the problem SQL is a very important step in the process of SQL tuning, which can generally be carried out with the help of ORACLE's own performance optimization tools such as STATSPACK, TKPROF and AUTOTRACE. At the same time, we should pay attention to the research of dynamic performance views such as V$SQL, V$MYSTAT and V$SYSSTAT.

Figure1General steps of SQL optimization

4 optimization of SQL statements

4. 1 Optimize sorting operation

The cost of sorting is very high, and when the predicates that cause the result set to be sorted are used in query statements, SQL performance will inevitably be affected.

4. 1. 1 sorting process analysis

When the data set to be sorted is not too large, the server completes the sorting operation in the memory (sorting area). If sorting requires more memory space, the server will do the following:

(1) Divides the data into several small sets and sorts each set.

(2) The server applies for temporary space from the disk, writes the sorted intermediate results into the temporary section, and then sorts other sets.

(3) After all the collections are sorted, the server will merge and get the final result. If the sorting area is too small to merge at one time, it will be divided into several times.

From the above analysis, we can see that sorting is a very expensive operation, which will consume a lot of CPU time and memory, and trigger disk paging and exchange operations, so try to avoid sorting operations in SQL statements.

4.1.2 Operation leading to sorting in SQL

The operations that cause sorting in SQL query statements are roughly: ORDER BY and GROUP BY clauses; Unique modifiers; Union, intersection and subtraction operators; Sorting, merging and joining when multiple tables are connected.

4. 1.3 How to avoid sorting?

1) to build an appropriate index.

Index fields that are frequently sorted and connected. After the index is established, when the server sends a sorting request to these fields, the index will be directly referenced without sorting; When performing an equijoin query, if the index of the join field is not established, the server performs sort merge join, and the process of join operation is as follows:

Fully scan two or more connected tables respectively;

Sorting all row sets in each table respectively;

Merge sorting results.

If the field used to establish the connection has been indexed, the server performs a nested circular connection, which does not require any sorting, and the process is as follows:

Fully scan the bridge;

Performing index unique scanning on each returned row by using the value of the connection field;

Use the ROWID value returned by the index scan to locate records from the table;

Merge matching records in the master table and the slave table.

Therefore, indexing can avoid most sorting operations.

2) replace UNION with UNIION ALL.

UNION will filter out duplicate records after table linking, so after table linking, it will sort the generated result set, delete duplicate records and return the results. In most applications, duplicate records will not be generated, and the most common is the union of process tables and history tables. Therefore, the UNION ALL operator is used instead of UNION, because the UNION ALL operation only combines two results and returns.

4.2 optimizing I/O

Too many I/O operations will occupy CPU time, consume a lot of memory and occupy too many latches, so I/O of SQL needs to be optimized. The most effective way to optimize I/O is to use index scanning instead of full table scanning.

Application of Function-based Index

Function-based indexing (FBI) provides the ability to index computed columns and use these indexes in queries. The essence of the FBI is to preprocess the intermediate results required by the query. If the FBI matches the embedded function in the query statement exactly, CBO will automatically enable index range scanning to replace full table scanning when generating the query plan. Check the following code snippets and use AUTOTRACE to observe the changes in the execution plan before and after the creation of the FBI.

select * from EMP where upper(ename)= ' SCOTT '

Before the creation of the FBI, it was obviously a full table scan.

keep to the plan

……

Employee's 1 0 table access (full) (cost =2 cards = 1 byte =22)

Idle & gt creates an index EMP _ UPPER _ FIRST _ NAME for EMPLOYEES(UPPER(FIRST_NAME)).

Index has been created.

Run the same query again,

keep to the plan

……

Access to 1 0 table of "employee" (by index ROWID) (cost = 1 card = 1 byte =22)

2 1' EMP _ UPPER _ FIRST _ NAME' (not unique) index (range scanning) (cost = 1 card = 1)

This simple example fully illustrates the role of the FBI in SQL query optimization. The functions used by the FBI can be functions created by users themselves. The more complex the function, the more obvious the optimization effect of creating the FBI based on this function on SQL query performance.

4.2.2 Applying materialized views and query rewriting

Materialized view is a pre-calculated result set, which usually includes complex operations such as aggregation and multi-table join. Materialized views are automatically maintained in the database and refreshed according to users' requirements. Query rewriting mechanism is to rewrite the query submitted by the user into a completely different but functionally equivalent query with an alternative object (such as materialized view) in the database. Query rewriting is transparent to users. Users write query statements to access the database completely according to the routine, and the optimizer (CBO) automatically decides whether to rewrite the query submitted by users. Query rewriting is a very effective method to improve query performance, especially for high-cost operations such as summary and multi-table connection in data warehouse environment.

The following is a very simple example, which shows the role of materialized views and query rewriting in optimizing the performance of SQL queries.

Select department, department number, department name and count (*)

From employees, departments

Where employee department number = department department number

Group by department, department number, department name

The query plan and main statistics are as follows:

Implementation plan:

-

……

2 1 hash connection (cost =5 cards = 14 bytes =224)

3 2 table access for "department" (full) (cost =2 cards =4 bytes =52)

4 2 table access of "EMP" (full) (cost =2 cards = 14 bytes =42)

Main statistics:

-

305 recursive calls

46 times of unanimous acquisition

Create materialized view EMP_DEPT:

Create materialized view emp_dept Build Now

Refresh on demand

Enable query rewriting

be like

Select department, department number, department name and count (*)

From employees, departments

Where employee department number = department department number

Group by department, department number, department name

/

Execute the query again. The implementation plan and main statistics are as follows:

Implementation plan:

-

……

1 0 table access of' EMP_DEPT' (full) (cost =2 cards =327 bytes = 1 1445)

Main statistics:

-

79 recursive calls

28 consistent acquisitions

It can be seen that before the materialized view is established, the whole table of the two tables is scanned first, then the hash connection is carried out, and then the grouping sorting and selection operations are carried out; After the materialized view was established, CBO automatically transformed the above complex operations into a full scan of the materialized view EMP_DEPT, and the related statistics were greatly improved, with the number of recursive calls reduced from 305 to 79 and the number of logical I/O(CONSISTENT GETS) reduced from 46 to 28.

4.2.3 Read frequently accessed small tables into the cache.

Logical I/o is always faster than physical I/o. If there are small tables frequently accessed by applications in the database, these tables can be forcibly read into the KEEP pool to avoid physical I/O.

4.3 Multi-table connection optimization

Multi-table join is the biggest manifestation of query complexity, and multi-table join operation often consumes a lot of CPU time and memory, so multi-table join query performance optimization is often the focus and difficulty of SQL optimization.

4.3. 1 Eliminate external connection

By eliminating external connections, queries are not only easier to read, but also performance can usually be improved. The general idea is that there are the following forms of queries:

SELECT …, outer join table. column

FROM SOME_TABLE,OUTER_JOINED_TO_TABLE

Where …=OUTER_JOINED_TO_TABLE(+)

Can be converted into the following form of query:

SELECT …,(SELECT COLUMN FROM OUTER _ JOINED _ TO _ TABLE WHERE…)FROM SOME _ TABLE;

4.3.2 Predicates are pushed forward to optimize intermediate results.

The low performance of multi-table connection is mostly due to the unreasonable order of connection operation and filtering operation. When writing multi-table join queries, most users always do join operations before applying filter conditions, which leads to the server doing too much useless work. Aiming at this kind of problem, its optimization idea is to push the filter predicate forward as far as possible, so that the unqualified records can be filtered out in advance and only a few qualified records can be connected, which can double the efficiency of SQL query.

Standard connection query is as follows:

Select a.prod_name,sum(b.sale_quant),

sum(c.sale_quant),sum(d.sale_quant)

From product a, telemarketing b, online sales c, store sales d

Where a. production mark = b. production mark, a. production mark = c. production mark.

And a.prod_id=d.prod_id and a.order _ date & gt system date -90.

Group by a.prod_id;

Enable the embedded view and set the condition a.order _ date >; Sysdate-90 moves forward, and the optimized code is as follows:

Select a.prod_name, b.tele_sale_sum, c.online_sale_sum and d.store_sale_sum from product a.

(select sum(sal _ quant)tele_sale _ sum from product,tele _ sale

In which product.order _ date & gt sysdate-90 and product.prod _ id = tele _ sale.prod _ id) b,

(select sum(sal _ quant)online _ sale _ sum

From products, telemarketing

In which product.order _ date & gt sysdate-90 and product.prod _ id = online _ sale.prod _ id) c,

(select sum(sal _ quant)store _ sale _ sum

From Products, Stores _ Sales

In which product.order _ date & gt sysdate-90 and product.prod _ id = store _ sale.prod _ id) d,

Where a.prod_id=b.prod_id and

A.prod_id=c.prod_id and a.prod _ id = d.prod _ id;

5 concluding remarks

SQL language plays a very important role in database application, and its performance directly affects the availability of the whole information system. This paper analyzes how to optimize the I/O of SQL query, avoid high-cost sorting operation and optimize multi-table connection from three main aspects that affect the performance of SQL. It should be emphasized that understanding the problems solved by SQL statements is more important than SQL tuning itself, so SQL tuning needs the close cooperation of system analysts, developers and database administrators.

refer to

Thomas Kate. Effective Oracle design: designing and building high-performance Oracle applications [M], McGral- Hill Company, 2003.

[2] Kevin Ronnie, George Koch, Oracle Bone Inscriptions 9i: complete reference [M], McGrath-Hill Company, 2002.

[3] Oracle9i SQL Reference Release 2(9.2)[OL/M], 2002. 10. /technology/

[4] Oracle9i data warehouse guide release 2(9.2) [OL/M], 2002.03 ./ Technology/

[5]Alexey Danchenkov, Donald Burleson, Oracle Tuning: Authoritative Reference [OL/M], Rampant Techpress, 2006.

[6] Oracle9i Database Concept 2nd Edition (9.2) [OL/M], 2002.08 ./ Technology/

[7] Refer to release 2(9.2) [OL/M], 2002. 12 for plsql packages and types provided by Oracle9i. /technology/