1. The statements that have been tested must be exactly the same as those already in the shared pool.
2. Variable names should be as consistent as possible.
3. Use outer joins reasonably.
4. Avoid excessive nested layers.
5. Utilize concurrency more.
The general steps for optimizing statements are:
1. Adjust the SGA area to make its use optimal.
2. Optimize SQL statements themselves, using tools like explain and SQL trace.
3. Adjust the database structure.
4. Adjust the project structure.
Experience in writing statements:
1. Use indexes for queries on large tables.
2. Minimize the use of IN and EXISTS.
3. Use set operations.
4. For queries on large tables, avoid conversions such as TO_CHAR, TO_DATE, TO_NUMBER on columns whenever possible.
5. Use indexes wherever possible, and place conditions involving indexes at the front. Establish some indexes if it's possible and necessary.
6. Avoid full table scans as much as possible, and include as many limiting conditions as possible to search for the data to be queried faster.
How to make your SQL run faster
Changchun Branch of Bank of Communications Computer Department
Ren Liang
--- People often fall into a trap when using SQL: they focus too much on whether the results are correct and ignore the potential performance differences between different implementation methods. This performance difference is particularly evident in large or complex database environments (such as online transaction processing OLTP or decision support systems DSS). In my practical work, I found that poor SQL often comes from inappropriate index design, insufficient join conditions, and non-optimizable WHERE clauses. After appropriately optimizing them, their running speed significantly improved! Below, I will summarize these three aspects:
--- To illustrate the problem more intuitively, the execution time of all SQL examples has been tested, and any time less than 1 second is indicated as (
19991201 and date
2000 (25 seconds)
select date, sum(amount) from record group by date
(55 seconds)
select count(*) from record where date >
19990901 and place in (BJ, SH) (27 seconds)
--- Analysis:
--- There are a large number of duplicate values on date. Under a non-clustered index, data is physically stored randomly on data pages. During range searches, a table scan must be performed to find all rows within this range.
--- 2. A clustered index on date
select count(*) from record where date >
19991201 and date
2000 (14 seconds)
select date, sum(amount) from record group by date
(28 seconds)
select count(*) from record where date >
19990901 and place in (BJ, SH) (14 seconds)
--- Analysis:
--- Under a clustered index, data is physically ordered on data pages, with repeated values grouped together. Therefore, during range searches, it can first locate the start and end points of the range and only scan data pages within this range, avoiding extensive scans and improving query speed.
--- 3. A composite index on place, date, amount
select count(*) from record where date >
19991201 and date
2000 (26 seconds)
select date, sum(amount) from record group by date
(27 seconds)
select count(*) from record where date >
19990901 and place in (BJ, SH) (
19991201 and date
2000 (
19990901 and place in (BJ, SH) (, =, <=) and order by
, group by can consider building a clustered index;
--- ② Columns that are frequently accessed simultaneously and contain duplicate values in each column can consider building a composite index;
--- ③ Composite indexes should try to make key queries form index coverage, and its leading column must be the most frequently used column.
Two, Insufficient join conditions:
--- Example: Table card has 7,896 rows, with a non-clustered index on card_no. Table account has 191,122 rows, with a non-clustered index on account_no. Let's see how two SQLs perform under different table join conditions:
select sum(a.amount) from account a,
card b where a.card_no = b.card_no (20 seconds)
--- Modify the SQL to:
select sum(a.amount) from account a,
card b where a.card_no = b.card_no and a.
account_no = b.account_no (< 1 second)
--- Analysis:
--- Under the first join condition, the optimal query plan is to use account as the outer table and card as the inner table, utilizing the index on card. Its I/O times can be estimated by the formula:
--- 22,541 pages on the outer table account (191,122 rows of the outer table account * 3 pages to be searched on the inner table card corresponding to the first row of the outer table) = 595,907 I/Os
--- Under the second join condition, the optimal query plan is to use card as the outer table and account as the inner table, utilizing the index on account. Its I/O times can be estimated by the formula:
--- 1,944 pages on the outer table card (7,896 rows of the outer table card * 4 pages to be searched on the inner table account corresponding to each row of the outer table) = 33,528 I/Os
--- Clearly, only with sufficient join conditions, the true optimal solution will be executed.
--- Summary:
--- 1. Before multi-table operations are actually executed, the query optimizer will list several possible join plans based on the join conditions and choose the one with the least system overhead. Join conditions should fully consider tables with indexes and tables with many rows; the selection of outer and inner tables can be determined by the formula: number of matching rows in the outer table * number of searches in the inner table per row in the outer table, with the smallest product being the optimal solution.
--- 2. Method to view the execution plan -- use set showplanon to turn on the showplan option, which allows you to see the join order and information about which index is used; for more detailed information, execute dbcc(3604,310,302) with the sa role.
Three, Non-optimizable WHERE clauses:
--- 1. Example: The following SQL condition statements have appropriate indexes on the columns, yet the execution speed is very slow:
select * from record where
substring(card_no, 1, 4) = 5378 (13 seconds)
select * from record where
amount / 30 < 1000 (11 seconds)
select * from record where
convert(char(10), date, 112) = 19991201 (10 seconds)
--- Analysis:
--- Any operation on a column in the WHERE clause is calculated column by column when the SQL runs, forcing a table scan without using the index on that column. If these results could be obtained during query compilation, they could be optimized by the SQL optimizer, using the index and avoiding a table scan. Thus, rewrite the SQL as follows:
select * from record where card_no like
5378% (< 1 second)
select * from record where amount
< 1000 * 30 (< 1 second)
select * from record where date = 1999/12/01
(< 1 second)
--- You will find the SQL becomes significantly faster!
--- 2. Example: Table stuff has 200,000 rows, with a non-clustered index on id_no. Consider the following SQL:
select count(*) from stuff where id_no in (0,1)
(23 seconds)
--- Analysis:
--- The IN in the WHERE condition logically equates to OR, so the syntax analyzer converts IN (0,1) into id_no = 0 OR id_no = 1 for execution. We expect it to search separately according to each OR clause and then add up the results, thereby using the index on id_no. However, in practice (according to showplan), it adopts the "OR strategy," i.e., it first retrieves the rows satisfying each OR clause, stores them in a work table in the temporary database, then builds a unique index to remove duplicate rows, and finally calculates the result from this temporary table. Therefore, the actual process does not use the index on id_no and also depends on the performance of the tempdb database.
--- Practice shows that the more rows the table has, the worse the performance of the work table becomes. When stuff has 620,000 rows, the execution time reaches 220 seconds! It would be better to separate the OR clauses:
select count(*) from stuff where id_no = 0
select count(*) from stuff where id_no = 1
--- Get two results, then perform a simple addition calculation. Since each statement uses the index, the execution time is only 3 seconds, and with 620,000 rows, the time is only 4 seconds. Alternatively, use a better method and write a simple stored procedure:
create proc count_stuff as
declare @a int
declare @b int
declare @c int
declare @d char(10)
begin
select @a = count(*) from stuff where id_no = 0
select @b = count(*) from stuff where id_no = 1
end
select @c = @a + @b
select @d = convert(char(10), @c)
print @d
--- Directly calculate the result, and the execution time is just as fast as above!
--- Summary:
--- It is clear that optimization means the WHERE clause utilizes the index, while non-optimization leads to table scans or additional overhead.
--- 1. Any operation on a column will lead to a table scan, including database functions and computational expressions. Try to move operations to the right side of the equal sign during queries.
--- 2. IN and OR clauses often use work tables, making indexes ineffective. If no significant duplication occurs, consider splitting the clauses. The split clauses should include indexes.
--- 3. Be good at using stored procedures, which make SQL more flexible and efficient.
--- From the above examples, we can see that the essence of SQL optimization is to use statements recognizable by the optimizer, fully utilize indexes, reduce the number of I/Os for table scans, and avoid table searches as much as possible. Actually, SQL performance optimization is a complex process. The above are just manifestations at the application level. Further research may involve resource allocation at the database level, traffic control at the network level, and overall design at the operating system level.
Article address: 【Eden Network】http://www.edenw.com/tech/devdeloper/database/2010-07-15/4711.html