Use subqueries for filtering
SELECT statement is used for queries in SQL. All the SELECT statements used before are simple queries, that is, a single statement that retrieves data from a single database table. However, SQL can also create subqueries, that is, queries nested in other queries.
Example:
Data table: The database tables used this time are all relational tables. Orders are stored in two tables. Each order contains the order number, customer ID and order date, and is stored as a row in the Orders table. The items of each order are stored in the related OrderItems table. Orders table does not store customer information, only customer ID. The actual customer information is stored in the Customers table.
If you need to retrieve all the customers who ordered RGAN0 1 now, how should you retrieve them? The steps are as follows:
① Retrieve the numbers of all orders containing RGAN0 1 project.
② Retrieve the id of all customers with the order number listed in the previous step.
③ Retrieve the customer information of all customer IDs returned in the previous step.
Each of the above steps can be executed independently as a query.
You can use the results returned by one SELECT statement for the WHERE clause of another SELECT statement, or you can use subqueries to merge three queries into one statement.
① The meaning of the first statement is clear, that is, to retrieve the order_num column for all order items with prod_id of RGAN0 1.
Analysis: Through this statement, we know which order contains the item to be retrieved.
② Next, query the customer ID related to the orders 20007 and 20008 retrieved from the above statement. You can use the IN clause here.
You can combine the following two queries to turn the first query into a subquery.
Analysis: In a SELECT statement, subqueries are always handled from the inside out. When processing the above SELECT statement, DBMS actually performs two operations.
First, execute the query in parentheses (), which returns two order numbers: 20007 and 20008.
These two values are then passed to the WHERE clause of the outer query IN the comma-separated format required by the in operator. The external query becomes:
The result retrieved by this statement is the same as that returned by the previous hard-coded WHERE clause.
③ All customer IDs of the ordered product RGAN0 1 are obtained from the above statements: 100004 and 100005. Next, retrieve the customer information of these customer IDs.
You can also convert the WHERE clause into a subquery, so that you don't have to hard-code these customer IDs.
Analysis: DBMS actually has to execute three SELECT statements to complete the above statements. The innermost subquery returns the ordinal number, which is used in the WHERE clause of the outer subquery. The outer subquery returns the customer ID column, which is used in the WHERE clause of the outermost query. The outermost query returns the final required data.
It can be seen that using subqueries in the WHERE clause can write powerful and flexible SQL statements. There is no limit to the number of subqueries that can be nested, but in practical applications, it is not appropriate to nest too many subqueries due to performance limitations.
Note: A SELECT statement as a subquery can only query one column, and retrieving multiple columns will return an error. Moreover, using subqueries is not always the most efficient way to perform this kind of data retrieval.
Use a subquery as a calculated field
Another way to use subqueries is to create calculated fields.
Example: You need to display the total number of orders for each customer in the Customers table. Orders and corresponding customer ids are stored in the Orders table. To do this, you need the following steps:
① Retrieve the customer list from the customer table.
② For each customer retrieved, count the number of Orders in the Orders table.
Here, we can use the SELECT COUNT(*) introduced earlier to count the rows in the table, and filter a specific customer ID through the WHERE clause to count only the orders of that customer.
Statistics of customer 10000 1 orders are as follows:
To execute COUNT(*) for each customer, you need to use it as a subquery, as shown below:
Analysis: This SELECT statement returns three columns for each customer in the Customers table: cust_name, cust_state and orders. Orders is a calculated field, which is established by the subquery in parentheses. This subquery is executed once for each customer retrieved. In this example, the subquery was executed five times because five customers were retrieved.
The WHERE clause in the subquery is slightly different from the previous WHERE clause because it uses fully qualified column names, not just column names (cust_id). It specifies the table name and column name (Orders.cust_id and Customers.cust_id). The following WHERE clause tells SQL to compare the cust_id in the Orders table with the cust_id currently retrieved from the Customers table:
When column names are easily confused, you must separate table names from column names with periods. In this example, there are two cust_id columns: one in Customers and the other in Orders. If there is no fully qualified name, the DBMS will think that the cust_id itself in the Orders table should be compared. Because:
The above statement always returns the total number of Orders in the orders table, which is not what we want, as shown below:
As can be seen from the above, when constructing a statement, if multiple tables are involved and the same column names are not distinguished, the DBMS will throw an error message.
When manipulating multiple tables in a SELECT statement, it is a good practice to use fully qualified column names to avoid ambiguity.
Finally, the characteristics of subquery are summarized:
① The subquery must be enclosed in parentheses.
② There can only be one column in the SELECT clause of the subquery, unless there are multiple columns in the main query, which are used to compare with the columns selected by the subquery.
③ The subquery cannot use ORDER BY, but the main query can. In subqueries, GROUP BY can play the same role as ORDER BY.
④ A subquery that returns multiple rows of data can only be used for multivalued operators, such as the IN operator.
⑤ The selection list cannot contain any references to values of type BLOB, ARRAY, CLOB or NCLOB.
⑥ Subqueries cannot be directly used for aggregate functions.
⑦ BETWEEN operator cannot be used for subqueries, but it can be used for subqueries.
So much for this introduction. The next article will explain the use of connections and advanced connections.
See you next time!