By: | Updated: 2019-05-15 | Comments | Related: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | More > Temp Tables Show
ProblemA SQL Server T-SQL correlated subquery is a special kind of temporary data store in which the result set for an inner query depends on the current row of its outer query. In contrast, an SQL Server T-SQL uncorrelated subquery has the same result set no matter what row is current in its outer query. This section reviews a couple of correlated subquery examples and compares them to alternative formulations based on joins for derived tables. The comparisons rely on an examination of the result sets as well as the execution plans for the alternative formulations. These comparisons shed light on the efficiency of uncorrelated subqueries versus joins between derived tables. The next temporary data store tutorial section will focus more thoroughly on derived tables. SolutionFor the examples below we are using the AdventureWorks2014 database. Download a copy and restore to your instance of SQL Server to test the below scripts. SQL Server Correlated Subquery as a SELECT List ItemThe introduction to subqueries section included coverage of how to use uncorrelated and correlated subqueries as select list items. This section revisits that earlier application of uncorrelated and correlated subqueries from three perspectives.
Here's a previously presented code block that demonstrates the use of an uncorrelated subquery and a correlated subquery as items in the select list of a query.
-- select list items with uncorrelated and correlated subqueries SELECT outer_cat.ProductCategoryID cat_id, outer_cat.Name outer_cat_name, subcat.Name subcat_name, ( SELECT COUNT(ProductSubcategoryID) subcat_id_count FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON outer_cat.ProductCategoryID = subcat.ProductCategoryID ) total_sub_cat_count, ( SELECT COUNT(ProductSubcategoryID) subcat_id_count FROM [AdventureWorks2014].[Production].[ProductCategory] cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON cat.ProductCategoryID = subcat.ProductCategoryID GROUP BY cat.name HAVING cat.name = outer_cat.Name ) sub_cat_count FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON outer_cat.ProductCategoryID = subcat.ProductCategoryID It is often possible to reformulate T-SQL queries based on subqueries with joins and WHERE clause criteria. This redesign of a query can eliminate the need for correlated subqueries. Some believe that this reformulation can improve the performance of a query. A good presentation of the issues for and against replacing correlated subqueries with joins appears in this blog (be sure to read comments too). To help evaluate the impact of replacing subqueries with joins, the preceding script sample is re-designed to use joins instead of subqueries. See the next code block for a join-based implementation of the preceding query.
-- with joins and derived tables SELECT cat.ProductCategoryID cat_id, cat.Name cat_name, subcat.Name subcat_name, subcat_id_count total_sub_cat_count, cat_id_count_by_cat_id.cat_id_count_by_cat_id sub_cat_count FROM [AdventureWorks2014].[Production].[ProductCategory] cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON cat.ProductCategoryID = subcat.ProductCategoryID CROSS JOIN ( -- count of subcategories across categories SELECT COUNT(ProductSubcategoryID) subcat_id_count FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat ) subcat_id_count LEFT JOIN ( -- count of subcategories within categories SELECT ProductCategoryID, COUNT(*) cat_id_count_by_cat_id FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat GROUP BY ProductCategoryID ) cat_id_count_by_cat_id ON cat.ProductCategoryID = cat_id_count_by_cat_id.ProductCategoryID Here are excerpts from the result set with uncorrelated subqueries and correlated subqueries versus the result set for the join-based approach with derived tables.
The intersect operator is a useful tool for comparing two result sets no matter what the order of rows is in each one. An intersect operator between the result sets for the two approaches returns thirty-seven rows. In other words, the thirty-seven-row result set from each query is identical. The following code block shows how to use the intersect operator to make the comparison. -- all 37 rows from each query intersect with one another -- select list items with uncorrelated and correlated subqueries SELECT outer_cat.ProductCategoryID cat_id, outer_cat.Name outer_cat_name, subcat.Name subcat_name, ( SELECT COUNT(ProductSubcategoryID) subcat_id_count FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON outer_cat.ProductCategoryID = subcat.ProductCategoryID ) total_sub_cat_count, ( SELECT COUNT(ProductSubcategoryID) subcat_id_count FROM [AdventureWorks2014].[Production].[ProductCategory] cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON cat.ProductCategoryID = subcat.ProductCategoryID GROUP BY cat.name HAVING cat.name = outer_cat.Name ) sub_cat_count FROM [AdventureWorks2014].[Production].[ProductCategory] outer_cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON outer_cat.ProductCategoryID = subcat.ProductCategoryID INTERSECT -- with joins and derived tables SELECT cat.ProductCategoryID cat_id, cat.Name cat_name, subcat.Name subcat_name, subcat_id_count total_sub_cat_count, cat_id_count_by_cat_id.cat_id_count_by_cat_id sub_cat_count FROM [AdventureWorks2014].[Production].[ProductCategory] cat INNER JOIN [AdventureWorks2014].[Production].[ProductSubcategory] subcat ON cat.ProductCategoryID = subcat.ProductCategoryID CROSS JOIN ( -- count of subcategories SELECT COUNT(ProductSubcategoryID) subcat_id_count FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat ) subcat_id_count LEFT JOIN ( -- count of subcategories within categories SELECT ProductCategoryID, COUNT(*) cat_id_count_by_cat_id FROM [AdventureWorks2014].[Production].[ProductSubcategory] subcat GROUP BY ProductCategoryID ) cat_id_count_by_cat_id ON cat.ProductCategoryID = cat_id_count_by_cat_id.ProductCategoryID The next screen shot displays the thirty-seven rows returned by the intersect operator between the result sets for the two approaches.
The next screen shot shows the execution plan comparison of the subquery-based and join-based queries.
SQL Server Correlated and Uncorrelated Subqueries in WHERE ClausesWhile correlated and uncorrelated subqueries can be used for SELECT list items, it is probably more common to encounter them in WHERE clauses for SELECT statements. The code blocks in this tutorial section demonstrate the application of correlated and uncorrelated subqueries in this more usual context. The code samples pull and manipulate data from the AdventureWorks2014 database. This example also requires some pre-processing steps like those needed for any highly normalized production database. The ultimate goal of the code samples in this section is to list the employees who belong to departments with more than the average number of employees across departments. This is a typical kind of requirement in which you are likely to use correlated or uncorrelated subqueries. The development of the employee list involves
The code in this tutorial section illustrates subquery-based and join-based approaches to the linking and filtering steps. The AdventureWorrks2014 tables are representative of those from a production database. The normalized status of many production databases often requires pre-processing before any task can be performed, such as listing the employees who belong to departments with more than the average number of employees across departments. The following database diagram presents the four AdventureWorks2014 tables used to list the employees who belong to departments with more than the average number of employees across departments.
The following code sample shows a relatively simple query for listing employees from the inner join of the Employee table and the EmployeeDepartmentHistory table. The query returns a single row for each employee, represented by NationalIDNumber, that occurs more than once in the EmployeeDepartmentHistory table. The column values for each row include the NationalIDNumber field value and a count field with an em_count alias. -- employees with historical membership in more than one department SELECT NationalIDNumber, COUNT(edh.BusinessEntityID) em_count FROM [AdventureWorks2014].[HumanResources].[Employee] em INNER JOIN [AdventureWorks2014].[HumanResources].EmployeeDepartmentHistory edh ON edh.BusinessEntityID = em.BusinessEntityID GROUP BY NationalIDNumber HAVING COUNT(edh.BusinessEntityID) > 1 The result set from the preceding query appears below.
The next code sample shows an approach to enumerating employees that had more than one row in the EmployeeDepartmentHistory table within the for_mrd derived table. Each employee in the outermost query's result set has a row with column values for BusinessEntityID and StartDate from the EmployeeDepartmentHistory table, and the most recent date (mrd) for an employee in the EmployeeDepartmentHistory table. The last_value windows function computes the mrd value partitioned by BusinessEntityID. Because a window function cannot be referenced in a WHERE clause, the query with the windows function is nested within an outermost query that receives the value of the windows function along with BusinessEntityID and StartDate value for each row. A WHERE clause in the outermost query selects the row for each employee with a StartDate value equal to the mrd value. -- most recently worked department for -- employees who worked in more than one department -- outer query is required because you cannot use -- windows function (last_value) in a where clause, -- but you can assign a windows function an alias (mrd) whose -- value is referenced in an outer query SELECT BusinessEntityID, StartDate, mrd FROM ( SELECT edh.BusinessEntityID, edh.StartDate, edh.DepartmentID, LAST_VALUE(startdate) OVER (PARTITION BY edh.BusinessEntityID ORDER BY edh.BusinessEntityID) mrd FROM [AdventureWorks2014].[HumanResources].EmployeeDepartmentHistory edh WHERE edh.BusinessEntityID IN ( -- employees with historical membership in more than one department SELECT edh.BusinessEntityID FROM [AdventureWorks2014].[HumanResources].[Employee] em INNER JOIN [AdventureWorks2014].[HumanResources].EmployeeDepartmentHistory edh ON edh.BusinessEntityID = em.BusinessEntityID GROUP BY edh.BusinessEntityID HAVING COUNT(edh.BusinessEntityID) >1 ) ) for_mrd WHERE mrd = StartDate The final pre-processing script builds on the logic from the two preceding queries and the relationships between the AdventureWorks2014 source tables. This query populates the #em_with_dep temp table. The temp table serves as the source table for facilitating the computation of the list the employees who belong to departments with more than the average number of employees across departments. The #em_with_dep temp table has a separate row for each of the 290 employees in the AdventureWorks2014 database. The temp table has more columns than is strictly necessary for deriving a list of employees who come from a department with more than the average number of employees across departments. The code block below is relatively straightforward in spite of it not being short. The following bullets highlight three major design issues to help readers adapt this kind of code for their purposes.
Here is an excerpt for the list generated by the final select statement in the preceding script. It shows the first 16 rows from the result set from the #em_with_dep temp table. There are 290 rows in total within the table. The remainder of this section shows how to process with either subquery-based techniques or join-based techniques the #em_with_dep temp table to generate a list of the employees belonging to departments with more than the average number of employees across departments. Here are three basic queries that support computing a list of employees in departments with more than the average number of employees across departments. These queries will subsequently be combined by subquery-based techniques and join-based techniques to achieve the goal of this example. Before illustrating how to combine the queries, let's examine each query separately.
-- Three basic queries for listing employees that belong -- to departments with more than the average number of -- employees across departments -- employees count by department name SELECT dep_name, COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name -- employees with dep_name and em_count_by_department SELECT FirstName, LastName, Gender, JobTitle, #em_with_dep.dep_name, em_count_by_dep_name.em_count FROM #em_with_dep LEFT JOIN ( SELECT dep_name, COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) em_count_by_dep_name ON #em_with_dep.dep_name = em_count_by_dep_name.dep_name -- avg employees per department SELECT AVG(CAST(em_count AS real)) avg_dep_count FROM ( SELECT COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) avg_count_by_dep_name If you feel the need, you can learn more about the three preceding queries with the following full and partial result set listings.
The next script combines the preceding three basic scripts with a correlated subquery and an uncorrelated subquery to compute a list of employees from departments with greater than the average number of employees per department.
-- employees from a department with a count -- greater than the average count per department -- employees with dep_name and em_count_by_department -- based on a correlated subquery SELECT FirstName, LastName, Gender, JobTitle, #em_with_dep.dep_name, em_count_by_dep_name.em_count FROM #em_with_dep LEFT JOIN ( SELECT dep_name, COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) em_count_by_dep_name ON #em_with_dep.dep_name = em_count_by_dep_name.dep_name WHERE em_count_by_dep_name.em_count > ( -- avg employees per department SELECT AVG(CAST(em_count AS real)) avg_dep_count FROM ( SELECT COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) avg_count_by_dep_name ) Here's an excerpt from the result set for the preceding query. It shows the first ten rows from the result set.
Some application developers may find developing solutions with subqueries as a natural process. For example, the preceding query creates a correlated subquery and then left joins it to the outermost query. The left join result set is for all employees. Then, a where clause draws on an uncorrelated subquery with a constraint for only rows for employees with a departmental employee count greater than the average count of employees across departments. Other application developers may just care to form a joined result set with both the count of the number of employees for the department in which an employee works and the average number of employees across all departments. Then, a filter can be applied to the joined result set that retains only rows where the departmental count for employees is greater than the average count of employees across departments. The following script shows how to develop the application this way.
-- employees from a department with a count -- greater than the average department count -- based on a left join and a cross join SELECT FirstName, LastName, Gender, JobTitle, #em_with_dep.dep_name, em_count_by_dep_name.em_count --, avg_count_across_dep.avg_dep_count FROM #em_with_dep LEFT JOIN ( SELECT dep_name, COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) em_count_by_dep_name ON #em_with_dep.dep_name = em_count_by_dep_name.dep_name CROSS JOIN ( -- avg employees per department SELECT AVG(CAST(em_count AS real)) avg_dep_count FROM ( SELECT COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) avg_count_by_dep_name ) avg_count_across_dep The final query in this tutorial section (see the next script) illustrates the use of the intersect operator to confirm that the join-based solution with a derived table returns the same result set as a subquery-based solution framework. The query returned 179 rows. Recall that 179 is the number of employees in the Production department. Therefore, the join-based and subquery-based approaches return the same set of 179 employees. -- verify identical outcomes -- employees from a department with a count -- greater than the average count per department -- employees with dep_name and em_count_by_department -- based on a correlated subquery SELECT FirstName, LastName, Gender, JobTitle, #em_with_dep.dep_name, em_count_by_dep_name.em_count FROM #em_with_dep LEFT JOIN ( SELECT dep_name, COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) em_count_by_dep_name ON #em_with_dep.dep_name = em_count_by_dep_name.dep_name WHERE em_count_by_dep_name.em_count > ( -- avg employees per department SELECT AVG(CAST(em_count AS real)) FROM ( SELECT COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) avg_count_by_dep_name ) INTERSECT -- employees from a department with a count -- greater than the average department count -- based on a left join and a cross join SELECT FirstName, LastName, Gender, JobTitle, #em_with_dep.dep_name, em_count_by_dep_name.em_count --, avg_count_across_dep.avg_dep_count FROM #em_with_dep LEFT JOIN ( SELECT dep_name, COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) em_count_by_dep_name ON #em_with_dep.dep_name = em_count_by_dep_name.dep_name CROSS JOIN ( -- avg employees per department SELECT AVG(CAST(em_count AS real)) avg_dep_count FROM ( SELECT COUNT(dep_name) em_count FROM #em_with_dep GROUP BY dep_name ) avg_count_by_dep_name ) avg_count_across_dep WHERE em_count > avg_dep_count Is one solution better in the sense that it requires fewer computing resources even though both solutions return the same employees? Also, while the T-SQL code is different between the two approaches, is the SQL Server execution plan the same or different between the two approaches? We can answer both these questions by comparing the execution plans from both solutions. The following screen shot shows the SQL Server execution plan for the subquery-based approach above and the join-based approach below. There are two main take-aways.
For the SELECT item and WHERE clause examples covered in this tutorial section, there is no difference in resources used between subquery-based solutions and join-based solutions. Furthermore, for the WHERE clause examples, there was not even a difference in the execution plan. Therefore, at least for examples like those covered in this tutorial section, you can freely use either solution strategy. It remains for others to prove with different examples when subquery-based approaches are slower or even materially different in their execution plan than join-based approaches. Next StepsHere are some links to resources that you may find useful to help you grow your understanding of content from this section of the tutorial.
Related ArticlesPopular ArticlesAbout the authorRick Dobson is a Microsoft Certified Technical Specialist and well accomplished SQL Server and Access author. View all my tips Article Last Updated: 2019-05-15 What is the difference between correlated and nonA correlated subquery is an inner subquery which is referenced by the main outer query such that the inner query is considered as being executed repeatedly. A noncorrelated subquery is subquery that is independent of the outer query and it can executed on its own without relying on main outer query.
What do you mean by correlated subquery?A correlated subquery is a SQL query that depends on values executed by an outer query in order to complete. Because a correlated subquery requires the outer query to be executed first, the correlated subquery must run once for every row in the outer query.
Which is better performance correlated subquery or normal subquery?Speed and Performance
A correlated subquery is much slower than a non-correlated subquery because in the former, the inner query executes for each row of the outer query. This means if your table has n rows then whole processing will take the n * n = n^2 time, as compared to 2n times taken by a non-correlated subquery.
Is in not in correlated subquery?Answer: In non-correlated subqueries the subquery should be executed before the outer query and the non-correlated subquery is executed only once. The data from the outer query and the subquery are independent and one execution of the subquery will work for all the rows from the outer query.
|