Simple • Straightforward

So easy a Caveman
Programmer can do it.

›› Let's Get Started

Breaking Up Queries

Is Hard To Do? No!

Have you ever seen one of those really big queries with multiple sub-joins and it seems like every table in the database is part of the calculation? Those are typically bad queries for 2 reasons:

  1. A couple years down the road, no one has a clue what is going on
  2. It tends to run slowly

The best thing that can be done is to break apart the query into multiple sections. Here are some things to look for when breaking down large queries.



Sub Joins

Pull out the sub joins into temp tables that only contain the needed data. An example is below.

Original Query - Trying to pull the previous warehouse size before it expanded to be greater than 10K sq feet
SELECT w.WarehouseName, whs.Size AS PreviousSize, whe.*
JOIN Warehouses w
  ON w.PK_Warehouses = i.FK_Warehouses
JOIN Cities c
  ON c.PK_Cities = w.FK_Cities
JOIN WarehouseSizes ws
  ON ws.PKWarehouseSizes = w.FKPKWarehouseSizes
JOIN WarehouseSizes_History whs
  ON whs.PKWarehouseSizes_History = w.FK_WarehouseSizes
JOIN (SELECT MAX(whsh.LastModifiedDate) AS LastmodifiedDate, whsh.PKWarehouseSizes_History
    FROM WarehouseSizes_History whsh
    JOIN WarehouseSizes whs
      ON whs.PKWarehouseSizes = whsh
    WHERE whs.Size > whsh.Size
    GROUP BY whsh.PKWarehouseSizes_History
    ) mws
  ON mws.PKWarehouseSizes_History = whs.PKWarehouseSizes_History
  AND mws.LastModifiedDate = whs.LastModifiedDate
JOIN WarehouseEmployees whe
  ON whe.PK_WarehouseEmployees = w.FK_WarehouseEmployees
WHERE ws.Size > 10000



Updated Query - Moving the sub join to a temp table
SELECT MAX(whsh.LastModifiedDate) AS LastmodifiedDate, whsh.PKWarehouseSizes_History
INTO #Temp
FROM WarehouseSizes_History whsh
JOIN WarehouseSizes whs
  ON whs.PKWarehouseSizes = whsh
WHERE whs.Size > whsh.Size
AND whs.Size > 10000
GROUP BY whsh.PKWarehouseSizes_History

SELECT w.WarehouseName, whs.Size AS PreviousSize, whe.*
JOIN Warehouses w
  ON w.PK_Warehouses = i.FK_Warehouses
JOIN Cities c
  ON c.PK_Cities = w.FK_Cities
JOIN WarehouseSizes ws
  ON ws.PKWarehouseSizes = w.FKPKWarehouseSizes
JOIN WarehouseSizes_History whs
  ON whs.PKWarehouseSizes_History = w.FK_WarehouseSizes
JOIN #Temp mws
  ON mws.PKWarehouseSizes_History = whs.PKWarehouseSizes_History
  AND mws.LastModifiedDate = whs.LastModifiedDate
JOIN WarehouseEmployees whe
  ON whe.PK_WarehouseEmployees = w.FK_WarehouseEmployees
WHERE ws.Size > 10000



Add Indexes for Joins

Remember that temp tables are really useful for breaking up large queries. Notice how we even reduced the data being returned in the temp table by limiting the sizes there as well. Those kind of data reductions help a lot when joining into the larger queries. Imagine if the history size was 500,000 rows, and by adding that size statement onto the temp table, we reduce it to 10,000.

Feel free to add indexes to the temp table you create as well. In the above example, we could add the following line after creating the temp table:

CREATE INDEX #IDX_PK_Date ON #Temp (PKWarehouseSizes_History, LastmodifiedDate)

Now the joins onto the table Primary Keys will do index lookups instead of table scans, which is something it would have had a hard time doing in the sub join.



Don't be Afraid to Start Over

Sometimes it is just better to start over. If the code you are looking at is fairly old, there is a good chance that it probably needs to be re-done when running slowly. Find out what it is supposed to do and break up the query to run in multiple parts. In our above example, pretend it was joining onto 15-20 other tables. Since we know the types of warehouse sizes we wanted to see, we could get that data first. Afterwards, we could join into other tables to pull additional data or further reduce the data set.



Comments

Don't forget to write comments about the new sections you are writing and what you are calculating. Remember, no one will remember the details of 1 procedure a year from now, especially in environments where there are thousands of lines of code.





Comments

Name:
Title:   
Commenting Currently Disabled


Nick
8/24/2011 11:41:55 PM
Create Clustered Indexes on Temp Tables


Most people do not think to create a clustered index on a temp table but if you've already put in all the information you need in the temp table and plan to return multiple columns from the temp table it may be best to create a clustered index so you do not have to do expensive lookups from a non-clustered index.
Chelsia
8/3/2011 7:23:11 PM



Wow, this is in every repcset what I needed to know.
Arto Ahlstedt
7/13/2011 4:37:45 PM
It Depends


The optimizer sometimes finds the cost of thorough optimizing prohibitive and then the execution plan remains sub-optimal. This is true especially with complex expressions in (nested) subqueries. Also when identical subquery results are used in several subsequent queries, temp table approach may make a huge difference for limited number of rows alone.

7/3/2011 9:45:05 PM



Maybe true for SQL server but generally resorting to temp tables are a result of the optimiser not doing it for you for some reason. If you can get a single query to work with the proper plan it will generally go faster than a manually split one. You can optimise and then pin the plan you want.