Inner Join Review
The most commonly used join is an INNER JOIN. This type of join combines rows from two tables only when they match on the joining condition. Usually the primary key from one table matches a foreign key on another table, but join conditions can be more complex than that.(Note: Most of the information in this article can be applied to views as well as tables. For simplicity, the word "table" will be used to mean table or view unless stated otherwise. Keys are not defined on views, but the underlying table’s key column or columns are often included in the view. To keep things simple, let's assume that is the case.)
INNER JOIN will retrieve a results row only where there is a perfect match between the two tables in the join condition. You will also often see one row from one of the tables matching multiple rows in the other table. For example, one customer can have many orders. One order can have many order details. The data on the one side will be repeated for each row on the many side. The following query is an example showing how the information from the Sales.SalesOrderHeader is repeated on each matching row:
SELECT s.SalesOrderID,
OrderDate,ProductID FROM Sales.SalesOrderHeader AS
s
INNER JOIN Sales.SalesOrderDetail
AS d ON s.SalesOrderID = d.SalesOrderID
ORDER BY s.SalesOrderID, ProductID
Outer Join Introduction
OUTER JOIN is used to join two tables even if there is not a match.
An OUTER JOIN can be used to return a list of all the customers and the
orders even if no orders have been placed for some of the customers. A
keyword, RIGHT or LEFT, is used to specify which side of the join
returns all possible rows. I like using LEFT because it makes sense to
me to list the most important table first. Except for one example
demonstrating RIGHT OUTER JOIN, this article will use left joins. Just
a note: the keywords INNER and OUTER are optional. The next example returns a list of all the customers and the SalesOrderID for the orders that have been placed, if any.
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
It uses the LEFT keyword because the Sales.Customer table is located on
the left side and we want all rows returned from that table even if
there is no match in the Sales.SalesOrderHeader table. This is an
important point. Notice also that the CustomerID column is the primary
key of the Sales.Customer table and a foreign key in the
Sales.SalesOrderHeader table. This means that there must be a valid
customer for every order placed. Writing a query that returns all
orders and the customers if they match doesn’t make sense. The LEFT
table should always be the primary key table when performing a LEFT
OUTER JOIN.
If the location of the tables in the query are switched, the RIGHT keyword is used and the same results are returned:
SELECT c.CustomerID, s.SalesOrderID FROM Sales.SalesOrderHeader
s
RIGHT OUTER JOIN Sales.Customer c ON c.CustomerID = s.CustomerID
Notice that I didn’t change the join condition at all. It doesn’t
matter which side of the equal sign the columns are listed; only where
the tables are named is it important.
If I have a LEFT OUTER JOIN, what is returned from the table on the
right side of the join where there is not a match? Each column from the
right side will return a NULL. Try this query which lists the
non-matching rows first:
SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer cLEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerIDORDER BY s.SalesOrderID
By adding a WHERE clause to check for a NULL SalesOrderID, you can
find all the customers who have not placed an order. My copy of
AdventureWorks returns 66 customers with no orders:
Occasionally, you will need to be more specific. How can you find all the customers who have not placed an order in 2002? There are several ways to solve this problem. You could create a view of all the orders placed in 2002 and join the view on the Sales.Customer table. Another option is to create a CTE, or Common Table Expression, of the orders placed in 2002. This example shows how to use a CTE to get the required results:SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer cLEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerIDWHERE s.SalesOrderID IS NULL
WITH s AS
( SELECT SalesOrderID, customerID
FROM Sales.SalesOrderHeader
WHERE OrderDate between '1/1/2002' and '12/31/2002'
)
SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer c
LEFT OUTER JOIN s ON c.customerID = s.customerID WHERE s.SalesOrderID IS NULL
My favorite technique to solve this problem is much simpler.
Additional criteria, in this case filtering on the OrderDate, can be
added to the join condition. The query joins all customers to the
orders placed in 2002. Then the results are restricted to those where
there is no match. This query will return exactly the same results as
the previous, more complicated query:
SELECT c.CustomerID, s.SalesOrderID FROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID and s.OrderDate between '1/1/2002' and '12/31/2002' WHERE s.SalesOrderID IS NULL
Using Aggregates with Outer Joins
Aggregate queries introduce another pitfall watch out for. The following example is an attempt to list all the customers and the count of the orders that have been placed. Can you spot the problem?SELECT c.CustomerID, count(*) OrderCountFROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerIDORDER BY OrderCountNow the customers with no orders look like they have placed one order. That is because this query is counting the rows returned. To solve this problem, count the SalesOrderID column. NULL values are eliminated from the count.
SELECT c.CustomerID, count(SalesOrderID) OrderCountFROM Sales.Customer c
LEFT OUTER JOIN Sales.SalesOrderHeader sON c.CustomerID = s.CustomerID
GROUP BY c.CustomerIDORDER BY OrderCount
Multiple Joins
Once more than two tables are involved in the query, things get a bit more complicated. When a table is joined to the RIGHT table, a LEFT OUTER JOIN must be used. That is because the NULL rows from the RIGHT table will not match any rows on the new table. An INNER JOIN causes the non-matching rows to be eliminated from the results. If the Sales.SalesOrderDetail table is joined to the Sales.SalesOrderHeader table and an INNER JOIN is used, none of the customers without orders will show up. NULL cannot be joined to any value, not even NULL.To illustrate this point, when I add the Sales.SalesOrderDetail table to one of the previous queries that checked for customers without orders, I get back no rows at all.
SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID FROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID INNER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULLTo get correct results, change the INNER JOIN to a LEFT JOIN.
SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID FROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID LEFT OUTER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL
What about additional tables joined to Sales.Customer, the table on the left? Must outer joins be used? If it is possible that there are some rows without matches, it must be an outer join to guarantee that no results are lost. The Sales.Customer table has a foreign key pointing to the Sales.SalesTerritory table. Every customer’s territory ID must match a valid value in Sales.SalesTerritory. This query returns 66 rows as expected because it is impossible to eliminate any customers by joining to Sales.SalesTerritory:
SELECT c.CustomerID, s.SalesOrderID, t.Name FROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID INNER JOIN Sales.SalesTerritory t ON c.TerritoryID = t.TerritoryID
WHERE SalesOrderID IS NULLSales.SalesTerritory is the primary key table; every customer must match a valid territory. If you wanted to write a query that listed all territories, even those that had no customers, an outer join will be used. This time, Sales.Customers is on the right side of the join.
SELECT t.Name, CustomerID FROM Sales.SalesTerritory t LEFT OUTER JOIN Sales.Customer c ON t.TerritoryID = c.TerritoryID
Conclusion
Queries with outer joins can be tricky to write. Extra time and care must be spent making sure the results are correct. Think about the relationship between the tables and make sure that the outer join is continued down the path. This article covered almost everything you need to know about outer joins.
Responses
0 Respones to "Writing Joins in T-SQL"
Post a Comment