Saturday, May 1, 2010

SQL Joins Explained – Inner Joins and Outer Joins using SQL Server 2008



SQL Joins Explained – Inner Joins and Outer Joins using SQL Server 2008


Why do I need SQL Joins again?


Relational databases like SQL Server focus on the concept of “Normalization” which reduces data redundancy.  What this really means is that each subject or group of data in a sql database should really be stored in only one table.  In other words if you have customer data, you need to keep that data in CUSTOMERS table whereas if you have order detail information, this needs to be stored in an ORDERS table. Following this routine eliminates data redundancy in sql tables and makes relational databases like SQL Server efficient in processing data. At the same time it creates a challenge in pulling data together from different tables into a unified sql view. This is where SQL Joins come into play. The Joins facilitate in bringing data from various SQL tables by using T-SQL (Transact SQL) queries. In our case of CUSTOMERS and ORDERS example, a Join in SQL2008 could combine information together from both these tables in our database. We will show you how to do just that here shortly using SQL Server 2008.

SQL Joins are able to pull information by using Primary Key (PK) and Foreign Key (FK) relationships.  Primary Key is a column in a SQL table that uniquely identifies all the rows. In order to understand these concepts, we are going to use Northwind sample database from Microsoft SQL Server. We will be using a few tables from this database to walk you through the Joins in SQL.  More information on this database can be found online at this location, Northwind database

Coming back to SQL joins, in order to understand Primary and Foreign keys, let us take a look at this figure:



In CUSTOMERS, the Primary (aka Parent) table, a column CustomerID with unique values can be used to find a specific customer. This column or field serves as the Primary Key for CUSTOMERS SQL table and is highlighted in blue. Similarly in the Secondary table, ORDERS, we also have a Primary Key column OrderID which is highlighted in blue. This column is the unique identifier for records in the ORDERS table. We have the data and inner join from CUSTOMERS SQL table and ORDERS SQL table in this Excel Spreadsheet


One customer in this SQL Server database can have many orders, as such the database relationship between CUSTOMERS and ORDERS table is one to many. In order to relate CUSTOMERS table to the ORDERS table using Joins in SQL 2008 Server, you would need to add another column CustomerID in the ORDERS table. This column then becomes the Foreign Key for ORDERS table and is highlighted in red. If a customer has placed any orders, this CustomerID column in ORDERS table will contain the same value as CustomerID from CUSTOMERS table. This mechanism is exactly how SQL joins help in bringing related information together from many SQL tables in a unified view.

Types of SQL Joins:


There are two types of SQL Joins, SQL INNER Join and SQL OUTER Join.  A SQL OUTER Join can be subdivided into a Left OUTER Join, Right OUTER Join and a Full OUTER join.

SQL INNER Join:


An SQL INNER join is used to pull matching data from two tables. This is the type of join that is commonly used in databases like SQL Server. In order to understand inner joins in SQL, we are going to use Venn Diagrams which are helpful in understanding the concept.  Take a look at the following figure:




Here we are representing a table by a circle. On the left side we have the CUSTOMERS circle and on the right side we have the ORDERS circle.  The portion where the two circles intersect is common to both and represents Customers that have placed an Order.  As such this common region represents an INNER join in SQL.

We are going to show you the syntax for an INNER join query next:

SELECT columns
FROM table_1 JOIN table_2
ON table_1. primarykey = table_2.foreignkey

Notice the words in bold are SQL keywords and we use the ON clause to match rows from primary and secondary tables. In our database the query for inner join will look like this:   
  
SELECT
CUSTOMERS.CUSTOMERID,
CUSTOMERS.CompanyName,
Orders.CustomerID,
Orders.OrderDate
FROM CUSTOMERS
INNER JOIN ORDERS
ON CUSTOMERS.CUSTOMERID=ORDERS.CUSTOMERID
ORDER BY CUSTOMERS.CUSTOMERID


We have included an output of this data in the following screen capture.





Notice that the two matching columns are highlighted in blue. We have the complete data from Customers SQL table, Orders SQL table and Inner Join in this Excel Spreadsheet

SQL OUTER Join:


A SQL OUTER join is used to bring matching and non matching data from two tables.  There are two types of outer joins in SQL, Left Outer Join and Right Outer Join. 

Left OUTER Join:   


In order to understand Left Outer Joins, let us study this possible scenario:

-What if we wanted to find out CUSTOMERS in our database that has not placed any ORDERS so far?  In other words who is not buying our great products yet?

We could use a Left OUTER join to get this information.  In a Left OUTER join all the data from the main table, CUSTOMERS in our case and any matching data from the ORDERS is returned.  We can further explain a Left OUTER join with the following Venn diagram.



As you can see from the blue line with arrows, a Left OUTER join will not only include the intersection of the two circles (matching data) but that also the portion from Customers circle that is not matching with any Orders.

Here’s the join query for Left Outer Join in SQL Server database.

SELECT
CUSTOMERS.CUSTOMERID,
CUSTOMERS.CompanyName,
Orders.CustomerID,
Orders.OrderDate
FROM CUSTOMERS
LEFT OUTER JOIN ORDERS
ON CUSTOMERS.CUSTOMERID=ORDERS.CUSTOMERID
WHERE Orders.CustomerID IS NULL

The above sql query will return 3 customers that do not have any orders yet. The screen shot from SQL Server Management is shown below




 
One thing we wanted to point out is the last optional line (using IS NULL). This condition forces SQL Server to return Customer rows with no Orders. We have the complete data from Customers SQL table, Orders SQL table and Left Outer Join in this Excel Spreadsheet

Right OUTER Join:

Next we are going to look at Right Outer Join which is the logical opposite of a Left Outer Join. Here’s the scenario:

-What if we were trying to figure out which CATEGORIES in our SQL database have been added recently that do not have any corresponding PRODUCTS yet. How can we solve this problem using a Right Outer Join?

Let’s look at the Venn diagram shown below:



Here you will see that the Right Outer Join is highlighted in dark orange pointed by the blue line. This area includes all the records from CATEGORIES table and matching rows from PRODUCTS table.  If we were to write the transact SQL for this Right Outer Join, it would be as follows:

SELECT 
PRODUCTS.PRODUCTID,
PRODUCTS.PRODUCTNAME,
CATEGORIES.CATEGORYID,
CATEGORIES.CATEGORYNAME
FROM PRODUCTS
RIGHT OUTER JOIN
CATEGORIES
ON PRODUCTS.CATEGORYID = CATEGORIES.CATEGORYID
WHERE PRODUCTS.CATEGORYID IS NULL

The above sql query will return 2 categories that do not have any products yet. Once again the optional IS NULL condition on the last line limits only Categories without any products. If you omit the WHERE clause, you will get a true right join in line with the Venn diagram. A screen capture of this result is as follows.


sql right outer join


We have the complete data from CATEGORIES SQL table, PRODUCTS SQL table, Left Outer Join and Right Outer Join in this Excel Spreadsheet.


Related Links on Inner and Outer Joins


-SQL JOIN

-Examples of Inner and Outer Joins
-Free SQL Server Tutorials

-Inner joins in Oracle



TAGS: SQL, SQL JOIN, INNER JOIN, OUTER JOIN, SQL Server, Primary Key, Foreign Key




Kash


Learning SQL server 2008.com

No comments: