The concept of SQL joins is a must-have skill. Every developer should be able to use it in different use-cases with the best performance possible. This tutorial reveal the difference between Inner Join and Outer Join (Left Outer Join and Right Outer Join) in practical situations. Confusion and mixing between the concepts will be cleared up as we go on. Theoretical parts will be minimized as possible. Who loves theoretical parts anyway?!
This is the first article. It talks about general concepts about SQL Joins. In the second article (here), you can find performance comparison (benchmark) between different types of joins.
For better understanding, this article assumes that the reader understands the following:
- Basics of databases [Table Structure, Relationships, Keys, …]
- Simple SQL Select queries.
Why do we use joins?
Databases provide a great way to save data in multiple tables where each table is responsible for saving one aspect(part) of the data. This ensures data consistency and reduces data redundancy as each part of the data is saved only once.
For example, if we have a sample database for employees where we save:
- Employees’ basic details: (name, SSN, Birthdate).
- Departments at which the employee(s) work, where we assume that each emplyee works at maximum of one department.
- Bonuses for the employee.
Of course, a diagram worth a thousand words:
That is a good design as data are only saved once (no redundancy), but what if I want to get the details of an employee and his department?
It seems very easy and straight forward, just get his DepartmentID and search for it in the Departments table and then get the data of the department. Fair enough.
As always, the devil is in the details:
- What if we want all the employees with their corresponding departments?
- What if some of the employees doesn’t have a department (for example, fresh employee)?
- And the problem is even harder for bonuses: If we want to get the bonuses of multiple employees, will we get each id and search for each of them in the table separately?!
- What if the tables are large, no I mean huge? What will be the performance?
Not very easy huh! But Joins save the situation as we will see.
Firstly, SQL introduces multiple types of joins and the name indicates they join multiple tables together. Let’s take them one by one and study when they are used in practical situations and how we write them and Finally we make a benchmark for the performance of SQL Joins (with intro to indexes).
While the syntax for writing queries with JOIN is considered very easy, the practical concepts are usually confusing.
Let’s start with Inner Join and then we can discuss the need for Outer Join for practical use-cases.
It simply combines the rows in two tables based on columns having corresponding values.
Example 1: Get employees with their departments’ details:
SELECT * FROM Employee INNER JOIN Department ON Department.ID =Employee.DepartmentID
|001-01-1933||Bob||(005)-015 4567||1981-08-21||1||1||ITemail@example.com||(005)-555 5555|
|009-02-1422||Dan||(005)-014 9876||1993-03-27||2||2||Salesfirstname.lastname@example.org||(005)-555 5556|
But wait a second: Where is Alice?!
Let’s take a look at the tables:
Alice isn’t connected to any department (doesn’t have DeparmentID), so it has been omitted from the result. Inner join only shows related data in the two tables based on the [on statement] ( Department.ID =Employee.DepartmentID ).
Example 2: To get employees with employees’ bonus(es):
SELECT * FROM Employee INNER JOIN EmployeeBonus on Employee.SSN = EmployeeBonus.EmployeeSSN
|001-01-1933||Bob||(005)-015 4567||1981-08-21||1||001-01-1933||Over time||2017-05-05||100|
|001-01-1933||Bob||(005)-015 4567||1981-08-21||1||001-01-1933||Early delivery||2017-05-07||150|
What if we want to get all the employees whether they have bonuses or not AND get related bonus if there is any. The answer is simply Outer Joins (Left, Right).
There are two types of Outer Joins (Left Outer Join and Right Outer Join). Instead of having the same effect on the results as in Inner Joins, In outer joins, the two tables affects the results differently:
- Main Table from which all the rows are retrieved.
- Related-Only Table from which only rows related to Main Table are retrieved.
In Left Joins: The Main Table is LeftTable.
SELECT …. FROM LeftTable LEFT JOIN RightTable ON …;
Gets all the rows in the LeftTable and if related rows exist in the RigthTable get them also.
In Right Joins: The Main Table here is RightTable.
SELECT …. FROM LeftTable Right JOIN RightTable ON …;
Gets all the rows in the RightTable and if related rows exist in the LeftTable get them also.
Example 1: Get all employees and get related department(s) if exists
Using Left Join:
SELECT Employee.SSN, Employee.Name as 'Employee Name', Department.Name as 'Department Name', Department.Email as 'Department Email' FROM Employee LEFT OUTER JOIN Department ON Department.ID = Employee.DepartmentID
Using Right Join:
We just exchange Employee and Department in the FROM part:
SELECT Employee.SSN, Employee.Name as 'Employee Name', Department.Name as 'Department Name', Department.Email as 'Department Email' FROM Department RIGHT OUTER JOIN Employee ON Department.ID = Employee.DepartmentID
|SSN||Employee Name||Department Name||Department Email|
Example 2: Get all employees with bonus(es) if exist.
Using Left Join:
SELECT Employee.SSN, Employee.Name as 'Employee Name', EmployeeBonus.Reason , EmployeeBonus.BonusDate,EmployeeBonus.Value FROM Employee LEFT OUTER JOIN EmployeeBonus ON Employee.SSN = EmployeeBonus.EmployeeSSN
Using Right Join:
SELECT Employee.SSN, Employee.Name as 'Employee Name', EmployeeBonus.Reason , EmployeeBonus.BonusDate,EmployeeBonus.Value FROM EmployeeBonus RIGHT OUTER JOIN Employee ON Employee.SSN = EmployeeBonus.EmployeeSSN
Example 3: Get all employees with all details (bonus(es) and department) if there exists.
SELECT Employee.SSN, Employee.Name as 'Employee Name', Department.Name as 'Department Name', Department.Email as 'Department Email', EmployeeBonus.Reason , EmployeeBonus.BonusDate,EmployeeBonus.Value FROM Employee LEFT OUTER JOIN EmployeeBonus ON Employee.SSN = EmployeeBonus.EmployeeSSN LEFT OUTER JOIN Department ON Department.ID = Employee.DepartmentID
|SSN||Employee Name||Department Name||Department Email||Reason||BonusDate||Value|
Other Use Cases in real life:
- if we want to make a report to get all the details for all the employees and related data [Outer Join].
- If we want to get the details of an Bob and his bonuses(if exists) [Outer Join].
- If we want to get the details of a non-empty department with the details of the employees who work in it (Department inner join Employee) [Why?]
Joins are very important for extracting related details from different table in any database systems.
Inner Join: Gets ONLY the common rows between two tables.
Outer Joins (Left/ Right): Get ALL the rows from one table [Main Table] and related rows from the other table [Related-Only Table].
You can find the used database here, so that you can try all the code yourself.
In the second article (here), you can find performance comparison (benchmark) between different types of joins with tips and tricks on Indexes.