6.8: Inner and Outer Joins
- Page ID
- 23927
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Whenever we use a query to retrieve data from two or more tables, the database query processor performs an operation called a join. In this section, we discuss inner joins, outer joins, and Cartesian products. Following that we discuss some interesting special cases: self-join, anti-join, non-equi joins.
If we have previously established relationships between tables, and if we have more than one table in a query, then MS Access will create joins based on those relationships. If necessary, we can alter, delete, or include new relationships.
MS Access creates joins where rows join if the join fields are equal in value; such joins are called equi-joins. If we create a query for the University database and add the Department and Course tables to the relationships area of the query we have:
If you edit the relationship line (double-click it), you see the join properties:
Here, we can see the join is based on the common attribute deptCode. If you click on the Join Type button, you will get information on the type of join used. You will see (as the following diagram shows) that Access has selected the first of three options:
Joins can be further characterized as inner or outer joins. Option 1 is an inner join.
Options 2 and 3 are outer joins. One of these would also be called a Left Outer Join and the other a Right Outer Join. If you examine the SQL statement generated you will see which is used; Left and Right choices are related to the textual expression of the SQL statement – which table name is leftmost/rightmost in the From clause.
Inner Join
All of the joins we have seen up to this point have been inner joins. For a row of one table to be included in the result of an inner join, the row must match a row in the other table. Because all joins so far have also been equi-joins the matching is based on the values of the join fields of one table being equal to the values of the join fields of the other table. Consider the inner join between Department and Course based on deptCode:
If the tables have the contents shown below:
Course |
||||
---|---|---|---|---|
Dept Code |
Course Number |
Title |
Description |
Credit Hours |
ACS |
1453 |
Introduction to Computers |
This course will introduce students to the basic concepts of computers: types of computers, hardware, software, and types of application systems. |
3 |
ACS |
1803 |
Introduction to Information Systems |
This course examines applications of information technology to businesses and other organizations. |
3 |
Department |
||||
---|---|---|---|---|
Dept Code |
Dept Name |
Location |
Phone |
Chair |
ACS |
Applied Computer Science |
3D07 |
(204) 786-0300 |
Simon Lee |
ENG |
English |
3D05 |
(204) 786-9999 |
April Jones |
MATH |
Mathematics |
2R33 |
(204) 786-0033 |
Peter Smith |
then the result of running the query is
Dept Name |
Course Number |
Title |
---|---|---|
Applied Computer Science |
1453 |
Introduction to Computers |
Applied Computer Science |
1803 |
Introduction to Information Systems |
In the above result notice there is no result line for English or Mathematics; this is because for the sample data there were no rows in Course that joined to the English or Mathematics rows in Department. Both rows in Course have a value of “ACS” in the deptCode field and so they joined to the ACS row in Department.
This query demonstrates a distinguishing characteristic of the inner join: only rows that match other rows are included in the results.
Exercises (Set 1)
Exercise \(\PageIndex{1}\)
Consider the Library database:
- Write a query that joins Loan and Member. List the member name and date due.
- Write a query that joins Loan and Book. List the book title and date due.
- Write a query that joins all three tables and lists the member name, book title, and date due.
Exercise \(\PageIndex{2}\)
Consider the two tables A and B below.
X | Y | Z |
1 | 3 | 5 |
2 | 4 | 6 |
4 | 9 | 9 |
X | Y | Q |
1 | 3 | 5 |
1 | 4 | 6 |
2 | 4 | 7 |
3 | 4 | 5 |
- How many rows are in the result if A and B are joined based on the attribute X?
- How many rows are in the result if A and B are joined based on both attributes X and Y?
Outer Join
Consider the Company database. Suppose we wanted to produce a report that lists each department and its employees, and that we must include every department. The two tables would be joined based on equal values of the dept id field. We want all departments and we know that an inner join will not include a department if there are no employees for the department to join to. To get all departments included when we are joining two tables we must use an outer join.
Consider the query that is started below:
By default the join is an inner join, but with MS Access you can get an outer join if you edit the relationship and specify either option 2 or option 3, as shown in the dialogue below:
By choosing option 2 your query will include all departments whether or not the department can join to an employee. If there is no employee for a department to join to, then the row is joined to a row of nulls. When you do this notice the change in the relationship line – it is now a directed line; this is how MS Access illustrates outer joins:
The first few rows of the result are:
deptId |
department |
dept |
lastName |
---|---|---|---|
4 |
Special Operations |
||
3 |
Sales |
3 |
Long |
3 |
Sales |
3 |
Craft |
3 |
Sales |
3 |
Watkins |
Notice that the Special Operations department joined to a null row.
Exercises (Set 2)
Exercise \(\PageIndex{3}\)
Consider the Company database and list each department and the number of employees in the department.
Exercise \(\PageIndex{4}\)
Consider the Orders database.
- Create a query to list each customer and their orders (order id and order date). Are there any customers who have not placed an order?
- Modify the above query to list each customer and the number of orders they have placed (include all customers).
Exercise \(\PageIndex{5}\)
Consider the library database.
- Create a query that will list every book and the date it was borrowed. Include all books in your result.
- Create a query to list every library member and the dates they borrowed books. Include all members
- Try creating a query that will list books that have never been borrowed.
- Try creating a query to list any members who have not borrowed a book.
Cartesian Product
Suppose you create a query, but without a join criteria. This is easily done by clicking on the relationship line and deleting it. When criteria for matching rows is not present, then each row of one table will join to each row of the other table.
This type of join is called a Cartesian Product and these can easily have very large result sets. If Department has 4 rows and Employee has 100 rows then the Cartesian Product has (4x100=) 400 rows. Databases used in practice have hundreds, thousands, even millions of rows; a Cartesian Product may take a long, long time to run.
Exercises (Set 3)
Exercise \(\PageIndex{6}\)
Consider the Sales database and its Store and Product tables. Construct a query to list the storeID and the productID. When you add Store and Product to the relationships area there is a line joining the two tables. Delete the join line. Run the query. Notice how many rows there are; the number of rows in the result set is the number of stores times the number of products.Exercise \(\PageIndex{1}\)
Exercise \(\PageIndex{7}\)
Consider the Sales database and its Store, Product, and Sales tables. Suppose we want to obtain a list that shows for each store and product the total quantity sold. Note that the end user wants to see every store and product combination.
The second query is developed as a join between the query CP and the table Sales. CP is outer-joined to Sales in order that every combination of Store and Product is in the final result.
- Hint
-
An approach you can use with MS Access is to create two queries. The first of these performs a cross product of store and product (call this CP).
Self-Join
A self-join, also called a recursive join, is a case where a table is joined to itself.
Consider the Company database and suppose we must obtain a list of employees who report to another employee named Raphael Delaney (i.e. List the employees Raphael Delaney supervises). To do this we need to find the row in Employee for Raphael Delaney and then join that row to other rows of Employee where the supervisor field is equal to the empId field for Raphael. When we build the query in MS Access we simply add the Employee table to the relationships area twice. One copy of Employee will be named Employee_1. Consider the following query:
Note the following:
- the criteria specifies the row in Employee will be that of Raphael Delaney
- the join line connects supervisor to empId and so rows of Employee_1 will be employees who report to Raphael.
Exercises (Set 4)
Exercise \(\PageIndex{8}\)
Consider the Genealogy database and develop queries to obtain:
- the father of Peter Chan.
- the mother of Peter Chan.
- the father and mother of Peter Chan.
- the children of Peter Chan.
- the grandchildren of Peter Chan.
Exercise \(\PageIndex{9}\)
Consider the Orders database and the Employee table.
- Write a query to list the employee who does not report to anyone.
- Write a query to list each employee and the number of employees they supervise.
Anti-Join
Suppose we need to list persons in our Company database that are not supervising anyone. One way of looking at this problem is to say we need to find those people that do not join to someone else based on the supervises relationship. That is, we need to find those employees whose employee id does not appear in the supervisor field of any employee.
To do this with MS Access we can construct a query that uses an outer join to connect an employee to another employee based on employeeID equaling supervisor, but where the supervisor value is null. That is, we are looking for an employee who, in an outer join, does not join to another employee. See the query below:
This query involves a join, specifically an outer join, and because it retrieves that rows that do not join, it is sometimes referred to as a special case – an anti-join.
Exercise \(\PageIndex{10}\)
Consider the Genealogy database and develop a query to find people that do not have any sons.
Exercise \(\PageIndex{11}\)
Consider the Genealogy database and develop a query to find people that do not have any daughters.
Non-Equi Join
A non-equi join is any join where the join criteria does not specify equals, “=”.
Suppose we wish to list all persons in the Genealogy database who are younger than, say, Peter Chan. One approach to getting the results is to join the row for Peter Chan to another row in Person where the birthdate of Peter Chan is greater than the birthdate of the other person. This type of join would be a “greater than” join as opposed to an equi-join. Proceed in the following way:
- Add Person to the relationships area twice so there is a Person table and a Person_1 table. If there are any relationship lines delete them.
- In the criteria line for Person fields: for firstName type “Peter” and for LastName type “Chan”.
- In the criteria line for birthDate in Person_1 type “> [Person].[birthDate]”
In this way you are creating a “greater than” join.
- Include attributes from Person_1 to display these younger people.
- Run your query.
Exercises (Set 5)
Exercise \(\PageIndex{12}\)
Consider the genealogy database.
- Run the example from above.
- Modify the example to list those people who are older than Peter Chan.