|FROM ||Ruben Safir
|SUBJECT ||Subject: [LIU Comp Sci] (fwd) Re: relational algrabra division?
|From owner-learn-outgoing-at-mrbrklyn.com Sat Dec 13 23:26:57 2014
Received: by mrbrklyn.com (Postfix)
id E766E16115D; Sat, 13 Dec 2014 23:26:56 -0500 (EST)
Received: by mrbrklyn.com (Postfix, from userid 28)
id D1314161163; Sat, 13 Dec 2014 23:26:56 -0500 (EST)
Received: from mailbackend.panix.com (mailbackend.panix.com [184.108.40.206])
by mrbrklyn.com (Postfix) with ESMTP id 2259916115D
for ; Sat, 13 Dec 2014 23:26:56 -0500 (EST)
Received: from panix2.panix.com (panix2.panix.com [220.127.116.11])
by mailbackend.panix.com (Postfix) with ESMTP id D7CCC13F98
for ; Sat, 13 Dec 2014 23:26:55 -0500 (EST)
Received: by panix2.panix.com (Postfix, from userid 20529)
id AF86F33C87; Sat, 13 Dec 2014 23:26:55 -0500 (EST)
From: Ruben Safir
Subject: [LIU Comp Sci] (fwd) Re: relational algrabra division?
User-Agent: tin/2.0.0-20110823 ("Ardenistiel") (UNIX) (NetBSD/6.1.5 (i386))
Date: Sat, 13 Dec 2014 23:26:55 -0500 (EST)
-- forwarded message --
X-Received: by 10.50.27.65 with SMTP id r1mr2469700igg.7.1418390449334;
Fri, 12 Dec 2014 05:20:49 -0800 (PST)
X-Received: by 10.182.65.229 with SMTP id a5mr7778obt.26.1418390449132; Fri,
12 Dec 2014 05:20:49 -0800 (PST)
Date: Fri, 12 Dec 2014 05:20:48 -0800 (PST)
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=18.104.22.168;
Subject: Re: relational algrabra division?
Injection-Date: Fri, 12 Dec 2014 13:20:49 +0000
Content-Type: text/plain; charset=ISO-8859-1
Xref: panix comp.databases.theory:75273
Relational division is one of the eight basic operations in Codd's relation=
al algebra. The idea is that a divisor table is used to partition a dividen=
d table and produce a quotient or results table. The quotient table is made=
up of those values of one column for which a second column had all of the =
values in the divisor. There is a really good presentation on four ways to =
do this at: http://www.cs.arizona.edu/people/mccann/research/divpresentatio=
This is easier to explain with an example. We have a table of pilots and th=
e planes they can fly (dividend); we have a table of planes in the hangar (=
divisor); we want the names of the pilots who can fly every plane (quotient=
) in the hangar. To get this result, we divide the PilotSkills table by the=
planes in the hangar.=20
CREATE TABLE PilotSkills=20
(pilot CHAR(15) NOT NULL,=20
plane CHAR(15) NOT NULL,=20
PRIMARY KEY (pilot, plane));
'Celko' 'Piper Cub'
'Higgins' 'B-52 Bomber'
'Higgins' 'F-14 Fighter'
'Higgins' 'Piper Cub'
'Jones' 'B-52 Bomber'
'Jones' 'F-14 Fighter'
'Smith' 'B-1 Bomber'
'Smith' 'B-52 Bomber'
'Smith' 'F-14 Fighter'
'Wilson' 'B-1 Bomber'
'Wilson' 'B-52 Bomber'
'Wilson' 'F-14 Fighter'
'Wilson' 'F-17 Fighter'
CREATE TABLE Hangar
(plane CHAR(15) NOT NULL PRIMARY KEY);
PilotSkills DIVIDED BY Hangar
In this example, Smith and Wilson are the two pilots who can fly everything=
in the hangar. Notice that Higgins and Celko know how to fly a Piper Cub, =
but we don't have one right now. In Codd's original definition of relationa=
l division, having more rows than are called for is not a problem.=20
The important characteristic of a relational division is that the CROSS JOI=
N (Cartesian product) of the divisor and the quotient produces a valid subs=
et of rows from the dividend. This is where the name comes from, since the =
CROSS JOIN acts like a multiplication operator.=20
Division with a Remainder
There are two kinds of relational division. Division with a remainder allow=
s the dividend table to have more values than the divisor, which was Codd's=
original definition. For example, if a pilot can fly more planes than just=
those we have in the hangar, this is fine with us. The query can be writte=
n in SQL-89 as
SELECT DISTINCT pilot=20
FROM PilotSkills AS PS1
WHERE NOT EXISTS=20
WHERE NOT EXISTS=20
FROM PilotSkills AS PS2
WHERE (PS1.pilot =3D PS2.pilot)
AND (PS2.plane =3D Hangar.plane)));
The quickest way to explain what is happening in this query is to imagine a=
n old World War II movie where a cocky pilot has just walked into the hanga=
r, looked over the fleet, and announced, "There ain't no plane in this hang=
ar that I can't fly!" We are finding the pilots for whom there does not exi=
st a plane in the hangar for which they have no skills. The use of the NOT =
EXISTS() predicates is for speed. Most SQL systems will look up a value in =
an index rather than scan the whole table. The SELECT * clause lets the que=
ry optimizer choose the column to use when looking for the index.=20
This query for relational division was made popular by Chris Date in his te=
xtbooks, but it is not the only method nor always the fastest. Another vers=
ion of the division can be written so as to avoid three levels of nesting. =
While it is not original with me, I have made it popular in my books.=20
FROM PilotSkills AS PS1, Hangar AS H1
WHERE PS1.plane =3D H1.plane
GROUP BY PS1.pilot=20
HAVING COUNT(PS1.plane) =3D (SELECT COUNT(plane) FROM Hangar);
There is a serious difference in the two methods. Burn down the hangar, so =
that the divisor is empty. Because of the NOT EXISTS() predicates in Date's=
query, all pilots are returned from a division by an empty set. Because of=
the COUNT() functions in my query, no pilots are returned from a division =
by an empty set.=20
In the sixth edition of his book, INTRODUCTION TO DATABASE SYSTEMS (Addison=
-Wesley; 1995 ;ISBN 0-201-82458-2), Chris Date defined another operator (DI=
VIDEBY ... PER) which produces the same results as my query, but with more =
The second kind of relational division is exact relational division. The di=
vidend table must match exactly to the values of the divisor without any ex=
FROM PilotSkills AS PS1
LEFT OUTER JOIN
Hangar AS H1
ON PS1.plane =3D H1.plane
GROUP BY PS1.pilot=20
HAVING COUNT(PS1.plane) =3D (SELECT COUNT(plane) FROM Hangar)
AND COUNT(H1.plane) =3D (SELECT COUNT(plane) FROM Hangar);
This says that a pilot must have the same number of certificates as there p=
lanes in the hangar and these certificates all match to a plane in the hang=
ar, not something else. The "something else" is shown by a created NULL fro=
m the LEFT OUTER JOIN.
Please do not make the mistake of trying to reduce the HAVING clause with a=
little algebra to:
HAVING COUNT(PS1.plane) =3D COUNT(H1.plane)=20
because it does not work; it will tell you that the hangar has (n) planes i=
n it and the pilot is certified for (n) planes, but not that those two sets=
of planes are equal to each other.
Note on Performance=20
The nested EXISTS() predicates version of relational division was made popu=
lar by Chris Date's textbooks, while the author is associated with populari=
zing the COUNT(*) version of relational division. The Winter 1996 edition o=
f DB2 ON-LINE MAGAZINE (http://www.db2mag.com/96011ar:htm) had an article e=
ntitled "Powerful SQL:Beyond the Basics" by Sheryl Larsen which gave the re=
sults of testing both methods. Her conclusion for DB2 was that the nested E=
XISTS() version is better when the quotient has less than 25% of the divide=
nd table's rows and the COUNT(*) version is better when the quotient is mor=
e than 25% of the dividend table.=20
-- end of forwarded message --