Attributes, The Keys

<< Entity-Relationship Data Model, Classification of entity types, Attributes

Relationships:Types of Relationships in databases >>

Database Management System (CS403)

Lecture No. 08

Reading Material

"Database Systems Principles, Design and Implementation"

Section 5.4

written by Catherine Ricardo, Maxwell Macmillan.

Overview of Lecture

o Concept of Key and its importance

o Different types of keys

Attributes

Def 1:

An attribute is any detail that serves to identify, qualify, classify, quantify, or

otherwise express the state of an entity occurrence or a relationship.

Def 2:

Attributes are data objects that either identify or describe entities.

Identifying entity type and then assigning attributes or other way round; it's an "egg or

hen" first problem. It works both ways; differently for different people. It is possible that

we first identify an entity type, and then we describe it in real terms, or through its

attributes keeping in view the requirements of different users' groups. Or, it could be

other way round; we enlist the attribute included in different users' requirements and then

group different attributes to establish entity types. Attributes are specific pieces of

information, which need to be known or held. An attribute is either required or optional.

When it's required, we must have a value for it, a value must be known for each entity

occurrence. When it's optional, we could have a value for it, a value may be known for

each entity occurrence.

The Keys

Attributes act as differentiating agents among different entity types, that is, the

differences between entity types must be expressed in terms of attributes. An entity type

can have many instances; each instance has got a certain value against each attribute

defined as part of that particular entity type. A key is a set of attributes that can be used to

Database Management System (CS403)

identify or access a particular entity instance of an entity type (or out of an entity set).

The concept of key is beautiful and very useful; why and how. An entity type may have

many instances, from a few to several thousands and even more. Now out of many

instances, when and if we want to pick a particular/single instance, and many times we do

need it, then key is the solution. For example, think of whole population , the

data of all Pakistanis lying at one place, say with NADRA people. Now if at sometime

we need to identify a particular person out of all this data, how can we do that? Can we

use name for that, well think of any name, like Mirza Zahir Iman Afroz, now we may

find many people with this name in Pakistan. Another option is the combination of name

and father name, then again, Amjad Malik s/o Mirza Zahir Iman Afroz, there could be so

many such pairs. There could be many such examples. However, if you think about

National ID Card number, then no matter whatever is the population , you will

always be able to pick precisely a single person. That is the key. While defining an entity

type we also generally define the key of that entity type. How do we select the key, from

the study of the real-world system; key attribute(s) already exist there, sometimes they

don't then the designer has to define one. A key can be simple, that is, consisting of

single attribute, or it could be composite which consists of two or more attributes.

Following are the major types of keys:

Super Key

Candidate Key

Primary Key

Alternate Key

Secondary Key

Foreign Key

The last one will be discussed later, remaining 5 are discussed in the following:

o Super key

A super key is a set of one or more attributes which taken collectively, allow us to

identify uniquely an entity instance in the entity set. This definition is same as of a

key, it means that the super key is the most general type of key. For example,

consider the entity type STUDENT with attributes registration number, name, father

name, address, phone, class, admission date. Now which attribute can we use that can

uniquely identify any instance of STUDENT entity type. Of course, none of the name,

father name, address, phone number, class, admission date can be used for this

purpose. Why? Because if we consider name as super key, and situation arises that we

need to contact the parents of a particular student. Now if we say to our registration

department that give us the phone number of the student whose name is Ilyas Hussain,

the registration department conducts a search and comes up with 10 different Ilyas

Hussain, could be anyone. So the value of the name attribute cannot be used to pick a

particular instance. Same happens with other attributes. However, if we use the

registration number, then it is 100% sure that with a particular value of registration

number we will always find exactly a single unique entity instance. Once you

identified the instance, you have all its attributes available, name, father name,

Database Management System (CS403)

everything. The entity type STUDENT and its attributes are shown graphically in the

figure 1 below, with its super key "regNo" underlined.

name

regNo

EMPLOYEE

fName

address

phoneNo

Fig. 1: An entity type, its defining attributes

and super key (underlined)

Once specific characteristic with super key is that, as per its definition any combination

of attributes with the super key is also a super key. Like, in the example just discussed

where we have identified regNo as super key, now if we consider any combination of

regNo with any other attribute of STUDENT entity type, the combination will also be a

super key. For example, "regNo, name", "regNo, fName, address", "name, fName,

regNo" and many others, all are super keys.

o Candidate key

A super key for which no subset is a super key is called a candidate key, or the

minimal super key is the candidate key. It means that there are two conditions for the

candidate key, one, it identifies the entity instances uniquely, as is required in case of

super key, second, it should be minimum, that is, no proper subset of candidate key is

a key. So if we have a simple super key, that is, that consists of single attribute, it is

definitely a candidate key, 100%. However, if we have a composite super key and if

we take any attribute out of it and remaining part is not a super key anymore then that

composite super key is also a candidate key since it is minimal super key. For

example, one of the super keys that we identified from the entity type STUDENT of

figure 1 is "regNo, name", this super key is not a candidate key, since if we remove

the regNo attribute from this combination, name attribute alone is not able to identify

the entity instances uniquely, so it does not satisfy the first condition of candidate key.

On the other hand if we remove the attribute name from this composite key then the

regNo alone is sufficient to identify the instances uniquely, so "regNo, name" does

have a proper subset (regNo) that can act as a super key; violation of second

condition. So the composite key "regNo, name" is a super key but it is not a candidate

key. From here we can also establish a fact that every candidate key is a super key but

not the other way round.

o Primary Key

A candidate key chosen by the database designer to act as key is the primary key. An

entity type may have more than one candidate keys, in that case the database designer has

Database Management System (CS403)

to designate one of them as primary key, since there is always only a single primary key

in an entity type. If there is just one candidate key then obviously the same will be

declared as primary key. The primary key can also be defined as the successful candidate

key. Figure 2 below contains the entity type STUDENT of figure 1 but with an additional

attribute nIdNumber (national ID card Number).

nIdNumber

name

regNo

EMPLOYEE

fName

address

phoneNo

Fig. 2: An entity type, its defining attributes

and two candidate keys

In figure 2, we can identify two different attributes that can individually identify the

entity instances of STUDENT and they are regNo and nIdNumber, both are minimal

super keys so both are candidate keys. Now in this situation we have got two candidate

keys. The one that we choose will be declared as primary key, other will be the alternate

key. Any of the candidate keys can be selected as primary key, it mainly depends on the

database designer which choice he/she makes. There are certain things that are generally

considered while making this decision, like the candidate key that is shorter, easier to

remember, to type and is more meaningful is selected as primary key. These are general

recommendations in this regard, but finally it is the decision of the designer and he/she

may have his/her own reasons for a particular selection that may be entirely different

from those mentioned above. The relation that holds between super and candidate keys

also holds between candidate and primary keys, that is, every primary key (PK) is a

candidate key and every candidate key is a super key.

A certain value that may be associated with any attribute is NULL, that means "not

given" or "not defined". A major characteristic of the PK is that it cannot have the NULL

value. If PK is a composite, then none of the attributes included in the PK can have the

NULL, for example, if we are using "name, fName" as PK of entity type STUDENT,

then none of the instances may have NULL value in either of the name or fName or both.

o Alternate Keys

Candidate keys which are not chosen as the primary key are known as alternate keys.

For example, we have two candidate keys of STUDENT in figure 2, regNo and

nIdNumber, if we select regNo as PK then the nIdNumber will be alternate key.

o Secondary Key

Many times we need to access certain instances of an entity type using the value(s) of one

or more attributes other than the PK. The difference in accessing instances using the

Database Management System (CS403)

value of a key or non-key attribute is that the search on the value of PK will always return

a single instance (if it exists), where as uniqueness is not guaranteed in case of non-key

attribute. Such attributes on which we need to access the instances of an entity type that

may not necessarily return unique instance is called the secondary key. For example, we

want to see how many of our students belong to Multan, in that case we will access those

instances of the STUDENT entity type that contain "Multan" in their address. In this case

address will be called secondary key, since we are accessing instances on the basis of its

value, and there is no compulsion that we will get a single instance. Keep one thing in

mind here, that a particular access on the value of a secondary key MAY return a single

instance, but that will be considered as chance or due to that particular state of entity set.

There is not the compulsion or it is not necessary for secondary key to return unique

instance, where as in case of super, candidate, primary and alternate keys it is compulsion

that they will always return unique instance against a particular value.

Summary

Keys are fundamental to the concept almost any data model including the E-R data model

because they enable the unique identity of an entity instance. There are different type of

keys that may exist in an entity type.

Exercises:

� Define attributes of the entity types CAR, BOOK, MOVIE; draw them

graphically

� Identify different types of keys in each one of them

Table of Contents: