what is Normalization means..?

Answer Posted / latesh sarode

REDUCING DATA REDUNDANCY

A table structure has been suggested for employees database.
After testing the structure with data we find that there is
a problem in adding, deleting and modifying data.

We see that the table structure could lead to inconsistency
in data and that it occupies lot of disk space.

The table structure is as follows
Employee code
First name
Last name
Candidate code
Address
City
Telno
Qualification
Birthdate
Skill

The table with some sample data is as follows
Emp code First Name Last Name City …………. Skill
00001 Angela Smith Mumbai Communication
00001 Angela Smith Mumbai Presentation
00001 Angela Smith Mumbai Team Leading
00002 Betty Williams Pune Communication
00002 Betty Williams Pune Presentation
00002 Betty Williams Pune Convincing
00002 Betty Williams Pune Planning
00003 Carol Jones Hyderabad Communication
00003 Carol Jones Hyderabad Team leading
00003 Carol Jones Hyderabad Inter personal




Solution

Identify how to reduce data redundancy and inconsistency in
updating, adding and deleting data.

Understanding Data Redundancy

Redundancy means repetition of data. Redundancy increases
the time involved in updating, adding and deleting data. It
also increases utilization of disk space and hence disk I/O.

Redundancy can therefore lead to

Update anomalies – Inserting, modifying and deleting data
may cause inconsistencies

Inconsistencies – Errors are most likely to occur when facts
are repeated

Un-necessary utilization of disk space

We can use systematic approaches like normalization to
reduce redundancy and duplicity

“Normalization is a scientific method of breaking down
complex table structures into simple table structures by
using certain rules. We can therefore reduce redundancy in a
table and eliminate the problems of inconsistency and disk
space usage.”










Normalization has numerous benefits

It enables faster sorting and index creation, few nulls and
an increase in compactness of database.

Normalization helps to simplify the structure of tables

“Performance of an application is directly linked to the
database design”

A poor design hinders the performance of the system

Logical design of the database lays the foundation for an
optimal database

Rules to be followed to achieve a good database design

Each table should have an identifier (tablename)

Each table should store data for single type of entity

Columns that accept nulls should be avoided

Repetition of columns or values should be avoided

Normalization results in formation of tables that satisfy
certain specified rules and represent certain normal forms

The normal forms are used to ensure that various types of
anomalies and inconsistencies are not introduced in the
database.

A table structure is always in certain normal form






First Normal Form (1 NF)

A table is said to be in 1st Normal Form when each cell
contains precisely one value.

Following is the content of project table

Emp code Dept Proj Code Hours
E 101 Systems P27 90
P51 101
P20 60
E 305 Sales P27 109
P 22 98
E 508 Admin P 51 NULL
P27 72

The data in the table is not normalized because the cells in
Projcode and Hours have more than 1 value.

By applying 1NF definition to the project table we would
arrive at the following:-

Emp code Dept Proj Code Hours
E 101 Systems P27 90
E 101 Systems P51 101
E 101 Systems P20 60
E 305 Sales P27 109
E 305 Sales P 22 98
E 508 Admin P 51 NULL
E 508 Admin P27 72








Functional Dependency
Normalization theory is based on the fundamental notion of
functional dependency

Given a relation R, attribute A is functionally dependent on
attribute B if each value of A in R is associated with
precisely one value of B

In other words attribute A is functionally dependent on B if
and only if for each value of B there is exactly one value of A
Thus attribute B is called determinant

For Example consider the employee table

Code Name City
E1 Rajesh Delhi
E2 Ashish Mumbai
E3 Manish Kolkata
Given a particular value of code there is precisely one
corresponding value for name. For example for code E1 there
is exactly one value of Name (Rajesh). Hence name is
functionally dependent on code. Similarly there is exactly
one value of city for each value of code. Hence attribute
city is functionally dependent on code. The attribute code
is determinant. We can also say that Code determines name
and city.













Second Normal Form (2NF)

A table is said to be in 2nd normal form when it is in 1NF
and every attribute in the row is functionally dependent
upon the whole key and not just part of the key.

Consider the project table

Empcode
Projcode
Dept
Hours

The table has following rows

Empcode Projcode Dept Hours
E101 P27 Systems 90
E305 P27 Finance 10
E508 P51 Admin NULL
E101 P51 Systems 101
E101 P20 Systems 60
E508 P27 Admin 72

The situation could lead to following problems

Insertion - The department of a particular employee cannot
be recorded until the employee is assigned a project

Updation - For a given employee the employee code and
department are repeated several times. Hence if an employee
is transferred to another department, this change will have
to be recorded in every row of the employee table pertaining
to that employee. Any omission will lead to inconsistencies.

Deletion - If an employee completes work on a project the
employees record will be deleted as a result the information
regarding the department to which the employee belongs will
also be lost.



The primary key here is composite (Empcode + Projcode)

The table satisfies definition of 1NF, we now need to check
if it satisfies 2NF

In the table for each value of empcode, there is more than 1
value of hours. Thus for empcode E101 there are 3 values of
hours 90,101 and 60 respectively.

Hence hours is not functionally dependent on Empcode.

Similarly for each value of projcode there is more than
1value of hours thus for projcode P27, there are three
values of hours 90, 10 and 72.

However for a combination of ecode and projcode there is
exactly one value of hours hence hours is functionally
dependent on the whole key ecode + projcode

Now we must check if dept is functionally dependent on the
whole key ecode + projcode

For each value of ecode there is exactly 1 value of dept,
for ecode E101 there is exactly 1 value system department.
Hence dept is functionally dependent on ecode.

However for each value of projcode there is more than 1
value of department, for projcode P27 there are 2 values of
department System and Finance. Hence Dept is not
functionally dependent on p4rojcode. Dept is therefore
functionally dependent on part of key (which is ecode) and
not functionally dependent on whole key (empcode + projcode).

Therefore the table Project is not in 2NF, for a table to be
in 2NF, the non key attributes must be fully functionally
dependent on the whole key and not just part of key.

Converting a table into 2NF

• Find and remove attributes that are functionally dependent
on only a part of the key and not on the whole key, place
them in a different table.
• Group the remaining attributes.

To convert the project table into 2NF we must remove the
attributes that are not fully functionally dependent on the
whole key and place them in a different table along with the
attribute that it is functionally dependent on.

In the above example since Dept is not fully functionally
dependent on the whole key ecode+projcode, we place dept
along with ecode in a separate table called Employeedept

Now the project table will contain Ecode, Projcode and Hours
EmployeeDept

Ecode Dept
E101 Systems
E305 Sales
E508 Admin

Project
Ecode Projcode Hours
E101 P27 90
E101 P51 101
E101 P20 60
E305 P27 10
E508 P51 NULL
E508 P27 72






Third Normal Form (3 NF)

A relation is said to be in 3 NF when it is in 2 NF and
every non key attribute is functionally dependent only on
the primary key.

Consider the employee table
Ecode Dept Depthead
E101 Systems E901
E305 Finance E906
E402 Sales E906
E508 Admin E908
E607 Finance E909
E608 Finance E909

The problem with dependencies of this kind are :-

Insertion - The department head of a new department that
does not have any employees at present cannot be entered in
the depthead column . This is because the primary key is
unknown.

Updation - For a given department, the code for a particular
department head (depthead) is repeated several times. Hence,
if a depthead moves to another department, the change will
have to be made consistently across the table.

Deletion - If the record of an employee is deleted,
information regarding the head of the department will also
be deleted, hence there is loss of information.

We must check if the table is in 3 NF, since each cell in
the table has a single value the table is in 1 NF
The primary key in employee table is ecode . For each value
of ecode there is exactly one value of dept. Hence the
attribute dept is functionally dependent on the primary key,
Ecode. Similarly for each value of Ecode there is exactly
one value of depthead. Hence depthead is functionally
dependent on the primary key ecode. Hence all the attributes
are functionally dependent on the whole key Ecode. Hence
table is in 2 NF

However the attribute depthead is dependent on the attribute
dept also. As per 3 NF all non-key attributes have to be
functionally dependent only on the primary key. This table
is not in 3 NF since depthead is functionally dependent o
dept which is not a primary key.

Converting a table into 3 NF

Find and remove non key attributes that are functionally
dependent on attributes that are not the primary key. Place
them in a different table.
Group the remaining attributes.
To convert the table employee into 3NF we must remove the
column depthead since it is not functionally dependent only
on the primary key Ecode and place it in another table
called department along with the attribute dept which it is
functionally dependent on.

Employee
Ecode Dept
E101 Systems
E305 Finance
E402 Sales
E508 Admin
E607 Finance
E608 Finance

Department
Dept Depthead
Systems E901
Sales E906
Admin E908
Finance E909

Is This Answer Correct ?    4 Yes 1 No



Post New Answer       View All Answers


Please Help Members By Posting Answers For Below Questions

Does facebook use graph database?

607


What is a sequence? Explain it with one example?

647


WHAT IS NEED BY METION DATFMT AND TIMFMT IN H-SEPICITION IN RPGLE?

2479


what are wrappers and build ups and what it's use?

1694


Explain transaction manager?

619






How to do Data integrity testing? Who does this testing (Developer or tester)?

1610


what is the role of auditor in auditing data oriented applications?

1925


What is the purpose of TNS?

660


Which is the most reliable programming language for RDBMS Relational Database Management System for Multi user applicaton. For PC application or web application ?

2030


What is table scan and index scan?

613


What are different types of joins used in?

628


What is the purpose of firebase?

546


how to reject duplicates in source sequential file and if by filter option then where is that filter option in the sequential file.

1107


how node connect with cpu?

1664


How to implement database security?

535