Normalization is the process of efficiently organizing data
in a database. There are two goals of the normalization
process: eliminating redundant data (for example, storing
the same data in more than one table) and ensuring data
dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the
amount of space a database consumes and ensure that data is
Database normalization is a technique for designing
relational database tables to minimize duplication of
information and, in so doing, to safeguard the database
against certain types of logical or structural problems,
namely data anomalies.
1NF Eliminate Repeating Groups - Make a separate table for
each set of related attributes, and give each table a
2NF Eliminate Redundant Data - If an attribute depends on
only part of a multi-valued key, remove it to a separate table.
3NF Eliminate Columns Not Dependent On Key - If attributes
do not contribute to a description of the key, remove them
to a separate table.
BCNF Boyce-Codd Normal Form - If there are non-trivial
dependencies between candidate key attributes, separate them
out into distinct tables.
4NF Isolate Independent Multiple Relationships - No table
may contain two or more 1:n or n:m relationships that are
not directly related.
5NF Isolate Semantically Related Multiple Relationships -
There may be practical constrains on information that
justify separating logically related many-to-many relationships.
ONF Optimal Normal Form - a model limited to only simple
(elemental) facts, as expressed in Object Role Model notation.
DKNF Domain-Key Normal Form - a model free from all
First Normal form (1NF): A relation is said to be in 1NF if
it has only single valued attributes, neither repeating nor
arrays are permitted.
Second Normal Form (2NF): A relation is said to be in 2NF if
it is in 1NF and every non key attribute is fully functional
dependent on the primary
Third Normal Form (3NF): We say that a relation is in 3NF if
it is in 2NF and has no transitive dependencies.
Boyce-Codd Normal Form (BCNF): A relation is said to be in
BCNF if and only if every determinant in the relation is a
Fourth Normal Form (4NF): A relation is said to be in 4NF if
it is in BCNF and contains no multi valued attributes.
Fifth Normal Form (5NF): A relation is said to be in 5NF if
and only if every join dependency in relation is implied by
the candidate keys of relation.
Domain-Key Normal Form (DKNF): We say that a relation is in
DKNF if it is free of all modification anomalies. Insertion,
Deletion, and update anomalies come under modification
Normalization is a systematic way of ensuring that a
database structure is suitable for general purpose querying
and free of certain undesirable characteristics—insertion,
update, and deletion anomalies that could lead to a loss of
a systematic way of ensuring that a database structure is
suitable for general purpose querying & free of certain
undesirable characteristics insertion,update,& deletion
anomalies that could lead to a lose of data integrity
This topic is a bit digressing from a DW point of view. But
it is better that we know about Normalization. Once we
understand Normalization & DW facts & dimensions, the schema
concepts would be clearer. It would help us understand why
reporting is easier & faster from a DW. There are 5 types of
Normalization. But for now it’s enough to understand 3 types
of Normalization. Normalization helps in reducing data
redundancy. As we move towards higher normalization
1NF: This type of normalization states that there must not
be any duplicates in the tables that we use. In other words,
all the tables used must have a primary key defined.
2NF: This type of normalization states that data redundancy
can be reduced if attributes those are dependent on one of
the keys of a composite primary key are isolated to a
separate table. Not only does this reduces data redundancy
but also helps in increasing data retention when a delete is
done. For example, consider a table that has the following
columns: Part Id, State, City, and Country. Here, assume
Part Id & Country form the composite primary key. The
attributes state & city depend only on the country. 2NF
states that if such is the case then split the table into 2
tables. One with Part Id & country as the columns. Other
with Country, state & city as the columns. In the 1st table
if a delete is made to all the rows with Part Id = ‘X’ then
we would lose country related data too. But in the 2nd case
this would not happen.
3NF: This type of normalization states that if a dependency
exists on certain attributes other than the primary key then
the table split depending on the dependency has to be done.
Consider the same example above. In the present case
consider that Part Id is the only primary key. Now state,
city depend only on country & not on Part Id. This table is
already in 1NF & 2NF. But to achieve 3NF we would do the
same split as above.