Normalisation

  • Normalisation is a step towards DB optimisation
  • Normalisation is used to minimise the redundancy from a relations.
  • It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
  • Normalisation divides the composite attributes into individual attributes OR larger table into smaller and links them using relationships.

Functional Dependency (FD)

  • It’s a relationship between the primary key attribute (usually) of the relation to that of the other attribute of the relation.
  • X -> Y, the left side of FD is known as a Determinant, the right side of the production is known as a Dependent.
  • Types of FD
    1. Trivial FD
      • A → B has trivial functional dependency if B is a subset of A.
      • A → A, B → B are also Trivial FD.
    2. Non-trivial FD
      1. A → B has a non-trivial functional dependency if B is not a subset of A. [i.e A intersection B is NULL].
  • Rules of FD (Armstrong’s axioms)
    • Reflexive
      1. If ‘A’ is a set of attributes and ‘B’ is a subset of ‘A’. Then, A→ B holds.
      2. If A ⊇ B then A → B.
    • Augmentation
      1. If B can be determined from A, then adding an attribute to this functional dependency won’t change anything.
      2. If A→ B holds, then AX→ BX holds too. ‘X’ being a set of attributes.
    • Transitivity
      1. If A determines B and B determines C, we can say that A determines C.
      2. if A→ B and B→ C then A→ C.

Why Normalisation?

  • To avoid redundancy in the DB, not to store redundant data.
  • Issue if we have redundant data
    • Insertion, deletion and updation anomalies arises

Anomalies

  • Anomalies means abnormalities
  • There are three types of anomalies introduced by data redundancy
    • Insertion anomaly
      • When certain data (attribute) can not be inserted into the DB without the presence of other data.
    • Deletion anomaly
      • The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some other important data.
    • Updation anomaly (or modification anomaly)
      • The update anomaly is when an update of a single data value requires multiple rows of data to be updated.
      • Due to updation to many places, may be Data inconsistency arises, if one forgets to update the data at all the intended places.
  • Due to these anomalies
    • DB size increases
    • DB performance become very slow

Types of Normal forms

  1. First Normal Form (1NF)
    1. Every relation cell must have atomic value.
    2. Relation must not have multi-valued attributes
  2. Second Normal Form (2NF)
    1. Relation must be in 1NF
    2. There should not be any partial dependency
      1. All non-prime attributes must be fully dependent on PK.
      2. Non prime attribute can not depend on the part of the PK.
  3. Third Normal Form (3NF)
    1. Relation must be in 2NF
    2. No transitivity dependency exists.
      1. Non-prime attribute should not find a non-prime attribute
  4. Boyce Codd normal form (BCNF)
    1. BCNF is the advance version of 3NF. It is stricter than 3NF.
    2. Relation must be in 3NF
    3. FD: A -> B, A must be a super key.
      1. We must not derive prime attribute from any prime or non-prime attribute.

Advantages of Normalisation

  1. Normalisation helps to minimise data redundancy.
  2. Greater overall database organisation.
  3. Data consistency is maintained in DB.