Data purging is nothing but deleting your data from DW.
Sometimes While loading data into staging or target table
you may need to load fresh data everytimes (called full
load). In this case you need to purge the complete
stage/target table before load it with Fresh data.
Data purging is just deleting the old data from the tables
in the database.
In my opinion data purging is not needed in transaction
tables/records such as sales, purchase etc.
Suppose you have a database application, which records
security information like logins, logouts, modifications on
the data or audit trails/history of modifications , then
it is obvious that this information will require additional
space. Since this information will help security
administrators for investigation purpose only you can take
a backup of this information and can delete.
Furthermore its up to your requirement, that at a given
point of time you want to retain data for last (N no of
months) then you can keep, otherwise delete all.
Occasionally, it is necessary to remove large amounts of
data from a data warehouse. A very common scenario is the
rolling window discussed previously, in which older data is
rolled out of the data warehouse to make room for new data.
However, sometimes other data might need to be removed from
a data warehouse. Suppose that a retail company has
previously sold products from Company ABC, and that Company
ABC has subsequently gone out of business. The business
users of the warehouse may decide that they are no longer
interested in seeing any data related to Company ABC, so
this data should be deleted.
This process is data purging.
The purge process moves data between and deletes data from three categories of data, or data sets:
The current data set, which contains data that needs to be available to users. Users can change or review the data.
The history data set, which contains data that can only be reviewed or aggregated by users. Users can't change the data.
The dormant data set, which contains data that falls out of the availability threshold but needs to be stored to meet legal or business requirements. This data is archived, so it can't be changed, reviewed, or aggregated by users.
The purge process is configurable and dynamic, so new categories or subcategories can be added to accommodate changes in legal and business requirements. For example, the current data set could be further classified into read-only data and updateable data. By hosting the read-only data on read-only file groups, you can improve SQL Server performance.
Use the PURGE statement to remove a table or index from your
recycle bin and release all of the space associated with the
object, or to remove the entire recycle bin, or to remove
part of all of a dropped tablespace from the recycle bin.
Information sources may be unreliable and may purge data. On
the other hand, information at the warehouse is under the
control of the warehouse users; it can be stored safely and
reliably for as long as necessary.
Is This Answer Correct ?
Other Data Warehouse General Interview Questions
Which one among Star and snowflake schemas will occupy more