• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here:Home » What Does Data Cleansing Mean?

By Abhishek Ghosh February 21, 2021 6:44 pm Updated on February 21, 2021

What Does Data Cleansing Mean?

Advertisement

Data cleansing includes various methods for removing and correcting data errors in databases or other information systems. For example, the errors may consist of incorrect (originally incorrect or outdated), redundant, inconsistent, or incorrectly formatted data. Key steps for data cleansing are duplicate detection (detecting and merging the same data sets) and data fusion (merging and completing patchy data). Data cleansing is a contribution to improving the quality of information. However, information quality also affects many other characteristics of data sources (credibility, relevance, availability, costs, etc) that cannot be improved by means of data cleansing.

 

Data Cleansing Process

 

The process of cleaning up the data is divided into five successive steps):

  • Make a backup copy of the file/table
  • Data Quality – Setting Data Requirements
  • Analysis of the data
  • Standardization
  • Cleanup of the data

Data Quality Requirements

Advertisement

---

High-quality and reliable data must meet certain requirements :

  • valid data: same data type, certain maximum values, etc.
  • complete data
  • uniform data: same unit (i.e. currency, weight, length)
  • integral data: Data must be protected from intentional and/or unintentional manipulation.

Analysis of data

Once the requirements have been clarified, the data must be checked, i.e. with the help of the checklists, whether the data is of the required quality.

Standardization of data before cleanup

For a successful cleanup, the data must first be standardized. For this purpose, these are first structured and then standardized. The structuring brings the data into a uniform format, for example, a date is brought into a uniform data format (01.09.2009) or composite data is broken down into its components, i.e. a customer’s name into the name components Salutation, Title, First Name and Last Name. In most cases, such structuring is not trivial and is carried out with the help of complex parsers.

During standardization, the existing values are mapped to a standardized value list. This standardization may be carried out, for example, academic titles or company additions.

Cleaning up data

There are six methods to clean up the data that can be applied individually or in combination:

  • Derive from other data: The correct values are derived from other data (i.e. salutation from the gender).
  • Replace with other data: The corrupted data is replaced by other data (i.e. from other systems).
  • Use Default values: Default values are used instead of the incorrect data.
  • Remove incorrect data: The data is filtered out and not further processed.
  • Remove duplicates: Duplicates are identified through duplicate detection, the non-redundant data is
  • Consolidation from the duplicates, and a single data set is formed from them.
  • Split summary: In contrast to the removal of duplicates, incorrectly summarized data is separated again.

What Does Data Cleansing Mean
Storage of the faulty data

Before cleaning up the data, you should save the original, erroneous data as a copy, and not simply delete it after the cleanup. Otherwise, the adjustments would not be comprehensible, and such a process would not be audit-proof.

An alternative is to store the corrected value in an additional column. Because additional disk space is required, this approach is recommended for only a few columns in a record to correct. Another option is to store it in an additional line, which increases the memory requirement even more. Therefore, it is only possible to correct a small number of records. The last option for a large number of columns and rows to correct is to create a separate table.

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to What Does Data Cleansing Mean?

  • Nginx WordPress Installation Guide (All Steps)

    This is a Full Nginx WordPress Installation Guide With All the Steps, Including Some Optimization and Setup Which is Compatible With WordPress DOT ORG Example Settings For Nginx.

  • WordPress & PHP : Different AdSense Units on Mobile Devices

    Here is How To Serve Different AdSense Units on Mobile Devices on WordPress With PHP. WordPress Has Function Which Can Be Used In Free Way.

  • Changing Data With cURL for OpenStack Swift (HP Cloud CDN)

    Changing Data With cURL For Object is Quite Easy in OpenStack Swift. Here Are Examples With HP Cloud CDN To Make it Clear. Official Examples Are Bad.

  • OpenShift OctoPress Auto install Script

    OpenShift OctoPress Auto install Script is an Advanced Script to Run OctoPress on Free OpenShift PaaS Practically Without Any Knowing Ruby or Git.

performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Market Segmentation in BriefSeptember 20, 2023
  • What is Booting?September 18, 2023
  • What is ncurses?September 16, 2023
  • What is JTAG in Electronics?September 15, 2023
  • iPhone 15 Pro Max Vs Samsung Galaxy S22/S23 UltraSeptember 14, 2023
PC users can consult Corrine Chorney for Security.

Want to know more about us?

Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy