• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » Knowledge Discovery in Databases : Part II

By Abhishek Ghosh August 27, 2018 3:54 pm Updated on August 27, 2018

Knowledge Discovery in Databases : Part II

Advertisement

In Part I of Knowledge Discovery in Databases, we discussed about the database systems, fundamentals of statistics and Big Data and fundamentals of knowledge discovery in databases. In this second part of Knowledge Discovery in Databases, we will discuss the process of the Knowledge Discovery in Databases and Methods of the Knowledge Discovery in Databases. Overall, knowledge discovery in databases is a good way to extract knowledge from data. Increasing the data collected from all areas is likely to extract more and more knowledge. The new knowledge can then be used for further evaluations. However, with the new knowledge, it must be checked whether the statistical basis of this knowledge is correct in order to avoid serious mistakes for the future.

 

Knowledge Discovery in Databases : Process of the KDD

 

Data selection

The first step is to gain an understanding of the application and the already familiar application knowledge. Based on this, the goal is defined in order to reach the previously unknown knowledge. In addition, the desired knowledge must have a useful added value for the application. In the first process step, the data is selected in which the knowledge is to be searched. In the simplest case, we access an existing database. Within the database different tables can be selected. If no database is available, the data must be entered manually during data selection. This can be done, for example, through surveys or other similar methods.

Advertisement

---

After the data collection, the proper management of the data is an advantage. This is often done by a file specially created for data mining. However, because most data is managed in commercial database systems, it creates redundancy. However, it then offers functionalities in all process steps that can be used profitably. For example, subsets can be selected easily and efficiently. It is therefore increasingly desirable to integrate the KDD with commercial database systems.

Preprocessing

The goal in this process step is to integrate the required data and create consistency. Missing attributes in the data are also filled in so that no gaps falsify the data mining process. The preprocessing process, or transformation process, usually generates the greatest effort within a KDD process. This effort can theoretically be reduced by the use of a data warehouse, which is a durable, integrated collection of data from different sources for the purpose of analysis and decision support.

When data is obtained in different ways from different sources, it must be integrated because different names could have been used for the same attributes. In addition, inconsistencies such as different values ​​of the same attribute or spelling errors must be resolved.

A survey can also create a so-called noise, whereby a random pattern can superimpose the actual patterns. Such noise usually arises from the appearance of measurement errors or intentionally unanswered questions. Depending on the algorithm used, missing attribute values ​​must also be specified more precisely. For example, a distinction can then be made between measurement not performed and measurement error.

Transformation

In this step, the preprocessed data is transformed into a representation suitable for the purpose of knowledge discovery in databases. This means that not all known attributes of the data are also relevant for the data mining process. One of the typical transformations is attribute selection.

Although many algorithms already make their own selection of attributes, too many attributes can affect the efficiency and the result of the data mining. The attribute selection is therefore advantageous if there is sufficient application knowledge about the meaning of the attributes and the given data mining task. Then a manual attribute selection can be performed. Alternatively, an automatic attribute selection must be performed.

A complete algorithm that considers all subsets of the set of attributes is too expensive. Instead, heuristic algorithms are used. It is more likely that the empty set or total set of attributes is used to add or remove the attribute that achieves the best score for the resulting attribute set in relation to the set data mining task.

Some data mining algorithms can not process numeric attributes but only categorical attributes. This then requires a discretization, that is a transformation of numerical attributes into categorical attributes. Simple methods simply divide the range of values ​​into intervals of equal length or frequency of contained attributes. More complex methods take into account class membership and form intervals of gaining information about class affiliation. Attribute values ​​of objects of the same class are assigned to the same interval as possible.

Data Mining

Data Mining is the application of efficient algorithms that find the valid patterns contained in a database.
However, the two terms data mining and knowledge discovery are now used synonymously in databases.

Data mining is a collective term for various computer-aided procedures that are used to analyze large databases. Scientists define the term data mining as – Data mining is a problem-solving methodology that finds a logical or mathematical description, eventually of a complex nature, of patterns and regularities in a set of data. Data Mining aims to find patterns in a database that can be represented using logical or mathematical descriptions. Data mining offers the possibility of automatically generating new hypotheses, as opposed to traditional statistical methods used to validate given hypotheses.

First, the relevant data mining task is identified. Clustering or discovering outliers follows the goal of dividing the database into groups of objects. A group of objects that are as similar as possible and other groups of objects that are as dissimilar as possible. There are outliers that can not be assigned to any group.
The classification specifies training objects with attribute values ​​already assigned to a class. A function is then created to subdivide the future objects into classes based on their attribute values.
The goal of generalization is to describe a set of data as compactly as possible by generalizing the attribute values. Thus, at the same time the number of records is reduced.

Based on the application objective and the type of data, a suitable algorithm is subsequently selected. Data with categorical attributes requires a different algorithm than data with numeric attributes.

Interpretation

In the last step of knowledge discovery in databases, the patterns found are presented appropriately. Many found patterns or many used attributes complicate this project. Often a visualization is more helpful than a textual output. When the representation of the found patterns is optimized, the patterns are evaluated by an expert. It deals with this process with his existing application knowledge in relation to the initially defined goals.

If, according to the expert, the goals are not yet all reached, the knowledge discovery process has to be run through again. We can choose any entry point. For example, the data mining process can be performed using the same algorithm and other parameters. Once the expert declares the evaluation successful, the knowledge found is documented. The newly acquired knowledge can serve as a new basis for future knowledge discovery processes in order to develop further knowledge.

Knowledge Discovery in Databases Part II

 

Overview of Methods of Knowledge Discovery in Databases

 

The methods used refer to the process of data mining in the course of the KDD. The basics have already been touched on in the field of data mining and will be revisited and expanded, among other things, in this article. One of the main purposes of these methods is to identify patterns in data. In this case a number of algorithms are used which often have their origins in mathematics and statistics. In the following, common methods are listed and explained in details next part of this series.

  1. Generalization
  2. Clustering – Cluster analysis with multivariate statistical methods, Clustering with Artificial Neural Networks
  3. Classification
  4. Association analysis
  5. Regression analysis

 

Conclusion of Part II

 

This part, mostly around data sciences. The topic “Knowledge Discovery in Databases” is too bigger to fit within one or two article. In the next article, we will go in to details of methods of Knowledge Discovery in Databases, application examples and draw conclusion.

Tagged With knowledge discovery in databases

This Article Has Been Shared 421 Times!

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Knowledge Discovery in Databases : Part II

  • Difference Between Data Warehouse And Data Lake

    What Is The Difference Between Data Warehouse And Data Lake? Data warehouses is four decade old established concept. Data lake is a new idea.

  • What is Predictive Analytics?

    What is Predictive Analytics? Predictive analysis encompasses a variety of data-based knowledge to make predictive assumptions about the future events.

  • Install Apache Gearpump On localhost (Ubuntu, Windows 10 Bash, Mac)

    Apache Gearpump is a real-time big data streaming engine, it is event/message based. Here is How to Install Apache Gearpump On localhost or Cloud Server.

  • Differences Between Batch Processing and Stream Processing

    Stream Processing is popularized showing Hadoop can give us results faster. MapReduce is the Batch Processing System of Hadoop system.

  • Wearables and Big Data : Business Scope and Brief Details

    This article on “Wearables and Big Data” explores the relationship between wearable technology and ever-growing amounts of data – big data.

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • The Importance of Voice and Style in Essay Writing April 1, 2023
  • What Online Casinos Have No Deposit Bonus in Australia March 30, 2023
  • Four Foolproof Tips To Never Run Out Of Blog Ideas For Your Website March 28, 2023
  • The Interactive Entertainment Serving as a Tech Proving Ground March 28, 2023
  • Is it Good to Run Apache Web server and MySQL Database on Separate Cloud Servers? March 27, 2023

About This Article

Cite this article as: Abhishek Ghosh, "Knowledge Discovery in Databases : Part II," in The Customize Windows, August 27, 2018, April 2, 2023, https://thecustomizewindows.com/2018/08/knowledge-discovery-in-databases-part-ii/.

Source:The Customize Windows, JiMA.in

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie SettingsAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT