• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » Uses of Text Mining in Web Content Mining : Part I

By Abhishek Ghosh August 3, 2019 6:52 pm Updated on August 4, 2019

Uses of Text Mining in Web Content Mining : Part I

Advertisement

English philosopher Francis Bacon said “Knowledge is power”. It is still as true today as it was almost 420 years ago. However, in the past, it has always been associated with luck or tremendous diligence to acquire this knowledge, so we are experiencing a true heyday in recent years due to technological advances. In the past, if you had to roll over tons of books or spend enormous amount of research, information technologies, which are gaining in importance in the course of globalization, offer us this knowledge on a silver platter. A major role in this context plays the computer. Initially used to solve complex computing tasks, billions of people worldwide use it today for all sorts of private and professional activities.

The resulting flood of data offers enormous potential. Since the early 1990s, efforts have been made to use software to examine these data to derive new insights that were previously unrecognizable. This process is referred to as knowledge discovery and is divided into different categories based on the type (eg, tables, text, etc.) and the source (Internet, intranet, etc.) of the data. Due to the increasing relevance of this topic, both for science and for the economy, this series of articles will examine one of these disciplines, that is Text Mining, and present the application possibilities in relation to the so-called Web Content Mining.

Uses of Text Mining in Web Content Mining Part I

 

Text Mining

 

The term text mining, first appeared in 1999 as part of a Pacific Asia Knowledge Discovery and Data Mining workshop. Although these works are already 15 or 19 years old, there is still no clear definition. Thus, there are a total of six different names for this research field after Mehler and Wolff, which differ depending on the task assignment :

Advertisement

---

  • Text Mining
  • Text Data Mining
  • Textual Data Mining
  • Text Knowledge Engineering
  • Knowledge discovery in texts
  • Kowledge Discovery in Textual Databases

Four perspectives on text mining can be derived from this variety of terms :

  1. The information retrieval perspective, which considers that text mining is merely a further development of information retrieval (IR).
  2. The data mining perspective, which see text mining as data mining on textual data.
  3. The methodological perspective, which defines text mining simply as a collection of methods for evaluating texts.
  4. The knowledge-oriented perspective, which aims to generate new insights and information from existing data.

Generally speaking, IR merely describes the approach to finding existing knowledge and not gaining new insights, as is commonly the case with text mining, which is why this perspective is no longer pursued today.

The intersection of the remaining perspectives is that text mining is seen as a discipline of Knowledge Discovery. This is based on the premise of gaining new insights from data or providing information that users did not previously know was contained in the processed data .

 

Delimitation of Data Mining

 

Text Mining can be classified as Knowledge Discovery (KD). Data mining and text mining are not nearly identical steps within the KD, we can say that text mining is an extension of data mining. For data mining requires a certain structure of the data, while text mining usually extends to weak and unstructured data.

While data mining usually consists of three phases (identification, preparation & function selection, distribution analysis), Text Mining extends this process by the process of filtering out special features from weak or unstructured data.

Web Mining

Web Mining describes the process of applying data mining techniques to the Internet (World Wide Web). From this it can be deduced that web mining, like data and text mining, is a building discipline of the KDD and thus can be used to obtain unknown patterns and new insights from data (on the internet).

Initially, the assumption was that the Internet was too unstructured to use data mining techniques was largely refuted. The so-called “labeling problem”, is the main problem in web mining. By nature, most data mining techniques require a kind of tagging of the data, such as whether a website is a homepage or not. The web mining process itself consists of four sub-processes :

  1. Resource discovery, i.e. obtaining data that is available either online or offline – compared to IR.
  2. The information selection & data pre-treatment, ie the pretreatment of the data found in step 1 by, for example, the removal of “stop words” or the like.
  3. Generalization, the step in which data mining techniques are applied to visualize patterns in the data found. It should be noted that human intervention plays an important role here, since the Web is an interactive medium.
  4. The analysis or the validation and interpretation of the results.

Web Mining is generally divided into three subcategories that represent the various parts of the Internet that can be examined using the various mining techniques  :

  • Web usage mining
  • Web Structure Mining
  • Web Content Mining

Web usage mining describes applying data mining techniques to web data in order to identify usage patterns and to adapt web applications to better suit users’ usage behavior. Web usage mining is divided into three phases :

  1. pretreatment
  2. pattern recognition
  3. pattern analysis

 

In the pretreatment phase, the existing data is prepared so that it can be further processed by mining techniques. First, the usage data, ie data about user and server sessions are treated. These provide information about which user visited, when, how often, which website or part of the page. The content, text, images, scripts, multimedia files, etc. are then converted into a usable format in order to be able to assign specific content to the user’s usage behavior. Subsequently, the structure of the visited pages is pretreated similar to the content.

In the pattern recognition phase, the mining techniques are used to bring previously unknown knowledge to light according to the approach of the KDD. A selection of the techniques used there are statistical analyzes, association rules, clustering, classification, continuous patterns or dependency modeling.

The pattern analysis analyzes the patterns found in Phase 2. Basically, attempts are made to filter out uninteresting patterns and to form relevant patterns, for example by means of SQL queries or OLAP operations, to new findings.

After all, there is not only a scientific interest in web usage mining – the economy has long since recognized the enormous potential of understanding customers better. Web Structure Mining describes the process of applying data mining techniques to the structure of Web data .

The aim of this approach is to gain information about the content based on the structure of web pages and to identify similarities within a collection of data. In general, two types of structures can be assumed :

  1. Intra-page structure, i.e. data that has a certain structure within a page, such as the arrangement of different HTML or XML tags within a web page, which is mainly used in the area of ​​web content mining.
  2. Inter-page structure, i.e. data that has a certain structure between several pages, such as hyperlinks that connect pages together.

The beginnings of web-structure mining can be found in the area of ​​social network analysis. There incoming and outgoing links are examined to recognize a pattern within the resulting hierarchy.

Another push in web structure mining was made on assumption that every document, even unstructured text in its own right, in connection with other texts on a similar topic, had similar structures. These two ideas lead to the realization that even unstructured data, as they frequently occur on the Internet, can be analyzed by examining hyperlinks and the use of labels (names). Based on the two types of structures described above, it can further be said that web usage mining is not always strictly separated from web content mining.

Web Content Mining

Web Content Mining focuses on capturing the content of Web pages and, based on this, either improving users’ information in the IR’s sense, or modeling the data using databases such as: Search engines can deliver more effective results. From this, two possible perspectives on web content mining can be deduced :

  1. agent-based or IR-view
  2. database view

The first variant uses intelligent search agents to search, organize and interpret relevant information based on domain characteristics and user profiles. It also uses agents that filter or categorize information using IR techniques, and, similar to web structure mining, examine link structures to create cluster hierarchies. The third subset of the IR View uses personalized web agents that can learn user preferences and discover sources of information.

The second variant uses either multi-level databases, which arrange the data according to the degree of structure and generalization or query systems, which summarize, for example, weakly structured data and from it a database of the found Can create information.

Continue reading the next part.

Tagged With how to read a webpage in text mining , purpose of text mining within the context of knowledge discovery , text mining from websites , text mining use case internet , uses of text mining , web search by text mining , what is text mining and web content mining , what used for Web content mining (Text Mining) ?A) clustering

This Article Has Been Shared 920 Times!

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Uses of Text Mining in Web Content Mining : Part I

  • Install Apache Kafka on Ubuntu 16.04 : Single Cloud Server

    Here Are The Steps On How To Install Apache Kafka on Ubuntu 16.04 Running One Single Cloud Server Instance. Apache Kafka

  • Install Apache Zeppelin On Ubuntu 16.04

    Zeppelin Can Be Pre-Built Package Or Can Be Build From Source. Here Is How To Install Apache Zeppelin On Ubuntu 16.04 Building From Source.

  • Theoretical Foundations of Big Data : Part 1

    This article Theoretical Foundations of Big Data aims to provide an overview of theoretical models of the data analysis and administration.

  • How MySQL Used in Big Data Analysis

    How MySQL Used in Big Data Analysis? MySQL used with Hadoop, where output being stored on the MySQL server. MySQL itself can be used as a big data store.

  • How Install Apache Cassandra on Ubuntu (Single Cloud Server Instance)

    Here Are the Steps on How Install Apache Cassandra on Ubuntu Single Cloud Server Instance. Cassandra is a distributed wide column store NoSQL DBS.

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Online Dating: How to Find Your Match March 20, 2023
  • Web Design Cookbook: Logo March 19, 2023
  • How Starlink Internet Works March 17, 2023
  • The Importance of a Camera Tracking System in Virtual Production March 15, 2023
  • Understanding the Key Differences between Docker and OpenVZ March 14, 2023

About This Article

Cite this article as: Abhishek Ghosh, "Uses of Text Mining in Web Content Mining : Part I," in The Customize Windows, August 3, 2019, March 20, 2023, https://thecustomizewindows.com/2019/08/uses-of-text-mining-in-web-content-mining-part-i/.

Source:The Customize Windows, JiMA.in

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie SettingsAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT