• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » Uses of Text Mining in Web Content Mining : Part I

By Abhishek Ghosh August 3, 2019 6:52 pm Updated on August 4, 2019

Uses of Text Mining in Web Content Mining : Part I

Advertisement

English philosopher Francis Bacon said “Knowledge is power”. It is still as true today as it was almost 420 years ago. However, in the past, it has always been associated with luck or tremendous diligence to acquire this knowledge, so we are experiencing a true heyday in recent years due to technological advances. In the past, if you had to roll over tons of books or spend enormous amount of research, information technologies, which are gaining in importance in the course of globalization, offer us this knowledge on a silver platter. A major role in this context plays the computer. Initially used to solve complex computing tasks, billions of people worldwide use it today for all sorts of private and professional activities.

The resulting flood of data offers enormous potential. Since the early 1990s, efforts have been made to use software to examine these data to derive new insights that were previously unrecognizable. This process is referred to as knowledge discovery and is divided into different categories based on the type (eg, tables, text, etc.) and the source (Internet, intranet, etc.) of the data. Due to the increasing relevance of this topic, both for science and for the economy, this series of articles will examine one of these disciplines, that is Text Mining, and present the application possibilities in relation to the so-called Web Content Mining.

Uses of Text Mining in Web Content Mining Part I

 

Text Mining

 

The term text mining, first appeared in 1999 as part of a Pacific Asia Knowledge Discovery and Data Mining workshop. Although these works are already 15 or 19 years old, there is still no clear definition. Thus, there are a total of six different names for this research field after Mehler and Wolff, which differ depending on the task assignment :

Advertisement

---

  • Text Mining
  • Text Data Mining
  • Textual Data Mining
  • Text Knowledge Engineering
  • Knowledge discovery in texts
  • Kowledge Discovery in Textual Databases

Four perspectives on text mining can be derived from this variety of terms :

  1. The information retrieval perspective, which considers that text mining is merely a further development of information retrieval (IR).
  2. The data mining perspective, which see text mining as data mining on textual data.
  3. The methodological perspective, which defines text mining simply as a collection of methods for evaluating texts.
  4. The knowledge-oriented perspective, which aims to generate new insights and information from existing data.

Generally speaking, IR merely describes the approach to finding existing knowledge and not gaining new insights, as is commonly the case with text mining, which is why this perspective is no longer pursued today.

The intersection of the remaining perspectives is that text mining is seen as a discipline of Knowledge Discovery. This is based on the premise of gaining new insights from data or providing information that users did not previously know was contained in the processed data .

 

Delimitation of Data Mining

 

Text Mining can be classified as Knowledge Discovery (KD). Data mining and text mining are not nearly identical steps within the KD, we can say that text mining is an extension of data mining. For data mining requires a certain structure of the data, while text mining usually extends to weak and unstructured data.

While data mining usually consists of three phases (identification, preparation & function selection, distribution analysis), Text Mining extends this process by the process of filtering out special features from weak or unstructured data.

Web Mining

Web Mining describes the process of applying data mining techniques to the Internet (World Wide Web). From this it can be deduced that web mining, like data and text mining, is a building discipline of the KDD and thus can be used to obtain unknown patterns and new insights from data (on the internet).

Initially, the assumption was that the Internet was too unstructured to use data mining techniques was largely refuted. The so-called “labeling problem”, is the main problem in web mining. By nature, most data mining techniques require a kind of tagging of the data, such as whether a website is a homepage or not. The web mining process itself consists of four sub-processes :

  1. Resource discovery, i.e. obtaining data that is available either online or offline – compared to IR.
  2. The information selection & data pre-treatment, ie the pretreatment of the data found in step 1 by, for example, the removal of “stop words” or the like.
  3. Generalization, the step in which data mining techniques are applied to visualize patterns in the data found. It should be noted that human intervention plays an important role here, since the Web is an interactive medium.
  4. The analysis or the validation and interpretation of the results.

Web Mining is generally divided into three subcategories that represent the various parts of the Internet that can be examined using the various mining techniques  :

  • Web usage mining
  • Web Structure Mining
  • Web Content Mining

Web usage mining describes applying data mining techniques to web data in order to identify usage patterns and to adapt web applications to better suit users’ usage behavior. Web usage mining is divided into three phases :

  1. pretreatment
  2. pattern recognition
  3. pattern analysis

 

In the pretreatment phase, the existing data is prepared so that it can be further processed by mining techniques. First, the usage data, ie data about user and server sessions are treated. These provide information about which user visited, when, how often, which website or part of the page. The content, text, images, scripts, multimedia files, etc. are then converted into a usable format in order to be able to assign specific content to the user’s usage behavior. Subsequently, the structure of the visited pages is pretreated similar to the content.

In the pattern recognition phase, the mining techniques are used to bring previously unknown knowledge to light according to the approach of the KDD. A selection of the techniques used there are statistical analyzes, association rules, clustering, classification, continuous patterns or dependency modeling.

The pattern analysis analyzes the patterns found in Phase 2. Basically, attempts are made to filter out uninteresting patterns and to form relevant patterns, for example by means of SQL queries or OLAP operations, to new findings.

After all, there is not only a scientific interest in web usage mining – the economy has long since recognized the enormous potential of understanding customers better. Web Structure Mining describes the process of applying data mining techniques to the structure of Web data .

The aim of this approach is to gain information about the content based on the structure of web pages and to identify similarities within a collection of data. In general, two types of structures can be assumed :

  1. Intra-page structure, i.e. data that has a certain structure within a page, such as the arrangement of different HTML or XML tags within a web page, which is mainly used in the area of ​​web content mining.
  2. Inter-page structure, i.e. data that has a certain structure between several pages, such as hyperlinks that connect pages together.

The beginnings of web-structure mining can be found in the area of ​​social network analysis. There incoming and outgoing links are examined to recognize a pattern within the resulting hierarchy.

Another push in web structure mining was made on assumption that every document, even unstructured text in its own right, in connection with other texts on a similar topic, had similar structures. These two ideas lead to the realization that even unstructured data, as they frequently occur on the Internet, can be analyzed by examining hyperlinks and the use of labels (names). Based on the two types of structures described above, it can further be said that web usage mining is not always strictly separated from web content mining.

Web Content Mining

Web Content Mining focuses on capturing the content of Web pages and, based on this, either improving users’ information in the IR’s sense, or modeling the data using databases such as: Search engines can deliver more effective results. From this, two possible perspectives on web content mining can be deduced :

  1. agent-based or IR-view
  2. database view

The first variant uses intelligent search agents to search, organize and interpret relevant information based on domain characteristics and user profiles. It also uses agents that filter or categorize information using IR techniques, and, similar to web structure mining, examine link structures to create cluster hierarchies. The third subset of the IR View uses personalized web agents that can learn user preferences and discover sources of information.

The second variant uses either multi-level databases, which arrange the data according to the degree of structure and generalization or query systems, which summarize, for example, weakly structured data and from it a database of the found Can create information.

Continue reading the next part.

Tagged With how to read a webpage in text mining , purpose of text mining within the context of knowledge discovery , text mining from websites , text mining use case internet , uses of text mining , web search by text mining , what is text mining and web content mining , what used for Web content mining (Text Mining) ?A) clustering

This Article Has Been Shared 817 Times!

Facebook Twitter Pinterest
Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Uses of Text Mining in Web Content Mining : Part I

  • Install Apache Kafka on Ubuntu 16.04 : Single Cloud Server

    Here Are The Steps On How To Install Apache Kafka on Ubuntu 16.04 Running One Single Cloud Server Instance. Apache Kafka

  • Install Apache Zeppelin On Ubuntu 16.04

    Zeppelin Can Be Pre-Built Package Or Can Be Build From Source. Here Is How To Install Apache Zeppelin On Ubuntu 16.04 Building From Source.

  • Theoretical Foundations of Big Data : Part 1

    This article Theoretical Foundations of Big Data aims to provide an overview of theoretical models of the data analysis and administration.

  • How MySQL Used in Big Data Analysis

    How MySQL Used in Big Data Analysis? MySQL used with Hadoop, where output being stored on the MySQL server. MySQL itself can be used as a big data store.

  • How Install Apache Cassandra on Ubuntu (Single Cloud Server Instance)

    Here Are the Steps on How Install Apache Cassandra on Ubuntu Single Cloud Server Instance. Cassandra is a distributed wide column store NoSQL DBS.

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (22.1K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Safe Chargers for Samsung Galaxy S22 Ultra June 27, 2022
  • How Telecoms Can Use The Cloud To Power Their 5G Network June 24, 2022
  • A Beginner Guide to Cloud Computing for Development June 22, 2022
  • 5 Benefits of Using a Virtual Data Room Today June 19, 2022
  • Top System Administration Courses 2022 June 18, 2022

About This Article

Cite this article as: Abhishek Ghosh, "Uses of Text Mining in Web Content Mining : Part I," in The Customize Windows, August 3, 2019, June 28, 2022, https://thecustomizewindows.com/2019/08/uses-of-text-mining-in-web-content-mining-part-i/.

Source:The Customize Windows, JiMA.in

This website uses cookies. If you do not want to allow us to use cookies and/or non-personalized Ads, kindly clear browser cookies after closing this webpage.

Read Privacy Policy.

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2022 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy