• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here:Home » Uses of Text Mining in Web Content Mining : Part I

By Abhishek Ghosh August 3, 2019 6:52 pm Updated on August 4, 2019

Uses of Text Mining in Web Content Mining : Part I

Advertisement

English philosopher Francis Bacon said “Knowledge is power”. It is still as true today as it was almost 420 years ago. However, in the past, it has always been associated with luck or tremendous diligence to acquire this knowledge, so we are experiencing a true heyday in recent years due to technological advances. In the past, if you had to roll over tons of books or spend enormous amount of research, information technologies, which are gaining in importance in the course of globalization, offer us this knowledge on a silver platter. A major role in this context plays the computer. Initially used to solve complex computing tasks, billions of people worldwide use it today for all sorts of private and professional activities.

The resulting flood of data offers enormous potential. Since the early 1990s, efforts have been made to use software to examine these data to derive new insights that were previously unrecognizable. This process is referred to as knowledge discovery and is divided into different categories based on the type (eg, tables, text, etc.) and the source (Internet, intranet, etc.) of the data. Due to the increasing relevance of this topic, both for science and for the economy, this series of articles will examine one of these disciplines, that is Text Mining, and present the application possibilities in relation to the so-called Web Content Mining.

Uses of Text Mining in Web Content Mining Part I

 

Text Mining

 

The term text mining, first appeared in 1999 as part of a Pacific Asia Knowledge Discovery and Data Mining workshop. Although these works are already 15 or 19 years old, there is still no clear definition. Thus, there are a total of six different names for this research field after Mehler and Wolff, which differ depending on the task assignment :

Advertisement

---

  • Text Mining
  • Text Data Mining
  • Textual Data Mining
  • Text Knowledge Engineering
  • Knowledge discovery in texts
  • Kowledge Discovery in Textual Databases

Four perspectives on text mining can be derived from this variety of terms :

  1. The information retrieval perspective, which considers that text mining is merely a further development of information retrieval (IR).
  2. The data mining perspective, which see text mining as data mining on textual data.
  3. The methodological perspective, which defines text mining simply as a collection of methods for evaluating texts.
  4. The knowledge-oriented perspective, which aims to generate new insights and information from existing data.

Generally speaking, IR merely describes the approach to finding existing knowledge and not gaining new insights, as is commonly the case with text mining, which is why this perspective is no longer pursued today.

The intersection of the remaining perspectives is that text mining is seen as a discipline of Knowledge Discovery. This is based on the premise of gaining new insights from data or providing information that users did not previously know was contained in the processed data .

 

Delimitation of Data Mining

 

Text Mining can be classified as Knowledge Discovery (KD). Data mining and text mining are not nearly identical steps within the KD, we can say that text mining is an extension of data mining. For data mining requires a certain structure of the data, while text mining usually extends to weak and unstructured data.

While data mining usually consists of three phases (identification, preparation & function selection, distribution analysis), Text Mining extends this process by the process of filtering out special features from weak or unstructured data.

Web Mining

Web Mining describes the process of applying data mining techniques to the Internet (World Wide Web). From this it can be deduced that web mining, like data and text mining, is a building discipline of the KDD and thus can be used to obtain unknown patterns and new insights from data (on the internet).

Initially, the assumption was that the Internet was too unstructured to use data mining techniques was largely refuted. The so-called “labeling problem”, is the main problem in web mining. By nature, most data mining techniques require a kind of tagging of the data, such as whether a website is a homepage or not. The web mining process itself consists of four sub-processes :

  1. Resource discovery, i.e. obtaining data that is available either online or offline – compared to IR.
  2. The information selection & data pre-treatment, ie the pretreatment of the data found in step 1 by, for example, the removal of “stop words” or the like.
  3. Generalization, the step in which data mining techniques are applied to visualize patterns in the data found. It should be noted that human intervention plays an important role here, since the Web is an interactive medium.
  4. The analysis or the validation and interpretation of the results.

Web Mining is generally divided into three subcategories that represent the various parts of the Internet that can be examined using the various mining techniques  :

  • Web usage mining
  • Web Structure Mining
  • Web Content Mining

Web usage mining describes applying data mining techniques to web data in order to identify usage patterns and to adapt web applications to better suit users’ usage behavior. Web usage mining is divided into three phases :

  1. pretreatment
  2. pattern recognition
  3. pattern analysis

 

In the pretreatment phase, the existing data is prepared so that it can be further processed by mining techniques. First, the usage data, ie data about user and server sessions are treated. These provide information about which user visited, when, how often, which website or part of the page. The content, text, images, scripts, multimedia files, etc. are then converted into a usable format in order to be able to assign specific content to the user’s usage behavior. Subsequently, the structure of the visited pages is pretreated similar to the content.

In the pattern recognition phase, the mining techniques are used to bring previously unknown knowledge to light according to the approach of the KDD. A selection of the techniques used there are statistical analyzes, association rules, clustering, classification, continuous patterns or dependency modeling.

The pattern analysis analyzes the patterns found in Phase 2. Basically, attempts are made to filter out uninteresting patterns and to form relevant patterns, for example by means of SQL queries or OLAP operations, to new findings.

After all, there is not only a scientific interest in web usage mining – the economy has long since recognized the enormous potential of understanding customers better. Web Structure Mining describes the process of applying data mining techniques to the structure of Web data .

The aim of this approach is to gain information about the content based on the structure of web pages and to identify similarities within a collection of data. In general, two types of structures can be assumed :

  1. Intra-page structure, i.e. data that has a certain structure within a page, such as the arrangement of different HTML or XML tags within a web page, which is mainly used in the area of ​​web content mining.
  2. Inter-page structure, i.e. data that has a certain structure between several pages, such as hyperlinks that connect pages together.

The beginnings of web-structure mining can be found in the area of ​​social network analysis. There incoming and outgoing links are examined to recognize a pattern within the resulting hierarchy.

Another push in web structure mining was made on assumption that every document, even unstructured text in its own right, in connection with other texts on a similar topic, had similar structures. These two ideas lead to the realization that even unstructured data, as they frequently occur on the Internet, can be analyzed by examining hyperlinks and the use of labels (names). Based on the two types of structures described above, it can further be said that web usage mining is not always strictly separated from web content mining.

Web Content Mining

Web Content Mining focuses on capturing the content of Web pages and, based on this, either improving users’ information in the IR’s sense, or modeling the data using databases such as: Search engines can deliver more effective results. From this, two possible perspectives on web content mining can be deduced :

  1. agent-based or IR-view
  2. database view

The first variant uses intelligent search agents to search, organize and interpret relevant information based on domain characteristics and user profiles. It also uses agents that filter or categorize information using IR techniques, and, similar to web structure mining, examine link structures to create cluster hierarchies. The third subset of the IR View uses personalized web agents that can learn user preferences and discover sources of information.

The second variant uses either multi-level databases, which arrange the data according to the degree of structure and generalization or query systems, which summarize, for example, weakly structured data and from it a database of the found Can create information.

Continue reading the next part.

Tagged With how to read a webpage in text mining , purpose of text mining within the context of knowledge discovery , text mining from websites , text mining use case internet , uses of text mining , web search by text mining , what is text mining and web content mining , what used for Web content mining (Text Mining) ?A) clustering
Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Uses of Text Mining in Web Content Mining : Part I

  • Knowledge Discovery in Databases : Part II

    In Part I of Knowledge Discovery in Databases, we discussed about the database systems, fundamentals of statistics and Big Data and fundamentals of knowledge discovery in databases. In this second part of Knowledge Discovery in Databases, we will discuss the process of the Knowledge Discovery in Databases and Methods of the Knowledge Discovery in Databases. […]

  • Uses of Text Mining in Web Content Mining : Part III

    This article is continuation of second part of Text Mining in Web Content Mining. The request by the users has already been filtered by information retrieval using the previously mentioned methods. But despite the filtering, the user is still faced with a gigantic number of relevant documents. The effort to read and edit all documents […]

  • What is Text Mining?

    Text mining or textual data mining, is a bundle of algorithm-based analysis methods for the discovery of meaning structures from unstructured or weakly structured text data. Using statistical means, text mining software opens up structures from texts that are intended to enable users to quickly recognize core information of the processed texts. Ideally, text mining […]

  • What Is Data Mining? Examples of Data Mining Software

    Data mining is the systematic application of statistical methods to large databases with the aim of identifying new patterns and trends.

performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Market Segmentation in BriefSeptember 20, 2023
  • What is Booting?September 18, 2023
  • What is ncurses?September 16, 2023
  • What is JTAG in Electronics?September 15, 2023
  • iPhone 15 Pro Max Vs Samsung Galaxy S22/S23 UltraSeptember 14, 2023
PC users can consult Corrine Chorney for Security.

Want to know more about us?

Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy