• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here:Home » What is Web Mining

By Abhishek Ghosh December 3, 2023 5:44 pm Updated on December 3, 2023

What is Web Mining

Advertisement

Web mining is the transfer of data mining techniques for the (partially) automatic extraction of information from the Internet, especially the World Wide Web. Web mining adopts procedures and methods from the fields of information retrieval, machine learning, statistics, pattern recognition and data mining. Three objects of investigation can be distinguished:

  • The content (web content mining) – for example, using information retrieval methods.
  • The structure of linking (web structure mining) – for example, using webometry methods. In web structure mining, so-called hubs are used. There are good hubs that link to many valuable pages, and valuable pages that link to many hubs.
  • User behaviour (web usage mining) – for example, through the analysis of log files.

The term screen scraping generally encompasses all methods of reading text from computer screens. At present, however, the term is used almost exclusively in relation to web pages (hence web scraping or web harvesting). In this case, screen scraping specifically refers to the techniques used to obtain information by extracting the required data in a targeted manner. Search engines use crawlers to search the World Wide Web, analyze web pages, and collect data, such as web feeds or email addresses. Screen scraping techniques are also used in web mining.

A program for extracting data from web pages is also called a wrapper.

Advertisement

---

After the website has been downloaded, it is first important to extract the data whether the exact location of the data on the website is known (e.g. second table, third column). If this is the case, there are several ways to extract the data. On the one hand, you can interpret the downloaded web pages as character strings and extract the desired data with regular expressions, for example.

If the website is XHTML-compliant, it is a good idea to use an XML parser. There are numerous supporting techniques for accessing XML (SAX, DOM, XPath, XQuery). Often, however, the websites are only delivered in the (possibly even erroneous) HTML format, which does not comply with the XML standard. With a suitable parser, it may still be possible to produce an XML-compliant document. Alternatively, the HTML can be cleaned up with HTML Tidy before parsing. Some screen scrapers use a query language specifically designed for HTML.

One criterion for the quality of the extraction mechanisms is their robustness to changes to the structure of the website. This requires fault-tolerant extraction algorithms. In many cases, however, the structure of the website is unknown (e.g. when using crawlers). Data structures such as purchase price information or time information must then be recognized and interpreted even without fixed specifications.

What is Web Mining

 

Types of Web Mining

 

Web usage mining attempts to detect regularities in the use of websites or web resources. In doing so, all secondary data generated by the user’s interaction with a web resource is processed and analyzed. Web usage mining also includes, for example, the analysis of the customer journey.

Web structure mining attempts to identify the reference structure underlying a web page or domain. Based on the topology of the references (hyperlinks) of the web page, with an optional description of the same, a model is created. This can be useful for the categorization and ranking of a website and allows conclusions to be drawn about similarities between websites and their relationships to each other. For example, content-rich websites (so-called authorities) and overview-like websites (so-called hubs) could be found for a certain topic.

Web content mining deals with the detection of regularities in the content of a web resource. Web content mining is one area of application for text mining. Data on the web consists of unstructured data such as text documents, semi-structured data such as HTML documents, and more structured data such as tables or dynamically generated HTML pages. Basically, the content of a website consists of different types of data, such as texts, images, audio, video, metadata and hyperlinks. Web content mining of multiple data types is referred to as “multimedia data mining” and can be understood as part of web content mining. However, most of the web’s content consists of unstructured text. Text mining can be understood as a manifestation and overarching field of research of web content mining. The methods used are general data mining methods, whereby statistical and computational linguistic methods realize the transformation of the texts into an adequate form (for data mining).

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to What is Web Mining

  • Uses of Text Mining in Web Content Mining : Part I

    This series will examine one of the discipline of knowledge discovery, that is Text Mining, and present the application possibilities of Web Content Mining.

  • What Is Data Mining? Examples of Data Mining Software

    Data mining is the systematic application of statistical methods to large databases with the aim of identifying new patterns and trends.

  • Uses of Text Mining in Web Content Mining : Part III

    This article is continuation of second part of Text Mining in Web Content Mining. The request by the users has already been filtered by information retrieval using the previously mentioned methods. But despite the filtering, the user is still faced with a gigantic number of relevant documents. The effort to read and edit all documents […]

  • What is Text Mining?

    Text mining or textual data mining, is a bundle of algorithm-based analysis methods for the discovery of meaning structures from unstructured or weakly structured text data. Using statistical means, text mining software opens up structures from texts that are intended to enable users to quickly recognize core information of the processed texts. Ideally, text mining […]

performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

vpsdime

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Cloud-Powered Play: How Streaming Tech is Reshaping Online GamesSeptember 3, 2025
  • How to Use Transcribed Texts for MarketingAugust 14, 2025
  • nRF7002 DK vs ESP32 – A Technical Comparison for Wireless IoT DesignJune 18, 2025
  • Principles of Non-Invasive Blood Glucose Measurement By Near Infrared (NIR)June 11, 2025
  • Continuous Non-Invasive Blood Glucose Measurements: Present Situation (May 2025)May 23, 2025
PC users can consult Corrine Chorney for Security.

Want to know more about us?

Read Notability and Mentions & Our Setup.

Copyright © 2026 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy