• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » Uses of Text Mining in Web Content Mining : Part IV

By Abhishek Ghosh August 6, 2019 2:34 pm Updated on August 8, 2019

Uses of Text Mining in Web Content Mining : Part IV

Advertisement

In the third part of Uses of Text Mining in Web Content Mining, we informed that in this part we will discuss about the areas of application or tasks of text mining. Basically, the different methods should analyze texts and make the implicit information too explicit. Then form relations from the information in different texts, highlight and visualize them. The descriptions give an overview of the many technologies and their role in text mining.

 

Information extraction

 

Information extraction includes word processing to identify selected information, such as specific types of names or specified characteristics of events. For names, it is sufficient to find these in the text and to recognize its nature. For events, the critical information (people, objects, date, location, etc.) must be extracted and this information, from the text, passed into a given structure. In information extraction, this given structure is defined as a template.

The following example is a template for extracting information about company change of managers :

Advertisement

---

Vim
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
: =
DOC_NR: "NUMBER"
CONTENT: *
 
: =
SUCCESSION_ORG:
POST: "POSITION TITLE" | "no title"
IN_AND_OUT: +
VACANCY_REASON: {DEPART_WORKFORCE, REASSIGNMENT, NEW_POST_CREATED, OTH_UNK}
 
: =
IO_PERSON: NEW_STATUS: {IN, IN_ACTING, OUT, OUT_ACTING}
ON_THE_JOB: {YES, NO, UNCLEAR}
OTHER_ORG:
REL_OTHER_ORG: {SAME_ORG, RELATED_ORG, OUTSIDE_ORG}
 
: =
ORG_NAME: "NAME"
ORG_ALIAS: "ALIAS" *
ORG_DESCRIPTOR: "DESCRIPTOR"
ORG_TYPE: {GOVERNMENT, COMPANY, OTHER}
ORG_LOCALE: LOCALE-STRING {{LOC_TYPE}} *
ORG_COUNTRY: NORMALIZED-COUNTRY-OR-REGION |
COUNTRY-OR-REGION STRING *
 
: =
PER_NAME: "NAME"
PER_ALIAS: "ALIAS" *
PER_TITLE: "TITLE" *
 
LOC_TYPE :: {CITY, PROVINCE, COUNTRY, REGION, UNK}

Information extraction systems evaluate by the dimensions precision and completeness, both are summarized under the so-called “F-measure”. Since the systems do not require complete text comprehension and a complete grammatical analysis, they achieve high precision and completeness. Information extraction is always designed for a specific need for information and systematically analyzes texts according to predefined data, phrases and text segments.

Uses of Text Mining in Web Content Mining

 

Topic Detection and Tracking

 

Topic Detection and Tracking (TDT) refers to automatic techniques for searching thematically related material in data streams. Techniques that can be very valuable in various applications where efficient and timely access to information is essential.

One use of TDT is the free notification service which automatically sends emails to the user, if there are new results on the Internet for the corresponding search term.

In addition to the use of TDT in news sources, there are many more areas that are used in the industry. In science, it can be used to ensure that the latest references and publications are always available in a particular area of ​​research. Likewise, it can be used on the stock market to always provide stock traders with the latest information and news about a company so that they can reconsider their investments or adjust accordingly. Or, a company uses a TDT system to monitor itself, its competitors and the products on the market.

 

 

Automatic summary of the text (Summarization)

 

The term “text summary” generally means the following definition: A summary is a text that produces from one or more texts, that contains a significant portion of the information in the original text (s), and that is no longer than half of the original text (s) . Hence the main task of the abstract is to reproduce the key messages of the text with a reduced number of words.

There are two different approaches to creating summaries: extraction and abstraction. In the case of the sentence extraction method, an assessment of a combination of statistical heuristics assigns each sentence an individual score. The highest rankings are considered the most prominent and are extracted to become part of the summary. The abstraction approach involves the simplification and compression of text. Here is a prerequisite that an understanding of the topics exists and the ability to rewrite the text. Considering these requirements, it becomes clear that abstraction is much more difficult to program than extraction, which makes extraction much more common in automated text summarization.

An essential area of ​​application of automatic text summarization is evident in the field of search engines. These internet search engines allow the user to browse countless web pages for specific content, but presenting the results to the user is a problem. At this point, automatic text summarization systems can play an important role by summarizing the results so that the user can assess the relevance of the hits more quickly. The world’s most popular search engine Google has been using such a technique for simplified representation of the results found for several years.

Since dynamic web applications make individual documents more and more unrecognizable, in today’s IT world the algorithms used also require the ability to process multiple text documents.

A simple example of automatic text summary is old Microsoft Word’s “AutoSummary” feature. The user could choose the percentage of the total text to extract for the summary. (The feature has been removed from Word 2010.) Another example would be when researchers, medical staff, or companies get thousands of documents relevant to them.

The possibility of website summary for smaller devices is been discussed. Websites are designed for large monitors, but they are not very reader-friendly on the smartphones. The automatic summarization should be a conversion into a meaningful, easy to read and above all searchable format. Today, with the proliferation of smartphones, it’s more likely that mobile devices will create stand-alone, more compressed websites.

 

Categorization

 

The goal of text categorization is to classify documents into a fixed number of predefined categories. Each document can be assigned in multiple, exactly one, or no category. The documents to be classified can be texts, pictures, music or anything else. Each of these documents has its own classification challenges. Subsequently, the documents can be classified according to their topics or other characteristics (author, year, type, etc.).

To categorize a document, it is considered to be just a collection of words and not the same process as, for example, information extraction. Rather, only the words are counted for categorization, and the numbers identify the main topics of the document. For this purpose, a thesaurus is often used for given topics and the relationships are determined by searching for sub-terms, synonyms and related terms.

As with automatic text summary, Topic Detection and Tracking can be used to further specify the relevance of a document to the information you are looking for. For example, many companies offer customer support or need to answer individual customer questions. However, when they have their documents categorized by a system, the end user is able to get the information they need much more quickly. With today’s search engines, automatic text categorization has become indispensable, as new documents are published much too quickly on the Internet, or old ones are already removed. For this reason, the technique is used in the popular search engines to always provide an up-to-date and closely linked set of results.

 

Clustering

 

Clustering is a technique to group similar documents together, yet the groups must be as different as possible. Nonetheless, it differs from the categorization. Because here the documents are processed immediately and not by the specification of given topics. Another advantage of this is that the document can appear in different groups, ensuring that a relevant document is not omitted from the search results. A basic clustering algorithm creates a vector of topics for each document and evaluates how well this document fits in the different clusters.

Clustering algorithms group a set of documents into subsets or clusters. The algorithms’ goal is to create clusters that are coherent internally, but clearly different from each other. In other words, documents within a cluster should be as similar as possible; and documents in one cluster should be as dissimilar as possible from documents in other clusters. If, for example, you were to search for the term “cell”, the search results also contain entries from the categories “biology”, “battery” and “prison”.

In the fifth and part part, we have discussed about Concept Linkage, Information Visualizing, Question-Answer-Systems and draw conclusion.

This Article Has Been Shared 619 Times!

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Uses of Text Mining in Web Content Mining : Part IV

  • Installing Apache Airflow On Ubuntu, CentOS Cloud Server

    Airflow Authors, Schedules, Monitors Workflows. Here Are The Steps For Installing Apache Airflow On Ubuntu, CentOS Running On Cloud Server.

  • How To Install Hue on Ubuntu 16.04

    Hue is Query Tool With GUI For Browsing, Querying, Visualizing Data & Developing Apps for Hadoop. Here is How To Install Hue on Ubuntu 16.04.

  • Big Data Analytics Solutions: On-Premise versus in the Cloud

    Objective of this article Big Data analytics solutions on-premise versus in the cloud is not limited to comparing on-premise & in the cloud.

  • Getting Started with Microservices

    Here is a Brief Getting Started with Microservices Article in Plain English for the Readers Who are Not Sure What Microservices are.

  • Install Bokeh Python Visualization Library in Jupyter Notebooks

    With Bokeh You Can Create Interactive Tables and Charts. Here is How to Install Bokeh Python Visualization Library in Jupyter Notebooks.

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Zebronics Pixaplay 16 : Entry Level Movie Projector Review February 2, 2023
  • What is Voice User Interface (VUI) January 31, 2023
  • Proxy Server: Design Pattern in Programming January 30, 2023
  • Cyberpunk Aesthetics: What’s in it Special January 27, 2023
  • How to Do Electrical Layout Plan for Adding Smart Switches January 26, 2023

About This Article

Cite this article as: Abhishek Ghosh, "Uses of Text Mining in Web Content Mining : Part IV," in The Customize Windows, August 6, 2019, February 2, 2023, https://thecustomizewindows.com/2019/08/uses-of-text-mining-in-web-content-mining-part-iv/.

Source:The Customize Windows, JiMA.in

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie SettingsAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT