• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here:Home » Uses of Text Mining in Web Content Mining : Part IV

By Abhishek Ghosh August 6, 2019 2:34 pm Updated on August 8, 2019

Uses of Text Mining in Web Content Mining : Part IV

Advertisement

In the third part of Uses of Text Mining in Web Content Mining, we informed that in this part we will discuss about the areas of application or tasks of text mining. Basically, the different methods should analyze texts and make the implicit information too explicit. Then form relations from the information in different texts, highlight and visualize them. The descriptions give an overview of the many technologies and their role in text mining.

 

Information extraction

 

Information extraction includes word processing to identify selected information, such as specific types of names or specified characteristics of events. For names, it is sufficient to find these in the text and to recognize its nature. For events, the critical information (people, objects, date, location, etc.) must be extracted and this information, from the text, passed into a given structure. In information extraction, this given structure is defined as a template.

The following example is a template for extracting information about company change of managers :

Advertisement

---

Vim
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
: =
DOC_NR: "NUMBER"
CONTENT: *
 
: =
SUCCESSION_ORG:
POST: "POSITION TITLE" | "no title"
IN_AND_OUT: +
VACANCY_REASON: {DEPART_WORKFORCE, REASSIGNMENT, NEW_POST_CREATED, OTH_UNK}
 
: =
IO_PERSON: NEW_STATUS: {IN, IN_ACTING, OUT, OUT_ACTING}
ON_THE_JOB: {YES, NO, UNCLEAR}
OTHER_ORG:
REL_OTHER_ORG: {SAME_ORG, RELATED_ORG, OUTSIDE_ORG}
 
: =
ORG_NAME: "NAME"
ORG_ALIAS: "ALIAS" *
ORG_DESCRIPTOR: "DESCRIPTOR"
ORG_TYPE: {GOVERNMENT, COMPANY, OTHER}
ORG_LOCALE: LOCALE-STRING {{LOC_TYPE}} *
ORG_COUNTRY: NORMALIZED-COUNTRY-OR-REGION |
COUNTRY-OR-REGION STRING *
 
: =
PER_NAME: "NAME"
PER_ALIAS: "ALIAS" *
PER_TITLE: "TITLE" *
 
LOC_TYPE :: {CITY, PROVINCE, COUNTRY, REGION, UNK}

Information extraction systems evaluate by the dimensions precision and completeness, both are summarized under the so-called “F-measure”. Since the systems do not require complete text comprehension and a complete grammatical analysis, they achieve high precision and completeness. Information extraction is always designed for a specific need for information and systematically analyzes texts according to predefined data, phrases and text segments.

Uses of Text Mining in Web Content Mining

 

Topic Detection and Tracking

 

Topic Detection and Tracking (TDT) refers to automatic techniques for searching thematically related material in data streams. Techniques that can be very valuable in various applications where efficient and timely access to information is essential.

One use of TDT is the free notification service which automatically sends emails to the user, if there are new results on the Internet for the corresponding search term.

In addition to the use of TDT in news sources, there are many more areas that are used in the industry. In science, it can be used to ensure that the latest references and publications are always available in a particular area of ​​research. Likewise, it can be used on the stock market to always provide stock traders with the latest information and news about a company so that they can reconsider their investments or adjust accordingly. Or, a company uses a TDT system to monitor itself, its competitors and the products on the market.

 

 

Automatic summary of the text (Summarization)

 

The term “text summary” generally means the following definition: A summary is a text that produces from one or more texts, that contains a significant portion of the information in the original text (s), and that is no longer than half of the original text (s) . Hence the main task of the abstract is to reproduce the key messages of the text with a reduced number of words.

There are two different approaches to creating summaries: extraction and abstraction. In the case of the sentence extraction method, an assessment of a combination of statistical heuristics assigns each sentence an individual score. The highest rankings are considered the most prominent and are extracted to become part of the summary. The abstraction approach involves the simplification and compression of text. Here is a prerequisite that an understanding of the topics exists and the ability to rewrite the text. Considering these requirements, it becomes clear that abstraction is much more difficult to program than extraction, which makes extraction much more common in automated text summarization.

An essential area of ​​application of automatic text summarization is evident in the field of search engines. These internet search engines allow the user to browse countless web pages for specific content, but presenting the results to the user is a problem. At this point, automatic text summarization systems can play an important role by summarizing the results so that the user can assess the relevance of the hits more quickly. The world’s most popular search engine Google has been using such a technique for simplified representation of the results found for several years.

Since dynamic web applications make individual documents more and more unrecognizable, in today’s IT world the algorithms used also require the ability to process multiple text documents.

A simple example of automatic text summary is old Microsoft Word’s “AutoSummary” feature. The user could choose the percentage of the total text to extract for the summary. (The feature has been removed from Word 2010.) Another example would be when researchers, medical staff, or companies get thousands of documents relevant to them.

The possibility of website summary for smaller devices is been discussed. Websites are designed for large monitors, but they are not very reader-friendly on the smartphones. The automatic summarization should be a conversion into a meaningful, easy to read and above all searchable format. Today, with the proliferation of smartphones, it’s more likely that mobile devices will create stand-alone, more compressed websites.

 

Categorization

 

The goal of text categorization is to classify documents into a fixed number of predefined categories. Each document can be assigned in multiple, exactly one, or no category. The documents to be classified can be texts, pictures, music or anything else. Each of these documents has its own classification challenges. Subsequently, the documents can be classified according to their topics or other characteristics (author, year, type, etc.).

To categorize a document, it is considered to be just a collection of words and not the same process as, for example, information extraction. Rather, only the words are counted for categorization, and the numbers identify the main topics of the document. For this purpose, a thesaurus is often used for given topics and the relationships are determined by searching for sub-terms, synonyms and related terms.

As with automatic text summary, Topic Detection and Tracking can be used to further specify the relevance of a document to the information you are looking for. For example, many companies offer customer support or need to answer individual customer questions. However, when they have their documents categorized by a system, the end user is able to get the information they need much more quickly. With today’s search engines, automatic text categorization has become indispensable, as new documents are published much too quickly on the Internet, or old ones are already removed. For this reason, the technique is used in the popular search engines to always provide an up-to-date and closely linked set of results.

 

Clustering

 

Clustering is a technique to group similar documents together, yet the groups must be as different as possible. Nonetheless, it differs from the categorization. Because here the documents are processed immediately and not by the specification of given topics. Another advantage of this is that the document can appear in different groups, ensuring that a relevant document is not omitted from the search results. A basic clustering algorithm creates a vector of topics for each document and evaluates how well this document fits in the different clusters.

Clustering algorithms group a set of documents into subsets or clusters. The algorithms’ goal is to create clusters that are coherent internally, but clearly different from each other. In other words, documents within a cluster should be as similar as possible; and documents in one cluster should be as dissimilar as possible from documents in other clusters. If, for example, you were to search for the term “cell”, the search results also contain entries from the categories “biology”, “battery” and “prison”.

In the fifth and part part, we have discussed about Concept Linkage, Information Visualizing, Question-Answer-Systems and draw conclusion.

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Uses of Text Mining in Web Content Mining : Part IV

  • Uses of Text Mining in Web Content Mining : Part I

    This series will examine one of the discipline of knowledge discovery, that is Text Mining, and present the application possibilities of Web Content Mining.

  • Uses of Text Mining in Web Content Mining : Part III

    This article is continuation of second part of Text Mining in Web Content Mining. The request by the users has already been filtered by information retrieval using the previously mentioned methods. But despite the filtering, the user is still faced with a gigantic number of relevant documents. The effort to read and edit all documents […]

  • Uses of Text Mining in Web Content Mining : Part II

    This articles assumes that the reader has read the first part of Text Mining in Web Content Mining. In the light of the methodology of Web Content Mining as second part of the series on Text Mining in Web Content Mining, two processes as well as the technology for this purpose will be explained in […]

  • Uses of Text Mining in Web Content Mining : Part V (END)

    This is The Final Part of Uses of Text Mining in Web Content Mining. It can be stated that text mining is a relatively young field.

performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • What is the Push and Pull Model (Marketing/Sales)?December 7, 2023
  • Legal and Moral Aspects of Data MiningDecember 6, 2023
  • How Decision Support System WorksDecember 6, 2023
  • Types of Classification MethodsDecember 5, 2023
  • Time Series Analysis: OverviewDecember 5, 2023
PC users can consult Corrine Chorney for Security.

Want to know more about us?

Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy