• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here:Home » Configure Apache Tika With WordPress to Search, Get Meta of PDF/Doc Files

By Abhishek Ghosh June 19, 2018 11:08 pm Updated on June 19, 2018

Configure Apache Tika With WordPress to Search, Get Meta of PDF/Doc Files

Advertisement

In our previously published article How to Install Apache Tika on Ubuntu Server, we learned basic about Apache Tika. Apache Tika Can Be Combined With PHP. Apache Tika can detect content, and extracts metadata and text from different file types – it can identify more than 1400 file types. Tika has relation with Apache Nutch codebase. Tika has fork in Python too. Tika has different way of implementation on server to integrate with various blogging platforms and CMS (including WordPress). Here is How to Configure Apache Tika With WordPress to Search, Get Meta of PDF/Doc/Excel/Text and Other Type of Files. This is another example of integration of Big Data tool with WordPress. Other examples of integration of Big Data tool with WordPress is combining search functions. We have article on Apache Solr vs. Elasticsearch For WordPress Search. Apache Nutch, Apache Tika practically are part of search, crawl and both for other purpose can be combined with Apache Solr. However, for using Apache Tika with WordPress, we do not need to go through Apache Solr – we want some function just within WordPress Admin.

Configure Apache Tika With WordPress to Search, Get Meta of PDF-Doc Files

 

How to Configure Apache Tika With WordPress

 

Difficult part for the new users was installing Apache Tika part, thinking of this article’s relatively new users; we written that Apache Tika installation guide slightly detailed. Essentially as first step one need to install that Apache Tika on same server WordPress is running. Obviously, Tika can be ran on separate server but configuring for separate server installation of Tika by new user may be difficult.

Apart from installing Apache Tika, WordPress will need two plugins to be installed. One is Search Everything :

Advertisement

---

Vim
1
https://wordpress.org/plugins/search-everything/

Second one is another WordPress plugin named Masala :

Vim
1
https://github.com/nanodust/masala

Masala means spice. Indian Masala are quite popular in America! Tikka means small piece of meat, fish etc. Together is Tikka Masala and whole earth is aware of what is cicken tikka butter masala. Apache projects deliberately named with various Sanskrit, Buddhist words to avoid copyright matters, make funny etc. Apache Tika is Tikka’s Tika – it is a delicious piece for Apache Solr.

Configure Apace Tika for your needed file types – check it whether can extract metadata on commandline. Thereafter install the plugin and check the source code of plugin. The plugin needs to install Tika’s jar somewhere on your server and assumes that you have Java installed on your server where WordPress running. Apache Tika’s jar file should be at project’s root folder and configure path in masala.php file. The plugin actually has not much detailed documentation.

When you upload content like a PDF or DOC, it will process the file after upload and insert metadata. You can
search the attachment’s metadata, obviously attachment will be listed in search results.

If you are using Apache Solr for WordPress search, itself metadata will be searchable, so as in most search engines.

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Configure Apache Tika With WordPress to Search, Get Meta of PDF/Doc Files

  • Nginx WordPress Installation Guide (All Steps)

    This is a Full Nginx WordPress Installation Guide With All the Steps, Including Some Optimization and Setup Which is Compatible With WordPress DOT ORG Example Settings For Nginx.

  • How to Install Apache Tika on Ubuntu Server

    Apache Tika is a Content Analysis Framework Which Can Be Configure With Web Software Like WordPress For Metadata Extraction of PDF, doc. Here is How to Install Apache Tika on Ubuntu Server.

  • Integrating Apache Nutch With Apache Solr on Ubuntu Server

    Integrating Apache Nutch With Apache Solr Will Offer a Web UI, Options to Visually Search and Use Extended Functions of Apache Nutch.

  • How To Install Apache Solr 6.x on Ubuntu 16.04

    Here is How To Install Apache Solr 6.x on Ubuntu 16.04. Apache Solr Search Platform Can Be Integrated With WordPress, CMS & Other Softwares.

performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • What is Analog-to-Digital Converter (ADC)September 27, 2023
  • Comparison of Tube Amplifiers and SemiconductorsSeptember 26, 2023
  • What is a Digital-to-Analog Converter (DAC)September 25, 2023
  • Tips on S Pen Air ActionsSeptember 24, 2023
  • Market Segmentation in BriefSeptember 20, 2023
PC users can consult Corrine Chorney for Security.

Want to know more about us?

Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy