• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » What Apache Kafka Can Do?

By Abhishek Ghosh May 4, 2020 8:33 pm Updated on May 4, 2020

What Apache Kafka Can Do?

Advertisement

In increasingly complex application landscapes, the handling of data flows is becoming increasingly difficult. Read how Apache Kafka big data technology can help. Linking data from different systems is at the top of the to-do list in application handling. There are many solutions for this, each with advantages and disadvantages. Apache Kafka promises to solve the problem in the way that users have always wanted. Apache Kafka is currently used by more and more companies. This is not surprising since for data distribution tasks it forms a solid basis for any data integration in addition to Apache Avro as a payload format – whether at table level, at the business object level or for real-time replication. From Apache Kafka’s perspective, these are just different use cases. Even complex transformations can be flanged in a variety of ways, from conventional ETL tools to stream processing tools.

 

Big data tools that reach their limits

 

Many technologies for data integration have been invented. For example ETL (Extract-Transform-Load) tools, whose focus was primarily on transforming data. These are usually powerful tools, but they only worked well as batch processing. The promise of salvation of the EII (Enterprise Information Integration) tools was to make this process easier by not having to copy the data, but rather by linking them together at runtime.

Enterprise Application Integration (EAI) has another focus: tables should not be the basis, but application objects. After all, the key information is that, for example, a new customer master record has been created and not that five tables have changed. Realtime replication can only copy data, but with very low latency. If you look at all of these approaches from a technical perspective, you constantly come across physical limits and contradictions.

Advertisement

---

 

The right big data technology?

 

The current situation with various approaches and tools can be seen nicely at various companies. The software manufacturers provide various ETL tools on the market (see Gartner Magic Quadrant for Data Integration). The EII approach, i.e. data federation, is implemented in many of the business intelligence tools.

At an abstract level, an application object represents a logical data model, and the tables are a possible physical implementation. A much more direct representation would be storage in JSON or XML or an even more suitable format. The main thing is that it allows a hierarchical structure of the data. The above example with the customer base, this object should contain customer names, the various addresses of the customer, possible contact information and the like. If the data integration perfectly supports this format, there is no longer any reason to build a tool for data or application integration. A table is then just a particularly trivial, flat, business object. Such non-relational structures are even common in the big data world. Apache Avro is a suitable format for this because it includes the format definition, efficient storage and fast processing. The second basic decision revolves around the question of how many customers there are for the respective data. However, the situation has never been so easy in real life. Also, today’s IT landscapes are becoming increasingly complex. It then makes sense that, for example, ten target systems torment the source system every few seconds with the question: “What has changed?”. This only creates a high base load on the ERP system. In such a landscape, it would be much more clever if there were a central distributor. Then the ERP system only has to pass the data on to service and all data consumers get the data from there.

This approach has also been tried several times. Remember the SOA movement (Service Oriented Architecture) or IBM MQ as the best-known representative from the Enterprise Message Bus category. These solutions were all not bad but were mostly overwhelmed by the real requirements. Simple things, but they still prevented widespread use. I would like to mention two points in particular: the payload format and the handling of the queues.

If you couple two systems, you have to agree on an interface definition. What does a customer master record look like exactly – fields, data types, permitted values, etc.? Any change in the interface means that both the transmitter and the receiver must be updated synchronously. This is possible with two systems – but if many systems are involved, it quickly becomes confusing and practically impossible.

You need a technique that on the one hand prescribes a fixed structure definition and on the other hand, allows a certain tolerance. The classic way to do this is through different versions of the APIs. But that is also easier and is constantly being done with databases: What happens if you add a column to a table? With clean programming, nothing – all read accesses continue to work. In the big data world, this approach has been developed further and is called the evolution scheme. Apache Avro supports this very nicely.

What Apache Kafka Can Do

 

Apache Kafka brings the advantages

 

In the past, a publish/subscribe model was mostly used for messaging. A change will be sent. Everyone who has registered as a data recipient will receive the corresponding data. Sounds obvious at first, but does not meet real requirements. As a consumer, we want to decide which data we want and when we get it. In many cases, it will be immediate – real-time – but there are other scenarios:

  1. The data warehouse only reads once a day.
  2. In the event of an error in the processing logic, the last few hours must be processed again.
  3. During development, you want to get the same data over and over again during testing.

This is exactly the advantage of Apache Kafka. It does all of this without creating other drawbacks and in a way that is simple, convenient and fast. From the logical principle, all change data are appended to the back of a log file. Every consumer says where to read from and Kafka transfers this data accordingly. The connection is left open, however, so that new data records are submitted with a latency in the millisecond range. The use-cases include :

  • Kafka Messaging
  • Website Activity Tracking
  • Kafka Log Aggregation
  • Stream Processing
  • Kafka Event Sourcing
  • Commit Log

This kills two birds with one stone: the consumer gets the data he requested – it is in control – and he gets the data in real-time. At least until he closes the connection himself. Furthermore, several instances of the same consumer can run in parallel. Kafka takes care of load balancing. If another consumer instance is started or an existing instance stops responding, Kafka automatically handles each of these situations. This makes programming simple and robust. So if you have to deal with data integration, pay attention to how your architecture and the tools used to harmonize with Apache Kafka.

Tagged With apache kafka , open etl file

This Article Has Been Shared 135 Times!

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to What Apache Kafka Can Do?

  • Installing Local Data Lake on Ubuntu Server : Part 1

    Here is Part 1 of Installing Local Data Lake on Ubuntu Server With Hadoop, Spark, Thriftserver, Jupyter etc To Build a Prediction System.

  • Chart, Data Visualization in WordPress Posts From SQL & SQL Queries

    Displaying SQL result data may be a need. Here is How to Get Chart, Data Visualization in WordPress Posts From SQL Queries in Easy Way.

  • How To Install Apache Maven on Ubuntu Server

    Apache Maven is a Build Automation Tool. Here Are the Steps on How To Install Apache Maven on Ubuntu Server. Maven Needed For Many Big Data Software.

  • What is Data Literacy And Why Data Literacy Matters

    What is Data Literacy? Data literacy is the ability to understand and utilize data effectively in the context of data collection, data sharing & analysis.

  • Influence of Digitalization on Marketing : Part III

    In the previous part of this series, under the sub-header “Marketing strategies”, we have discussed about Customer requirement and Customer acquisition. For many companies, online marketing is the most used medium for customer acquisition. The other points to discuss under the sub-header “Marketing strategies” are Customer loyalty and The challenge for marketing. “Digitization in the […]

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Projector Screen Basics February 6, 2023
  • What is Configuration Management February 5, 2023
  • What is ChatGPT? February 3, 2023
  • Zebronics Pixaplay 16 : Entry Level Movie Projector Review February 2, 2023
  • What is Voice User Interface (VUI) January 31, 2023

About This Article

Cite this article as: Abhishek Ghosh, "What Apache Kafka Can Do?," in The Customize Windows, May 4, 2020, February 6, 2023, https://thecustomizewindows.com/2020/05/what-apache-kafka-can-do/.

Source:The Customize Windows, JiMA.in

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie SettingsAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT