• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » What Apache Kafka Can Do?

By Abhishek Ghosh May 4, 2020 8:33 pm Updated on May 4, 2020

What Apache Kafka Can Do?

Advertisement

In increasingly complex application landscapes, the handling of data flows is becoming increasingly difficult. Read how Apache Kafka big data technology can help. Linking data from different systems is at the top of the to-do list in application handling. There are many solutions for this, each with advantages and disadvantages. Apache Kafka promises to solve the problem in the way that users have always wanted. Apache Kafka is currently used by more and more companies. This is not surprising since for data distribution tasks it forms a solid basis for any data integration in addition to Apache Avro as a payload format – whether at table level, at the business object level or for real-time replication. From Apache Kafka’s perspective, these are just different use cases. Even complex transformations can be flanged in a variety of ways, from conventional ETL tools to stream processing tools.

 

Big data tools that reach their limits

 

Many technologies for data integration have been invented. For example ETL (Extract-Transform-Load) tools, whose focus was primarily on transforming data. These are usually powerful tools, but they only worked well as batch processing. The promise of salvation of the EII (Enterprise Information Integration) tools was to make this process easier by not having to copy the data, but rather by linking them together at runtime.

Enterprise Application Integration (EAI) has another focus: tables should not be the basis, but application objects. After all, the key information is that, for example, a new customer master record has been created and not that five tables have changed. Realtime replication can only copy data, but with very low latency. If you look at all of these approaches from a technical perspective, you constantly come across physical limits and contradictions.

Advertisement

---

 

The right big data technology?

 

The current situation with various approaches and tools can be seen nicely at various companies. The software manufacturers provide various ETL tools on the market (see Gartner Magic Quadrant for Data Integration). The EII approach, i.e. data federation, is implemented in many of the business intelligence tools.

At an abstract level, an application object represents a logical data model, and the tables are a possible physical implementation. A much more direct representation would be storage in JSON or XML or an even more suitable format. The main thing is that it allows a hierarchical structure of the data. The above example with the customer base, this object should contain customer names, the various addresses of the customer, possible contact information and the like. If the data integration perfectly supports this format, there is no longer any reason to build a tool for data or application integration. A table is then just a particularly trivial, flat, business object. Such non-relational structures are even common in the big data world. Apache Avro is a suitable format for this because it includes the format definition, efficient storage and fast processing. The second basic decision revolves around the question of how many customers there are for the respective data. However, the situation has never been so easy in real life. Also, today’s IT landscapes are becoming increasingly complex. It then makes sense that, for example, ten target systems torment the source system every few seconds with the question: “What has changed?”. This only creates a high base load on the ERP system. In such a landscape, it would be much more clever if there were a central distributor. Then the ERP system only has to pass the data on to service and all data consumers get the data from there.

This approach has also been tried several times. Remember the SOA movement (Service Oriented Architecture) or IBM MQ as the best-known representative from the Enterprise Message Bus category. These solutions were all not bad but were mostly overwhelmed by the real requirements. Simple things, but they still prevented widespread use. I would like to mention two points in particular: the payload format and the handling of the queues.

If you couple two systems, you have to agree on an interface definition. What does a customer master record look like exactly – fields, data types, permitted values, etc.? Any change in the interface means that both the transmitter and the receiver must be updated synchronously. This is possible with two systems – but if many systems are involved, it quickly becomes confusing and practically impossible.

You need a technique that on the one hand prescribes a fixed structure definition and on the other hand, allows a certain tolerance. The classic way to do this is through different versions of the APIs. But that is also easier and is constantly being done with databases: What happens if you add a column to a table? With clean programming, nothing – all read accesses continue to work. In the big data world, this approach has been developed further and is called the evolution scheme. Apache Avro supports this very nicely.

What Apache Kafka Can Do

 

Apache Kafka brings the advantages

 

In the past, a publish/subscribe model was mostly used for messaging. A change will be sent. Everyone who has registered as a data recipient will receive the corresponding data. Sounds obvious at first, but does not meet real requirements. As a consumer, we want to decide which data we want and when we get it. In many cases, it will be immediate – real-time – but there are other scenarios:

  1. The data warehouse only reads once a day.
  2. In the event of an error in the processing logic, the last few hours must be processed again.
  3. During development, you want to get the same data over and over again during testing.

This is exactly the advantage of Apache Kafka. It does all of this without creating other drawbacks and in a way that is simple, convenient and fast. From the logical principle, all change data are appended to the back of a log file. Every consumer says where to read from and Kafka transfers this data accordingly. The connection is left open, however, so that new data records are submitted with a latency in the millisecond range. The use-cases include :

  • Kafka Messaging
  • Website Activity Tracking
  • Kafka Log Aggregation
  • Stream Processing
  • Kafka Event Sourcing
  • Commit Log

This kills two birds with one stone: the consumer gets the data he requested – it is in control – and he gets the data in real-time. At least until he closes the connection himself. Furthermore, several instances of the same consumer can run in parallel. Kafka takes care of load balancing. If another consumer instance is started or an existing instance stops responding, Kafka automatically handles each of these situations. This makes programming simple and robust. So if you have to deal with data integration, pay attention to how your architecture and the tools used to harmonize with Apache Kafka.

Tagged With apache kafka , open etl file

This Article Has Been Shared 791 Times!

Facebook Twitter Pinterest
Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to What Apache Kafka Can Do?

  • Installing Local Data Lake on Ubuntu Server : Part 1

    Here is Part 1 of Installing Local Data Lake on Ubuntu Server With Hadoop, Spark, Thriftserver, Jupyter etc To Build a Prediction System.

  • Chart, Data Visualization in WordPress Posts From SQL & SQL Queries

    Displaying SQL result data may be a need. Here is How to Get Chart, Data Visualization in WordPress Posts From SQL Queries in Easy Way.

  • How To Install Apache Maven on Ubuntu Server

    Apache Maven is a Build Automation Tool. Here Are the Steps on How To Install Apache Maven on Ubuntu Server. Maven Needed For Many Big Data Software.

  • What is Data Literacy And Why Data Literacy Matters

    What is Data Literacy? Data literacy is the ability to understand and utilize data effectively in the context of data collection, data sharing & analysis.

  • Influence of Digitalization on Marketing : Part III

    In the previous part of this series, under the sub-header “Marketing strategies”, we have discussed about Customer requirement and Customer acquisition. For many companies, online marketing is the most used medium for customer acquisition. The other points to discuss under the sub-header “Marketing strategies” are Customer loyalty and The challenge for marketing. “Digitization in the […]

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (22.1K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • How Telecoms Can Use The Cloud To Power Their 5G Network June 24, 2022
  • A Beginner Guide to Cloud Computing for Development June 22, 2022
  • 5 Benefits of Using a Virtual Data Room Today June 19, 2022
  • Top System Administration Courses 2022 June 18, 2022
  • The Best Business VPNs for 2022 June 17, 2022

About This Article

Cite this article as: Abhishek Ghosh, "What Apache Kafka Can Do?," in The Customize Windows, May 4, 2020, June 25, 2022, https://thecustomizewindows.com/2020/05/what-apache-kafka-can-do/.

Source:The Customize Windows, JiMA.in

This website uses cookies. If you do not want to allow us to use cookies and/or non-personalized Ads, kindly clear browser cookies after closing this webpage.

Read Privacy Policy.

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2022 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy