Do I Need Messages or Topics? A Comparison Between SQS and Kafka

Posted

You’re in the process of designing an application and need to establish a communication layer between two independent services within the application stack. You've heard of AWS Simple Queue Service (“SQS”) and Apache Kafka, both of which seem to meet the demands of your application. How would you go about making an optimal choice between these 2 technologies?

What is SQS and Kafka?

SQS is an AWS service designed for facilitating the exchange of data using messages: applications can receive, send, and delete individual messages through queues created within the service. Launched in 2004, SQS became widely accessible to the public in 2006. As a managed service, SQS requires an AWS account, and you won’t have to provision hardware or patch software to use it. Furthermore, SQS supports a pay-as-you-go pricing model based on API calls.

Apache Kafka is an open-source platform that allows developers to stream events to topics and deliver them to consumers. Since its release in 2011, an impressive list of corporations use Apache Kafka for message and event processing, including Cloudflare, Linkedin, Netflix, and Paypal. Kafka supports cloud-native, hybrid, and on-premises deployments.

A Quick Comparison

The following represents a high level view of how Apache Kafka and SQS compare: 

What’s Right for Me?

When designing an asynchronous software solution, one of the first questions you should ask yourself is whether or not your platform requires in-order processing and if messages can be received more than once. If you can tolerate messages that are potentially delivered out of order, or potentially delivered more than once, an SQS standard queue could work. On the other hand, Kafka will natively provide messages on a topic in order, provided that the partition size is set to 1. With that in mind, how would you know which technology to choose?

If your system needs to process an individual nightly report for 5,000 customers. The order that the reports process takes doesn't matter to either the customers or the application platform. What platform might you use? While you could use both SQS or Kafka, an SQS queue provides the simplest way of partitioning the work into individual tasks.  

 

When it comes to streaming logs from an application service, Kafka appears to fulfill this use case best with its streaming semantics. While an SQS standard queue could potentially be used, SQS could potentially deliver duplicate messages or deliver them out of order. While a FIFO queue could be used to address these issues, it would increase costs and be limited by a maximum throughput of 300 TPS.

How Much Does SQS and Kafka Cost?

The next question to ask is how much monetary and human capital your organization is willing, or is able, to dedicate to the solution. As its name implies, SQS provides a relatively simple interface to create, manage, and administer queues. As a free-tier eligible service, the first 1,000,000 calls to SQS per month are free. The table below provides some clarity as to what your AWS spend might be in the us-east-1 region:

* A single FIFO queue cannot support > 300 TPS.  TPS values represent the total number of requests to SQS across all FIFO queues within the us-east-1 region.

** Costs obtained using AWS Pricing Calculator [14].

As you can imagine, the expenses associated with SQS can become quite significant as your application scales. While avoiding the need to purchase hardware, be prepared for potential cost increases during traffic spikes.

On the other hand, Apache Kafka can be scaled by adding more nodes. Kafka scaling is a multi-dimensional topic, but in many cases bandwidth (and therefore message size) becomes a limiting factor. To give a broad idea of what Kafka might cost, the table below represents a monthly cost matrix for AWS Managed Streaming for Apache Kafka in the us-east-1 region, with no data ingress or egress:

* Costs obtained using AWS Pricing Calculator. [14]

When comparing these two cost models, it’s crucial to understand the scale required by your application. If two applications need to elect a leader for a job that might last a couple hours every night, SQS can perform that task at a low, possibly free, price point. SQS is also relatively inexpensive for workloads with less than 1000 TPS. However, as scale increases, it may be more practical to invest in real or virtual hardware, opting for upfront costs over ongoing usage expenses. 

Additionally, it’s worth noting that while SQS does not use a publish-subscribe model, it can subscribe to a Simple Notification Service topic to achieve the same effect. From a cost perspective, using SNS to SQS effectively multiplies TPS by the number of subscribing queues, potentially resulting in substantial AWS bills. Therefore, if your application requires a pub-sub model at high scale, Apache Kafka is likely to be less expensive than SQS.

What About Data Compliance?

Depending on your application’s data, it may be subject to certain governmental or industry-specific regulations. These regulations may dictate where your data can be physically stored, and how the data is encrypted both at rest and during transit. SQS is a cloud-only service, and you must store your messages within the AWS cloud. As mentioned previously, Kafka can be flexibly deployed in either the cloud or on-premises.

    

SQS and Kafka both have native support for data encryption during transit. SQS offers two integrations for encryption: SQS-SSE and SQS-KMS. Using AES-256 encryption, AWS offers SQS-SSE at no cost to its customers. SQS-KMS can also be used at additional cost. Kafka does not support server side encryption, meaning that software clients will be responsible for encryption and decryption.

The Best Choice? It Depends - But We Can Help.

As SQS and Kafka both have significant use case overlap, it’s crucial to understand your business’s specific requirements. For rapid prototyping or situations where your application scale is uncertain, SQS offers a cost-effective and efficient way to process data asynchronously.  However, if your application requires publishing data to multiple consumers, particularly at scale, Kafta may present a simpler and more economic option.

Finally, consider the human cost associated with managing your messaging layer. Do you have the resources and expertise to maintain your own cluster? Cloudflare's experience, highlighted in a postmortem, underscores the importance of this consideration; their analytics platform went down due to a power outage in the data center that hosted an Apache Kafka cluster. If your application demands a highly available architecture capable of withstanding power outages, you’ll need to establish a regionally distinct set of nodes elsewhere — or let AWS handle it for you with AWS Managed Streaming for Apache Kafka or AWS SQS.  

Sources:

  1. Amazon Simple Queue Service Released. Jeff Barr. https://aws.amazon.com/blogs/aws/amazon_simple_q/
  2. Open-sourcing Kafka, LinkedIn’s distributed messaging queue.  Jun Rao.  https://www.linkedin.com/blog/member/archive/open-source-linkedin-kafka
  3. Powered By.  https://kafka.apache.org/powered-by
  4. Amazon SQS queue types. https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-queue-types.html
  5. Resilience in Amazon SQS. https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-resilience.html
  6. Quotas related to messages. https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/quotas-messages.html
  7. Configuring server-side encryption (SSE) for a queue using SQS-managed encryption keys (console). https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-sqs-sse-queue.html
  8. Configuring server-side encryption (SSE) for a queue (console). https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-sse-existing-queue.html
  9. Identity and access management in Amazon SQS. https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-authentication-and-access-control.html
  10. Broker Configs. https://kafka.apache.org/documentation/#brokerconfigs
  11. Encryption and Authentication using SSL. https://kafka.apache.org/documentation/#security_ssl
  12. Authorization and ACLs. https://kafka.apache.org/documentation/#security_authz
  13. Apache License, Version 2.0. https://www.apache.org/licenses/LICENSE-2.0.html
  14. AWS Pricing Calculator. https://calculator.aws/#/
  15. Post Mortem on the Cloudflare Control Plane and Analytics Outage. https://blog.cloudflare.com/post-mortem-on-cloudflare-control-plane-and-analytics-outage

More by

Arthur Edmunds

Ready to reach your goals? We’re here to help.

Contact us