{"id":67521,"date":"2024-09-30T14:02:47","date_gmt":"2024-09-30T08:32:47","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=67521"},"modified":"2024-10-01T15:03:32","modified_gmt":"2024-10-01T09:33:32","slug":"configuring-aws-lambda-as-a-kafka-producer-with-sasl_ssl-and-kerberos-gssapi-for-secure-communication","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/configuring-aws-lambda-as-a-kafka-producer-with-sasl_ssl-and-kerberos-gssapi-for-secure-communication\/","title":{"rendered":"Configuring AWS Lambda as a Kafka Producer with SASL_SSL and Kerberos\/GSSAPI for Secure Communication"},"content":{"rendered":"<p>Kafka is a distributed streaming platform designed for real-time <a href=\"https:\/\/www.tothenew.com\/data-analytics\">data<\/a> pipelines, stream processing, and data integration. <a href=\"https:\/\/www.tothenew.com\/aws-data-analytics\">AWS<\/a> lambda, on the other hand, is a serverless compute service that executes your code in response to events, managing the underlying compute resources for you. In organizations where Kafka plays a central role in streaming and data integration, implementing a serverless, event-driven solution that processes files and seamlessly produces records to Kafka is an efficient and scalable approach.<\/p>\n<p>In this blog, we\u2019ll walk through how to configure an AWS Lambda function as a Kafka producer, reading files from Amazon S3 and sending data to Kafka in event-driven batch mode. We\u2019ll use the confluent-kafka-python library with SASL_SSL and GSSAPI configurations for secure communication. Since pre-built Linux wheels of confluent-kafka-python do not support SASL Kerberos\/GSSAPI, we must install librdkafka and its dependencies separately and build confluent-kafka. We will containerize the Lambda function along with confluent-kafka build, deploy via AWS ECR (Elastic Container Registry), and integrate with AWS Secrets Manager for passing Kafka credentials and certificates.<\/p>\n<h4>Read More:<a href=\"https:\/\/www.tothenew.com\/blog\/efficient-data-migration-from-mongodb-to-s3-using-pyspark\/\"> Efficient Data Migration from MongoDB to S3 using PySpark<\/a><\/h4>\n<h3>Architecture<\/h3>\n<p>The architecture below demonstrates a data pipeline that utilizes various AWS components to show how a containerized Lambda can be used to produce records for Kafka.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67600 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM.png\" alt=\"Architecture\" width=\"2074\" height=\"1188\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM.png 2074w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM-300x172.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM-1024x587.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM-768x440.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM-1536x880.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM-2048x1173.png 2048w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.33.26-PM-624x357.png 624w\" sizes=\"(max-width: 2074px) 100vw, 2074px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h3>Steps to Configure AWS Lambda as Kafka Producer<\/h3>\n<h4>1. Create a Python-based Lambda code for the producer.<\/h4>\n<p>First, we need to create a Python-based Lambda code for our producer and later will package it as a Docker container. We will use the confluent-kafka-python library for this.<\/p>\n<ul>\n<li><strong>Define functions to retrieve environment variables and secret values from the secret manager.<\/strong><\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67509 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.52.57-PM.png\" alt=\"Code snippet for defining Environment variable\" width=\"1280\" height=\"640\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.52.57-PM.png 1280w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.52.57-PM-300x150.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.52.57-PM-1024x512.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.52.57-PM-768x384.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.52.57-PM-624x312.png 624w\" sizes=\"(max-width: 1280px) 100vw, 1280px\" \/><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><strong>Define producer configuration<\/strong><\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67510 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.59.01-PM.png\" alt=\"Defining Producer Config\" width=\"1560\" height=\"828\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.59.01-PM.png 1560w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.59.01-PM-300x159.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.59.01-PM-1024x544.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.59.01-PM-768x408.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.59.01-PM-1536x815.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-5.59.01-PM-624x331.png 624w\" sizes=\"(max-width: 1560px) 100vw, 1560px\" \/><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><strong>Define lambda handler<\/strong><\/li>\n<\/ul>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67513 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-6.03.15-PM.png\" alt=\"Lambda handler\" width=\"1582\" height=\"1320\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-6.03.15-PM.png 1582w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-6.03.15-PM-300x250.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-6.03.15-PM-1024x854.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-6.03.15-PM-768x641.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-6.03.15-PM-1536x1282.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-6.03.15-PM-624x521.png 624w\" sizes=\"(max-width: 1582px) 100vw, 1582px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h4>Read More: <a href=\"https:\/\/www.tothenew.com\/blog\/driving-efficiency-and-cost-reduction-kafka-migration-to-aws-msk-for-a-leading-advertising-firm\/\">Driving Efficiency and Cost Reduction: Kafka Migration to AWS MSK for a Leading Advertising Firm<\/a><\/h4>\n<h4>2. Create a Dockerfile<\/h4>\n<p>Here\u2019s a simple Dockerfile to containerize the Lambda function with the necessary dependencies including librdkafka and confluent-kafka-python, enabling SASL_SSL communication using Kerberos\/GSSAPI.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-67617\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM.png\" alt=\"Dockerfile\" width=\"1578\" height=\"1596\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM.png 1578w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-297x300.png 297w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-1012x1024.png 1012w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-768x777.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-1519x1536.png 1519w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-624x631.png 624w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-120x120.png 120w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-24x24.png 24w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-48x48.png 48w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.42.25-AM-96x96.png 96w\" sizes=\"(max-width: 1578px) 100vw, 1578px\" \/><\/p>\n<h4>3. Build and Push Docker Image to ECR<\/h4>\n<p>Once the Dockerfile is ready, we can build and push the image to AWS ECR.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67589 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.01.20-PM.png\" alt=\"Build Image\" width=\"1950\" height=\"358\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.01.20-PM.png 1950w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.01.20-PM-300x55.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.01.20-PM-1024x188.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.01.20-PM-768x141.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.01.20-PM-1536x282.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.01.20-PM-624x115.png 624w\" sizes=\"(max-width: 1950px) 100vw, 1950px\" \/><\/p>\n<h3><\/h3>\n<h4>4. Create Lambda Function from ECR Image<\/h4>\n<p>In the AWS Management Console, create a new Lambda function and choose the Container image option. Select the image from your ECR repository.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67587 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-1.33.57-AM.png\" alt=\"Container Lambda\" width=\"1812\" height=\"1176\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-1.33.57-AM.png 1812w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-1.33.57-AM-300x195.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-1.33.57-AM-1024x665.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-1.33.57-AM-768x498.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-1.33.57-AM-1536x997.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-1.33.57-AM-624x405.png 624w\" sizes=\"(max-width: 1812px) 100vw, 1812px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h4>5. Configure Kafka Secrets in AWS Secrets Manager<\/h4>\n<p>Store Kafka credentials, including the SSL certificate and Kerberos Keytab, in AWS Secrets Manager. This ensures the credentials are securely retrieved at runtime.<\/p>\n<h4>6. Set Environment Variables<\/h4>\n<p>Configure the environment variables for the Lambda function to access secrets and other variables.<\/p>\n<h4>7. Provide IAM permissions<\/h4>\n<p>Provide necessary IAM permissions for:<\/p>\n<ul>\n<li>Accessing S3 from AWS Lambda.<\/li>\n<li>Accessing ECR from AWS Lambda.<\/li>\n<li>Accessing Secret Manager.<\/li>\n<\/ul>\n<h4>8. Testing the Lambda Function<\/h4>\n<p>In order to test the lambda function before deploying and running on AWS:<\/p>\n<ul>\n<li>Run the docker image for lambda producer and provide all necessary env variables.<img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67614 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.27.26-AM.png\" alt=\"lambda container testing\" width=\"1932\" height=\"242\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.27.26-AM.png 1932w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.27.26-AM-300x38.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.27.26-AM-1024x128.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.27.26-AM-768x96.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.27.26-AM-1536x192.png 1536w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-25-at-12.27.26-AM-624x78.png 624w\" sizes=\"(max-width: 1932px) 100vw, 1932px\" \/><\/li>\n<li>Send a sample event using the curl command in another window. The csv file name used in the event should be present in s3 location.<img decoding=\"async\" loading=\"lazy\" class=\"wp-image-67592 size-full\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.06.02-PM.png\" alt=\"curl sample event\" width=\"1338\" height=\"1284\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.06.02-PM.png 1338w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.06.02-PM-300x288.png 300w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.06.02-PM-1024x983.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.06.02-PM-768x737.png 768w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.06.02-PM-624x599.png 624w, \/blog\/wp-ttn-blog\/uploads\/2024\/09\/Screenshot-2024-09-24-at-11.06.02-PM-24x24.png 24w\" sizes=\"(max-width: 1338px) 100vw, 1338px\" \/><\/li>\n<li>Lambda container hosted on a local machine should be triggered, and processing should start.<\/li>\n<li>Lambda should be able to read all CSV records transform them according to the lambda logic and produce them to Kafka topic.<\/li>\n<li>Data should be available in the Kafka topic.<\/li>\n<li>Once all scenarios are tested you can deploy your image to AWS and start using Lambda as a Kafka producer.<\/li>\n<\/ul>\n<h3>Conclusion<\/h3>\n<p>Containerizing AWS Lambda allows us to efficiently manage complex dependencies and extend its capabilities with external libraries like confluent-kafka-python and librdkafka. This provides a secure way to integrate AWS Lambda with Kafka as a producer using SASL_SSL and GSSAPI. Additionally, it ensures a consistent stable environment across Dev, Test, and Prod Stages. This architecture offers a scalable, secure solution for building robust data pipelines that seamlessly integrate with S3 and Kafka.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kafka is a distributed streaming platform designed for real-time data pipelines, stream processing, and data integration. AWS lambda, on the other hand, is a serverless compute service that executes your code in response to events, managing the underlying compute resources for you. In organizations where Kafka plays a central role in streaming and data integration, [&hellip;]<\/p>\n","protected":false},"author":1953,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":238},"categories":[6194],"tags":[248,1197,5388,1604,1545],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/67521"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1953"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=67521"}],"version-history":[{"count":20,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/67521\/revisions"}],"predecessor-version":[{"id":68071,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/67521\/revisions\/68071"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=67521"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=67521"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=67521"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}