Handling Dataloss in AWS Lambda

AWS Lambda or the fancy server-less environment has become so handy off-late. From reducing costs to scaling, server-less grabbed everyone’s attention. If not used with the right configuration data loss can be a huge player in these environments.

But, why do we even care about data loss here?

Let’s consider this scenario, you are using Lambda to send data to let’s say Splunk. There can be hiccups in the network from Lambda to Splunk at a given time, which results in the loss of data transferred to Splunk at that given point of time.

Losing Data, Oh no there’s your nightmare 😜
Let’s just thank AWS for having a process to back up this data, which follows in this article.
We have a saviour YES!! 😇

So, How is this done?

Asynchronous configuration in Lambda makes it possible to invoke lambda’s asynchronously and not waiting for the responses. There are three inputs that you need to know to enable this feature.
Maximum age of event – This the amount of time an unprocessed event is kept in the queue.
Retry attempts – Must be 0-2. The number of retry attempts when an event fails.
Dead letter queue service – Unprocessed events can be sent to SNS or SQS through this config.

Demonstration with an example

Let’s consider the following inputs –
Maximum age of event – 5 mins
Retry attempts – 1
Dead letter queue service – SQS
This means, whenever there is an error in Lambda, the event is stored in its queue for 5mins, retries once and if it’s still unsuccessful then it is sent to SQS.
All the data which you thought you would lose when the connection between Lambda and Splunk is interrupted is now in SQS.

Bonus

Now that you have all the lost data in SQS. Create a Lambda, which runs for every 5/10 minutes(it’s your choice) that moves the data to and S3. One more manual run Lambda, which transfers the data from S3 to Splunk once the connection from Lambda to Splunk is re-established.

Figure 1. The Whole Flow

There can be numerous ways you can do this, the example you can use kinesis for doing the above example where you wouldn’t need all these custom Lambdas. But, think about the cost there. The process you adopt depends upon the scenario you are trying to solve.