Implementing retries for messages in azure storage queues

One of the common scenario to handle while processing messages from a queue is to retry messages if they cannot be processed. This happens by default for a message on the Azure storage queue. But this default mechanism for retrying is not always very useful. The default mechanism retries a message for 5 times in quick succession (i.e. without any delays within a matter of seconds) and then puts it on the poison queue.

I can imagine many scenarios where this is not what we would want. For example, let us assume that you are building an integration layer between different applications. Purpose of this integration layer is to reliably transfer data from one application to another. Queuing is an essential part of this kind of architecture where it is required if for some reason the receiving application is not available the sender can still send and the data will be transferred to the receiving application when it is available

To achieve this goal it is necessary to implement a retry strategy where there is some kind of fixed or increasing delay between subsequent message retries. One of the most common examples of such retry strategies is called exponential backoff where the duration between each retry attempt increases exponentially.

This is very simple to implement in azure storage queues. As mentioned in my previous post that azure storage queues work with leasing mechanism i.e. when an application pulls the message from the queue it is given a lease on the message for a specified amount of time (called visibility time out) and once the duration expires the message reappears on the queue to be processed.

Simple strategy, in this case, would be to pick the message and give visibility timeout duration which increases exponentially. Let’s look at an example.

Example

namespace StorageAccountDemo.Queue.Receiver.Retry
{
   
    public interface IRetryIntervalGenerator
    {
        TimeSpan GetNext(int dequeueCount);
    }

    public class ExponentialRetryIntervalGenerator : IRetryIntervalGenerator
    {
        public TimeSpan GetNext(int dequeueCount)
        {
            return TimeSpan.FromSeconds(dequeueCount * dequeueCount);
        }
    }
    public class AzureStorageQueueReceiverWithRetry
    {
        private readonly IRetryIntervalGenerator _retryIntervalGenerator;

        public AzureStorageQueueReceiverWithRetry(IRetryIntervalGenerator retryIntervalGenerator)
        {
            this._retryIntervalGenerator = retryIntervalGenerator;
        }
        public async Task ReceiveMessagesAsync()
        {
            var storageAccount = CloudStorageAccount.Parse("UseDevelopmentStorage=true");
            var queueClient = storageAccount.CreateCloudQueueClient();
            var queueRef = queueClient.GetQueueReference("filequeue");
            await queueRef.CreateIfNotExistsAsync();

            var message = await queueRef.GetMessageAsync();
            if (message != null)
            {
                await queueRef.UpdateMessageAsync(message, _retryIntervalGenerator.GetNext(message.DequeueCount),
                    MessageUpdateFields.Visibility);
                Console.WriteLine($"Received message : {message.AsString}");

                if (message.DequeueCount >= 4)
                {

                    Console.WriteLine("Could not process the message after 4 retries. Sending message to poison queue");
                    //TODO - Send message to poison queue and delete the message
                    //Delete the message so that it does not reappear on the queue
                    await queueRef.DeleteMessageAsync(message);
                }
            }
            else
            {
                Console.WriteLine("No message(s) on the queue");
            }

            Console.ReadLine();
        }
    }
}

In the above code, we are injecting a retry interval generator and using it while calling  UpdateMessageAsync. Now when the message is received the visibility timeout of the message is determined by the retry interval generator based on which attempt this is e.g., in this case, it’s very simple and I am just multiplying dequeue count by itself. This will generate timespan numbers which will increase exponentially. This will be done for a given number of times and then the message will be put manually to the poison queue (equivalent to a dead-letter queue if you are familiar with that terminology).

Hope this post gives you some idea on making your queuing infrastructure resilient. If you are someone working on implementing or designing azure solutions using storage queues I would highly recommend these two (course one and two) courses which go beyond usual azure stuff.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.