Marcin Drobik

software journeyman notes

Uploading files to Azure Blob through Owin-Katana

Intro

When implementing file uploads from my Katana-based Azure-hosted blog site I learned that Katana has no built-in support for multipart/form-data Content-Type. The idea of this type is that data is separated by boundaries that are specified in the content-type header.

multipart/form-data implementation

I decided to implement for myself the mechanism that handles the multipart/form-data. I looked into Nancy and Stack overflow examples for some guidelines and I noticed that all those solutions read entire stream to find the boundaries and then pass the data around. I wanted to achieve something different - I wanted to pass around stream that will end when the boundary is found while reading it.

Why? This way I could pass the modified Request stream directly into Azure Blob Storage which then would read it. Because the stream will end when boundary is found, no matter how big the file is, I'll keep only small buffer in memory for finding the boundary.

PatternLimitedStream

So the biggest problem in implementing multipart/form-data was to find a pattern in stream of bytes. I used this opportunity to implement my own generic Boyer-Moore implementation. Having that, It was simple to create custom Stream.

I based my implementation on Decorator Design Pattern which I called PatternLimitedStream. It's available on GitHub.

All of the interesting stuff is happening in Read overload. It actually reads more bytes than requested and stores them in internal search buffer, which is then searched for the pattern. If the pattern is found at some position, it will return only data up to that position. If not, it will return the requested data and leave extra bytes in buffer for use in next read.

MultiPartFormDataStream

At this point I could use the PatternLimitedStream to implement the multipart/form-data protocol itself. I chose the same strategy - the Decorator Pattern.

The MultiPartFormDataStream adds SeekNextFile method, which will find the file boundary, read the header (Content-Type and Content-Desposition) and place the stream posistion before first byte of the file data.

Internally it uses the PatternLimitedStream, so after calling SeekNextFile, the Read method will end reading right after last byte of the file. When you call SeekNextFile again, it will move to beginning of the next file (or return false if there are none).

MultiPartFormDataStreamReader illustrates how it can be used:

public static IEnumerable<File> GetFiles(MultiPartFormDataStream stream)
{
    while (stream.SeekNextFile())
    {
        yield return new File
        {
           Content = ReadFully(stream),
           FileName = stream.CurrentFileName,
           FormName = stream.CurrentFormName,
           ContentType = stream.CurrentContentType
        };
    }
}

Uploading files to Azure Blob Storage using Owin Context

At this point, we have everything we need to put incoming files into Azure Blob Storage.

var multiPartStream = new MultiPartFormDataStream(owinContext.Request.Body, owinContext.Request.ContentType);

var storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("BlogStorage"));
var blogClient = storageAccount.CreateCloudBlobClient();
var blogFilesContainer = blogClient.GetContainerReference("blogfiles");

while (multiPartStream.SeekNextFile())
{
    var blob = blogFilesContainer.GetBlockBlobReference(Guid.NewGuid() + "-" + multiPartStream.CurrentFileName);
    blob.UploadFromStream(multiPartStream);
}

The important thing is that we don't read entire request body - we just decorate it and pass to Azure SDK, which reads it and sends it to Azure Storage server.

Links

comments powered by Disqus