Introduction

Recently I needed to solve the following problem at work

We need to upload a lot of large files to a third party storage provider, it needs to be as efficient as possible.

This problem is more or less the same as when it comes to dealing with large http responses. A rule of thumb when it comes to I/O in dotnet is to always think STREAM STREAM STREAM, let's see if it holds true here as well.

The third party provides an API endpoint that accepts binary data.

POST /files/{filename}

Implementations

Read file content into a string

Note: This is just here for...educational purposes, it's not a good idea to use ReadAllText since it only works correctly on text files. It will not "just work" on binary files.

Here we are using the ReadAllText method on the File object to read the full content of the file into a string. We are then using StringContent (which is basically just the same as ByteArrayContent) to build up the request body.

A StringContent is essentially a ByteArrayContent. We serialize the string into a byte-array in the constructor using encoding information provided by the caller...

So first we allocate a string which contains all the contents of the file and then we also allocate a byte array.

UploadFileLocalCommand_String.cs

public class UploadFileLocalCommand_String : IUploadFileCommand
{
    private readonly HttpClient _httpClient;

    public UploadFileLocalCommand_String(IHttpClientFactory httpClientFactory)
    {
        _httpClient = httpClientFactory.CreateClient(nameof(UploadFileLocalCommand_String));
        _httpClient.BaseAddress = new Uri("http://localhost:5000");
    }

    public async Task<HttpStatusCode> UploadFile(string filename)
    {
        // DON'T DO THIS
        var file = File.ReadAllText(Path.Combine(Config.ExampleFilesAbsolutePath, filename));
        var content = new StringContent(file, Encoding.UTF8);
        var request = new HttpRequestMessage(HttpMethod.Post, $"/files/{filename}.string")
        {
            Content = content
        };

        using (var response = await _httpClient.SendAsync(request))
        {
            return response.StatusCode;
        }
    }
}

I've seen this pattern a bunch of times:

  1. Get all the content (using File.ReadAllText, or response.Content.ReadAsStringAsync() when dealing with http).
  2. Store it in a string.
  3. Do something with the string (manipulation, deserialization...).

Don't do this. It allocates like crazy and performs really bad. There are much nicer APIs to use as you will see.

Read file content into a byte array

Now we are using File.ReadAllBytes instead of File.ReadAllText.
So instead of storing the file content in a string, we are now storing the bytes in a byte array.
We are then using ByteArrayContent instead of StringContent, meaning that we will "only" allocate the byte array instead of first allocating both the string and then the byte array as we did in the first example.

UploadFileLocalCommand_Bytes.cs

public class UploadFileLocalCommand_Bytes : IUploadFileCommand
{
    private readonly HttpClient _httpClient;

    public UploadFileLocalCommand_Bytes(IHttpClientFactory httpClientFactory)
    {
        _httpClient = httpClientFactory.CreateClient(nameof(UploadFileLocalCommand_Bytes));
        _httpClient.BaseAddress = new Uri("http://localhost:5000");
    }

    public async Task<HttpStatusCode> UploadFile(string filename)
    {
        // DON'T DO THIS
        var file = File.ReadAllBytes(Path.Combine(Config.ExampleFilesAbsolutePath, filename));
        var content = new ByteArrayContent(file);
        var request = new HttpRequestMessage(HttpMethod.Post, $"/files/{filename}.bytes")
        {
            Content = content
        };

        using (var response = await _httpClient.SendAsync(request))
        {
            return response.StatusCode;
        }
    }
}

This is an improvement from our first example but it still has one huge drawback, it allocates the whole file and stores it in a byte array. We can do better.

Read file content with a stream

We are now using File.OpenRead instead of File.ReadAllBytes. This gives us a FileStream that we can use to stream the content of the file together with StreamContent.

UploadFileLocalCommand_Stream.cs

public class UploadFileLocalCommand_Stream : IUploadFileCommand
{
    private readonly HttpClient _httpClient;

    public UploadFileLocalCommand_Stream(IHttpClientFactory httpClientFactory)
    {
        _httpClient = httpClientFactory.CreateClient(nameof(UploadFileLocalCommand_Bytes));
        _httpClient.BaseAddress = new Uri("http://localhost:5000");
    }

    public async Task<HttpStatusCode> UploadFile(string filename)
    {
        using (var file = File.OpenRead(Path.Combine(Config.ExampleFilesAbsolutePath, filename)))
        {
            var content = new StreamContent(file);
            var request = new HttpRequestMessage(HttpMethod.Post, $"/files/{filename}.stream")
            {
                Content = content
            };
            using (var response = await _httpClient.SendAsync(request))
            {
                return response.StatusCode;
            }
        }
    }
}

Using a stream allows us to operate on small portions of the file in chunks instead of allocating the whole file.

When using a stream, the flow works (kind of) like this:

  1. Get a small chunk from the file (buffer).
  2. Send this chunk to the receiver (in our case our FileUpload API).
  3. Repeat 1-2 until the whole file has been sent.

Benchmarks

|            Method |    filename |         Mean |     StdDev |       Gen 0 |       Gen 1 |     Gen 2 |     Allocated |
|------------------ |------------ |-------------:|-----------:|------------:|------------:|----------:|--------------:|
| UploadFile_String |    1MB.test |     3.877 ms |  0.0527 ms |    500.0000 |    500.0000 |  500.0000 |    5156.46 KB |
|  UploadFile_Bytes |    1MB.test |     2.402 ms |  0.1362 ms |           - |           - |         - |    1028.28 KB |
| UploadFile_Stream |    1MB.test |     2.317 ms |  0.1189 ms |           - |           - |         - |       5.95 KB |
| UploadFile_String |   10MB.test |    50.732 ms |  7.7102 ms |   3500.0000 |   2000.0000 | 1000.0000 |   51313.21 KB |
|  UploadFile_Bytes |   10MB.test |    43.655 ms |  3.3186 ms |           - |           - |         - |   10244.29 KB |
| UploadFile_Stream |   10MB.test |    31.291 ms |  2.5565 ms |           - |           - |         - |      14.11 KB |
| UploadFile_String |  100MB.test |   573.321 ms | 46.4071 ms |  27500.0000 |  15000.0000 | 2500.0000 |  512969.54 KB |
|  UploadFile_Bytes |  100MB.test |   256.709 ms | 28.3533 ms |           - |           - |         - |  102405.95 KB |
| UploadFile_Stream |  100MB.test |   257.108 ms | 70.1054 ms |           - |           - |         - |     171.29 KB |
| UploadFile_String | 1000MB.test | 5,184.615 ms | 74.6092 ms | 256000.0000 | 172000.0000 | 5000.0000 | 5129272.25 KB |
|  UploadFile_Bytes | 1000MB.test | 2,490.773 ms | 60.2330 ms |           - |           - |         - | 1024006.73 KB |
| UploadFile_Stream | 1000MB.test | 1,910.646 ms | 60.3543 ms |           - |           - |         - |        998 KB |

All code in this post can be found here together with a "dummy" API that saves the file.