Zip files on S3 with AWS Lambda and Node

Gordon Johnston - Sep 11 '19 - - Dev Community

This post was updated 20 Sept 2022 to improve reliability with large numbers of files.

  • Update the stream handling so streams are only opened to S3 when the file is ready to be processed by the Zip Archiver. This fixes timeouts that could be seen when processing a large number of files.
  • Use keep alive with S3 and limit connected sockets.

It's not an uncommon requirement to want to package files on S3 into a Zip file for a user to download multiple files in a single package. Maybe it's common enough for AWS to offer this functionality themselves one day. Until then you can write a short script to do it.

If you want to provide this service in a serverless environment such as AWS Lambda you have two main constraints that define the approach you can take.

1 - /tmp is only 512Mb. Your first idea might be to download the files from S3, zip them up, upload the result. This will work fine until you fill up /tmp with the temporary files!

2 - Memory is constrained to 3GB. You could store the temporary files on the heap, but again you are constrained to 3GB. Even in a regular server environment you're not going to want a simple zip function to take 3GB of RAM!

So what can you do? The answer is to stream the data from S3, through an archiver and back onto S3.

Fortunately this Stack Overflow post and its comments pointed the way and this post is basically a rehash of it!

The below code is Typescript but the Javascript is just the same with the types removed.

Start with the imports you need

import * as Archiver from 'archiver';
import * as AWS from 'aws-sdk';
import { createReadStream } from 'fs';
import { Readable, Stream } from 'stream';
import * as lazystream from 'lazystream';
Enter fullscreen mode Exit fullscreen mode

Firstly configure the aws-sdk so that it will use keepalives when communicating with S3, and also limit the maximum number of connections. This improves efficiency and helps avoid hitting an unexpected connection limit. Instead of this section you could set AWS_NODEJS_CONNECTION_REUSE_ENABLED in your lambda environment.

    // Set the S3 config to use keep-alives
    const agent = new https.Agent({ keepAlive: true, maxSockets: 16 });

    AWS.config.update({ httpOptions: { agent } });
Enter fullscreen mode Exit fullscreen mode

Let's start by creating the streams to fetch the data from S3. To prevent timeouts to S3 the streams are wrapped with 'lazystream', this delays the actual opening of the stream until the archiver is ready to read the data.

Let's assume you have a list of keys in keys. For each key we need to create a ReadStream. To track the keys and streams lets create a S3DownloadStreamDetails type. The 'filename' will ultimately be the filename in the Zip, so you can do any transformation you need for that at this stage.

    type S3DownloadStreamDetails = { stream: Readable; filename: string };
Enter fullscreen mode Exit fullscreen mode

Now for our array of keys, we can iterate after it to create the S3StreamDetails objects

    const s3DownloadStreams: S3DownloadStreamDetails[] = keys.map((key: string) => {
        return {
            stream: new lazystream.Readable(() => {
                console.log(`Creating read stream for ${fileToDownload.key}`);
                return s3.getObject({ Bucket: s3UGCBucket, Key: fileToDownload.key }).createReadStream();
            }),
            filename: key,
        };
    });
Enter fullscreen mode Exit fullscreen mode

Now prepare the upload side by creating a Stream.PassThrough object and assigning that as the Body of the params for a S3.PutObjectRequest.


    const streamPassThrough = new Stream.PassThrough();
    const params: AWS.S3.PutObjectRequest = {
        ACL: 'private',
        Body: streamPassThrough
        Bucket: 'Bucket Name',
        ContentType: 'application/zip',
        Key: 'The Key on S3',
        StorageClass: 'STANDARD_IA', // Or as appropriate
    };

Enter fullscreen mode Exit fullscreen mode

Now we can start the upload process.

    const s3Upload = s3.upload(params, (error: Error): void => {
        if (error) {
            console.error(`Got error creating stream to s3 ${error.name} ${error.message} ${error.stack}`);
            throw error;
        }
    });

Enter fullscreen mode Exit fullscreen mode

If you want to monitor the upload process, for example to give feedback to users then you can attach a handler to httpUploadProgress like this.

    s3Upload.on('httpUploadProgress', (progress: { loaded: number; total: number; part: number; key: string }): void => {
        console.log(progress); // { loaded: 4915, total: 192915, part: 1, key: 'foo.jpg' }
    });
Enter fullscreen mode Exit fullscreen mode

Now create the archiver

    const archive = Archiver('zip');
    archive.on('error', (error: Archiver.ArchiverError) => { throw new Error(`${error.name} ${error.code} ${error.message} ${error.path} ${error.stack}`); });
Enter fullscreen mode Exit fullscreen mode

Now we can connect the archiver to pipe data to the upload stream and append all the download streams to it

    await new Promise((resolve, reject) => {

        console.log('Starting upload');

        s3Upload.on('close', resolve);
        s3Upload.on('end', resolve);
        s3Upload.on('error', reject);

        archive.pipe(s3StreamUpload);
        s3DownloadStreams.forEach((streamDetails: S3DownloadStreamDetails) => archive.append(streamDetails.stream, { name: streamDetails.filename }));
        archive.finalize();
    }).catch((error: { code: string; message: string; data: string }) => { throw new Error(`${error.code} ${error.message} ${error.data}`); });
Enter fullscreen mode Exit fullscreen mode

Finally wait for the uploader to finish

    await s3Upload.promise();
Enter fullscreen mode Exit fullscreen mode

and you're done.

I've tested this with +10GB archives and it works like a charm. I hope this has helped you out.

. . . .
Terabox Video Player