Building A Serverless Image Processing SaaS
If you google “Image Processing SaaS” you will find many image processing services. Some of them you might have already used, or at least heard of, such as Imgix, Bitline, Cloudinary, etc. Here’s a more comprehensive list. In this tutorial, I will share with you how you can build your own and serverless service, using AWS and Zappa. So I’m happy to announce imgy, a tiny image processing service we will be building in this tutorial. 😀
Most features of all these services orbit around a implementation of a real-time image transformation API, a simple RESTful web service with a single operation that looks like the following:
For example, in the above HTTP request, the API receives the input
image.png, dynamically makes image modifications (in this case, changes its dimensions to 300px) and returns the modified image, as HTTP (binary) response.
Very simple, right ?
Of course, we can have many image processing modifiers, such as scaling, format conversion, quality compression, etc, but we will get to that later. For now what you need to understand is that the API is a single method/resource.
This tutorial assumes you know at least the basics of some AWS products, mainly CloudFormation, CloudFront, S3, APIGateway, Lambda. It’s also assumed you have some Python and Flask knowledge. I also won’t dive into much details of why you should be building serverless applications. There are plenty of articles out there on that.
In this tutorial we will use:
- Python 3.6
So we want to build this thing serverleslly, right ? We will use some of the AWS product arsenal for that. And here’s what the general architecture looks like:
Thanks to Cloudcraft for this nice diagram 🙂
- API Gateway: API Gateway is responsible for handling incoming HTTP requests. Its main job is to proxy the requests to Lambda, and forward the response back to the user.
- CloudFront: It works as a CDN (Content Delivery Network), which, in plain English, means it will cache responses (resulting images) and deliver according to geographic location of the user. A CloudFront distribution is attached by default to the API Gateway deployment, BUT guess what, it’s not designed for caching 😒, as you can see here. So we will need to include our own CloudFront distribution.
- Lambda: This is where our business logic lives. All the code handling the HTTP request, which is basically transforming the input image and generating a binary image response. This will be written in Python 3.6 and take advantage of the fact that Lambda instances comes with ImageMagick built-in. To be more precise we will be using Wand, “a simple ImageMagick binding for Python”. Lambda will scale automatically according to the demand.
- S3: One of the assumption of our services is that the input images we want to process is available at S3. That means images must be uploaded directly to S3 prior the request. This is actually very scalable serverless architecture, in the sense that we won’t have the headache of managing an upload server, since S3 handles that beautifully for us.
The full code for imgy is available HERE. And here are some of the important aspects of it.
This is coded in Flask, which does most of the heavy lifting for us in terms of HTTP handling. So we are basically downloading the source image (given by
s3_key) from S3, then applying all specified operations (
ops) coming from the query string. After that, we convert the image to binary and attach it to the HTTP response via
send_file helper from Flask. Got part of that snippet from here.
Wand for the Win
Below you can see the image modification part using Wand. This code can be easily extended to include more operations, as you can stack up new image modifiers. Notice they are independent of each other, that’s how ImageMagick is designed.
All transformations below are currently supported and they can be applied independently of each other:
w: sets image width
h: sets image height
fm: sets image format, e.g.: png, jpeg, etc. All supported by ImageMagick.
q: sets compression quality, in case it’s lossy format. Value must be between 1 to 100.
Until recently, it was very clumsy to support binary responses in API Gateway; and this is key to make our solution to work. Hopefully, AWS improved the support last year and it’s much simpler to integrate 👏 👏. Zappa took advantage of this and added binary support on its version 0.36.0.
Now, let’s add the necessary headers to our response so that CloudFront is able to cache images. First, we will use the decorator
@app.after_request on all responses. Then we need to tell to the CloudFront that they should cache the images for
CACHE_MAX_AGE seconds. Finally we set the cache as
public, rather than private. That’s it, we are all set for caching.
You are able to verify this is working when you see a header in the response that is set as
x-cache: Hit from cloudfront, instead of
x-cache: Miss from cloudfront. Plus you can check the Lambda function logs and you will notice the function is not invoked after caching.
One important feature imgy should have is the ability to support CORS (Cross-Origin Resource Sharing) requests. Meaning, we want to let incoming requests from an external domain. For example, if we were to embed the following image in a HTML, it would not work if CORS was disabled.
CORS support is achieved by simply adding
Let’s Deploy Configure first your environment. At zappa_settings.py you SHALL change:
s3_bucket: the S3 bucket where we will store our (zappa) deployment packages. You don’t need to create in AWS, Zappa will do it for you. NOTE: make sure the default AWS CLI profile has ADMIN permission.
You MAY also configure:
aws_region: the AWS region to where you want to deploy the app
imgy/settings.py you SHALL change:
imgy_bucket: the S3 bucket from where we will get input images.
You MAY also configure:
imgy_cache_max_age: Define the cache Control header max-age in seconds.
To install let’s perform some commands:
virtualenv -p python3.6 venv source venv/bin/activate pip install -r requirements.txt deploy zappa
Adding CloudFront (optional)
You may add a custom CloudFront distribution, which will add a caching layer to your service. Be aware that this step will take about 15-20 minutes to finish. At the end the CloudFront URL will be generated, and use that instead of the API Gateways’.
E.g.: If the API Gateway URL is
https://vk05slewjg.execute-api.us-west-2.amazonaws.com/api, then your id is
NOTE: you might need to update awscli to latest version:
sudo pip install awscli --upgrade --user. Install in you OS environment, not in the virtualenv.
To remove all AWS components created:
./remove_cloud_front.sh undeploy zappa
API Gateway URL:
Post your comments below if you run into any issues or have questions.
Liked it ? Share this with your friends.