AWS Lambda is great for ad hoc services without needing to manage additional infrastructure. I’ve used it on a couple tasks for syncing S3 buckets.
The workflow goes like this:
- Register a lambda function
- Setup appropriate role and ARN permissions
- Setup a trigger, ie a circumstance that should invoke this function
- Build code to respond to the trigger
- Upload, debug, etc
So this weekend I built an AWS Lambda in python to transform some textfiles that were stored in EDN format into JSON and then partition them according to one key. EDN is a json-ish format from the Clojure world (https://en.wikipedia.org/wiki/Extensible_Data_Notation). These EDN files were on S3 and gzip compressed.
I built the lambda in python, used boto3, and edn_format for freeing the data from EDN. I packaged those dependencies up into a zipfile and shipped it to staging environment.
It worked marvelously on files that were up to 1MB in size. Then larger files started timing out… because AWS Lambda has an upper time limit of 300 seconds per execution. I found the culprit files, mostly ~ 7MB of gzipped EDN, tried them locally, performance profiled it, and realized the issue was in deserializing EDN data in Python. Woops! As you might expect, EDN libraries are few and far between compared to JSON. And they tend to be less robust and don’t delegate to C extensions.
Now clojure is the logical choice for this EDN -> JSON partitioning task. But AWS only officially supports Java, Python and Node.js.
But clojure is really just java under the hood… so I found an article with the basic guidelines and set to work. (Article: https://aws.amazon.com/blogs/compute/clojure/).
The trick to using clojure is needing to expose a static method with the appropriate signature for AWS Lambda and then using a few project.clj configurations.
project.clj - Note the uberjar profile with :aot :all
and the aws lambda clojar.
Include [com.amazonaws/aws-lambda-java-core “1.0.0”] as dependency and set :profiles {:uberjar {:aot :all}}
Then to help with the aws-lambda protocol, I followed instructions from the original article, along with a secondary source of information from @kobmic on Github. I’m particularly happy with their implementation of the deflambda macro, copied to here:
;; convenience macro for generating gen-class and handleRequest
(defmacro deflambda [name args & body]
(let [class-name (->> (clojure.string/split (str name) #"-")
(mapcat clojure.string/capitalize)
(apply str))
fn-name (symbol (str "handle-" name "-event"))]
`(do (gen-class
:name ~(symbol class-name)
:prefix ~(symbol (str class-name "-"))
:implements [com.amazonaws.services.lambda.runtime.RequestStreamHandler])
(defn ~(symbol (str class-name "-handleRequest")) [this# is# os# context#]
(let [~fn-name (fn ~args ~@body)
w# (io/writer os#)]
(-> (json/read (io/reader is#) :key-fn keyword)
(~fn-name)
(json/write w#))
(.flush w#))))))
Used like
(deflambda s3-split [event]
(example.core/handler event)
And in the AWS Lambda dashboard, the handler name is S3Split::handleRequest
.
So where the Python version of this code was timing out at 300 seconds without completing the task, my clojure lambda burns through it in 20-70 seconds and has been working well.
Additional Code for Deploying/Updating/Building
Create lambda function
#!/usr/bin/env bash
aws lambda create-function --function-name example-lambda --handler S3Put::handleRequest --runtime java8 --memory 512 --timeout 120 --role arn:aws:iam::<ID>:role/example-role-lambda --zip-file fileb://./target/example-0.1.0-SNAPSHOT-standalone.jar
Update lambda function
#!/usr/bin/env bash
aws lambda update-function-code \
--function-name example-lambda \
--zip-file fileb://./target/example-1.0.0-SNAPSHOT-standalone.jar
Build