neria: An AWS Lambda function for Named Entity Recognition from any website

neria: An AWS Lambda function for Named Entity Recognition from any website

TLDR; I developed an AWS Lambda Function URL which can be used for performing basic Named Entity Recognition i.e. detecting names of people and places from text from any website. It requires a URL and an JQuery-like selector which specifies which part of the page to extract the text from.

Background

I work on/off on a Java toy project, articulated, which scrapes articles from a few Malawian news sites. I use it mostly to play with different technologies, ideas and to try to keep up with the local news. One of the features of that project is performing basic Named Entity Recognition which I decided to build as a separate service in Go after facing issues with Java libraries for similar functionality, but that's a story for another day.

Building an AWS Lambda Function for NER

I decided to re-implement that service as a Lambda function because it seemed like the kind of thing that would make sense to have a lambda function for. As I was going about that, it struck me that the service could be made more useful if it was generic enough to be used with any website.

The result is a project dubbed neria-lambda which can be deployed on AWSs Lambda platform. You send a request with a URL to a website and a jQuery-like selector which specifies which DOM elements (i.e. part of the page) to extract the text from. After the text is fetched, the service uses prose to extract Named Entities from the text, which basically means it detects names of people or places from the text. (Side note: turns out the prose library was archived by the maintainer at some point)

Here is an example in Insomnia

neria-00.png

Deploying the Lambda Function

So as much as I love having things in production, I don't like bleeding money so I won't share the Function URLs public URL - this section will explain how you can deploy your own instance of the lambda on AWS.

Step 1: Clone the repository, compile the go program and create a zip file named main.zip. The instructions are below:

$ git clone https://github.com/zikani03/neria

$ cd neria/neria-lambda

$ GOOS=linux GOARCH=amd64 go build -o main main.go

$ zip main.zip main

NOTE: Windows users should use the build.ps1 powershell script for the process

Step 2: Login to the AWS console and create a Lambda function

neria-01.png

neria-02.png

neria-03.png

Step 3: Once the Lambda Function is created, upload the main.zip from step 1.

neria-04.png

neria-05.png

neria-06.png

Step 4: Create a Function URL - which will enable you to access the lambda via a public URL (of course you can configure it to be accessible to certain IAM roles, etc.. - I'm assuming you know your way around this AWS stuff)

neria-07.png

Step 5: Once you create the Function URL you can test the lambda with cURL, Postman or Insomnia with the following example request:

{
    "Url": "https://www.nyasatimes.com/chilima-says-malawi-is-a-best-investment-place-in-sadc-region-and-beyond/",
    "Selector": "#content div.nyasa-content",
    "Text": ""
}

Experience of deploying a Lambda Function URL

I found the experience of deploying a Lambda Function URL an interesting one, it was easy to follow the AWS documentation and was especially useful to reference their samples from the GitHub repos.

What I found really interesting is how deployment of the function is done via uploading a zip (with an option to get that from S3). I expected a Heroku-like experience for the process but was reminded that sometimes simple approaches go a long way.

Conclusion

In this article I described my experience building a useful Lambda function which you can also deploy for your own use. The project could improve in how it does named entity recognition but since I'm using a library that's since been archived I can't really say much on when and how. It was an interesting experience and one that's given me a couple of ideas (one for a PaaS platform).

Thanks for reading. Feel free to share feedback on Twitter - @zikani03