How to use EmbeddingGemma to generate embeddings in Go
Embeddings or Vector Embeddings are the numeric representation of text or other data that enables a lot of Machine learning tasks like classification and clustering. If you are familiar with the concept of embeddings or have some interest in machine learning and topics such as RAG (retrieval augmented generation) you may find it exciting that Google recently released EmbeddingGemma.
EmbeddingGemma sounds compelling for it’s relatively small size and potential for local on-device applications.
From their article :
EmbeddingGemma generates embeddings, which are numerical representations - in this case, of text (such as sentences and documents) - by transforming it into a vector of numbers to represent meaning in a high-dimensional space. […] EmbeddingGemma empowers developers to build on-device, flexible, and privacy-centric applications. It generates embeddings of documents directly on the device's hardware, helping ensure sensitive user data is secure.
In this article, we will do something a bit unconventional, we will use EmbeddingGemma to generate embeddings from a Go program. Typically, it is advisable to work with embeddings from language like Python which has better ecosystem and libraries for such things.
Prerequisites
Go 1.24+
Ollama v0.11.10+
We will use a similar example to the one in the HuggingFace blog post. In order to follow along with this tutorial you will need to have Ollama installed, you will need version atleast v0.11.10 and pull the embeddinggemma:300m model
$ ollama --version
$ ollama pull embeddinggemma:300m
Once this completes successfully, you will have the EmbeddingGemma model on your machine, ready to interact with via Ollama to generate embeddings.
Let’s get to coding, start by creating a new go project
mkdir embeddings-tutorial
cd embeddings-tutorial
go mod init embeddings-tutorial
touch main.go
Before we can look at the Go code, let’s discuss what it does.
The code below is a small program that interacts with Ollama via the LangChain library. In the program we have a user query or question and a list of facts (think of those as a database) that we can query against. Both the query and facts are hardcoded in the code as strings but they could come from anywhere, a text file, stdin etc.. Once we have the database and the user query we convert them to Vector Embeddings (remember, embeddings are a numeric representation) using Ollama via the llm.CreateEmbedding function.
Once the embeddings are created, we need a way to use them to perform a search which is basically an operation to compare the vector embedding of the query with the embeddings of the information in our “database” to see which fact/statement most closely matches the query. In order to do this, we use the HNSW data structure from the github.com/habedi/hann package. The package implements a few algorithms for working with high-dimensionality data, but we pick HNSW as it is one of the most commonly used approach for working with embeddings:
“HNSW - addresses the challenge of quickly finding items “close” to a query point in large datasets, which is computationally expensive with traditional methods like brute-force comparisons.” - from Milvus.io
Copy the following content into main.go
package main
import (
"context"
"fmt"
"log"
"github.com/habedi/hann/core"
"github.com/habedi/hann/hnsw"
"github.com/tmc/langchaingo/llms/ollama"
)
const modelID = "embeddinggemma:300m"
const query = "Which planet is known as the Red Planet?"
var documents = []string{
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
}
func preprocess(content []string) []string {
res := make([]string, len(content))
for idx, item := range content {
res[idx] = fmt.Sprintf("title: none | text: %s", item)
}
return res
}
func main() {
ctx := context.Background()
llm, err := ollama.New(
ollama.WithModel(modelID),
)
if err != nil {
log.Fatalf("failed to connect to ollama: %v", err)
}
// our embeddings are
embeddings, err := llm.CreateEmbedding(ctx, preprocess(documents))
if err != nil {
log.Fatalf("failed to generate embedding from ollama: %v", err)
}
fmt.Println("generated embeddings successfully, now you gotta use 'em", "dimension", len(embeddings[0]))
dimension := 768
m := 4 // maximum number of neighbor connections per node
ef := 16 // search breadth factor
distanceFuncName := "cosine"
index := hnsw.NewHNSW(dimension, m, ef, core.Distances[distanceFuncName], distanceFuncName)
for idx, embedding := range embeddings {
err = index.Add(idx, embedding)
if err != nil {
log.Fatalf("failed to index embedding in HNSW data structure: %v", err)
}
}
// now that we have built an index, we can query it
queryTask := fmt.Sprintf("task: search result | query: %s", query)
queryEmbedding, err := llm.CreateEmbedding(ctx, []string{queryTask})
if err != nil {
log.Fatalf("failed to generate query embedding from ollama: %v", err)
}
numNeighbors := 3
neighbors, err := index.Search(queryEmbedding[0], numNeighbors)
if err != nil {
log.Fatalf("failed to search for query in index: %v", err)
}
fmt.Println("Results: ")
for _, n := range neighbors {
fmt.Printf("Relevance: %f \t Content: %s\n", 1-n.Distance, documents[n.ID])
}
}
Now that we have looked at and understand the code, you can then run the following commands:
$ go mod tidy
$ go run main.go
You will get output that looks something like this

As you can see, the answer we want about Mars is ranked the highest! (has the highest relevance score)
This result is good as it gives us confidence that the query we searched for matched the most relevant items in our “database” - this technique can be used to implement semantic search in applications as well as recommendations based in unstructured input/text.
Warning and/or Disclaimer: Now I should mention that while this works, I wouldn’t advise using this approach for production. You are better off using solutions in the Python ecosystem as they have more robust libraries for this sort of thing.
I hope you found this somewhat interesting.