Skip to main content
Embedding or vectorization is the process of converting content into sets of numbers that, when compared mathematically with each other, quantify how similar they are in meaning. Embeddings are used to power semantic search engines, text classification, and cluster analysis, as well as the retrieval component of retrieval-augmented generation (RAG) applications. Isaacus currently offers the world’s most accurate legal AI embedder, Kanon 2 Embedder, available through our embedding endpoint.

Usage

Our embedding endpoint takes one or more texts as input and outputs an embedding for each text. The code snippet below demonstrates how you could use our embedding endpoint to assess the semantic similarity of search queries to a legal document. Please consult our quickstart guide first if you haven’t set up your Isaacus account and API client.
import numpy as np # NOTE you may need to `pip install numpy`.

from isaacus import Isaacus

# Create an Isaacus API client.
# NOTE see https://docs.isaacus.com/quickstart to learn how to get an API key.
client = Isaacus(api_key="PASTE_YOUR_API_KEY_HERE")

# Download the GitHub terms of service as an example.
tos = client.get(path="https://examples.isaacus.com/github-tos.md", cast_to=str)

# Embed the terms of service.
document_response = client.embeddings.create(
model="kanon-2-embedder",
texts=tos, # You can pass a single text or a list of texts here.
task="retrieval/document",
)

# Embed our search query.
query_responses = client.embeddings.create(
model="kanon-2-embedder",
texts=[
"What are GitHub's billing policies?", # This is a relevant query.
"What are Microsoft's billing policies?", # This is an irrelevant query.
],
task="retrieval/query",
)

# Unpack the embeddings.
document_embedding = document_response.embeddings[0].embedding

query_embeddings = query_responses.embeddings
relevant_query_embedding = query_embeddings[0].embedding
irrelevant_query_embedding = query_embeddings[1].embedding

# Compute the similarity between the queries and the document.
relevant_similarity = np.dot(relevant_query_embedding, document_embedding)
irrelevant_similarity = np.dot(irrelevant_query_embedding, document_embedding)

# Log the results.
print(f"Similarity of relevant query to the document: {relevant_similarity * 100:.2f}")
print(f"Similarity of irrelevant query to the document: {irrelevant_similarity * 100:.2f}")
The output should look something like this:
Similarity of relevant query to the document: 52.87
Similarity of irrelevant query to the document: 24.86
I