Skip to main content
All Isaacus models are available for private deployment on Amazon SageMaker via the AWS Marketplace. Isaacus SageMaker deployments are fully air-gapped and run entirely within your own AWS account without any external dependencies, making them ideal for customers with heightened data sovereignty, compliance, and security requirements. Isaacus SageMaker deployments run the Isaacus SageMaker Model Server, which boasts full feature parity with the Isaacus API. The only difference is that, due to SageMaker constraints, all requests must be proxied through the /invocations endpoint, as explained in our API reference. Using the Isaacus SageMaker Python integration package, you can bootstrap the standard Isaacus SDK to work with your SageMaker-deployed models, enabling seamless integration with existing Isaacus SDK-based code. This guide walks you through how to purchase, set up, and use Isaacus models on SageMaker.

1. Prerequisites

To get started, you’ll need an AWS account with the necessary permissions (e.g., AmazonSageMakerFullAccess and AWSMarketplaceManageSubscriptions) to subscribe to AWS Marketplace products and create SageMaker resources.

2. Subscribe to an Isaacus AWS Marketplace model

Navigate to the Isaacus model you wish to deploy on the AWS Marketplace. For this example, we’ll be using Kanon 2 Embedder, available here, but these instructions apply to all Isaacus AWS Marketplace models. Click the large orange button labeled “Continue to Subscribe”. You’ll be taken to a new page presenting pricing for the subscription and applicable terms of service. Because our models are charged based on your actual usage, you will not incur any immediate costs upon subscribing. After reviewing the terms and pricing, click “Accept offer”. You’ve now successfully subscribed to an Isaacus AWS Marketplace model.

3. Deploy the model

You can deploy the model via your AWS console, AWS CLI, or an AWS SDK. To deploy via the AWS console, navigate to “Amazon SageMaker AI”, and then under the “Inference” dropdown in the left sidebar, click “Marketplace model packages”. Next, click the “AWS Marketplace subscriptions” tab and then click on the name of the Isaacus model you subscribed to earlier. Select the latest version of the model package, and then click “Create endpoint”. Give the model a name (e.g., kanon-2-embedder), select an IAM role with the necessary permissions to create SageMaker resources (you may want to create a new role for this purpose by opening the “IAM role” dropdown and then selecting “Create a new role”, which, by default, should then give you all the necessary permissions), and then click “Next”. On the next page, give the endpoint a name (e.g., kanon-2-embedder-001), which is what you’ll use to connect to it, and then create a new endpoint configuration by clicking “Create endpoint configuration” (feel free to customize the instance type as needed, however, the default settings should work fine for this guide). Finally, click “Create endpoint”. Your SageMaker endpoint may take a dozen or so minutes to fully deploy. You can monitor its status in the “Endpoints” section of the SageMaker console. If you encounter any issues, reach out to Isaacus support for assistance.

4. Install the Isaacus SDK and SageMaker integration

Now that your SageMaker endpoint is set up, install our Python SDK along with the Isaacus SageMaker Python integration, if you don’t already have them.
pip install isaacus isaacus-sagemaker

5. Embed a document

To demonstrate how you can use the Isaacus Python SDK with your SageMaker endpoint, let’s walk through embedding and retrieving a legal document. If you’d like to do something else instead, feel free to follow along with your own use case (note: only the first step of setting up the SDK to connect to your SageMaker endpoint is specific to SageMaker; the rest of the code is standard Isaacus SDK usage). First, we’ll set up the SDK to connect to your endpoint. You can do that by creating a new isaacus_sagemaker.IsaacusSageMakerRuntimeHTTPClient instance that will be passed to your isaacus.Isaacus client via its http_client parameter.
from isaacus import Isaacus
from isaacus_sagemaker import IsaacusSageMakerRuntimeHTTPClient, IsaacusSageMakerRuntimeEndpoint

endpoints = [
    IsaacusSageMakerRuntimeEndpoint(
        name="my-sagemaker-endpoint",
        # region="us-west-2", # Optional, defaults to the client or AWS SDK default region
        # profile="my-aws-profile", # Optional, defaults to the client or AWS SDK default profile
        # models=["kanon-2-embedder"], # Optional, models supported by this endpoint,
        #                              # defaults to all models
    )
]

client = Isaacus(
    http_client=IsaacusSageMakerRuntimeHTTPClient(
        endpoints=endpoints,
        # region="us-west-2", # Optional, defaults to AWS SDK default region
        # profile="my-aws-profile", # Optional, defaults to AWS SDK default profile
        # boto_session_kwargs={"aws_access_key_id": "...",}, # Optional, additional boto3 session kwargs
        # **{}, # Optional, additional httpx.Client kwargs
    )
)
Now that you’ve set up your client, you can use it exactly like you would with the standard Isaacus SDK. Next, let’s grab a legal document to embed. For this example, we’ll use GitHub’s terms of service.
import httpx

tos = httpx.get("https://examples.isaacus.com/github-tos.md").text
Right now, we’re interested in retrieving the GitHub terms of service given a search query about it. To do that, we’ll first embed the document using the .embeddings.create() method of our API client. We’ll also make sure to flag that we’re embedding a document by setting the task parameter to "retrieval/document". This will help our embedder produce an embedding that is optimized specifically for being retrieved. We recommend always setting the task parameter to either "retrieval/document" or "retrieval/query" when using our embedders for retrieval tasks, even if the text in question is not strictly a document or a query, so long as one text is being treated as something to be retrieved, and the other as something to retrieve it with. If necessary, you can also request a lower-dimensional embedding using the optional dimensions parameter, which can help speed up similarity comparisons and save on vector storage costs at the cost of some diminution in accuracy. The default (and maximum) dimensionality for Kanon 2 Embedder is 1,7921,792.
document_response = client.embeddings.create(
    model="kanon-2-embedder",
    texts=tos, # You can pass a single text or a list of texts here.
    task="retrieval/document",
)
Since we only passed a single document to the embedder, let’s unpack the first embedding result, accessible at document_response.embeddings[0], in addition to our usage statistics.
document_embedding_data = document_response.embeddings[0]
document_embedding = document_embedding_data.embedding
document_index = document_embedding_data.index

document_embedding_usage = document_response.usage
document_response.embeddings is an array of embedding results, sorted in the same order as the input texts. Each embedding result contains embedding, which is the actual embedding, and index, which is the index of the input text that the embedding corresponds to (starting from 00). document_response.usage contains statistics about the usage of resources in the process of generating the embedding. document_response.usage.input_tokens will give you the number of tokens inputted into the embedder. Now, let’s embed two search queries, one that is clearly relevant to the document and another that is clearly irrelevant. This time we set our task parameter to "retrieval/query" to indicate that we’re embedding a search query.
query_responses = client.embeddings.create(
    model="kanon-2-embedder",
    texts=[
        "What are GitHub's billing policies?", # This is a relevant query.
        "What are Microsoft's billing policies?", # This is an irrelevant query.
    ],
    task="retrieval/query",
)

query_embeddings = query_responses.embeddings
relevant_query_embedding = query_embeddings[0].embedding
irrelevant_query_embedding = query_embeddings[1].embedding
To assess the relevance of the queries to the document, we can compute the cosine similarity between their embeddings and the document embedding. Cosine similarity measures how similar two sets of numbers are (specifically, the cosine of the angle between two vectors in an inner product space). In theory, it ranges from 1-1 to 11, with 11 indicating that the vectors are identical, 00 indicating that they are orthogonal (i.e., completely dissimilar), and 1-1 indicating that they are diametrically opposed. In practice, however, it tends to range from 00 to 11 for text embeddings (since they are usually non-negative). Our embedders have been optimized such that the cosine similarity of the embeddings they produce roughly corresponds to how similar the original texts are in meaning. Unlike our universal classifiers, however, our embedders’ scores have not been calibrated to be interpreted as probabilities, only as relative measures of similarity, making them most useful for ranking search results. For the sake of convenience, our Python example uses numpy’s dot function to compute the dot product of our embeddings (which is equivalent to their cosine similarity since all our embeddings are L2-normalized) (you can run pip install numpy to install it if you don’t have it already). If you prefer, you can use another library to compute the cosine similarity of the embeddings (e.g., torch via torch.nn.functional.cosine_similarity), or you could write your own implementation (as we do for our JavaScript example).
import numpy as np # NOTE You may need to run `pip install numpy`.

relevant_similarity = np.dot(relevant_query_embedding, document_embedding)
irrelevant_similarity = np.dot(irrelevant_query_embedding, document_embedding)

print(f"Similarity of relevant query to the document: {relevant_similarity * 100:.2f}")
print(f"Similarity of irrelevant query to the document: {irrelevant_similarity * 100:.2f}")
The output should look something like this:
Similarity of relevant query to the document: 52.87
Similarity of irrelevant query to the document: 24.86
As you should see, the relevant query has a much higher similarity score to the document than the irrelevant query, indicating that our embedder has successfully captured the semantic meaning of the texts. And that’s it! You’ve just successfully embedded a legal document and queries using a SageMaker-deployed Isaacus model.