/invocations endpoint, as explained in our API reference.
Using the Isaacus SageMaker Python integration package, you can bootstrap the standard Isaacus SDK to work with your SageMaker-deployed models, enabling seamless integration with existing Isaacus SDK-based code.
This guide walks you through how to purchase, set up, and use Isaacus models on SageMaker.
1. Prerequisites
To get started, you’ll need an AWS account with the necessary permissions (e.g.,AmazonSageMakerFullAccess and AWSMarketplaceManageSubscriptions) to subscribe to AWS Marketplace products and create SageMaker resources.
2. Subscribe to an Isaacus AWS Marketplace model
Navigate to the Isaacus model you wish to deploy on the AWS Marketplace. For this example, we’ll be using Kanon 2 Embedder, available here, but these instructions apply to all Isaacus AWS Marketplace models. Click the large orange button labeled “Continue to Subscribe”. You’ll be taken to a new page presenting pricing for the subscription and applicable terms of service. Because our models are charged based on your actual usage, you will not incur any immediate costs upon subscribing. After reviewing the terms and pricing, click “Accept offer”. You’ve now successfully subscribed to an Isaacus AWS Marketplace model.3. Deploy the model
You can deploy the model via your AWS console, AWS CLI, or an AWS SDK. To deploy via the AWS console, navigate to “Amazon SageMaker AI”, and then under the “Inference” dropdown in the left sidebar, click “Marketplace model packages”. Next, click the “AWS Marketplace subscriptions” tab and then click on the name of the Isaacus model you subscribed to earlier. Select the latest version of the model package, and then click “Create endpoint”. Give the model a name (e.g.,kanon-2-embedder), select an IAM role with the necessary permissions to create SageMaker resources (you may want to create a new role for this purpose by opening the “IAM role” dropdown and then selecting “Create a new role”, which, by default, should then give you all the necessary permissions), and then click “Next”.
On the next page, give the endpoint a name (e.g., kanon-2-embedder-001), which is what you’ll use to connect to it, and then create a new endpoint configuration by clicking “Create endpoint configuration” (feel free to customize the instance type as needed, however, the default settings should work fine for this guide).
Finally, click “Create endpoint”.
Your SageMaker endpoint may take a dozen or so minutes to fully deploy. You can monitor its status in the “Endpoints” section of the SageMaker console. If you encounter any issues, reach out to Isaacus support for assistance.
4. Install the Isaacus SDK and SageMaker integration
Now that your SageMaker endpoint is set up, install our Python SDK along with the Isaacus SageMaker Python integration, if you don’t already have them.5. Embed a document
To demonstrate how you can use the Isaacus Python SDK with your SageMaker endpoint, let’s walk through embedding and retrieving a legal document. If you’d like to do something else instead, feel free to follow along with your own use case (note: only the first step of setting up the SDK to connect to your SageMaker endpoint is specific to SageMaker; the rest of the code is standard Isaacus SDK usage). First, we’ll set up the SDK to connect to your endpoint. You can do that by creating a newisaacus_sagemaker.IsaacusSageMakerRuntimeHTTPClient instance that will be passed to your isaacus.Isaacus client via its http_client parameter.
.embeddings.create() method of our API client. We’ll also make sure to flag that we’re embedding a document by setting the task parameter to "retrieval/document". This will help our embedder produce an embedding that is optimized specifically for being retrieved. We recommend always setting the task parameter to either "retrieval/document" or "retrieval/query" when using our embedders for retrieval tasks, even if the text in question is not strictly a document or a query, so long as one text is being treated as something to be retrieved, and the other as something to retrieve it with.
If necessary, you can also request a lower-dimensional embedding using the optional dimensions parameter, which can help speed up similarity comparisons and save on vector storage costs at the cost of some diminution in accuracy. The default (and maximum) dimensionality for Kanon 2 Embedder is .
document_response.embeddings[0], in addition to our usage statistics.
document_response.embeddings is an array of embedding results, sorted in the same order as the input texts. Each embedding result contains embedding, which is the actual embedding, and index, which is the index of the input text that the embedding corresponds to (starting from ).
document_response.usage contains statistics about the usage of resources in the process of generating the embedding. document_response.usage.input_tokens will give you the number of tokens inputted into the embedder.
Now, let’s embed two search queries, one that is clearly relevant to the document and another that is clearly irrelevant. This time we set our task parameter to "retrieval/query" to indicate that we’re embedding a search query.
numpy’s dot function to compute the dot product of our embeddings (which is equivalent to their cosine similarity since all our embeddings are L2-normalized) (you can run pip install numpy to install it if you don’t have it already). If you prefer, you can use another library to compute the cosine similarity of the embeddings (e.g., torch via torch.nn.functional.cosine_similarity), or you could write your own implementation (as we do for our JavaScript example).