Isaacus charges API calls based on the number of tokens that actually get inputted into a model, not necessarily the number of tokens inputted into an endpoint. This page explains what that distinction means in practice.

If you’re looking for the actual prices of our models, please see our pricing page instead.

Boilerplate tokens

The first difference between the number of tokens inputted into an API endpoint and the number of tokens inputted into a model is that boilerplate tokens can be added to inputs after they are received by the API endpoint.

Boilerplate tokens are typically, but not always, used to structure inputs into whatever format that the model expects.

The table below shows the number of boilerplate tokens that are added to inputs for each of our models, alongside a description of what those tokens are used for.

ModelNumber of boilerplate tokensDescription
Kanon Universal Classifier33Statements are formatted alongside input texts in the format <|startoftext|>{statement}<|endoftext|>{text}<|endoftext|>.
Kanon Universal Classifier Mini33Statements are formatted alongside input texts in the format <|startoftext|>{statement}<|endoftext|>{text}<|endoftext|>.

Chunking

When an input is received that is longer than the maximum input length that a model can process in a single go, we will, unless chunking is disabled, automatically split that input up into smaller chunks and will then process each chunk separately.

We use our own semchunk algorithm to chunk inputs in such a way that they are unlikely to cut off in the middle of an important sentence or paragraph.

Although semchunk is a deterministic algorithm, it can still be difficult to predict exactly how many chunks will be created for any given input, due to the fact that the algorithm is designed to create chunks that are semantically meaningful as possible rather than to create chunks of a fixed size.

If you take advantage of your ability to customize chunk sizes, generally speaking, you should expect the number of chunks created to be greater than whatever you would get using the default chunk size.

The default chunk size is the maximum input length of the model less overhead, which includes not only boilerplate tokens but also, if a model that takes a query as input is being used, the number of tokens in the longest statement in that query.

For every chunk that is created, the number of tokens inputted into a model will increase by the number of tokens in that chunk in addition to the number of boilerplate tokens that are added to the input.

Additionally, if the chunks are being passed to a model that also takes a statement as input, such as a universal classifier, that statement will have to be added to each chunk, which will therefore increase the number of tokens inputted into the model by the number of tokens in the statement multiplied by the number of chunks.

Finally, the use of a chunk overlap ratio will also increase the number of tokens and oftentimes the number of chunks being inputted to a model.

Isaacus Query Language (IQL)

When using the Isaacus Query Language (IQL), the number of tokens inputted into a model is multiplied by the number of statements in your query. This is simply a consequence of the fact that each statement in your query has to be evaluated separately, with the results of each statement being combined to form the final output.

If your query has only a single statement, the number of tokens inputted into a model will be no different to passing that statement with IQL disabled.

Templates

When you invoke an IQL template, that template will later be transformed into a model-specific, Isaacus-optimized query. That query will then be passed to the model and you will be charged for the number of tokens in that query.

It is possible for a template’s underlying queries to contain multiple statements, in which case, as mentioned above, the number of tokens inputted to a model will multiply by however many statements there are in a query. Currently, however, all of our templates use only a single statement.

You can find out how many tokens and statements (if we ever decide to create multi-statement templates) are in a template’s query for a particular model by checking out our template documentation.

The number of tokens and statements in a template’s queries can change at any time without notice (because the very nature of templates is that they are designed to be improved over time), so it is recommended that you check the template documentation for the most up-to-date information.

Approximating costs

The following Python function can be used to approximate the number of tokens that will be inputted into a model typically within a margin of a couple dozen tokens though absolutely no warranties or guarantees are made as to its reliability.

def approximate_number_of_input_tokens_for_input(
    number_of_tokens_in_text: int,
    chunk_size: int,
    chunk_overlap_ratio: float,
    number_of_boilerplate_tokens: int,
    # Statement-specific parameters
    number_of_tokens_in_longest_statement: int | None = None,
    average_number_of_tokens_in_statements: int | None = None,
    number_of_statements: int | None = None,
) -> int:
    if (
        len(
            {
                number_of_tokens_in_longest_statement,
                average_number_of_tokens_in_statements,
                number_of_statements,
            }
        )
        != 1
    ):
        raise ValueError(
            "You can either provide all of the statement-specific parameters or none of them."
        )

    elif number_of_tokens_in_longest_statement is None:
        number_of_tokens_in_longest_statement = 0
        average_number_of_tokens_in_statements = 0
        number_of_statements = 1

    effective_chunk_size = (
        chunk_size
        - number_of_boilerplate_tokens
        - number_of_tokens_in_longest_statement
    )
    number_of_chunks = math.ceil(number_of_tokens_in_text / effective_chunk_size) * (
        1 + chunk_overlap_ratio
    )
    approximate_number_of_input_tokens_for_input = (
        (
            (number_of_tokens_in_text / number_of_chunks)
            + number_of_boilerplate_tokens
            + average_number_of_tokens_in_statements
        )
        * number_of_statements
        * number_of_chunks
    )

    return approximate_number_of_input_tokens_for_input