Skip to main content
Universal classification or zero-shot classification is the process of determining whether a statement expressed about a document, for example, “this is a confidentiality clause”, is supported by that document. Unlike traditional classifiers, universal classifiers don’t require you to provide any examples beforehand to classify a document. All you need to do is write out your classification criteria. In practice, universal classifiers can be used for a wide variety of information retrieval and extraction tasks beyond traditional classification tasks, from pulling out particular types of clauses from contracts all the way to scoring the relevance of search results to legal queries. Isaacus currently offers two universal classifiers: Kanon Universal Classifier and Kanon Universal Classifier Mini. Our universal classifiers are available via our universal classification endpoint as well as our reranking endpoint, where they can be used to rank legal documents by their relevance to search queries. For a complete specification of the parameters and response format of our universal classification endpoint, please refer to the API reference documentation.

Usage

To use our universal classification endpoint, all you need is a text and a statement to evaluate the text against. The API will then return a score between 00 and 11 representing the model’s estimation of the likelihood that the text supports the statement, with a score over 0.50.5 indicating a positive classification. As an example, if you wanted to pull out confidentiality clauses from a contract, you could pass our universal classification endpoint the contract alongside the statement “This is a confidentiality clause.”, as shown below (consult our quickstart guide first if you haven’t set up your Isaacus account and API client).
import httpx

from isaacus import Isaacus

# Create an Isaacus API client.
client = Isaacus(api_key="PASTE_YOUR_API_KEY_HERE")

# Download the GitHub terms of service as an example.
tos = httpx.get("https://examples.isaacus.com/github-tos.md").text

# Classify the contract.
response = client.classifications.universal.create(
    model="kanon-universal-classifier",
    query="This is a confidentiality clause.",
    texts=[tos],
)
Since we only passed the classifier a single document, let’s unpack the first classification result, accessible at response.classifications[0].
classification = response.classifications[0]
chunks = classification.chunks
score = classification.score
Because legal documents can often be quite long, the classifier will automatically break them up into chunks for you (though you can disable this by setting the chunking_options parameter to null). Typically, these chunks will correspond to individual clauses in the document. Chunks are stored in the classification.chunks attribute and come with their own text, score, start, and end attributes, with start and end representing the start and end character indices of the chunk in the original text. classification.score contains the overall classification score of the document. This score is currently set to the largest score of any chunk. You can think of the score as the classifier’s estimation of the likelihood that the query is supported by the document. A score over 0.50.5 indicates a positive classification. When passing multiple documents to the classifier, you can access each document’s classification via the response.classifications array, which is sorted from highest to lowest classification score. You can use the classification.index attribute to recover the index of documents in the original input array. Now, let’s print out the results.
from isaacus.types.classifications.universal_classification_response import Classification

def print_classification(classification: Classification) -> None:
    chunks = classification.chunks
    score = classification.score
    
    # Print the overall classification score.
    print(
        f'Likelihood of there being a confidentiality clause: {score * 100:.2f}%',
        end='\n\n',
    )

    # Filter out chunks with a score below 50%.
    chunks = [c for c in chunks if c.score > 0.5]

    # Print the chunks.
    if chunks:
        print('#' * 18, 'Snippets with a positive classification', '#' * 18, end='\n')
    
    for chunk in chunks:
        chunk_text = chunk.text
        chunk_score = chunk.score
        start, end = chunk.start, chunk.end

        # Print the chunk in the format:
        # ---------- start char = {start} | end char = {end} | score = {chunk_score * 100}% ----------
        # {chunk_text}

        print(
            '-' * 10,
            f'start char = {start:,} | end char = {end:,} | score = {chunk_score:.2%}',
            '-' * 10,
            '\n',
            chunk_text,
            end='\n\n',
        )

print_classification(classification)
The output should look something like this:
Likelihood of there being a confidentiality clause: 98.28%

---------- start char = 19,966 | end char = 22,311 | score = 98.28% ---------- 
 ### 2. Confidentiality of Private Repositories

GitHub considers the contents of private repositories to be confidential to...

Isaacus Query Language (IQL)

In the previous example, we used a simple, plain English query to classify confidentiality clauses. That worked fine for our purposes, but what if you needed to absolutely maximize the accuracy of your classifications? One way is to construct a test dataset and then repeatedly evaluate differently worded queries on that dataset until you find the best one. This is a time-consuming process, but it can yield excellent results. The good news is that, for a whole bunch of legal classification problems, we’ve already done that work for you. Each of our models comes with a set of pre-optimized queries that you can use to classify legal documents with high accuracy and minimal effort. These queries can be accessed using the Isaacus Query Language or IQL. IQL is the world’s first legal AI query language — that is, a query language designed specifically for analyzing legal documents with AI systems. Any statement you can think of, including the one we used earlier, qualifies as an IQL statement. You just need to wrap it in curly brackets like so: {This is a confidentiality clause.}. To invoke a pre-optimized query template, we can express our query in the format {IS <template name>}. You can find a list of available templates here. For example, we could’ve invoked the {IS confidentiality clause} template to classify confidentiality clauses in the GitHub terms of service instead of trying to write our own query from scratch. Let’s do that now.
response = client.classifications.universal.create(
    model="kanon-universal-classifier",
    query="{IS confidentiality clause}",
    texts=[tos],
)
classification = response.classifications[0]

print_classification(classification)
This time, our most “confidentiality clause”-like chunks have shuffled around a bit, with the top chunk now being:
---------- start char = 25,715 | end char = 27,954 | score = 79.47% ---------- 
... **Confidentiality Obligations.** You agree that any non-public Beta Preview
information we give you, such as information about a private Beta Preview, will
be considered GitHub’s confidential information (collectively, “Confidential
Information”), ...
That certainly sounds like a confidentiality clause. Our templates cover more than just confidentiality clauses, however. We’ve got templates for pulling out indemnities, force majeure clauses, termination clauses — even unilateral clauses, clauses that benefit or obligate only a single party, often a key indicator of a contract’s one-sidedness and potential for unfairness. There are also templates that allow you to plug in your own descriptions of what you’re looking for via the format {IS <template name> "<template argument>"}. For example, if you wanted to identify clauses that specifically obligate the party referred to as “you” in the document, you could use the {IS clause obligating "<party name>"} template like so: {IS clause obligating "You"}.
response = client.classifications.universal.create(
    model="kanon-universal-classifier",
    query='{IS clause obligating "You"}',
    texts=[tos],
)
classification = response.classifications[0]

print_classification(classification)
Now, our top chunk is:
---------- start char = 13,300 | end char = 15,403 | score = 78.25% ---------- 
Your use of the Website and Service must not violate any applicable laws,
including copyright or trademark laws, export control or sanctions laws, or
other laws in your jurisdiction. You are responsible for making sure that your
use of the Service is in compliance with laws and any applicable regulations.
If there isn’t a template available for what you’re looking for, you can always try one of our more general templates like {IS clause called "<clause name>"} (e.g., {IS clause called "confidentiality"}) or {IS clause that "<clause description>"} (e.g., {IS clause that "imposes a duty of confidentiality"}). We don’t yet have very many templates for non-contractual classifications, however, our models have been trained on an equal mix of contracts, cases and legislation, so you can always write your own queries for anything not covered by an existing template. In addition to allowing you to invoke query templates, IQL also enables you to string statements together using logical operators like AND, OR, and NOT, as well as the > and < comparison operators and the + operator for averaging. For example, if we wanted to identify confidentiality clauses that apply to you and you alone, we could use this query: {IS confidentiality clause} AND {IS clause obligating "You"} AND {IS unilateral clause}.
response = client.classifications.universal.create(
    model="kanon-universal-classifier",
    query='{IS confidentiality clause} AND {IS clause obligating "You"} AND {IS unilateral clause}',
    texts=[tos],
)
classification = response.classifications[0]

print_classification(classification)
That query pulls up the top chunk from before, which is indeed a unilateral confidentiality clause.