Enrichment

Enrichment is an experimental feature that is currently only available through the Isaacus Beta Program. You can apply for access here.

Enrichment is the process of transforming unstructured documents into rich, structured, hierarchical knowledge graphs representing key sections, concepts, entities, and their relationships to one another. Enrichment is a critical first step in preparing data for use in downstream applications such as agentic search, eDiscovery, and retrieval-augmented generation (RAG) systems demanding high-quality structured inputs. Isaacus currently offers the world’s most advanced enrichment model, Kanon 2 Enricher, capable of simultaneously performing:

Hierarchical segmentation: breaking documents up into their full hierarchical structure of divisions, articles, sections, clauses, and so on.
Entity extraction, disambiguation, classification, and hierarchical linking: extracting references to key entities such as individuals, organizations, governments, locations, dates, citations, and more, and identifying which real-world entities they refer to, classifying them, and linking them to each other (for example, linking companies to their offices, subsidiaries, executives, and contact points; attributing quotations to source documents and authors; classifying citations by type and jurisdiction; etc.).
Text annotation: tagging headings, tables of contents, signatures, junk, front and back matter, entity references, cross-references, citations, definitions, and other common textual elements.

Usage

1. Setup

This guide will walk you through how to use every feature of Kanon 2 Enricher by converting Apple’s terms of service into a rich, highly structured knowledge graph and then rendering the resulting graph in a human-readable format. Since this guide covers a lot of material, including capabilities that may not be relevant to your intended use cases, we encourage you to feed this guide to an LLM and then ask it for help applying Kanon 2 Enricher to your own problems. You can always reach out to us for assistance via our support page. If you’d like to see the complete specification of the request parameters and possible output fields of our enrichment endpoint, please refer to our API reference documentation. To get started, let’s initialize an Isaacus API client and then download Apple’s terms of service. If you don’t already have an API key, you can obtain one by following the first step of our quickstart tutorial.

from isaacus import Isaacus

# Create an Isaacus API client.
client = Isaacus(
    api_key="PASTE_YOUR_API_KEY_HERE" # See https://docs.isaacus.com/quickstart
)

# Download Apple's terms of service as an example document to enrich.
doc_text = client.get("https://examples.isaacus.com/apple-tos.txt", cast_to=str)

2. Enrichment

We’ll now enrich the document by sending it to the /enrichments endpoint, which we can call through the client.enrichments.create() method of our API client. We’ll use the model (string) request parameter to specify that we want to enrich our document with kanon-2-enricher. The texts (array[string] | string) parameter is where we’ll place the document. It works with both an array of strings and a single string, depending on whether you want to enrich multiple documents or just one.

# Enrich the document with Kanon 2 Enricher.
response = client.enrichments.create(
    model="kanon-2-enricher",
    texts=doc_text,  # You can pass a single text or a list of texts as input.
)

The response has two attributes, results (array[Result]) and usage (Usage). The results attribute is an array of objects, one for each input document, all sorted in the same order as they were inputted to the model. Each object has two keys, index (integer) and document (Document). index is the index of the input document that the result corresponds to, starting from

0

. document is the document enriched into version 1.0.0 of the Isaacus Legal Graph Schema (ILGS). The usage attribute contains a single key, input_tokens (integer), specifying the number of tokens in the document plus overhead tokens. You can cross-reference this number with our pricing page (and any applicable personal discounts or credits) to calculate the cost of an enrichment. To access the enriched document, we can look up response.results[0].document since we only enriched one document and the results are in the same order as the input.

# Store the enriched document.
doc = response.results[0].document

3. Rendering

The returned document object, being an instance of an ILGS Document, has the following properties:

text (string): the original, unchanged text of the document.
title (Span | null): the span of the document’s title or null if the document has no title or its title cannot be resolved.
subtitle (Span | null): the span of the document’s subtitle or null if the document has no subtitle or its subtitle cannot be resolved.
type (enum<string>): the type of the document, being one of statute, regulation, decision, contract, or other.
jurisdiction (string | null): the jurisdiction of the document or null if the jurisdiction cannot be resolved.
segments (array[Segment]): an array of segments within the document representing structurally distinct portions of its content.
crossreferences (array[Crossreference]): an array of crossreferences within the document pointing to segments or spans of segments.
locations (array[Location]): an array of locations identified within the document.
persons (array[Person]): an array of persons identified within the document.
emails (array[Email]): an array of email addresses identified within the document belonging to persons.
websites (array[Website]): an array of websites identified within the document belonging to persons.
phone_numbers (array[PhoneNumber]): an array of phone numbers identified within the document belonging to persons.
id_numbers (array[IDNumber]): an array of identification numbers identified within the document belonging to persons.
terms (array[Term]): an array of terms assigned definite meanings within the document.
external_documents (array[ExternalDocument]): an array of external documents cited within the document.
quotes (array[Quote]): an array of quotations within the document.
dates (array[Date]): an array of dates identified within the document belonging to a supported date type.
headings (array[Span]): an array of spans within the document constituting headings.
junk (array[Span]): an array of spans within the document constituting non-operative, non-substantive ‘junk’ content such as headers, footers, page numbers, and OCR artifacts.

We will now walk through each of these properties one by one, outlining how they are accessed, traversed, and rendered.

Text

The text property contains the text of the document. It is completely identical to the text inputted to the enrichment endpoint. It is provided for the sake of convenience, allowing you to access the original text of the document without having to maintain a separate reference to it.

# Verify that the text of the enriched document is unchanged from what was inputted to the endpoint.
assert doc.text == doc_text

Title

The title property contains either a span representing the title of the document or null if the document has no title or its title cannot be resolved. A span is an object with two keys, start and end, representing a Unicode code point span of discrete text within the document. All spans anywhere in an ILGS Document are globally laminar and well-nested with respect to each other. Similar to XML, it is impossible for any two spans to partially overlap—they can either be completely disjoint or wholly contained. It is also impossible for any two spans of the same type to be duplicated (i.e., have the exact same start and end indices). Some span groups such as titles and subtitles are further incapable of nesting. Implicitly, this means that all spans form a single global hierarchy, making it trivial to traverse annotations and reason about their relationships to each other. Spans are zero-based (i.e., the first Unicode code point in the document is at index 0) and half-open (i.e., the end index is exclusive). They cannot be empty (i.e., the start index must always be less than the end index) and they can never start or end at whitespace (i.e., the start and end of a span will always land on non-whitespace characters). When using programming languages other than Python (which uses zero-based, half-open, Unicode code point-spaced string indexing), indices may need to be translated accordingly. For example, JavaScript slices into UTF-16 code units instead of Unicode code points. To render the title of the document, we can slice into the document’s text using the start and end indices of the title’s span. We have added a helper method to all Span objects in our Python SDK called decode() which, when passed the document’s text, will slice out spans for you.

# Render the title of the document if available.
import isaacus.types.ilgs.v1 as ilgs_v1

def render_spans(spans: list[ilgs_v1.span.Span]) -> str:
    span_reps = []
    
    for span in spans:
        span_text = ' '.join(span.decode(doc.text).split())
        
        if len(span_text) > 30:
            span_text = span_text[:29] + "…"
        
        span_reps.append(f'⟦{span.start}–{span.end} "{span_text}"⟧')
    
    if len(span_reps) > 3:
        span_reps = span_reps[:3] + ["…"]
    
    return "; ".join(span_reps)

if doc.title:
    print(f"Title: {render_spans([doc.title])}")

else:
    print("No title found.")

The title of our document is “Apple Media Services Terms and Conditions”, so this code should¹ output:

Title: ⟦503–544 "Apple Media Services Terms an…"⟧

Subtitle

The subtitle property contains either a span representing the subtitle of the document or null if the document has no subtitle or its subtitle cannot be resolved. A subtitle will typically only be extracted where it is clearly demarcated from the title.

# Render the subtitle of the document if available.
if doc.subtitle:
    print(f"Subtitle: {render_spans([doc.subtitle])}")

else:
    print("No subtitle found.")

Our document does not have a subtitle, so you should¹ see:

No subtitle found.

Type

The type property contains the type of the document, being one of:

statute: primary legislation such as acts, bills, codes, and constitutions.
regulation: secondary legislation such as rules, statutory instruments, and ordinances.
decision: judicial or quasi-judicial decisions such as court judgments, judicial opinions, and tribunal rulings.
contract: contracts, covenants, and agreements.
other: all other types of legal documents that do not fit into any of the predefined types.

# Print the type of the document.
print(f"Type: {doc.type.capitalize()}")

Our document is a contract, so this code should¹ output:

Type: Contract

Jurisdiction

The jurisdiction property contains the jurisdiction of the document or null if the jurisdiction cannot be resolved. Jurisdictions are composed of a country code and, where applicable, a subdivision code prefixed by a hyphen. All 249 ISO 3166-1 alpha-2 country codes are representable in addition to special INT and EU codes for international and European Union law, respectively. All 5,046 ISO 3166-2 codes are also representable in addition to a special FED code for federal law.

# Print the document's jurisdiction if available.
if doc.jurisdiction:
    print(f"Jurisdiction: {doc.jurisdiction}")

else:
    print("Jurisdiction unknown.")

Apple’s terms of service are governed by the laws of the state of California, so you should¹ see:

Jurisdiction: US-CA

Segments

The segments property contains an array of segments within the document representing structurally distinct portions of its content. A segment object has the following properties:

id (string): a unique identifier for the segment within the document in the format seg:{index} where {index} is a non-negative incrementing integer starting from zero.
kind (enum<string>): the structural kind of the segment, being one of the following:
- container: a structural or semantic grouping of content such as a chapter. containers can contain segments of any kind or none at all.
- unit: a single syntactically independent unit of text such as a paragraph. units can only contain items and figures.
- item: a syntactically subordinate unit of text such as an item in a run-in list. items can only contain other items. items are conceptually distinct from list items—it is perfectly possible to encounter list items that are syntactically independent of their surrounding items just as it is possible to encounter dependent clauses that do not appear as part of a list.
- figure: a visually structured or tabular unit of content such as a diagram, equation, or table. figures cannot contain segments.
type (enum<string> | null): the addressable type of the segment, being one of the supported types listed in our API reference documentation (e.g., chapter, section, paragraph, etc.), or null if the type is unknown or not applicable. Note that:
- The type of a segment may be defined explicitly (e.g., by headings, such as ‘Section 2. Definitions’), implicitly (e.g., by way of reference, such as ‘as defined in Section 2’), or by convention (e.g., [42] in a judgment often denotes a paragraph, independent provisions in statute are often sections, etc.).
- Although many segment types may coincide with syntactic constructs, they should be thought of purely as distinct formal citable units. Most paragraphs (in the syntactic sense) will not have the paragraph type, for example. That type is reserved for segments that would formally be cited as a ‘Paragraph’ within the document’s referential scheme.
- Certain types such as division, part are exclusive to the container kind and others such as figure and table are exclusive to the figure kind, as outlined in our API reference documentation.
category (enum<string>): the functional category of the segment, being one of the following:
- front_matter: non-operative contextualizing content occurring at the start of a document such as a preamble or recitals.
- scope: operative content defining the application or interpretation of a document such as definition sections and governing law clauses.
- main: operative, non-scopal content.
- annotation: non-operative annotative content providing explanatory or referential information such as commentary, footnotes, and endnotes.
- back_matter: non-operative contextualizing content occurring at the end of a document such as authority statements.
- other: content that does not fit into any of the other categories.
type_name (Span | null): the span within the segment defining its type name (e.g., "Section" in "Section 2 - Definitions") or null if none exists.
code (Span | null): the span within the segment defining its code (e.g., "2" in "Section 2 - Definitions") or null if none exists.
title (Span | null): the span within the segment defining its title (e.g., "Definitions" in "Section 2 - Definitions") or null if none exists.
parent (string | null): the unique identifier of the ‘parent’ segment immediately containing the segment or null if the segment has no parent (i.e., it is a root-level segment).
children (array[string]): the unique identifiers of the segment’s immediate children.
level (integer): the level of the segment within the document’s segment hierarchy starting from $0$ for root-level segments.
span (Span): the span of the segment within the document’s text.

Together, the parent, children, and level fields define the hierarchical structure of the document, with the kind, type, and category labels describing the syntactic, functional, and semantic roles of segments within the document’s hierarchy. To render the document’s segment tree, we can:

Sort segments such that parent segments always come before their children and they are in order of appearance in the document ((segment.span.start, -segment.span.end)).
Prefix each segment with indentation and branching characters (➣ for root-level segments, └─ for the last child, ├─ for all other children, and │ for each ancestor segment having at least one descendant at a given level) corresponding to its level.
Print each segment’s ID, category, kind, type, span, and, where present, type name, code, and title.

# Construct a textual representation of the document's segment tree.
segs_by_id = {seg.id: seg for seg in doc.segments}
segs = sorted(doc.segments, key=lambda s: (s.span.start, -s.span.end))
seg_to_name = {}

seg_tree = []
pipes = []

for seg in segs:
    # Create a textual representation of the segment.
    seg_rep = ""

    # Represent the segment's position in the tree.
    if seg.level == 0:
        seg_rep += "➣ "

    else:
        seg_rep += "  "

    pipes = pipes[:seg.level]
    seg_rep += "".join(("│ " if p else "  ") for p in pipes)

    is_last_child_at_level = True
    
    if seg.parent:
        parent = segs_by_id[seg.parent]
        is_last_child_at_level = (parent.children and parent.children[-1] == seg.id)
        seg_rep += "└─ " if is_last_child_at_level else "├─ "
        
    pipes.append(not is_last_child_at_level)

    # Add the segment's id.
    seg_rep += f"{seg.id}"

    # Add the segment's type name, code, and title, where present.
    seg_name = []

    if seg.type_name:
        seg_name.append(f"{seg.type_name.decode(doc.text)}")

    if seg.code:
        seg_name.append(f"{seg.code.decode(doc.text)}")

    if seg.title:
        seg_name.append(f"{seg.title.decode(doc.text)}")

    if seg_name:
        seg_name = " ".join(seg_name)
        seg_rep += f' "{seg_name}"'
        seg_to_name[seg.id] = seg_name

    # Add the segment's category.
    seg_rep += f" ({seg.category}"

    # Add the segment's kind.
    seg_rep += f"/{seg.kind}"

    # Add the segment's type.
    if seg.type:
        seg_rep += f"/{seg.type}"
    
    seg_rep += ")"

    # Add the segment's span.
    seg_rep += f'  ⟶  {render_spans([seg.span])}'

    # Add the segment representation to the tree.
    seg_tree.append(seg_rep)

# Print the textual representation of the segment tree.
print("\n".join(seg_tree))

You should¹ see something like this as output:

➣ seg:0 (front_matter/unit)  ⟶  ⟦0–499 "[These terms and conditions w…"⟧
➣ seg:1 (front_matter/unit)  ⟶  ⟦546–668 "These terms and conditions cr…"⟧
➣ seg:2 (front_matter/unit)  ⟶  ⟦672–1141 "The following terms and condi…"⟧
➣ seg:3 "TABLE OF CONTENTS" (front_matter/container/table_of_contents)  ⟶  ⟦1145–1779 "TABLE OF CONTENTS A. INTRODUC…"⟧
    ├─ seg:4 (front_matter/item)  ⟶  ⟦1164–1179 "A. INTRODUCTION"⟧
    ├─ seg:5 (front_matter/item)  ⟶  ⟦1181–1212 "B. PAYMENTS, TAXES, AND REFUN…"⟧
    ├─ seg:6 (front_matter/item)  ⟶  ⟦1214–1224 "C. ACCOUNT"⟧
    ├─ seg:7 (front_matter/item)  ⟶  ⟦1226–1236 "D. PRIVACY"⟧
...

Crossreferences

The crossreferences property contains an array of crossreferences within the document pointing to one or more segments. A crossreference object has the following properties:

start (string): the unique identifier of the earliest segment in the referenced span, with ties broken in favor of the least-nested (largest) segment.
end (string): the unique identifier of the latest segment in the referenced span, with ties broken in favor of the least-nested (largest) segment.
span (Span): the span within the document’s text where the crossreference occurs.

If a crossreference points to a single segment, start and end will be identical. We can render crossreferences by:

Decoding their span property.
Looking up the span of text targeted by the crossreference from start_segment.span.start until end_segment.span.end.

# Construct a textual representation of the document's crossreferences.
xref_reps = []

for xref in doc.crossreferences:
    xref_rep = ''
    
    # Add the crossreference's starting text.
    xref_rep += render_spans([xref.span])
    
    # Add the start and end text of the crossreference's target span.
    target_text = doc.text[segs_by_id[xref.start].span.start:segs_by_id[xref.end].span.end]
    target_text = " ".join(target_text.split()).strip()
    
    if len(target_text) > 30:
        target_text = target_text[:29] + "…"
    
    xref_rep += '  ⟶  ⟦'
    xref_rep += f"{xref.start}–{xref.end}" if xref.start != xref.end else f"{xref.start}"
    xref_rep += f' "{target_text}"⟧'
    
    # Store the crossreference's representation.
    xref_reps.append(xref_rep)

# Print the textual representation of the crossreferences.
print("\n".join(xref_reps))

The output should¹ look like this:

⟦1164–1179 "A. INTRODUCTION"⟧  ⟶  ⟦seg:24 "A. INTRODUCTION This Agreemen…"⟧
⟦1181–1212 "B. PAYMENTS, TAXES, AND REFUN…"⟧  ⟶  ⟦seg:26 "B. PAYMENTS, TAXES, AND REFUN…"⟧
⟦1214–1224 "C. ACCOUNT"⟧  ⟶  ⟦seg:30 "C. ACCOUNT Using our Services…"⟧
⟦1226–1236 "D. PRIVACY"⟧  ⟶  ⟦seg:34 "D. PRIVACY Your use of our Se…"⟧
⟦1238–1254 "E. ACCESSIBILITY"⟧  ⟶  ⟦seg:36 "E. ACCESSIBILITY To learn abo…"⟧
⟦1256–1291 "F. SERVICES AND CONTENT USAGE…"⟧  ⟶  ⟦seg:38 "F. SERVICES AND CONTENT USAGE…"⟧
...

Locations

The locations property contains an array of locations identified within the document. A location object has the following properties:

id (string): a unique identifier for the location within the document in the format loc:{index} where {index} is a non-negative incrementing integer starting from zero.
name (Span): a span within the document’s text representing the name by which the location is referred to that is the ‘most proper’. As an example, a location referred to as ‘New York City’ in two places in a document, ‘NYC’ in three places, and ‘the Big Apple’ in one place would have its name set to whichever span the model was most confident represented the proper name of the location, likely being one of the ‘New York City’ spans.
type (enum<string>): the type of the location, being one of country, state, city, address, or other.
parent (string | null): the unique identifier of the ‘parent’ location immediately containing the location or null if the location has no ancestors identified in the document.
- Locations with the address and other types can have locations of any type as their parents.
- A location with the city type cannot have a location with the address type as its parent.
- A location with the state type cannot have a location with the address or city types as its parent.
- A location with the country type cannot have a location with the address, city, or state types as its parent.
- It is impossible for a location to be its own ancestor.
children (array[string]): the unique identifiers of any child locations having the location as their immediate parent.
mentions (array[Span]): one or more spans within the document’s text where the location is mentioned.

Similar to segments, locations form a tree through the parent and children properties. Accordingly, we can render the location tree just as we rendered the segment tree.

# Construct a textual representation of locations mentioned in the document.
locs_by_id = {loc.id: loc for loc in doc.locations}
locs = [loc for loc in doc.locations if not loc.parent]

loc_tree = []
pipes = []
level = 0

while locs:
    loc = locs.pop(0)
    
    # Create a textual representation of the location.
    loc_rep = ""
    
    # Represent the location's position in the tree.
    if level == 0:
        loc_rep += "➣ "

    else:
        loc_rep += "  "
        
    pipes = pipes[: level]
    loc_rep += "".join(("│ " if p else "  ") for p in pipes)
    
    is_last_child_at_level = True

    if loc.parent:
        parent = locs_by_id[loc.parent]
        is_last_child_at_level = parent.children and parent.children[-1] == loc.id
        loc_rep += "└─ " if is_last_child_at_level else "├─ "

    pipes.append(not is_last_child_at_level)
    
    # Add the location's id.
    loc_rep += f"{loc.id}"

    # Add the location's canonical name.
    loc_rep += f' "{loc.name.decode(doc.text)}"'
    
    # Add the location's type.
    loc_rep += f" ({loc.type})"
    
    # Add mentions.
    loc_rep += f"  ⟶  {render_spans(loc.mentions)}"
    
    # Add the location's representation to the tree.
    loc_tree.append(loc_rep)
    
    # Add the location's children to the stack for representation.
    child_locs = [locs_by_id[child_id] for child_id in loc.children]
    
    if child_locs:
        locs = child_locs + locs
        level += 1
    
    else:
        while loc.parent:
            parent_loc = locs_by_id[loc.parent]
            
            if parent_loc.children[-1] == loc.id:
                level -= 1
                loc = parent_loc
                
            else:
                break
    
# Print the textual representation of the location tree.
print("\n".join(loc_tree))

The output should¹ look like this:

➣ loc:0 "Australia" (country)  ⟶  ⟦53–63 "Australian"⟧; ⟦823–833 "Australian"⟧; ⟦1089–1099 "Australian"⟧; …
    └─ loc:9 "New South Wales, Australia" (state)  ⟶  ⟦41254–41280 "New South Wales, Australia"⟧; ⟦57793–57819 "New South Wales, Australia"⟧
      └─ loc:38 "Level 3, 20 Martin Place, Sydney NSW 2000, Australia" (address)  ⟶  ⟦46594–46646 "Level 3, 20 Martin Place, Syd…"⟧
➣ loc:1 "United States" (country)  ⟶  ⟦38467–38480 "United States"⟧; ⟦38678–38683 "U.S.-"⟧; ⟦39071–39084 "United States"⟧; …
    ├─ loc:2 "California" (state)  ⟶  ⟦40174–40184 "California"⟧; ⟦56709–56719 "California"⟧
    │ ├─ loc:3 "county of Santa Clara, California" (state)  ⟶  ⟦40335–40368 "county of Santa Clara, Califo…"⟧; ⟦56870–56903 "county of Santa Clara, Califo…"⟧
    │ └─ loc:10 "One Apple Park Way, Cupertino, California" (address)  ⟶  ⟦45728–45769 "One Apple Park Way, Cupertino…"⟧; ⟦50050–50069 "Cupertino, CA 95014"⟧
    ├─ loc:11 "Puerto Rico" (state)  ⟶  ⟦45813–45824 "Puerto Rico"⟧; ⟦46140–46151 "Puerto Rico"⟧
    └─ loc:14 "2811 Ponce de Leon Boulevard, Floor 12, Coral Gables, Florida, 33134" (address)  ⟶  ⟦45974–46042 "2811 Ponce de Leon Boulevard,…"⟧
...

Persons

The persons property contains an array of legal persons identified within the document. A person object has the following properties:

id (string): a unique identifier for the person within the document in the format per:{index} where {index} is a non-negative incrementing integer starting from zero.
name (Span): a span within the document’s text representing the person’s most proper name of all the names by which it is referred to in the document.
type (enum<string>): the legal entity type of the person, being one of the following:
- natural: a human being in their capacity as a natural legal person, including when representing unincorporated entities such as partnerships and trusts.
- corporate: a body corporate such as a company, incorporated partnership, or statutory corporation.
- politic: a body politic, political entity, or part thereof such as a court, state, government, or intergovernmental organization.
role (enum<string>): the role of the person in relation to the subject of the document, being one of the supported roles listed in our API reference documentation (e.g., seller, buyer, governing_jurisdiction, plaintiff, defendant, etc.), including a special other role for persons that do not fit into any of the predefined roles.
parent (string | null): the unique identifier of the immediate legal entity that owns or controls the person (e.g., a parent company) or that the person represents if the person is identified only in their capacity as a representative (e.g., a director) of that entity or null if the person has no parent entity mentioned in the document or their parentage is unknown.
children (array[string]): the unique identifiers of any persons having the person as their immediate parent.
residence (string | null): the unique identifier of the location at which the person primarily resides or null if unknown or not mentioned.
mentions (array[Span]): one or more spans within the document’s text where the person is mentioned.

As with segments and locations, persons form a tree through the parent and children properties.

# Construct a textual representation of persons mentioned in the document.
pers_by_id = {pers.id: pers for pers in doc.persons}
pers = [pers for pers in doc.persons if not pers.parent]

per_tree = []
pipes = []
level = 0

while pers:
    per = pers.pop(0)
    
    # Create a textual representation of the person.
    per_rep = ""
    
    # Represent the person's position in the tree.
    if level == 0:
        per_rep += "➣ "

    else:
        per_rep += "  "

    pipes = pipes[: level]
    per_rep += "".join(("│ " if p else "  ") for p in pipes)
    
    is_last_child_at_level = True
    
    if per.parent:
        parent = pers_by_id[per.parent]
        is_last_child_at_level = parent.children and parent.children[-1] == per.id
        per_rep += "└─ " if is_last_child_at_level else "├─ "
    
    pipes.append(not is_last_child_at_level)
    
    # Add the person's id.
    per_rep += f"{per.id}"
    
    # Add the person's canonical name.
    per_rep += f' "{per.name.decode(doc.text)}"'
    
    # Add the person's type.
    per_rep += f" ({per.type}"
    
    # Add the person's role.
    per_rep += f"/{per.role})"
    
    # Add the person's residence.
    if per.residence:
        per_rep += f' @ {per.residence} "{locs_by_id[per.residence].name.decode(doc.text)}"'
    
    # Add mentions.
    per_rep += f"  ⟶  {render_spans(per.mentions)}"
    
    # Add the person's representation to the tree.
    per_tree.append(per_rep)
    
    # Add the person's children to the stack for representation.
    child_pers = [pers_by_id[child_id] for child_id in per.children]
    
    if child_pers:
        pers = child_pers + pers
        level += 1
    
    else:
        while per.parent:
            parent_per = pers_by_id[per.parent]
            
            if parent_per.children[-1] == per.id:
                level -= 1
                per = parent_per
                
            else:
                break

# Print the textual representation of the person tree.
print("\n".join(per_tree))

The output should¹ look like this:

➣ per:0 "Apple Inc." (corporate/employer)  ⟶  ⟦64–69 "Apple"⟧; ⟦164–169 "Apple"⟧; ⟦219–224 "Apple"⟧; …
    ├─ per:1 "Apple Distribution International Ltd." (corporate/non_party) @ loc:40 "Hollyhill Industrial Estate, Hollyhill, Cork, Republic of Ireland"  ⟶  ⟦3098–3135 "Apple Distribution Internatio…"⟧; ⟦3424–3461 "Apple Distribution Internatio…"⟧; ⟦26720–26757 "Apple Distribution Internatio…"⟧; …
    ├─ per:2 "Apple Services Pte. Ltd.," (corporate/contractor) @ loc:16 "7 Ang Mo Kio Street 64, Singapore 569086"  ⟶  ⟦3139–3164 "Apple Services Pte. Ltd.,"⟧; ⟦3465–3490 "Apple Services Pte. Ltd.,"⟧; ⟦26761–26786 "Apple Services Pte. Ltd.,"⟧; …
    ├─ per:6 "Apple Canada Inc.," (corporate/employer) @ loc:12 "120 Bremner Blvd., Suite 1600, Toronto ON M5J 0A8, Canada"  ⟶  ⟦45827–45845 "Apple Canada Inc.,"⟧
    ├─ per:7 "Apple Services LATAM LLC" (corporate/other) @ loc:14 "2811 Ponce de Leon Boulevard, Floor 12, Coral Gables, Florida, 33134"  ⟶  ⟦45937–45961 "Apple Services LATAM LLC"⟧
    ├─ per:8 "iTunes K.K.," (corporate/other) @ loc:36 "Roppongi Hills, 6-10-1 Roppongi, Minato-ku, Tokyo 106-6140, Tokyo"  ⟶  ⟦46434–46446 "iTunes K.K.,"⟧
    └─ per:9 "Apple Pty Limited" (corporate/other) @ loc:38 "Level 3, 20 Martin Place, Sydney NSW 2000, Australia"  ⟶  ⟦46545–46562 "Apple Pty Limited"⟧
...

Websites

The websites property contains an array of websites identified within the document belonging to legal persons. Websites that are not attributable to legal persons will not be extracted. A website object has the following properties:

url (string): the normalized URL of the website in the form https://{host}/.
person (string): the unique identifier of the person that the website belongs to.
mentions (array[Span]): one or more spans where the website is mentioned, including paths and slugs that are not part of the normalized website URL.

# Construct a textual representation of websites mentioned in the document.
web_reps = []

for web in doc.websites:
    web_rep = ""
    
    # Add the website's normalized URL.
    web_rep += f"{web.url}"
    
    # Add the website's linked person entity.
    web_rep += f' @ {web.person} "{pers_by_id[web.person].name.decode(doc.text)}"'
    
    # Add the website's mentions.
    web_rep += f"  ⟶  {render_spans(web.mentions)}"
    
    web_reps.append(web_rep)

# Print the textual representation of the websites.
print('\n'.join(web_reps))

The output should¹ look like this:

https://www.apple.com/ @ per:0 "Apple Inc."  ⟶  ⟦79–148 "https://www.apple.com/au/lega…"⟧; ⟦5449–5511 "https://www.apple.com/legal/i…"⟧; ⟦7775–7817 "https://www.apple.com/accessi…"⟧
https://support.apple.com/ @ per:0 "Apple Inc."  ⟶  ⟦14030–14064 "https://support.apple.com/HT2…"⟧; ⟦19070–19104 "https://support.apple.com/HT2…"⟧

Emails

The emails property contains an array of email addresses identified within the document belonging to legal persons. Email addresses that are not attributable to legal persons will not be extracted. An email object has the following properties:

address (string): the normalized email address.
person (string): the unique identifier of the person that the email address belongs to.
mentions (array[Span]): one or more spans where the email address is mentioned.

# Construct a textual representation of emails mentioned in the document.
email_reps = []

for email in doc.emails:
    email_rep = ""
    
    # Add the normalized email address.
    email_rep += email.address
    
    # Add the email's linked person entity.
    email_rep += f' @ {email.person} "{pers_by_id[email.person].name.decode(doc.text)}"'
    
    # Add the email's mentions.
    email_rep += f"  ⟶  {render_spans(email.mentions)}"
    
    email_reps.append(email_rep)

# Print the textual representation of the emails.
print('\n'.join(email_reps))

The output should¹ look like this:

[email protected] @ per:0 "Apple Inc."  ⟶  ⟦50099–50125 "[email protected]"⟧

Phone numbers

The phone_numbers property contains an array of valid phone numbers identified within the document belonging to legal persons. Phone numbers that are not valid, possible, or attributable to legal persons will not be extracted. A phone number object has the following properties:

number (string): the normalized phone number in E.123 international notation conforming with local conventions on the use of spaces and hyphens as separators.
person (string): the unique identifier of the person that the phone number belongs to.
mentions (array[Span]): one or more spans where the phone number is mentioned.

# Construct a textual representation of phone numbers mentioned in the document.
phone_reps = []

for phone in doc.phone_numbers:
    phone_rep = ""
    
    # Add the normalized phone number.
    phone_rep += phone.number
    
    # Add the phone number's linked person entity.
    phone_rep += f' @ {phone.person} "{pers_by_id[phone.person].name.decode(doc.text)}"'
    
    # Add the phone number's mentions.
    phone_rep += f"  ⟶  {render_spans(phone.mentions)}"
    
    phone_reps.append(phone_rep)

# Print the textual representation of the phone numbers.
print('\n'.join(phone_reps))

The output should¹ look like this:

+1 408-996-1010 @ per:0 "Apple Inc."  ⟶  ⟦50078–50090 "408.996.1010"⟧

ID numbers

The id_numbers property contains an array of identification numbers identified within the document belonging to legal persons. ID numbers that are not attributable to legal persons will not be extracted. An ID number object has the following properties:

number (string): the identification number.
person (string): the unique identifier of the person that the identification number belongs to.
mentions (array[Span]): one or more spans where the identification number is mentioned.

# Construct a textual representation of ID numbers mentioned in the document.
idn_reps = []

for idn in doc.id_numbers:
    idn_rep = ""

    # Add the normalized ID number.
    idn_rep += idn.number

    # Add the ID number's linked person entity.
    idn_rep += f' @ {idn.person} "{pers_by_id[idn.person].name.decode(doc.text)}"'

    # Add the ID number's mentions.
    idn_rep += f"  ⟶  {render_spans(idn.mentions)}"

    idn_reps.append(idn_rep)

# Print the textual representation of the ID numbers.
print("\n".join(idn_reps))

The output should¹ look like this:

46 002 510 054 @ per:9 "Apple Pty Limited"  ⟶  ⟦46567–46581 "46 002 510 054"⟧

Terms

The terms property contains an array of terms assigned definite meanings within the document. A term object has the following properties:

id (string): a unique identifier for the term within the document in the format term:{index} where {index} is a non-negative incrementing integer starting from zero.
name (Span): the span within the document defining the term’s name. For example, in the phrase ‘“Agreement” means this contract between the parties’, the term’s name would be the span covering ‘Agreement’. Term names are different from and will never overlap with mentions of the term elsewhere in the document.
meaning (Span): the span within the document providing the term’s meaning. For example, in the phrase ‘“Agreement” means this contract between the parties’, the term’s meaning would be the span covering ‘this contract between the parties’.
mentions (array[Span]): spans where the term is mentioned outside of its definition. It is possible for the term to have no mentions if, outside of its definition, it is never referred to in the document.

# Construct a textual representation of terms mentioned in the document.
term_reps = []

for term in doc.terms:
    term_rep = ""
    
    # Add the term's ID.
    term_rep += f"{term.id}"
    
    # Add the term's defined name.
    term_rep += f' "{term.name.decode(doc.text)}"'
    
    # Add the term's meaning.
    term_rep += f' ≝ "{term.meaning.decode(doc.text)}"'
    
    # Add the term's mentions.    
    if term.mentions:
        term_rep += f"  ⟶  {render_spans(term.mentions)}"
    
    term_reps.append(term_rep)

# Print the textual representation of the terms.
print('\n'.join(term_reps))

The output should¹ look like this:

term:0 "Services" ≝ "Apple's services"  ⟶  ⟦2049–2057 "Services"⟧; ⟦2097–2105 "Services"⟧; ⟦2346–2354 "Services"⟧; …
term:1 "Home Country" ≝ "your country or territory of residence"  ⟶  ⟦2556–2568 "Home Country"⟧; ⟦6757–6769 "Home Country"⟧; ⟦15796–15808 "Home Country"⟧; …
term:2 "Usage Rules" ≝ "the rules set forth in this section, all other applicable Apple terms, policies, and guidelines, and all applicable laws"  ⟶  ⟦8222–8233 "Usage Rules"⟧; ⟦8239–8250 "Usage Rules"⟧; ⟦8547–8558 "Usage Rules"⟧; …
term:3 "Apps" ≝ "apps and App Clips for any Apple platform and/or operating system, including any in-app purchases, extensions (such as keyboards), stickers, and Subscriptions made available in such apps or App Clips"  ⟶  ⟦13793–13797 "Apps"⟧; ⟦16284–16288 "Apps"⟧; ⟦25455–25459 "Apps"⟧; …
term:4 "App Provider" ≝ "a third party developer"  ⟶  ⟦3555–3567 "App Provider"⟧; ⟦26552–26565 "App Providers"⟧; ⟦26672–26684 "App Provider"⟧; …
term:5 "Standard EULA" ≝ "Licensed Application End User License Agreement"  ⟶  ⟦27612–27625 "Standard EULA"⟧; ⟦29790–29837 "Licensed Application End User…"⟧; ⟦29840–29853 "Standard EULA"⟧; …
term:6 "Licensor" ≝ "The App Provider or Apple as applicable"  ⟶  ⟦30479–30487 "Licensor"⟧; ⟦30824–30832 "Licensor"⟧; ⟦31912–31920 "Licensor"⟧; …

External documents

The external_documents property contains an array of external documents referenced within the document. An external document object has the following properties:

id (string): a unique identifier for the external document within the document in the format exd:{index} where {index} is a non-negative incrementing integer starting from zero.
name (Span): a span representing the most proper name of the external document of all the names by which it is referred to in the document.
type (enum<string>): the type of the external document, being one of the supported document types (statute, regulation, decision, contract, or other).
jurisdiction (string | null): the jurisdiction of the external document formatted the same way as the document’s jurisdiction or null if unknown.
reception (enum<string>): the sentiment of the document towards the external document, being one of the following:
- positive: indicating that the document expresses a favorable view of the external document whether by endorsing or approving it.
- mixed: indicating that the document expresses both favorable and unfavorable views of the external document, for example, by affirming parts of it and disapproving others.
- negative: indicating that the document expresses an unfavorable view of the external document whether by criticizing, repealing, overruling, or explicitly contradicting it.
- neutral: indicating that the document references the external document without expressing any particular sentiment towards it.
mentions (array[Span]): spans where the external document is mentioned by name, for example, ‘the US Constitution’ in ‘the Second Amendment to the US Constitution protects freedom of speech’.
pinpoints (array[Span]): spans where specific parts of the external document are referenced, for example, ‘Section 2’ in ‘as defined in Section 2 of the Crimes Act’.

# Construct a textual representation of external documents mentioned in the document.
exds_by_id = {exd.id: exd for exd in doc.external_documents}
exd_reps = []

for exd in doc.external_documents:
    exd_rep = ""
    
    # Add the external document's ID.
    exd_rep += f"{exd.id}"
    
    # Add the external document's canonical name.
    exd_rep += f' "{exd.name.decode(doc.text)}"'
    
    # Add the external document's jurisdiction, being formatted the same as the document's jurisdiction.
    if exd.jurisdiction:
        exd_rep += f' ({exd.jurisdiction}/'
    
    else:
        exd_rep += " ("
    
    # Add the external document's type.
    exd_rep += f"{exd.type}"
    
    # Add the reception of this document to the external document.
    exd_rep += f'/{exd.reception})'
    
    # Add the external document's mentions.
    exd_rep += f"  ⟶  {render_spans(exd.mentions)}"
    
    # Add pinpoint references to the external document.    
    if exd.pinpoints:
        exd_rep += f"  ⟶  {render_spans(exd.pinpoints)}"
    
    # Store the external document's representation.
    exd_reps.append(exd_rep)

# Print the textual representation of the external documents.
print('\n'.join(exd_reps))

The output should¹ look like this:

exd:0 "Competition and Consumer Act 2010 (Cth)" (AU-FED/statute/neutral)  ⟶  ⟦762–801 "Competition and Consumer Act …"⟧; ⟦1028–1067 "Competition and Consumer Act …"⟧; ⟦35411–35462 "AUSTRALIAN COMPETITION AND CO…"⟧; …
exd:1 "https://support.apple.com/HT204030" (US-FED/other/neutral)  ⟶  ⟦4084–4118 "https://support.apple.com/HT2…"⟧
exd:2 "https://www.apple.com/legal/internet-services/itunes/giftcards" (US-FED/other/neutral)  ⟶  ⟦5449–5511 "https://www.apple.com/legal/i…"⟧
exd:3 "http://support.apple.com/HT201359" (US-FED/other/neutral)  ⟶  ⟦5578–5611 "http://support.apple.com/HT20…"⟧
exd:4 "http://support.apple.com/HT212361" (AU-FED/other/neutral)  ⟶  ⟦7469–7502 "http://support.apple.com/HT21…"⟧
exd:5 "https://www.apple.com/legal/privacy" (US-IL/other/neutral)  ⟶  ⟦7604–7639 "https://www.apple.com/legal/p…"⟧
exd:6 "https://support.apple.com/HT210074" (US-FED/other/neutral)  ⟶  ⟦14030–14064 "https://support.apple.com/HT2…"⟧
exd:7 "https://support.apple.com/HT202039" (US-FED/other/neutral)  ⟶  ⟦16545–16579 "https://support.apple.com/HT2…"⟧
exd:8 "https://support.apple.com/HT201060" (other/neutral)  ⟶  ⟦22737–22771 "https://support.apple.com/HT2…"⟧
exd:9 "https://support.apple.com/HT201304" (US-FED/other/neutral)  ⟶  ⟦28619–28653 "https://support.apple.com/HT2…"⟧
exd:10 "48 C.F.R. §2.101" (US-FED/regulation/neutral)  ⟶  ⟦39362–39378 "48 C.F.R. §2.101"⟧
exd:11 "48 C.F.R. §12.212" (US-FED/regulation/neutral)  ⟶  ⟦39501–39518 "48 C.F.R. §12.212"⟧; ⟦39522–39541 "48 C.F.R. §227.7202"⟧; ⟦39574–39591 "48 C.F.R. §12.212"⟧; …
exd:12 "United Nations Convention on the International Sale of Goods" (INT/other/negative)  ⟶  ⟦41367–41427 "United Nations Convention on …"⟧; ⟦57906–57966 "United Nations Convention on …"⟧

Quotes

The quotes property contains an array of quotations within the document. A quote object has the following properties:

source_segment (string | null): the unique identifier of the segment that is the source of the quote or null if the quote is not from a segment within the document.
source_document (string | null): the unique identifier of the external document that is the source of the quote or null if the quote is not from an external document.
source_person (string | null): the unique identifier of the person that is the source of the quote or null if not attributable to a person.
amending (boolean): whether the quote is being used to amend or modify content, typically in other documents.
span (Span): the span where the quote occurs.

It is not possible for a quote to be simultaneously from a segment and an external document.

# Construct a textual representation of quotes mentioned in the document.
quote_reps = []

for quote in doc.quotes:
    quote_rep = ""

    # Add the source segment, document, or person of the quote, if any.
    if quote.source_segment or quote.source_document or quote.source_person:
        if quote.source_person:
            quote_rep += quote.source_person
            quote_rep += f' "{pers_by_id[quote.source_person].name.decode(doc.text)}"'
        
            if quote.source_segment or quote.source_document:
                quote_rep += " @ "
        
        if quote.source_segment:
            quote_rep += quote.source_segment
            
            if quote.source_segment in seg_to_name:
                quote_rep += f' "{seg_to_name[quote.source_segment]}"'
        
        elif quote.source_document:
            quote_rep += quote.source_document
            quote_rep += f' "{exds_by_id[quote.source_document].name.decode(doc.text)}"'
        
        quote_rep += "  ⟶  "
    
    # Add the quote's span.
    quote_rep += render_spans([quote.span])
    
    # If the quote is amending another document, note that.
    if quote.amending:
        quote_rep += " (amending)"
    
    quote_reps.append(quote_rep)

# Print the textual representation of the quotes.
print('\n'.join(quote_reps))

The output should¹ look like this:

⟦177–407 ""[t]erms and conditions are a…"⟧

Dates

The dates property contains an array of dates identified within the document belonging to a supported date type. Dates not belonging to a supported date type will not be extracted. A date object has the following properties:

value (string): the date in ISO 8601 format (YYYY-MM-DD).
type (enum<string>): the type of the date, being one of the following:
- creation: the date the document was created or last updated. There may only be one creation date per document.
- signature: the date the document was signed.
- effective: the date when the document or a part thereof comes into effect (e.g., commencement or enactment dates).
- expiry: the date when the document or a part thereof is no longer in effect.
- delivery: the date when goods or services are to be delivered under the document.
- renewal: the date when one or more of the document’s terms are to be renewed.
- payment: the date when payment is to be made under the document.
- birth: the birth date of a natural person or establishment (e.g., incorporation) date of a non-natural legal person identified in the document. There can only be one birth date linked to a single person and all birth dates must be linked to a person. A person’s birth date will never be after their death date.
- death: the death date of a natural person or dissolution date of a non-natural legal person identified in the document. There can only be one death date linked to a single person and all death dates must be linked to a person. A person’s death date will never be before their birth date.
person (string | null): the unique identifier of the legal person associated with the date or null if the date is not associated with a person.
mentions (array[Span]): one or more spans where the date is mentioned.

from datetime import datetime as dt

# Construct a textual representation of dates mentioned in the document.
date_reps = []

for date in doc.dates:
    date_rep = ""
    
    # Add the date's normalized value.
    date_rep += dt.fromisoformat(date.value).strftime(r"%-d %B %Y")
    
    # Add the date's type.
    date_rep += f" ({date.type})"
    
    # Add the date's linked person entity, if any.
    if date.person:
        date_rep += f' @ {date.person} "{pers_by_id[date.person].name.decode(doc.text)}"'
    
    # Add the date's mentions.
    date_rep += f"  ⟶  {render_spans(date.mentions)}"
    
    date_reps.append(date_rep)

# Print the textual representation of the dates.
print('\n'.join(date_reps))

The output should¹ look like this:

15 September 2025 (creation)  ⟶  ⟦60447–60465 "September 15, 2025"⟧

Headings

The headings property contains an array of spans within the document constituting headings.

# Print a textual representation of the document's headings.
print("\n".join([render_spans([s]) for s in doc.headings]))

The output should¹ look like this:

⟦503–544 "Apple Media Services Terms an…"⟧
⟦1145–1162 "TABLE OF CONTENTS"⟧
⟦1783–1798 "A. INTRODUCTION"⟧
⟦2783–2814 "B. PAYMENTS, TAXES, AND REFUN…"⟧
⟦6141–6151 "C. ACCOUNT"⟧
⟦7507–7517 "D. PRIVACY"⟧
⟦7644–7660 "E. ACCESSIBILITY"⟧
⟦7822–7857 "F. SERVICES AND CONTENT USAGE…"⟧
⟦14069–14110 "G. TERMINATION AND SUSPENSION…"⟧

Junk

The junk property contains an array of spans within the document constituting non-operative, non-substantive ‘junk’ content such as headers, footers, page numbers, and OCR artifacts.

# Print a textual representation of the document's junk content.
print('\n'.join([render_spans([s]) for s in doc.junk]))

The output should¹ look like this:

⟦60469–60537 "Privacy Policy | Terms of Use…"⟧

¹ The outputs shown in this documentation may differ slightly from your experience as we continue to improve our models and enrichment capabilities, particularly in relation to models made available through the Isaacus Beta Program.

Get started

Capabilities

Models

Integrations

Pricing

IQL

Usage

1. Setup

2. Enrichment

3. Rendering

Text

Title

Subtitle

Type

Jurisdiction

Segments

Crossreferences

Locations

Persons

Websites

Emails

Phone numbers

ID numbers

Terms

External documents

Quotes

Dates

Headings

Junk

Get started

Capabilities

Models

Integrations

Pricing

IQL

​Usage

​1. Setup

​2. Enrichment

​3. Rendering

​Text

​Title

​Subtitle

​Type

​Jurisdiction

​Segments

​Crossreferences

​Locations

​Persons

​Websites

​Emails

​Phone numbers

​ID numbers

​Terms

​External documents

​Quotes

​Dates

​Headings

​Junk

Usage

1. Setup

2. Enrichment

3. Rendering

Text

Title

Subtitle

Type

Jurisdiction

Segments

Crossreferences

Locations

Persons

Websites

Emails

Phone numbers

ID numbers

Terms

External documents

Quotes

Dates

Headings

Junk