Enrichment is an experimental feature that is currently only available through the Isaacus Beta Program. You can apply for access here.
- Hierarchical segmentation: breaking documents up into their full hierarchical structure of divisions, articles, sections, clauses, and so on.
- Entity extraction, disambiguation, classification, and hierarchical linking: extracting references to key entities such as individuals, organizations, governments, locations, dates, citations, and more, and identifying which real-world entities they refer to, classifying them, and linking them to each other (for example, linking companies to their offices, subsidiaries, executives, and contact points; attributing quotations to source documents and authors; classifying citations by type and jurisdiction; etc.).
- Text annotation: tagging headings, tables of contents, signatures, junk, front and back matter, entity references, cross-references, citations, definitions, and other common textual elements.
Usage
1. Setup
This guide will walk you through how to use every feature of Kanon 2 Enricher by converting Apple’s terms of service into a rich, highly structured knowledge graph and then rendering the resulting graph in a human-readable format. Since this guide covers a lot of material, including capabilities that may not be relevant to your intended use cases, we encourage you to feed this guide to an LLM and then ask it for help applying Kanon 2 Enricher to your own problems. You can always reach out to us for assistance via our support page. If you’d like to see the complete specification of the request parameters and possible output fields of our enrichment endpoint, please refer to our API reference documentation. To get started, let’s initialize an Isaacus API client and then download Apple’s terms of service. If you don’t already have an API key, you can obtain one by following the first step of our quickstart tutorial.2. Enrichment
We’ll now enrich the document by sending it to the/enrichments endpoint, which we can call through the client.enrichments.create() method of our API client.
We’ll use the model (string) request parameter to specify that we want to enrich our document with kanon-2-enricher.
The texts (array[string] | string) parameter is where we’ll place the document. It works with both an array of strings and a single string, depending on whether you want to enrich multiple documents or just one.
results (array[Result]) and usage (Usage).
The results attribute is an array of objects, one for each input document, all sorted in the same order as they were inputted to the model. Each object has two keys, index (integer) and
document (Document). index is the index of the input document that the result corresponds to, starting from . document is the document enriched into version 1.0.0 of the Isaacus Legal Graph Schema (ILGS).
The usage attribute contains a single key, input_tokens (integer), specifying the number of tokens in the document plus overhead tokens. You can cross-reference this number with our pricing page (and any applicable personal discounts or credits) to calculate the cost of an enrichment.
To access the enriched document, we can look up response.results[0].document since we only enriched one document and the results are in the same order as the input.
3. Rendering
The returned document object, being an instance of an ILGS Document, has the following properties:text (string): the original, unchanged text of the document.title (Span | null): the span of the document’s title ornullif the document has no title or its title cannot be resolved.subtitle (Span | null): the span of the document’s subtitle ornullif the document has no subtitle or its subtitle cannot be resolved.type (enum<string>): the type of the document, being one ofstatute,regulation,decision,contract, orother.jurisdiction (string | null): the jurisdiction of the document ornullif the jurisdiction cannot be resolved.segments (array[Segment]): an array of segments within the document representing structurally distinct portions of its content.crossreferences (array[Crossreference]): an array of crossreferences within the document pointing to segments or spans of segments.locations (array[Location]): an array of locations identified within the document.persons (array[Person]): an array of persons identified within the document.emails (array[Email]): an array of email addresses identified within the document belonging to persons.websites (array[Website]): an array of websites identified within the document belonging to persons.phone_numbers (array[PhoneNumber]): an array of phone numbers identified within the document belonging to persons.id_numbers (array[IDNumber]): an array of identification numbers identified within the document belonging to persons.terms (array[Term]): an array of terms assigned definite meanings within the document.external_documents (array[ExternalDocument]): an array of external documents cited within the document.quotes (array[Quote]): an array of quotations within the document.dates (array[Date]): an array of dates identified within the document belonging to a supported date type.headings (array[Span]): an array of spans within the document constituting headings.junk (array[Span]): an array of spans within the document constituting non-operative, non-substantive ‘junk’ content such as headers, footers, page numbers, and OCR artifacts.
Text
Thetext property contains the text of the document. It is completely identical to the text inputted to the enrichment endpoint. It is provided for the sake of convenience, allowing you to access the original text of the document without having to maintain a separate reference to it.
Title
Thetitle property contains either a span representing the title of the document or null if the document has no title or its title cannot be resolved.
A span is an object with two keys, start and end, representing a Unicode code point span of discrete text within the document.
All spans anywhere in an ILGS Document are globally laminar and well-nested with respect to each other. Similar to XML, it is impossible for any two spans to partially overlap—they can either be completely disjoint or wholly contained. It is also impossible for any two spans of the same type to be duplicated (i.e., have the exact same start and end indices). Some span groups such as titles and subtitles are further incapable of nesting. Implicitly, this means that all spans form a single global hierarchy, making it trivial to traverse annotations and reason about their relationships to each other.
Spans are zero-based (i.e., the first Unicode code point in the document is at index 0) and half-open (i.e., the end index is exclusive). They cannot be empty (i.e., the start index must always be less than the end index) and they can never start or end at whitespace (i.e., the start and end of a span will always land on non-whitespace characters).
When using programming languages other than Python (which uses zero-based, half-open, Unicode code point-spaced string indexing), indices may need to be translated accordingly. For example, JavaScript slices into UTF-16 code units instead of Unicode code points.
To render the title of the document, we can slice into the document’s text using the start and end indices of the title’s span. We have added a helper method to all Span objects in our Python SDK called decode() which, when passed the document’s text, will slice out spans for you.
Subtitle
Thesubtitle property contains either a span representing the subtitle of the document or null if the document has no subtitle or its subtitle cannot be resolved.
A subtitle will typically only be extracted where it is clearly demarcated from the title.
Type
Thetype property contains the type of the document, being one of:
statute: primary legislation such as acts, bills, codes, and constitutions.regulation: secondary legislation such as rules, statutory instruments, and ordinances.decision: judicial or quasi-judicial decisions such as court judgments, judicial opinions, and tribunal rulings.contract: contracts, covenants, and agreements.other: all other types of legal documents that do not fit into any of the predefined types.
Jurisdiction
Thejurisdiction property contains the jurisdiction of the document or null if the jurisdiction cannot be resolved.
Jurisdictions are composed of a country code and, where applicable, a subdivision code prefixed by a hyphen.
All 249 ISO 3166-1 alpha-2 country codes are representable in addition to special INT and EU codes for international and European Union law, respectively.
All 5,046 ISO 3166-2 codes are also representable in addition to a special FED code for federal law.
Segments
Thesegments property contains an array of segments within the document representing structurally distinct portions of its content.
A segment object has the following properties:
id (string): a unique identifier for the segment within the document in the formatseg:{index}where{index}is a non-negative incrementing integer starting from zero.kind (enum<string>): the structural kind of the segment, being one of the following:container: a structural or semantic grouping of content such as a chapter.containers can contain segments of any kind or none at all.unit: a single syntactically independent unit of text such as a paragraph.units can only containitems andfigures.item: a syntactically subordinate unit of text such as an item in a run-in list.items can only contain otheritems.items are conceptually distinct from list items—it is perfectly possible to encounter list items that are syntactically independent of their surrounding items just as it is possible to encounter dependent clauses that do not appear as part of a list.figure: a visually structured or tabular unit of content such as a diagram, equation, or table.figures cannot contain segments.
type (enum<string> | null): the addressable type of the segment, being one of the supported types listed in our API reference documentation (e.g.,chapter,section,paragraph, etc.), ornullif the type is unknown or not applicable. Note that:- The type of a segment may be defined explicitly (e.g., by headings, such as ‘Section 2. Definitions’), implicitly (e.g., by way of reference, such as ‘as defined in Section 2’), or by convention (e.g., [42] in a judgment often denotes a
paragraph, independent provisions in statute are oftensections, etc.). - Although many segment types may coincide with syntactic constructs, they should be thought of purely as distinct formal citable units. Most paragraphs (in the syntactic sense) will not have the
paragraphtype, for example. That type is reserved for segments that would formally be cited as a ‘Paragraph’ within the document’s referential scheme. - Certain types such as
division,partare exclusive to thecontainerkind and others such asfigureandtableare exclusive to thefigurekind, as outlined in our API reference documentation.
- The type of a segment may be defined explicitly (e.g., by headings, such as ‘Section 2. Definitions’), implicitly (e.g., by way of reference, such as ‘as defined in Section 2’), or by convention (e.g., [42] in a judgment often denotes a
category (enum<string>): the functional category of the segment, being one of the following:front_matter: non-operative contextualizing content occurring at the start of a document such as a preamble or recitals.scope: operative content defining the application or interpretation of a document such as definition sections and governing law clauses.main: operative, non-scopal content.annotation: non-operative annotative content providing explanatory or referential information such as commentary, footnotes, and endnotes.back_matter: non-operative contextualizing content occurring at the end of a document such as authority statements.other: content that does not fit into any of the other categories.
type_name (Span | null): the span within the segment defining its type name (e.g.,"Section"in"Section 2 - Definitions") ornullif none exists.code (Span | null): the span within the segment defining its code (e.g.,"2"in"Section 2 - Definitions") ornullif none exists.title (Span | null): the span within the segment defining its title (e.g.,"Definitions"in"Section 2 - Definitions") ornullif none exists.parent (string | null): the unique identifier of the ‘parent’ segment immediately containing the segment ornullif the segment has no parent (i.e., it is a root-level segment).children (array[string]): the unique identifiers of the segment’s immediate children.level (integer): the level of the segment within the document’s segment hierarchy starting from for root-level segments.span (Span): the span of the segment within the document’s text.
parent, children, and level fields define the hierarchical structure of the document, with the kind, type, and category labels describing the syntactic, functional, and semantic roles of segments within the document’s hierarchy.
To render the document’s segment tree, we can:
- Sort segments such that parent segments always come before their children and they are in order of appearance in the document (
(segment.span.start, -segment.span.end)). - Prefix each segment with indentation and branching characters (
➣for root-level segments,└─for the last child,├─for all other children, and│for each ancestor segment having at least one descendant at a given level) corresponding to itslevel. - Print each segment’s ID, category, kind, type, span, and, where present, type name, code, and title.
Crossreferences
Thecrossreferences property contains an array of crossreferences within the document pointing to one or more segments.
A crossreference object has the following properties:
start (string): the unique identifier of the earliest segment in the referenced span, with ties broken in favor of the least-nested (largest) segment.end (string): the unique identifier of the latest segment in the referenced span, with ties broken in favor of the least-nested (largest) segment.span (Span): the span within the document’s text where the crossreference occurs.
start and end will be identical.
We can render crossreferences by:
- Decoding their
spanproperty. - Looking up the span of text targeted by the crossreference from
start_segment.span.startuntilend_segment.span.end.
Locations
Thelocations property contains an array of locations identified within the document.
A location object has the following properties:
id (string): a unique identifier for the location within the document in the formatloc:{index}where{index}is a non-negative incrementing integer starting from zero.name (Span): a span within the document’s text representing the name by which the location is referred to that is the ‘most proper’. As an example, a location referred to as ‘New York City’ in two places in a document, ‘NYC’ in three places, and ‘the Big Apple’ in one place would have itsnameset to whichever span the model was most confident represented the proper name of the location, likely being one of the ‘New York City’ spans.type (enum<string>): the type of the location, being one ofcountry,state,city,address, orother.parent (string | null): the unique identifier of the ‘parent’ location immediately containing the location ornullif the location has no ancestors identified in the document.- Locations with the
addressandothertypes can have locations of any type as their parents. - A location with the
citytype cannot have a location with theaddresstype as its parent. - A location with the
statetype cannot have a location with theaddressorcitytypes as its parent. - A location with the
countrytype cannot have a location with theaddress,city, orstatetypes as its parent. - It is impossible for a location to be its own ancestor.
- Locations with the
children (array[string]): the unique identifiers of any child locations having the location as their immediate parent.mentions (array[Span]): one or more spans within the document’s text where the location is mentioned.
parent and children properties. Accordingly, we can render the location tree just as we rendered the segment tree.
Persons
Thepersons property contains an array of legal persons identified within the document.
A person object has the following properties:
id (string): a unique identifier for the person within the document in the formatper:{index}where{index}is a non-negative incrementing integer starting from zero.name (Span): a span within the document’s text representing the person’s most proper name of all the names by which it is referred to in the document.type (enum<string>): the legal entity type of the person, being one of the following:natural: a human being in their capacity as a natural legal person, including when representing unincorporated entities such as partnerships and trusts.corporate: a body corporate such as a company, incorporated partnership, or statutory corporation.politic: a body politic, political entity, or part thereof such as a court, state, government, or intergovernmental organization.
role (enum<string>): the role of the person in relation to the subject of the document, being one of the supported roles listed in our API reference documentation (e.g.,seller,buyer,governing_jurisdiction,plaintiff,defendant, etc.), including a specialotherrole for persons that do not fit into any of the predefined roles.parent (string | null): the unique identifier of the immediate legal entity that owns or controls the person (e.g., a parent company) or that the person represents if the person is identified only in their capacity as a representative (e.g., a director) of that entity ornullif the person has no parent entity mentioned in the document or their parentage is unknown.children (array[string]): the unique identifiers of any persons having the person as their immediate parent.residence (string | null): the unique identifier of the location at which the person primarily resides ornullif unknown or not mentioned.mentions (array[Span]): one or more spans within the document’s text where the person is mentioned.
parent and children properties.
Websites
Thewebsites property contains an array of websites identified within the document belonging to legal persons. Websites that are not attributable to legal persons will not be extracted.
A website object has the following properties:
url (string): the normalized URL of the website in the formhttps://{host}/.person (string): the unique identifier of the person that the website belongs to.mentions (array[Span]): one or more spans where the website is mentioned, including paths and slugs that are not part of the normalized website URL.
Emails
Theemails property contains an array of email addresses identified within the document belonging to legal persons. Email addresses that are not attributable to legal persons will not be extracted.
An email object has the following properties:
address (string): the normalized email address.person (string): the unique identifier of the person that the email address belongs to.mentions (array[Span]): one or more spans where the email address is mentioned.
Phone numbers
Thephone_numbers property contains an array of valid phone numbers identified within the document belonging to legal persons. Phone numbers that are not valid, possible, or attributable to legal persons will not be extracted.
A phone number object has the following properties:
number (string): the normalized phone number in E.123 international notation conforming with local conventions on the use of spaces and hyphens as separators.person (string): the unique identifier of the person that the phone number belongs to.mentions (array[Span]): one or more spans where the phone number is mentioned.
ID numbers
Theid_numbers property contains an array of identification numbers identified within the document belonging to legal persons. ID numbers that are not attributable to legal persons will not be extracted.
An ID number object has the following properties:
number (string): the identification number.person (string): the unique identifier of the person that the identification number belongs to.mentions (array[Span]): one or more spans where the identification number is mentioned.
Terms
Theterms property contains an array of terms assigned definite meanings within the document.
A term object has the following properties:
id (string): a unique identifier for the term within the document in the formatterm:{index}where{index}is a non-negative incrementing integer starting from zero.name (Span): the span within the document defining the term’s name. For example, in the phrase ‘“Agreement” means this contract between the parties’, the term’s name would be the span covering ‘Agreement’. Term names are different from and will never overlap with mentions of the term elsewhere in the document.meaning (Span): the span within the document providing the term’s meaning. For example, in the phrase ‘“Agreement” means this contract between the parties’, the term’s meaning would be the span covering ‘this contract between the parties’.mentions (array[Span]): spans where the term is mentioned outside of its definition. It is possible for the term to have no mentions if, outside of its definition, it is never referred to in the document.
External documents
Theexternal_documents property contains an array of external documents referenced within the document.
An external document object has the following properties:
id (string): a unique identifier for the external document within the document in the formatexd:{index}where{index}is a non-negative incrementing integer starting from zero.name (Span): a span representing the most proper name of the external document of all the names by which it is referred to in the document.type (enum<string>): the type of the external document, being one of the supported document types (statute,regulation,decision,contract, orother).jurisdiction (string | null): the jurisdiction of the external document formatted the same way as the document’s jurisdiction ornullif unknown.reception (enum<string>): the sentiment of the document towards the external document, being one of the following:positive: indicating that the document expresses a favorable view of the external document whether by endorsing or approving it.mixed: indicating that the document expresses both favorable and unfavorable views of the external document, for example, by affirming parts of it and disapproving others.negative: indicating that the document expresses an unfavorable view of the external document whether by criticizing, repealing, overruling, or explicitly contradicting it.neutral: indicating that the document references the external document without expressing any particular sentiment towards it.
mentions (array[Span]): spans where the external document is mentioned by name, for example, ‘the US Constitution’ in ‘the Second Amendment to the US Constitution protects freedom of speech’.pinpoints (array[Span]): spans where specific parts of the external document are referenced, for example, ‘Section 2’ in ‘as defined in Section 2 of the Crimes Act’.
Quotes
Thequotes property contains an array of quotations within the document.
A quote object has the following properties:
source_segment (string | null): the unique identifier of the segment that is the source of the quote ornullif the quote is not from a segment within the document.source_document (string | null): the unique identifier of the external document that is the source of the quote ornullif the quote is not from an external document.source_person (string | null): the unique identifier of the person that is the source of the quote ornullif not attributable to a person.amending (boolean): whether the quote is being used to amend or modify content, typically in other documents.span (Span): the span where the quote occurs.
Dates
Thedates property contains an array of dates identified within the document belonging to a supported date type. Dates not belonging to a supported date type will not be extracted.
A date object has the following properties:
value (string): the date in ISO 8601 format (YYYY-MM-DD).type (enum<string>): the type of the date, being one of the following:creation: the date the document was created or last updated. There may only be onecreationdate per document.signature: the date the document was signed.effective: the date when the document or a part thereof comes into effect (e.g., commencement or enactment dates).expiry: the date when the document or a part thereof is no longer in effect.delivery: the date when goods or services are to be delivered under the document.renewal: the date when one or more of the document’s terms are to be renewed.payment: the date when payment is to be made under the document.birth: the birth date of a natural person or establishment (e.g., incorporation) date of a non-natural legal person identified in the document. There can only be onebirthdate linked to a single person and allbirthdates must be linked to a person. A person’sbirthdate will never be after theirdeathdate.death: the death date of a natural person or dissolution date of a non-natural legal person identified in the document. There can only be onedeathdate linked to a single person and alldeathdates must be linked to a person. A person’sdeathdate will never be before theirbirthdate.
person (string | null): the unique identifier of the legal person associated with the date ornullif the date is not associated with a person.mentions (array[Span]): one or more spans where the date is mentioned.
Headings
Theheadings property contains an array of spans within the document constituting headings.
Junk
Thejunk property contains an array of spans within the document constituting non-operative, non-substantive ‘junk’ content such as headers, footers, page numbers, and OCR artifacts.
1 The outputs shown in this documentation may differ slightly from your experience as we continue to improve our models and enrichment capabilities, particularly in relation to models made available through the Isaacus Beta Program.