heritage package

Submodules

heritage.cli module

Command line interface for Heritage.py.

heritage.cli.dataclass_to_dict(obj: Any) Any[source]

Convert nested dataclasses into dictionaries.

heritage.cli.build_platform(args: Namespace) HeritagePlatform[source]

Create a configured HeritagePlatform instance from CLI arguments.

heritage.cli.configure_parser() ArgumentParser[source]

Configure the CLI argument parser.

heritage.cli.cmd_analysis(args: Namespace, platform: HeritagePlatform) int[source]
heritage.cli.cmd_parse(args: Namespace, platform: HeritagePlatform) int[source]
heritage.cli.cmd_declension(args: Namespace, platform: HeritagePlatform) int[source]
heritage.cli.cmd_conjugation(args: Namespace, platform: HeritagePlatform) int[source]
heritage.cli.cmd_sandhi(args: Namespace, platform: HeritagePlatform) int[source]
heritage.cli.main() int[source]

Entry point for the CLI.

heritage.constants module

Constants

heritage.heritage module

Python Interface to The Sanskrit Heritage Site

Use The Sanskrit Heritage Platform using,

  • Web mirror - no installation required - makes HTTP requests

  • Local installation - faster - uses console - no HTTP requests required

Using Local Installation

  • Heritage_Platform/ML/ contains the scripts

  • export QUERY_STRING as shell variable (referred to as OPTION_STRING in this code alongwith the ‘&text=TEXT’ part)

  • execute various scripts, such as ./reader

  • still produces HTML output that needs to be parsed

# Default input needs to be in the devanagari format # utils.devanagari_to_velthuis() function will convert this to VH

class heritage.heritage.frozendict[source]

Bases: dict

heritage.heritage.freezeargs(func)[source]

Transform mutable dictionnary arguments into immutable frozen ones

Useful to be compatible with @cache. Should be added on top of @cache

class heritage.heritage.HeritageAnalysis(case: str = None, number: str = None, gender: str = None, tense: str = None)[source]

Bases: object

case: str = None
number: str = None
gender: str = None
tense: str = None
class heritage.heritage.Token[source]

Bases: object

class heritage.heritage.HeritageOutput(html: str)[source]

Bases: object

Heritage Output Parser

Parse output generated by various utilities from Heritage Platform

CLASSES = {'footer': ['enpied']}
process(html: Optional[str] = None)[source]

Process the html and extract basic information

extract_analysis(meta: bool = False, structured: bool = False)[source]

Extract analysis from HTML

Parameters:
  • meta (bool) – If True, include meta information, i.e, parse options, classes The default is False.

  • structured (bool) – If True, return dataclass-based representations. The default is False (legacy dictionaries).

extract_parse(structured: bool = False)[source]

Extract parse from HTML

extract_declensions(headers: bool = True, structured: bool = False)[source]

Extract declension tables from HTML.

When structured is True, returns a DeclensionTable instance; otherwise returns a nested list of header/body cells.

extract_conjugations(headers: bool = True, structured: bool = False)[source]

Extract conjugation tables from HTML.

When structured is True, returns a list of ConjugationTable objects; otherwise a nested dictionary keyed by table headings.

extract_sandhi()[source]

Extract Sandhi from HTML

extract_lexicon_entry(word_id: str)[source]

Extract entry from a lexicon

extract_search_results(structured: bool = True)[source]

Extract dictionary search results.

static parse_analysis(table: Tag, structured: bool = False)[source]

Parse analysis of a single word Analysis Format is: [root]{analysis_1 | analysis_2 | ..}

Parameters:

table (bs4.element.Tag) – Valid table element

Returns:

analysies

Return type:

list

class heritage.heritage.HeritagePlatform(base_dir: str = '', base_url: Optional[str] = None, method: str = 'shell', **kwargs)[source]

Bases: object

The Sanskrit Heritage Platform

Access various utilities from The Sanskrit Heritage Platform

Initialize Heritage Class

Parameters:
  • base_dir (str) – Path to the Heritage_Platform repository. The directory should contain ‘ML’ sub-directory, which further contains the scripts

  • base_url (str, optional) – URL for the Heritage Platform Mirror. If None, the official INRIA website will be used. The default is None.

  • method (str, optional) –

    Method used to obtain results. Results can be obtained either using the web installation or using UNIX shell.

    Possible values are, ‘shell’ and ‘web’ The default is ‘shell’.

  • **kwargs

    Additional configuration keywords. Supported values are:

    • request_timeout (int): timeout for HTTP requests in seconds.

    • request_attempts (int): number of HTTP retries before giving up.

INRIA_URL = 'https://sanskrit.inria.fr/cgi-bin/SKT/'
ACTIONS = {'conjugation': {'shell': 'conjugation', 'web': 'sktconjug.cgi'}, 'declension': {'shell': 'declension', 'web': 'sktdeclin.cgi'}, 'dictionary': {'shell': '../MW/', 'web': '../../MW/'}, 'interface': {'shell': 'interface', 'web': 'sktgraph.cgi'}, 'lemma': {'shell': 'lemmatizer', 'web': 'sktlemmatizer.cgi'}, 'parser': {'shell': 'parser', 'web': 'sktparser.cgi'}, 'reader': {'shell': 'reader', 'web': 'sktreader.cgi'}, 'sandhi': {'shell': 'sandhier', 'web': 'sktsandhier.cgi'}, 'search': {'shell': 'indexer', 'web': 'sktindex.cgi'}, 'search_easy': {'shell': 'indexerd', 'web': 'sktsearch.cgi'}, 'user': {'shell': 'user_aid', 'web': 'sktuser.cgi'}}
OPTIONS = {'font': {'default': 'deva', 'description': 'Font for Sanskrit output', 'values': {'deva': 'Devanagari', 'roma': 'Roman (IAST)'}}, 'lex': {'default': 'MW', 'description': 'Lexicon', 'values': {'MW': 'Monier-Williams Dictionary (English)', 'SH': 'Sanskrit Heritage Dictionary (French)'}}, 't': {'default': 'VH', 'description': 'Internal Transliteration Scheme', 'values': {'VH': 'Velthuis'}}}
METHODS = ['shell', 'web']
DEFAULT_METHOD = 'shell'
__init__(base_dir: str = '', base_url: Optional[str] = None, method: str = 'shell', **kwargs)[source]

Initialize Heritage Class

Parameters:
  • base_dir (str) – Path to the Heritage_Platform repository. The directory should contain ‘ML’ sub-directory, which further contains the scripts

  • base_url (str, optional) – URL for the Heritage Platform Mirror. If None, the official INRIA website will be used. The default is None.

  • method (str, optional) –

    Method used to obtain results. Results can be obtained either using the web installation or using UNIX shell.

    Possible values are, ‘shell’ and ‘web’ The default is ‘shell’.

  • **kwargs

    Additional configuration keywords. Supported values are:

    • request_timeout (int): timeout for HTTP requests in seconds.

    • request_attempts (int): number of HTTP retries before giving up.

get_analysis(input_text: str, sentence: bool = True, unsandhied: bool = False, meta: bool = False, structured: bool = True)[source]

Obtain morphological analyses using The Sanskrit Reader Companion

Parameters:
  • input_text (str) – Input text to analyse

  • sentence (bool, optional) – The input is treated as a sentence, if true, otherwise as a word. The default is True.

  • unsandhied (bool, optional) – If True, the input text is assumed to not contain sandhi. The default is False.

  • meta (bool, optional) – The option is passed to HeritageOutput.extract_analysis(). The default is False.

  • structured (bool, optional) – Return dataclass objects if True, otherwise legacy dictionaries. The default is True.

Returns:

Dictionary of valid morphological analyses with solution_id as keys

Return type:

dict[int, SolutionAnalysis] | dict

get_parse(input_text: str, solution_id: Optional[int] = None, sentence: bool = True, unsandhied: bool = False)[source]

Obtain parse of a sentence using The Sanskrit Reader Companion

Parameters:
  • input_text (str) – Input text to analyse

  • solution_id (int, optional) – Solution ID to parse. If None, the first solution ID is used. The default is None.

  • sentence (bool, optional) – The input is treated as a sentence, if true, otherwise as a word. The option is passed to HeritagePlatform.get_analysis(). The default is True.

  • unsandhied (bool, optional) – If True, the input text is assumed to not contain sandhi. The option is passed to HeritagePlatform.get_analysis(). The default is False.

Returns:

Parse of the sentence. By default a heritage.models.SolutionAnalysis instance is returned, but legacy dictionary outputs are still supported when using the non-structured APIs.

Return type:

SolutionAnalysis | dict

sandhi(word_1: str, word_2: str, mode: str = 'internal')[source]

Join two words by forming a Sandhi

Parameters:
  • word_1 (str) – The first (left) word in the Sandhi

  • word_2 (str) – The second (right) word in the Sandhi

  • mode (str, optional) – Indicates whether the words join to form a single word or not Possible values are, * internal * external The default is ‘internal’.

Returns:

sandhi – String obtained by forming the Sandhi

Return type:

str

search_inflected_form(word: str, category: str)[source]

Search an inflected form

Parameters:
  • word (str) – Sanskrit Word to search (in Devanagari)

  • category (str) –

    Type of the word
    • Noun: Noun

    • Pron: Pronoun

    • Part: Participle

    • Inde: Indeclinible

    • Absya, Abstvaa, Voca, Iic, Ifc, Iiv, Piic etc.

Returns:

matches – List of matches.

Return type:

list

get_declensions(word: str, gender: str, headers: bool = True, lexicon: Optional[str] = None, structured: bool = True)[source]

Retrieve declension tables from the Grammarian.

Parameters:
  • word (str) – Input word in Devanagari.

  • gender (str) – Gender hint. Accepted values include short forms (m, f, n) and Sanskrit labels (e.g. पु, स्त्री).

  • headers (bool, optional) – If True, include header row information. The default is True.

  • lexicon (str, optional) – Reserved for future use. Currently ignored.

  • structured (bool, optional) – When True (the default), returns a heritage.models.DeclensionTable instance. When False, returns the raw nested list produced by HeritageOutput.extract_declensions().

Returns:

Structured table, legacy list-of-lists, or None when no table can be extracted.

Return type:

DeclensionTable | list | None

get_conjugations(word: str, gana: str, lexicon: Optional[str] = None, headers: bool = True, structured: bool = True)[source]

Retrieve conjugation paradigms from the Grammarian.

Parameters:
  • word (str) – Verbal root in Devanagari.

  • gana (str) – Verbal class (gaṇa) identifier expected by the backend.

  • lexicon (str, optional) – Reserved for future use. Currently ignored.

  • headers (bool, optional) – If True, treat the first row of each table as a heading.

  • structured (bool, optional) – When True (the default), returns a list of heritage.models.ConjugationTable objects. When False, returns the legacy dictionary-of-tables output.

Returns:

Structured tables, legacy mapping, or None on failure.

Return type:

list[ConjugationTable] | dict | None

search_lexicon(word: str, lexicon: Optional[str] = None, structured: bool = True)[source]

Search a word in the dictionary.

Parameters:
  • word (str) – Sanskrit Word to search (in Devanagari)

  • lexicon (str, optional) –

    Lexicon to search the word in. Possible values are,

    • MW: Monier-Williams Dictionary

    • SH: Heritage Dictionary

    The default is ‘MW’.

Returns:

Parsed search results (the default), legacy dictionaries when structured is False, or None when the backend response cannot be parsed.

Return type:

list[SearchResult] | list[dict] | None

get_lexicon_entry(file_name: str, word_id: str)[source]

Fetch a single dictionary entry by its file and anchor identifier.

The implementation reuses the same HTML parser used for direct search results and returns a heritage.models.DictionaryEntry instance.

Parameters:
  • file_name (str) – Name of the HTML file containing the entry.

  • word_id (str) – Anchor identifier within the dictionary page.

Returns:

Parsed entry when available, otherwise None.

Return type:

DictionaryEntry | None

get_result_from_web(url: str, options: dict, attempts: Optional[int] = None, timeout: Optional[int] = None)[source]

Get results from the Heritage Platform web mirror Exponential backoff is used in case there are network errors

Parameters:
  • url (str) – URL of the CGI script to call HeritagePlatform.get_url() can be used to generate supported URLs

  • options (dict) – Dictionary containing valid options for the script

  • attempts (int, optional) – Number of attempts for the exponential backoff The default is self.request_attempts.

  • timeout (int, optional) – Timeout for the HTTP request in seconds. The default is self.request_timeout.

Returns:

Result (HTML) obtained. Returns None when every attempt fails.

Return type:

str

_query_with_backoff(query_url: str, attempts: int, timeout: int)[source]

Fetch a URL with exponential backoff and robust decoding.

Returns decoded response text on success, otherwise None.

static _response_text(response: Response) str[source]

Return response body decoded as UTF-8, avoiding mojibake.

get_result_from_shell(path: str, options: dict, timeout: int = 30)[source]

Get results from the Heritage Platform’s local installation via shell

Parameters:
  • path (str) – Path to the executable script HeritagePlatform.get_path() can be used to generate supported paths

  • options (dict) – Valid options for the script

  • timeout (int, optional) – Timeout in seconds, after which the function will abort. The default is 30.

Returns:

result – Result (HTML) obtained

Return type:

str

get_result(action: str, options: dict, *args, **kwargs)[source]

High-level function to obtain result for various actions

Avoids the hassle of generating the URL or PATH. Utilizes the HeritagePlatform.method attribute to determine whether to fetch through shell or web.

Parameters:
  • action (str) – Action value corresponding to the utility to be used. Refer to HeritagePlatform.ACTIONS

  • options (dict) – Valid options for the specified action

Returns:

Result (HTML) obtained

Return type:

str

get_method()[source]

Get the current method

set_method(method: str)[source]

Set method for fetching the output

Valid methods are listed in HeritagePlatform.METHODS

get_option(opt_name: str)[source]

Get the value of global options

set_option(opt_name: str, opt_value: str)[source]

Set global options

Any of these options, if expected by a particular utility from the Heritage Platform, will be directly used in the QUERY_STRING while fetching the output from that utility

class variable OPTIONS stores the default values for options

Each option contains, - a ‘description’ of the option - ‘values’ it can take (and descriptions of those values) - ‘default’ value

get_font()[source]

Get current font for Sanskrit Output

set_font(font: str)[source]

Set font for Sanskrit output

get_lexicon()[source]

Get current lexicon

set_lexicon(lexicon: str)[source]

Set lexicon

get_url(action: str)[source]

URL Builder

get_path(action: str)[source]

Path Builder

valid_installation()[source]

Check if the Heritage Platform installation exists

static prepare_input(input_text: str)[source]
Prepare Input
  • Convert Devanagari to Velthuis

  • Join words by ‘+’ instead of by whitespaces

static identify_gender(gender: str)[source]

heritage.models module

Typed models used by the Heritage Platform wrapper.

class heritage.models.Method(value)[source]

Bases: str, Enum

Execution backend for the Heritage Platform.

SHELL = 'shell'
WEB = 'web'
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

class heritage.models.Lexicon(value)[source]

Bases: str, Enum

Available dictionary backends.

MW = 'MW'
SH = 'SH'
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

class heritage.models.Font(value)[source]

Bases: str, Enum

Output font options understood by the CGI scripts.

DEVA = 'deva'
ROMA = 'roma'
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

class heritage.models.SandhiMode(value)[source]

Bases: str, Enum

Modes supported by the sandhi engine.

INTERNAL = 'internal'
EXTERNAL = 'external'
_generate_next_value_(start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_value: the last value assigned or None

class heritage.models.AnalysisCandidate(root: str, analyses: List[List[str]], lexicon_reference: Optional[Tuple[Optional[str], Optional[str]]] = None)[source]

Bases: object

Single candidate returned by the Reader Companion.

root: str
analyses: List[List[str]]
lexicon_reference: Optional[Tuple[Optional[str], Optional[str]]] = None
class heritage.models.WordAnalysis(text: str, category: ~typing.List[~typing.Optional[str]] = <factory>, classes: ~typing.List[str] = <factory>, candidates: ~typing.List[~heritage.models.AnalysisCandidate] = <factory>)[source]

Bases: object

Analysis for a single word/token.

text: str
category: List[Optional[str]]
classes: List[str]
candidates: List[AnalysisCandidate]
class heritage.models.SolutionAnalysis(id: int, words: List[WordAnalysis], parser_options: Optional[Dict[str, str]] = None, roles: Optional[List[WordRole]] = None)[source]

Bases: object

Full solution comprising analyses for each token.

id: int
words: List[WordAnalysis]
parser_options: Optional[Dict[str, str]] = None
roles: Optional[List[WordRole]] = None
class heritage.models.WordRole(text: str, roles: List[str])[source]

Bases: object

Semantic role assignment extracted from the Reader Assistant.

text: str
roles: List[str]
class heritage.models.DeclensionTable(headers: Sequence[str], rows: Sequence[Sequence[str]])[source]

Bases: object

Declension grid produced by the grammarian.

headers: Sequence[str]
rows: Sequence[Sequence[str]]
class heritage.models.ConjugationCell(heading: str, rows: Sequence[Sequence[str]])[source]

Bases: object

Single cell produced inside a conjugation table.

heading: str
rows: Sequence[Sequence[str]]
class heritage.models.ConjugationTable(title: str, cells: Sequence[ConjugationCell])[source]

Bases: object

Grouping for conjugation paradigms.

title: str
cells: Sequence[ConjugationCell]
class heritage.models.DictionaryEntry(lemma: str, html: str, text: str)[source]

Bases: object

Dictionary entry extracted from the Heritage lexicons.

lemma: str
html: str
text: str
class heritage.models.SearchResult(entry: str, link: Optional[str], summary: str)[source]

Bases: object

Single row returned by the lexicon search interface.

entry: str
summary: str

heritage.utils module

Utility Functions

heritage.utils.build_query_string(options: dict) str[source]

Build a CGI-compatible QUERY_STRING.

Values set to None are dropped and literal + characters are kept intact because the Heritage CGI scripts rely on plus-separated tokens for multi-word inputs.

heritage.utils.devanagari_to_velthuis(text: str) str[source]

Convert Devanagari text to Velthuis

Heritage Platform uses its own DN to VH conversion This deviates from the standard one (from Wiki or other sources) Following is a translation of the JS function convert() from the Heritage Platform Source URL: https://sanskrit.inria.fr/DICO/utf82VH.js

Module contents

Heritage.py – Python Interface to The Sanskrit Heritage Platform