heritage package

Submodules

heritage.cli module

Console script for Heritage.

heritage.cli.main()[source]

Console script for Heritage.py

heritage.constants module

Constants

heritage.heritage module

Python Interface to The Sanskrit Heritage Site

Use The Sanskrit Heritage Platform using,

  • Web mirror - no installation required - makes HTTP requests

  • Local installation - faster - uses console - no HTTP requests required

Using Local Installation

  • Heritage_Platform/ML/ contains the scripts

  • export QUERY_STRING as shell variable (referred to as OPTION_STRING in this code alongwith the ‘&text=TEXT’ part)

  • execute various scripts, such as ./reader

  • still produces HTML output that needs to be parsed

# Default input needs to be in the devanagari format # utils.devanagari_to_velthuis() function will convert this to VH

class heritage.heritage.frozendict[source]

Bases: dict

heritage.heritage.freezeargs(func)[source]

Transform mutable dictionnary arguments into immutable frozen ones

Useful to be compatible with @cache. Should be added on top of @cache

heritage.heritage.timeout_handler(signum, frame)[source]
class heritage.heritage.HeritageAnalysis(case: str = None, number: str = None, gender: str = None, tense: str = None)[source]

Bases: object

case: str = None
number: str = None
gender: str = None
tense: str = None
class heritage.heritage.Token[source]

Bases: object

class heritage.heritage.HeritageOutput(html: str)[source]

Bases: object

Heritage Output Parser

Parse output generated by various utilities from Heritage Platform

CLASSES = {'footer': ['enpied']}
process(html: Optional[str] = None)[source]

Process the html and extract basic information

extract_analysis(meta: bool = False)[source]

Extract analysis from HTML

Parameters:

meta (bool) – If True, include meta information, i.e, parse options, classes The default is False.

extract_parse()[source]

Extract parse from HTML

extract_declensions(headers: bool = True)[source]

Extract declensions from HTML

extract_conjugations(headers: bool = True)[source]

Extract conjugations from HTML

extract_sandhi()[source]

Extract Sandhi from HTML

extract_lexicon_entry(word_id: str)[source]

Extract entry from a lexicon

static parse_analysis(table: Tag)[source]

Parse analysis of a single word Analysis Format is: [root]{analysis_1 | analysis_2 | ..}

Parameters:

table (bs4.element.Tag) – Valid table element

Returns:

analysies

Return type:

list

class heritage.heritage.HeritagePlatform(base_dir: str = '', base_url: Optional[str] = None, method: str = 'shell', **kwargs)[source]

Bases: object

The Sanskrit Heritage Platform

Access various utilities from The Sanskrit Heritage Platform

Initialize Heritage Class

Parameters:
  • base_dir (str) – Path to the Heritage_Platform repository. The directory should contain ‘ML’ sub-directory, which further contains the scripts

  • base_url (str, optional) – URL for the Heritage Platform Mirror. If None, the official INRIA website will be used. The default is None.

  • method (str, optional) –

    Method used to obtain results. Results can be obtained either using the web installation or using UNIX shell.

    Possible values are, ‘shell’ and ‘web’ The default is ‘shell’.

INRIA_URL = 'https://sanskrit.inria.fr/cgi-bin/SKT/'
ACTIONS = {'conjugation': {'shell': 'conjugation', 'web': 'sktconjug.cgi'}, 'declension': {'shell': 'declension', 'web': 'sktdeclin.cgi'}, 'dictionary': {'shell': '../MW/', 'web': '../../MW/'}, 'interface': {'shell': 'interface', 'web': 'sktgraph.cgi'}, 'lemma': {'shell': 'lemmatizer', 'web': 'sktlemmatizer.cgi'}, 'parser': {'shell': 'parser', 'web': 'sktparser.cgi'}, 'reader': {'shell': 'reader', 'web': 'sktreader.cgi'}, 'sandhi': {'shell': 'sandhier', 'web': 'sktsandhier.cgi'}, 'search': {'shell': 'indexer', 'web': 'sktindex.cgi'}, 'search_easy': {'shell': 'indexerd', 'web': 'sktsearch.cgi'}, 'user': {'shell': 'user_aid', 'web': 'sktuser.cgi'}}
OPTIONS = {'font': {'default': 'deva', 'description': 'Font for Sanskrit output', 'values': {'deva': 'Devanagari', 'roma': 'Roman (IAST)'}}, 'lex': {'default': 'MW', 'description': 'Lexicon', 'values': {'MW': 'Monier-Williams Dictionary (English)', 'SH': 'Sanskrit Heritage Dictionary (French)'}}, 't': {'default': 'VH', 'description': 'Internal Transliteration Scheme', 'values': {'VH': 'Velthuis'}}}
METHODS = ['shell', 'web']
DEFAULT_METHOD = 'shell'
__init__(base_dir: str = '', base_url: Optional[str] = None, method: str = 'shell', **kwargs)[source]

Initialize Heritage Class

Parameters:
  • base_dir (str) – Path to the Heritage_Platform repository. The directory should contain ‘ML’ sub-directory, which further contains the scripts

  • base_url (str, optional) – URL for the Heritage Platform Mirror. If None, the official INRIA website will be used. The default is None.

  • method (str, optional) –

    Method used to obtain results. Results can be obtained either using the web installation or using UNIX shell.

    Possible values are, ‘shell’ and ‘web’ The default is ‘shell’.

get_analysis(input_text: str, sentence: bool = True, unsandhied: bool = False, meta: bool = False)[source]

Obtain morphological analyses using The Sanskrit Reader Companion

Parameters:
  • input_text (str) – Input text to analyse

  • sentence (bool, optional) – The input is treated as a sentence, if true, otherwise as a word. The default is True.

  • unsandhied (bool, optional) – If True, the input text is assumed to not contain sandhi. The default is False.

  • meta (bool, optional) – The option is passed to HeritageOutput.extract_analysis(). The default is False.

Returns:

Dictionary of valid morphological analyses with solution_id as keys

Return type:

dict

get_parse(input_text: str, solution_id: Optional[int] = None, sentence: bool = True, unsandhied: bool = False)[source]

Obtain parse of a sentence using The Sanskrit Reader Companion

Parameters:
  • input_text (str) – Input text to analyse

  • solution_id (int, optional) – Solution ID to parse. If None, the first solution ID is used. The default is None.

  • sentence (bool, optional) – The input is treated as a sentence, if true, otherwise as a word. The option is passed to HeritagePlatform.get_analysis(). The default is True.

  • unsandhied (bool, optional) – If True, the input text is assumed to not contain sandhi. The option is passed to HeritagePlatform.get_analysis(). The default is False.

Returns:

Parse of the sentence

Return type:

dict

sandhi(word_1: str, word_2: str, mode: str = 'internal')[source]

Join two words by forming a Sandhi

Parameters:
  • word_1 (str) – The first (left) word in the Sandhi

  • word_2 (str) – The second (right) word in the Sandhi

  • mode (str, optional) – Indicates whether the words join to form a single word or not Possible values are, * internal * external The default is ‘internal’.

Returns:

sandhi – String obtained by forming the Sandhi

Return type:

str

search_inflected_form(word: str, category: str)[source]

Search an inflected form

Parameters:
  • word (str) – Sanskrit Word to search (in Devanagari)

  • category (str) –

    Type of the word
    • Noun: Noun

    • Pron: Pronoun

    • Part: Participle

    • Inde: Indeclinible

    • Absya, Abstvaa, Voca, Iic, Ifc, Iiv, Piic etc.

Returns:

matches – List of matches.

Return type:

list

get_declensions(word: str, gender: str, headers: bool = True, lexicon: Optional[str] = None)[source]
get_conjugations(word: str, gana: str, lexicon: Optional[str] = None)[source]
search_lexicon(word: str, lexicon: Optional[str] = None)[source]

Search a word in the dictionary

Parameters:
  • word (str) – Sanskrit Word to search (in Devanagari)

  • lexicon (str, optional) –

    Lexicon to search the word in. Possible values are,

    • MW: Monier-Williams Dictionary

    • SH: Heritage Dictionary

    The default is ‘MW’.

Returns:

matches – List of matches.

Return type:

list

get_lexicon_entry(file_name: str, word_id: str)[source]
get_result_from_web(url: str, options: dict, attempts: int = 3)[source]

Get results from the Heritage Platform web mirror Exponential backoff is used in case there are network errors

Parameters:
  • url (str) – URL of the CGI script to call HeritagePlatform.get_url() can be used to generate supported URLs

  • options (dict) – Dictionary containing valid options for the script

  • attempts (int, optional) – Number of attempts for the exponential backoff The default is 3.

Returns:

Result (HTML) obtained

Return type:

str

get_result_from_shell(path: str, options: dict, timeout: int = 30)[source]

Get results from the Heritage Platform’s local installation via shell

Parameters:
  • path (str) – Path to the executable script HeritagePlatform.get_path() can be used to generate supported paths

  • options (dict) – Valid options for the script

  • timeout (int, optional) – Timeout in seconds, after which the function will abort. The default is 30.

Returns:

result – Result (HTML) obtained

Return type:

str

get_result(action: str, options: dict, *args, **kwargs)[source]

High-level function to obtain result for various actions

Avoids the hassle of generating the URL or PATH. Utilizes the HeritagePlatform.method attribute to determine whether to fetch through shell or web.

Parameters:
  • action (str) – Action value corresponding to the utility to be used. Refer to HeritagePlatform.ACTIONS

  • options (dict) – Valid options for the specified action

Returns:

Result (HTML) obtained

Return type:

str

get_method()[source]

Get the current method

set_method(method: str)[source]

Set method for fetching the output

Valid methods are listed in HeritagePlatform.METHODS

get_option(opt_name: str)[source]

Get the value of global options

set_option(opt_name: str, opt_value: str)[source]

Set global options

Any of these options, if expected by a particular utility from the Heritage Platform, will be directly used in the QUERY_STRING while fetching the output from that utility

class variable OPTIONS stores the default values for options

Each option contains, - a ‘description’ of the option - ‘values’ it can take (and descriptions of those values) - ‘default’ value

get_font()[source]

Get current font for Sanskrit Output

set_font(font: str)[source]

Set font for Sanskrit output

get_lexicon()[source]

Get current lexicon

set_lexicon(lexicon: str)[source]

Set lexicon

get_url(action: str)[source]

URL Builder

get_path(action: str)[source]

Path Builder

valid_installation()[source]

Check if the Heritage Platform installation exists

static prepare_input(input_text: str)[source]
Prepare Input
  • Convert Devanagari to Velthuis

  • Join words by ‘+’ instead of by whitespaces

static identify_gender(gender: str)[source]

heritage.utils module

Utility Functions

heritage.utils.build_query_string(options: dict) str[source]

Build QUERY_STRING

heritage.utils.devanagari_to_velthuis(text: str) str[source]

Convert Devanagari text to Velthuis

Heritage Platform uses its own DN to VH conversion This deviates from the standard one (from Wiki or other sources) Following is a translation of the JS function convert() from the Heritage Platform Source URL: https://sanskrit.inria.fr/DICO/utf82VH.js

Module contents

Heritage.py – Python Interface to The Sanskrit Heritage Platform