Symbol Layer


Datatypes for Symbols and Symbol Alphabets
typedef Symbol	Symbol
	A handle for a symbol name, i.e. a string.
typedef SymbolSet	SymbolSet
	A set of symbols aka an alphabet of symbols.
typedef SymbolIterator	SymbolIterator
	Iterator over the symbols in a SymbolSet.
typedef SymbolPair	SymbolPair
	A pair of symbols representing a transition in a transducer.
typedef SymbolPairSet	SymbolPairSet
	A set of symbol pairs aka an alphabet of symbol pairs.
typedef SymbolPairIterator	SymbolPairIterator
	Iterator over the set of symbol pairs in a SymbolPairSet.
typedef KeyTable	KeyTable
	A table for storing Key-to-Symbol associations.
Defining and Using Symbols
Symbol	define_symbol (const char *s)
	Define a symbol with name s.
bool	is_symbol (const char *s)
	Whether the string s indicates a name for a symbol.
Symbol	get_symbol (const char *s)
	Find the symbol for the symbol name s.
const char *	get_symbol_name (Symbol s)
	Find the symbol name for the symbol s.
bool	is_equal (Symbol s1, Symbol s2)
	Whether the symbol s1 is identical to symbol s2.
Defining and Using Alphabets of Symbols
SymbolSet *	create_empty_symbol_set ()
	Define an empty set of symbols.
SymbolSet *	insert_symbol (Symbol s, SymbolSet *Si)
	Insert s into the set of symbols Si and return the updated set.
bool	has_symbol (Symbol s, SymbolSet *Si)
	Whether symbol s is a member of the set of symbols Si.
Iterators over Symbols
SymbolIterator	begin_sigma_symbol (SymbolSet *Si)
	Beginning of the iterator for the symbol set Si.
SymbolIterator	end_sigma_symbol (SymbolSet *Si)
	End of the iterator for the symbol set Si.
size_t	size_sigma_symbol (SymbolSet *Si)
	Size of the iterator for the symbol set Si.
Symbol	get_sigma_symbol (SymbolIterator Si)
	Get the symbol pointed by the symbol iterator si.
Defining and Using Symbol Pairs
SymbolPair *	define_symbolpair (Symbol s1, Symbol s2)
	Define a symbol pair with input symbol s1 and output symbol s2.
Symbol	get_input_symbol (SymbolPair *s)
	Get the input symbol of SymbolPair s.
Symbol	get_output_symbol (SymbolPair *s)
	Get the output symbol of SymbolPair s.
Defining and Using Alphabets of Symbol Pairs
SymbolPairSet *	create_empty_symbolpair_set ()
	Define an empty set of symbol pairs.
SymbolPairSet *	insert_symbolpair (SymbolPair p, SymbolPairSet Pi)
	Insert p into the set of symbol pairs Pi and return the updated set.
bool	has_symbolpair (SymbolPair p, SymbolPairSet Pi)
	Whether symbol pair p is a member of the set of symbol pairs Pi.
Iterators over Symbol Pairs
SymbolPairIterator	begin_pi_symbol (SymbolPairSet *Pi)
	Beginning of the iterator for the symbol pair set Pi.
SymbolPairIterator	end_pi_symbol (SymbolPairSet *Pi)
	End of the iterator for the symbol pair set Pi.
size_t	size_pi_symbol (SymbolPairSet *Pi)
	Size of the iterator for the symbol pair set Pi.
SymbolPair *	get_pi_symbolpair (SymbolPairIterator pi)
	Get the symbol pair pointed by the symbol pair iterator pi.
Defining the Connection between Symbols and Transducer Keys.
The relation 1:N between keys and symbols is useful for dealing with equivalence classes of symbols.
KeyTable *	create_key_table ()
	Create an empty key table.
bool	is_key (Key i, KeyTable *T)
	Whether i indicates an existing key in key table T.
bool	is_symbol (Symbol s, KeyTable *T)
	Whether s indicates an existing symbol in key table T.
void	associate_key (Key i, KeyTable *T, Symbol s)
	Associate the key i in the key table T with the symbol s.
Key	get_key (Symbol s, KeyTable *T)
	Find the key for the symbol s in key table T.
Key	get_unused_key (KeyTable *T)
	Return a Key which hasn't been associated to any symbol in key table T.
Symbol	get_key_symbol (Key i, KeyTable *T)
	Find a symbol for the key i in key table T.
KeySet *	get_key_set (KeyTable *T)
	A set of keys in key table T.
SymbolSet *	get_symbol_set (KeyTable *T)
	A set of symbols in key table T.
KeyTable *	read_symbol_table (istream &is, bool binary=false)
	Read a symbol table from istream is and transform it to a key table. binary defines whether the symbol table is in binary or text format.
void	write_symbol_table (KeyTable *T, ostream &os, bool binary=false)
	Transform the key table T to a symbol table and write it to ostream os. binary defines whether the symbol table is written in binary or text format.
KeyTable *	gather_flag_diacritic_table (KeyTable *kt)
	Return a new key table only including those key/symbol pairs which correspond to flag-diacritic symbol names.
Reading Symbol Strings and Transducers
Read transducers (1) in text format from pair strings and input streams and (2) in binary format from files and input streams so that the keys used in the transducer are harmonized according to a key table.
TransducerHandle	longest_match_tokenizer (KeySet ks, KeyTable kt)
	Create a left to right longest match tokenizer for symbols in key set ks.
TransducerHandle	longest_match_tokenizer2 (KeyTable *kt)
	Create a left to right longest match tokenizer for symbols in key set ks.
KeyTable *	recode_key_table (KeyTable kt, const char epsilon_replacement)
	Replace the epsilon in kt, with epsilon_replacement.
KeyPairVector *	tokenize_string_pair (TransducerHandle tokeniser, const char upper, const char lower, KeyTable *inputKeys)
	Change 2 strings to a transducer aligned character by character according to tokenisation by tokeniser. The path(s) of result of composition of of string’s UTF-8 representations against tokeniser are paired up to a new tokeniser from beginning to end. Empty spaces in the end are filled with ε’s.
KeyVector *	tokenize_string (TransducerHandle tokeniser, const char string, KeyTable inputKeys)
	Change a string s into identity pair transducer as tokenised by tokeniser.
KeyVector *	longest_match_tokenize (TransducerHandle tokenizer, const char string, KeyTable inputKeys)
	Use tokenizer to tokenize string.
KeyPairVector *	longest_match_tokenize_pair (TransducerHandle tokenizer, const char string1, const char string2, KeyTable *inputKeys)
	Use tokenizer to tokenize string1 and string2 and align the tokenized strings to a key pair vector.
KeyPairVector *	tokenize_pair_string (TransducerHandle tokeniser, char pairs, KeyTable inputKeys)
	Tokenise with tokeniser a string s of individual characters and colon separated pairs into transducer.
TransducerHandle	pairstring_to_transducer (const char str, KeyTable T)
	Create a one-path transducer as defined in pairstring form in str using the symbols defined in key table T.
TransducerHandle	read_transducer_text (istream &is, KeyTable *T, bool sfst=false)
	Make a transducer as defined in text form in istream is using the key-to-printname relations defined in key table T. The parameter sfst defines whether SFST text format is used, otherwise AT&T format is used.
bool	has_symbol_table (istream &is)
	Whether the transducer coming from istream is has a symbol table stored with it.
TransducerHandle	read_transducer (istream &is, KeyTable *T)
	Read a transducer in binary form from input stream is and harmonize it according to the key table T.
TransducerHandle	harmonize_transducer (TransducerHandle t, KeyTable T_old, KeyTable T_new)
	Harmonize transducer t that uses key table T_old according to key table T _new.
Writing Symbol Strings and Transducers
Write transducers (1) in text format into pair strings and output streams and (2) in binary format to output streams so that the print names associated to keys are stored with the transducer.
char *	transducer_to_pairstring (TransducerHandle t, KeyTable *T, bool spaces=true, bool print_epsilons=true)
	A pairstring representation of one-path transducer t using the symbols defined in key table T. spaces defines whether pairs are separated by spaces.
void	print_transducer (TransducerHandle t, KeyTable *T, bool print_weights=false, ostream &ostr=std::cout, bool old=false)
	Print transducer t in text format using the symbols defined in key table T. The parameter print_weights indicates whether weights are included, the output stream ostr indicates where printing is directed. Parameter old indicates whether transducer t should be printed in old SFST text format instead of AT&T format.
void	write_transducer (TransducerHandle t, KeyTable *T, ostream &os=std::cout, bool backwards_compatibility=false)
	Write t in binary form to output stream os. Key table T is stored with the transducer.
void	write_runtime_transducer (TransducerHandle t, KeyTable kt, FILE output_file)
	Write a transducer t with key table kt into file output_file. Write its symbols into the file with name symbol_file_name.

Detailed Description

Datatypes and functions related to symbols and the relation between symbols and keys.

Typedef Documentation

typedef KeyTable KeyTable

A table for storing Key-to-Symbol associations.

A key can be associated to several symbols but a symbol is associated to only one key.

Definition at line 57 of file symbol-layer.h.

typedef Symbol Symbol

A handle for a symbol name, i.e. a string.

Symbol is the type of a handle for such a symbol that could occur in cell of an input or output tape or as input or output labels of transitions in transducers, or of a special-use symbols that do not occur on tapes but occur only as input or output transition labels having a special interpretation, e.g. any, default, failure, etc., which is indicated by an attribute of the transducer.

There is a global, session-spesific table of Symbol-to-string relations, called the the global symbol cache. In the symbol cache, one Symbol is associated with one string and for one string there is one Symbol representing it, i.e. the relation between strings and Symbols is one-to-one.

Definition at line 34 of file symbol-layer.h.

typedef SymbolIterator SymbolIterator

Iterator over the symbols in a SymbolSet.

Definition at line 40 of file symbol-layer.h.

typedef SymbolPair SymbolPair

A pair of symbols representing a transition in a transducer.

Definition at line 43 of file symbol-layer.h.

typedef SymbolPairIterator SymbolPairIterator

Iterator over the set of symbol pairs in a SymbolPairSet.

Definition at line 49 of file symbol-layer.h.

typedef SymbolPairSet SymbolPairSet

A set of symbol pairs aka an alphabet of symbol pairs.

Definition at line 46 of file symbol-layer.h.

typedef SymbolSet SymbolSet

A set of symbols aka an alphabet of symbols.

Definition at line 37 of file symbol-layer.h.

Function Documentation

void associate_key	(	Key	i,
		KeyTable *	T,
		Symbol	s
	)

Associate the key i in the key table T with the symbol s.

The symbol that is first associated with a key, becomes the primary symbol for that key. If key i has already been associated with one or more symbol(s) not equal to s, the symbol s becomes a parallel symbol for the key i.

SymbolPairIterator begin_pi_symbol ( SymbolPairSet * Pi )

Beginning of the iterator for the symbol pair set Pi.

SymbolIterator begin_sigma_symbol ( SymbolSet * Si )

Beginning of the iterator for the symbol set Si.

SymbolSet* create_empty_symbol_set ( )

Define an empty set of symbols.

SymbolPairSet* create_empty_symbolpair_set ( )

Define an empty set of symbol pairs.

KeyTable* create_key_table ( )

Create an empty key table.

The result has no associations defined between symbols and keys.

Symbol define_symbol ( const char * s )

Define a symbol with name s.

SymbolPair* define_symbolpair	(	Symbol	s1,
		Symbol	s2
	)

Define a symbol pair with input symbol s1 and output symbol s2.

SymbolPairIterator end_pi_symbol ( SymbolPairSet * Pi )

End of the iterator for the symbol pair set Pi.

SymbolIterator end_sigma_symbol ( SymbolSet * Si )

End of the iterator for the symbol set Si.

KeyTable* gather_flag_diacritic_table ( KeyTable * kt )

Return a new key table only including those key/symbol pairs which correspond to flag-diacritic symbol names.

Flag-diacritic symbol names begin and end with an '@'.

Symbol get_input_symbol ( SymbolPair * s )

Get the input symbol of SymbolPair s.

Key get_key	(	Symbol	s,
		KeyTable *	T
	)

Find the key for the symbol s in key table T.

KeySet* get_key_set ( KeyTable * T )

A set of keys in key table T.

Symbol get_key_symbol	(	Key	i,
		KeyTable *	T
	)

Find a symbol for the key i in key table T.

If there are several symbols associated with the key, the primary symbol (the symbol that was first associated with the key) is returned.

Symbol get_output_symbol ( SymbolPair * s )

Get the output symbol of SymbolPair s.

SymbolPair* get_pi_symbolpair ( SymbolPairIterator pi )

Get the symbol pair pointed by the symbol pair iterator pi.

Symbol get_sigma_symbol ( SymbolIterator Si )

Get the symbol pointed by the symbol iterator si.

Symbol get_symbol ( const char * s )

Find the symbol for the symbol name s.

Precondition:: s must refer to a symbol name. Use is_symbol to check this if you are not sure.

const char* get_symbol_name ( Symbol s )

Find the symbol name for the symbol s.

SymbolSet* get_symbol_set ( KeyTable * T )

A set of symbols in key table T.

Key get_unused_key ( KeyTable * T )

Return a Key which hasn't been associated to any symbol in key table T.

TransducerHandle harmonize_transducer	(	TransducerHandle	t,
		KeyTable *	T_old,
		KeyTable *	T_new
	)

Harmonize transducer t that uses key table T_old according to key table T _new.

See also:: read_transducer

bool has_symbol	(	Symbol	s,
		SymbolSet *	Si
	)

Whether symbol s is a member of the set of symbols Si.

bool has_symbol_table ( istream & is )

Whether the transducer coming from istream is has a symbol table stored with it.

Precondition:: The transducer is in valid format and the end of stream has not been reached. Use read_format to check this.

bool has_symbolpair	(	SymbolPair *	p,
		SymbolPairSet *	Pi
	)

Whether symbol pair p is a member of the set of symbol pairs Pi.

SymbolSet* insert_symbol	(	Symbol	s,
		SymbolSet *	Si
	)

Insert s into the set of symbols Si and return the updated set.

SymbolPairSet* insert_symbolpair	(	SymbolPair *	p,
		SymbolPairSet *	Pi
	)

Insert p into the set of symbol pairs Pi and return the updated set.

bool is_equal	(	Symbol	s1,
		Symbol	s2
	)

Whether the symbol s1 is identical to symbol s2.

bool is_key	(	Key	i,
		KeyTable *	T
	)

Whether i indicates an existing key in key table T.

bool is_symbol	(	Symbol	s,
		KeyTable *	T
	)

Whether s indicates an existing symbol in key table T.

bool is_symbol ( const char * s )

Whether the string s indicates a name for a symbol.

KeyVector* longest_match_tokenize	(	TransducerHandle	tokenizer,
		const char *	string,
		KeyTable *	inputKeys
	)

Use tokenizer to tokenize string.

The transducer tokenizer should be created using the function longest_match_tokenizer2. The key table inputKeys should contain all characters in string and be compatible with tokenizer.

KeyPairVector* longest_match_tokenize_pair	(	TransducerHandle	tokenizer,
		const char *	string1,
		const char *	string2,
		KeyTable *	inputKeys
	)

Use tokenizer to tokenize string1 and string2 and align the tokenized strings to a key pair vector.

The transducer tokenizer should be created using the function longest_match_tokenizer2. The key table inputKeys should contain all characters in string1 and string2 and be compatible with tokenizer. The tokenized strings will be aligned into a key pair vector. The shorter one of the tokenized strings will be padded with zeroes at the end.

TransducerHandle longest_match_tokenizer	(	KeySet *	ks,
		KeyTable *	kt
	)

Create a left to right longest match tokenizer for symbols in key set ks.

The keytable kt should contain the letters which make up the symbols for keys in ks. The keyset ks should not contain the key epsilon! The resulting transducer can be composed with other transducers to accomplish tokenization.

TransducerHandle longest_match_tokenizer2 ( KeyTable * kt )

Create a left to right longest match tokenizer for symbols in key set ks.

The keytable kt should contain the letters which make up its multicharacter symbols. Tokenization can be accomplished using functions longest_match_tokenize and longest_match_tokenize_pair.

TransducerHandle pairstring_to_transducer	(	const char *	str,
		KeyTable *	T
	)

Create a one-path transducer as defined in pairstring form in str using the symbols defined in key table T.

The transitions must be written one after another separated by a space. (For automatic tokenization of symbols, see tokenize_pair_string.) If the input and output symbols are not equal, they are separated by a colon. If the backslash '\' and colon ':' are part of a symbol name, they must be escaped as "\\" and "\:".

For example the string "a:\: cd:e" represents a transducer with consecutive transitions mapping "a" to ":" and "cd" to "e".

See also:: transducer_to_pairstring

void print_transducer	(	TransducerHandle	t,
		KeyTable *	T,
		bool	print_weights = `false`,
		ostream &	ostr = `std::cout`,
		bool	old = `false`
	)

Print transducer t in text format using the symbols defined in key table T. The parameter print_weights indicates whether weights are included, the output stream ostr indicates where printing is directed. Parameter old indicates whether transducer t should be printed in old SFST text format instead of AT&T format.

In HFST the print_weight parameter is ignored.

In At&T and SFST format, the newline, horizontal tab, carriage return, vertical tab, formfeed, bell character, backspace, backslash and space are printed as "\n", "\t", "\r", "\v", "\f" "\a", "\b", "\\" and "\0x20". In SFST format, the colon and angle brackets are printed as "\:", "\<" and "\>".

See also:: read_transducer_text

KeyTable* read_symbol_table	(	istream &	is,
		bool	binary = `false`
	)

Read a symbol table from istream is and transform it to a key table. binary defines whether the symbol table is in binary or text format.

Key table and symbol table are two ways of representing key-to-string mappings. Key tables are used during a session and symbol tables when moving or storing information between sessions.

During a session, a key table associates keys to symbol handles and the global symbol cache associates symbol handles to strings.

Between sessions, a symbol table associates keys directly to strings, as there is no symbol cache.

A symbol table in OpenFst text format lists each symbol name and its associated key on one line. The symbol name and the associated key are separated by a tabulator. If several symbol names are associated to the same key, the one listed first is considered the primary print name for that key.

An example:


KeyTable          Global symbol cache      Symbol table            Symbol table in text format     
--------          -------------------      ------------            ---------------------------

Key  Symbol       Symbol    string         Key   string            <> TAB 0
                                                                   <eps> TAB 0
 0     0, 1         0         "<>"          0      "<>", "<eps>"   a TAB 1 
 1     2            1         "<eps>"       1      "a"             b TAB 2
 2     4            2         "a"           2      "b"             c TAB 3
 3     5            3         "A"           3      "c" 
                    4         "b"
                    5         "c"
                    6         "d"

TransducerHandle read_transducer	(	istream &	is,
		KeyTable *	T
	)

Read a transducer in binary form from input stream is and harmonize it according to the key table T.

Following notations are used: Ts = the transducer read from istream is and S = the symbol table of transducer Tr.

Harmonization is done in the following way:

If T is empty (made with create_key_table), S is copied to T as such and all keys used in Ts remain the same i.e. no harmonization is done.

If T is not empty, the harmonization goes as follows. For each input and output key in a transition in Ts, a corresponding primary print name is looked in S. A corresponding key value for this print name is then looked in T and the original input or output key is replaced with this key. Epsilon keys are copied as such (the primary name of epsilon is thus defined solely by T). If a primary print name used in Ts is not found in T, it is added to T and to the global symbol cache to the next free position.

Some special cases: (1) If a key used in Ts is not found in S, it is replaced by next free key in T, but it is not added to T as it has no print name (the side effect is that the key after next free key in T is associated with a dummy Symbol, so it is recommended that all keys used in Ts are in S.) (2) Keys defined in S that are not used in Ts are not copied to T.

Precondition:: The transducer read from istream is must have a symbol table stored with it.

Returns:: The harmonized version of the transducer read from istream is. If end of stream is reached, NULL.

TransducerHandle read_transducer_text	(	istream &	is,
		KeyTable *	T,
		bool	sfst = `false`
	)

Make a transducer as defined in text form in istream is using the key-to-printname relations defined in key table T. The parameter sfst defines whether SFST text format is used, otherwise AT&T format is used.

In At&T and SFST format, the newline, horizontal tab, carriage return, vertical tab, formfeed, bell character, backspace, backslash and space must be escaped as "\n", "\t", "\r", "\v", "\f" "\a", "\b", "\\" and "\0x20". In SFST format, the colon and angle brackets must be escaped as "\:", "\<" and "\>".

An example of a transducer file:

AT&T                                       AT&T UNWEIGHTED               SFST                         

0      0                                   0                             final  0
0      1      a      aa     0.3            0      1      a      aa       0      a:aa   1
0      2      b      b      0              0      2      b      b        0      b      2
1      0      c      C      0.5            1      0      c      C        1      c:C    0
2      1      \n     c      0              2      1      \n     c        2      \n:c   1
2      0      a      A      1.2            2      0      a      A        2      a:A    0
2      2      d      D      1.65           2      2      d      D        2      d:D    2
2      0.5                                 2                             final  2

The syntax of the lines in the text format is one of the following in the AT&T format:

originating_node TAB destination_node TAB input_symbol TAB output_symbol (TAB transition_weight)
final_node (TAB final_weight)

and one of the following in sfst format:

originating_node TAB input_symbol:output_symbol TAB destination_node
final TAB final_node

When AT&T format is used in HFST, weights are ignored. When SFST or AT&T unweighted format is used in HWFST, weights are set to zero.

Precondition:: All printnames used in the text format representation of the transducer must be in the key table T.

Returns:: A transducer as defined in is. If end of stream is reached, NULL.

See also:: print_transducer

KeyTable* recode_key_table	(	KeyTable *	kt,
		const char *	epsilon_replacement
	)

Replace the epsilon in kt, with epsilon_replacement.

When tokenizing input-strings, the strings should never contain a substring matching the symbol name of the epsilon key in the KeyTable used in tokenization. Therefore the epsilons in the tokenizer should be replaced by an internal epsilon-symbol, which is unlikely to occur in real input-strings.

recode_key_table returns a KeyTable, which is the same as kt, except the key 0 corresponds to the internal epsilon symbol name epsilon_replacement and the original epsilon symbol name corresponds to the first unused key in kt.

size_t size_pi_symbol ( SymbolPairSet * Pi )

Size of the iterator for the symbol pair set Pi.

size_t size_sigma_symbol ( SymbolSet * Si )

Size of the iterator for the symbol set Si.

KeyPairVector* tokenize_pair_string	(	TransducerHandle	tokeniser,
		char *	pairs,
		KeyTable *	inputKeys
	)

Tokenise with tokeniser a string s of individual characters and colon separated pairs into transducer.

E.g. a string cat+pl:s will be made to c a t +pl:s given that tokeniser creates such tokens.

Parameters:

	tokeniser	A transducer that, upon composing leftwards against transducer made of UTF-8 characters of string, results in acyclic tokenisation(s) of original path.
	pairs	UTF-8 encoded string for transducer
	inputKeys	`KeyTable` that matches mapping of UTF-8 characters on input side of tokeniser.

Returns:: Transducer that contains as paths all possible aligned tokenisation(s) of upper : lower.

Todo:: does not support ambiguous tokenisations (i.e. with more than one path.

KeyVector* tokenize_string	(	TransducerHandle	tokeniser,
		const char *	string,
		KeyTable *	inputKeys
	)

Change a string s into identity pair transducer as tokenised by tokeniser.

E.g. a string cat will be tokenised as transducer c a t, given that tokeniser creates tokens for c, a, and t.

Parameters:

	tokeniser	A transducer that, upon composing leftwards against transducer made of UTF-8 characters of string, results in acyclic tokenisation(s) of original path.
	string	UTF-8 encoded string for transducer pairs.
	inputKeys	`KeyTable` that matches mapping of UTF-8 characters on input side of tokeniser.

Returns:: Transducer that contains as paths of s tokenised with tokeniser.

Todo:: does not support ambiguous tokenisations (i.e. with more than one path.

KeyPairVector* tokenize_string_pair	(	TransducerHandle	tokeniser,
		const char *	upper,
		const char *	lower,
		KeyTable *	inputKeys
	)

Change 2 strings to a transducer aligned character by character according to tokenisation by tokeniser. The path(s) of result of composition of of string’s UTF-8 representations against tokeniser are paired up to a new tokeniser from beginning to end. Empty spaces in the end are filled with ε’s.

E.g. strings cat dog are aligned as c:d a:o g:t. Strings ääliö ääliöitä are aligned as ä ä l i ö ε:i ε:t ε:ä. And talo+NOUN+SINGULAR+NOMINATIVE talo as t a l o +NOUN:ε +SINGULAR:ε +NOMINATIVE:ε, given that tokeniser and keytable contains those symbols.

If specific alignment is required, it is possible to specify ε’s manually using the string for ε that is defined in inputKeys.

A tokeniser tokeniser may be built manually using or with functions, such as longestMatchTokeniser(...)

Parameters:

	tokeniser	A transducer that, upon composing leftwards against transducer made of UTF-8 characters of string, results in acyclic tokenisation(s) of original path.
	upper	UTF-8 encoded string for input side of transducer.
	lower	UTF-8 encoded string for output side of transducer.
	inputKeys	`KeyTable` that matches mapping of UTF-8 characters on input side of tokeniser.

Returns:: Transducer that contains as paths all possible aligned tokenisation(s) of upper : lower.

Todo:: does not support ambiguous tokenisations (i.e. with more than one path.

char* transducer_to_pairstring	(	TransducerHandle	t,
		KeyTable *	T,
		bool	spaces = `true`,
		bool	print_epsilons = `true`
	)

A pairstring representation of one-path transducer t using the symbols defined in key table T. spaces defines whether pairs are separated by spaces.

The transitions are printed one after another, separated by spaces if so requested. If the input and output symbols are not equal, they are separated by a colon. If the backslash '\' and colon ':' are part of a symbol print name, they are escaped as "\\" and "\:".

The empty transducer is represented by "\empty_transducer" and the epsilon transducer as "EPS" where EPS is the symbol name for epsilon (pairstring_to_transducer recognizes "" as the epsilon transducer, but "EPS" is a more user-friendly notation). If the symbol name for epsilon is not defined, "\epsilon" is returned.

See also:: pairstring_to_transducer

void write_runtime_transducer	(	TransducerHandle	t,
		KeyTable *	kt,
		FILE *	output_file
	)

Write a transducer t with key table kt into file output_file. Write its symbols into the file with name symbol_file_name.

void write_symbol_table	(	KeyTable *	T,
		ostream &	os,
		bool	binary = `false`
	)

Transform the key table T to a symbol table and write it to ostream os. binary defines whether the symbol table is written in binary or text format.

See also:: read_symbol_table

void write_transducer	(	TransducerHandle	t,
		KeyTable *	T,
		ostream &	os = `std::cout`,
		bool	backwards_compatibility = `false`
	)

Write t in binary form to output stream os. Key table T is stored with the transducer.

Parameters:

	t	Transducer to be written
	T	Key table that is stored with the transducer
	os	Where transducer is written
	backwards_compatibility	Whether the transducer is written in SFST/OpenFst compatible format.

Symbol Layer

Datatypes for Symbols and Symbol Alphabets

Defining and Using Symbols

Defining and Using Alphabets of Symbols

Iterators over Symbols

Defining and Using Symbol Pairs

Defining and Using Alphabets of Symbol Pairs

Iterators over Symbol Pairs

Defining the Connection between Symbols and Transducer Keys.

Reading Symbol Strings and Transducers

Writing Symbol Strings and Transducers

Detailed Description

Typedef Documentation

Function Documentation