HFST - Helsinki Finite-State Transducer Technology - Python API  version 3.9.0
Quick Start to HFST

Using HFST in your own code

After installing HFST on your computer, start python3 and execute import libhfst.

For example, the following simple program

 import libhfst
 
 tr1 = libhfst.regex('foo:bar')
 tr2 = libhfst.regex('bar:baz')
 tr1.compose(tr2)
 print(tr1)

should print to standard output the following text when run:

 0      1     foo    baz    0
 1      0


Structure of the API

The HFST API is written in module libhfst that includes the following classes:

There are also functions in module libhfst that are not part of any class. For example libhfst.fst


Examples of HFST functionalities

An example of creating a simple transducer from scratch and converting between transducer formats and testing transducer properties and handling exceptions:

 import libhfst
 # Create as HFST basic transducer [a:b] with transition weight 0.3 and final weight 0.5.
 t = libhfst.HfstBasicTransducer()
 t.add_state(1)
 t.add_transition(0, 1, 'a', 'b', 0.3)
 t.set_final_weight(1, 0.5)

 # Convert to tropical OpenFst format (the default) and push weights toward final state.
 T = libhfst.HfstTransducer(t)
 T.push_weights(libhfst.TO_FINAL_STATE)

 # Convert back to HFST basic transducer.
 tc = libhfst.HfstBasicTransducer(T)
 try:
     # Rounding might affect the precision.
     if (0.79 < tc.get_final_weight(1)) and (tc.get_final_weight(1) < 0.81):
         print("TEST PASSED")
         exit(0)
     else:
         print("TEST FAILED")
         exit(1)
 # If the state does not exist or is not final
 except libhfst.HfstException:
     print("TEST FAILED: An exception thrown.")
     exit(1)

An example of creating transducers from strings, applying rules to them and printing the string pairs recognized by the resulting transducer.

 import libhfst
 libhfst.set_default_fst_type(libhfst.FOMA_TYPE)
 
 # Create a simple lexicon transducer [[foo bar foo] | [foo bar baz]].
 tok = libhfst.HfstTokenizer()
 tok.add_multichar_symbol('foo')
 tok.add_multichar_symbol('bar')
 tok.add_multichar_symbol('baz')
 
 words = libhfst.tokenized_fst(tok.tokenize('foobarfoo'))
 t = libhfst.tokenized_fst(tok.tokenize('foobarbaz'))
 words.disjunct(t)
 
 # Create a rule transducer that optionally replaces 'bar' with 'baz' between 'foo' and 'foo'.
 rule = libhfst.regex('bar (->) baz || foo _ foo')
 
 # Apply the rule transducer to the lexicon.
 words.compose(rule).minimize()
 
 # Extract all string pairs from the result and print them to standard output.
 results = 0
 try:
     # Extract paths and remove tokenization
     results = words.extract_paths(output='dict')
 except libhfst.TransducerIsCyclicException:
     # This should not happen because transducer is not cyclic.
     print("TEST FAILED")
     exit(1)
 
 for input,outputs in results.items():
     print('%s:' % input)
     for output in outputs:
         print('  %s\t%f' % (output[0], output[1]))

The output:

 foobarfoo:
   foobarfoo     0.000000
   foobazfoo     0.000000
 foobarbaz:
   foobarbaz     0.000000