JAutomata Documentation

Here you'll find a short tutorial on the JAutomata API. Detailed information on classes and methods can be found in the Javadocs.

The Automaton Class

The most important structure in the JAutomata API is the Automaton<L, K> interface. Every automaton class implements this interface, and every automaton is an instance of such a such a class. The following methods of this class define an automaton:
Collection<Object> initialStates() the set of initial states
Collection<Object> transitionsOut(Object state) the set of outgoing transition of the specified state
K initialWeight(Object state) the initial weight of the specified state
K initialWeight(Object state) the final weight of the specified state
Object from(Object transition) the source state of the specified transition
Object to(Object transition) the destination state of the specified transition
L label(Object transition) the label of the specified transition
K weight(Object transition) the weight of the specified transition
Semiring semiring() the semiring over which the automaton is defined
States and transition can be any type of object, such as Integer, String, or other types. The only requirement is that the equals(Object) and hashCode() methods are properly implemented. Automaton<L, K> is generic, with type parameters L and K.

Label types, Acceptors, Transducers, and Multi-Tape Automata

Type parameter L indicates the label type. JAutomata supports acceptors, transducers, and multi-tape automata. The label type specifies to which of these types an automaton belongs. Also, the label type determines the alphabet(s) of the automaton. Acceptors are automata with a single tape. The label type specifies the type of the elements of the alphabet. This can be any type, such as Character or String.

Transducers are automata with two tapes: an input tape and an output tape. In JAutomata, transducers have generic label type TLabel<I, O>, where I is the type of the input alphabet and O is the type of the output alphabet. The type TLabel<I, O> has methods I in() and O out(), which return the input label and the output label. Transducers optionally implement the convenience interface Transducer<I, O, K>, which extends Automaton<TLabel<I, O>, K>. The Transducer<I, O, K> interface has no additional methods, but by using it instead of Automaton<TLabel<I, O>, K> limits the size of type specifications.

Multi-tape automata are automata with an arbitrary number of tapes. In JAutomata, multi-tape automata have generic label type MTLabel<T, L>, where T is the tape type and L is the tape label type. A multi-tape label assigns a tape label to every tape. The type MTLabel<T, L> has a method Collection<T> tapes(), which returns the set of tapes of this label. The method L tapeLabel(T tape) returns the tape label of the specified tape. Multi-tape automata implement the interface MTAutomaton<T, L, K>, which extends Automaton<MTLabel<T, L>, K>. This interface has the method Collection<T< tapes(), which returns the tapes of the automaton. Also, using it instead of Automaton<MTLabel<T, L>, K> limits the size of type specifications.

Unweighted and Weighted Automata and Semirings

JAutomata supports both weighted and unweighted automata. Whether an automaton is weighted depends on its weight type. The weight type an unweighted automaton is Boolean. An unweighted automaton defines a set of strings over an alphabet. Unweighted automata are equivalent to regular expressions, i.e. each set of strings defined by a regular expression can be defined a an unweighted automaton, and vice versa. The weight type of a weighted automaton is Double. A weighted automaton defines a function that maps strings to real number. A common application is to describe probability distributions over strings. In addition to a weight type, every automaton is defined over a semiring. In JAutomata, semirings are instances of classes that implement the Semiring<K> interface, where K is the weight type. The semiring() method returns the semiring of an automaton. Examples of semirings are the boolean semiring, the probabilistic semiring, and the tropical semiring. Unweighted automata are defined over the boolean semiring, which implements Semiring<Boolean>. Weighted automata are defined over semirings that implements Semiring<Double>, such as the probabilistic semiring and the tropical semiring. The probabilistic semiring is used to compute string probabilities, and the tropical semiring is used to compute the shortest path over an automaton.

Topological Orders over States

An automaton can optionally provide a topological order on its states. The topological order is an implementation of Comparator<Object>. An automaton's topological order is returned by its topologicalOrder() method. Topological orders are used by certain algorithms to make them faster. If an automaton has no topological order or if its topological order is not known, then the topologicalOrder() method returns null.

Creating Automata

Automata can be created in a variety of ways. They can be read from files, they can be created using Java code, or they can be created from existing automata, by performing operations on them. Here I briefly describe the first two methods. Operations are described below.

Reading Automata from Files

JAutomata supports file input for acceptors and transducers. Acceptors are read from files using the AcceptorIO class, and transducers are read from files using the TransducerIO class. Both classes contain static methods readUnweighted, readWeighted and read. readUnweighted is used for unweighted automata, and readWeighted is used for weighted automata over the probabilistic semiring. read is used for automata over arbitrary semirings, specified by an argument.

In the following example, an unweighted acceptor is read from a file:

  1. File file = new File("filename");
  2. InputStreamReader isr = new InputStreamReader(new FileInputStream(file, "UTF-8"));
  3. Automaton<Character, Boolean> acceptor = AcceptorIO.readUnweighted(isr);

The first two lines create a Reader for a file. The third line loads an unweighted acceptor from the file.

Similar to the previous example, a weighted transducer over the probabilistic semiring is read from a file in the following way:

  1. File file = new File("filename");
  2. InputStreamReader isr = new InputStreamReader(new FileInputStream(file, "UTF-8"));
  3. Automaton<Character, Double> transducer = TransducerIO.readWeighted(isr);

A weighted transducer over the tropical semiring is read from a file in the following way:

  1. File file = new File("filename");
  2. InputStreamReader isr = new InputStreamReader(new FileInputStream(file, "UTF-8"));
  3. Automaton<Character, Double> transducer = TransducerIO.read(isr, new TropicalSemiring());

Creating Automata Using Java Code

One way of creating an automaton is by creating a class that implements Automaton<L, K> and by creating an instance of that class. Though this is the most flexible way, it is also the most time consuming way. For many purposes, there are some predefined classes available. One of them is EditableAutomaton<L, K>, which allows automata to be defined using its addState and addTransition methods. The states and transitions of this class are integers. The method addState creates a state with the specified initial and final weights, and returns the int that identifies the newly created state. The method addTransition creates a transition with the specified source state, destination state, label and weight, and returns the int that identifies the newly created transition. In the following example, a weighted acceptor over the probabilistic semiring is created that accepts strings that start with "b" followed by an arbitrary number of "a"s ("b", "ba", "baa", "baaa", etc). The string weights of the automaton decrease exponentially, such that their total sum is one.

  1. Automaton<Character, Double> a = new EditableAutomaton<Character, Double>(new RealSemiring()); // probabilistic semiring uses RealSemiring
  2. int s1 = a.addState(1.0, 0.0); // Create initial state (initial weight 1.0, final weight 0.0)
  3. int s2 = a.addState(0.0, 1.0); // Create final state (initial weight 0.0, final weight 1.0)
  4. a.addTransition(s1, s2, 'b', 0.4); // Create transition from s1 to s2
  5. a.addTransition(s2, s2, 'a', 0.6); // Create transition from s2 to s2

Weights, Shortest Paths, and Best Strings

JAutomata provides methods to compute the weights of strings, the n shortest paths, and the n best strings. Methods for these computations are provided by the Automata class (not to be confused with the Automaton<L, K> interface).

The weight of a string can be computed using the Automata.stringWeight(Automaton<L, K> automaton, List<L> string) method. Strings of elements are represented by Lists. For automata with label type Character, there is also the convenience method Automata.stringWeight(Automaton<L, K> automaton, String string), which allows strings of characters to be described by String objects.

The n shortest paths of an automaton can be computed using the Automata.shortestPaths(Automaton<L, K> automaton, int numPaths) method. This method returns a list of up to n paths. A path has type Path>L, K> and contains the sequence of transitions that define the path, its concatenated label, and the weight of the path. In case of a probabilistic automaton, the method returns the paths with the highest probability, and in case of an unweighted automaton, the method returns up to n accepted paths.

The n best strings of an automaton can be computed using the Automata.bestStrings(Automaton<L, K> automaton, int numStrings) method. This method returns a list of up to n paths with unique strings. In case of an unweighted automaton, the method returns up to n accepted strings.

Operations

Operations take one or more automata as input, and produce a new automaton as output. JAutomata supports a number of operations on automata:

Most operations are found as static methods in the Operations class. Transducer-specific operations are found in the Transducers class, and operations specific to multi-taoe automata are found in the MTAutomata class.