Here you'll find a short tutorial on the JAutomata API. Detailed information on classes and methods can be found in the Javadocs.
The most important structure in the JAutomata API is the Automaton<L, K> interface. Every automaton class implements this interface, and every automaton is an instance of such a such a class. The following methods of this class define an automaton:
| Collection<Object> initialStates() | the set of initial states |
| Collection<Object> transitionsOut(Object state) | the set of outgoing transition of the specified state |
| K initialWeight(Object state) | the initial weight of the specified state |
| K initialWeight(Object state) | the final weight of the specified state |
| Object from(Object transition) | the source state of the specified transition |
| Object to(Object transition) | the destination state of the specified transition |
| L label(Object transition) | the label of the specified transition |
| K weight(Object transition) | the weight of the specified transition |
| Semiring |
the semiring over which the automaton is defined |
Type parameter L indicates the label type. JAutomata supports acceptors, transducers, and multi-tape automata. The label type specifies to which of these types an automaton belongs. Also, the label type determines the alphabet(s) of the automaton. Acceptors are automata with a single tape. The label type specifies the type of the elements of the alphabet. This can be any type, such as Character or String.
Transducers are automata with two tapes: an input tape and an output tape. In JAutomata, transducers have generic label type TLabel<I, O>, where I is the type of the input alphabet and O is the type of the output alphabet. The type TLabel<I, O> has methods I in() and O out(), which return the input label and the output label. Transducers optionally implement the convenience interface Transducer<I, O, K>, which extends Automaton<TLabel<I, O>, K>. The Transducer<I, O, K> interface has no additional methods, but by using it instead of Automaton<TLabel<I, O>, K> limits the size of type specifications.
Multi-tape automata are automata with an arbitrary number of tapes. In JAutomata, multi-tape automata have generic label type MTLabel<T, L>, where T is the tape type and L is the tape label type. A multi-tape label assigns a tape label to every tape. The type MTLabel<T, L> has a method Collection<T> tapes(), which returns the set of tapes of this label. The method L tapeLabel(T tape) returns the tape label of the specified tape. Multi-tape automata implement the interface MTAutomaton<T, L, K>, which extends Automaton<MTLabel<T, L>, K>. This interface has the method Collection<T< tapes(), which returns the tapes of the automaton. Also, using it instead of Automaton<MTLabel<T, L>, K> limits the size of type specifications.
JAutomata supports both weighted and unweighted automata. Whether an automaton is weighted depends on its weight type. The weight type an unweighted automaton is Boolean. An unweighted automaton defines a set of strings over an alphabet. Unweighted automata are equivalent to regular expressions, i.e. each set of strings defined by a regular expression can be defined a an unweighted automaton, and vice versa. The weight type of a weighted automaton is Double. A weighted automaton defines a function that maps strings to real number. A common application is to describe probability distributions over strings. In addition to a weight type, every automaton is defined over a semiring. In JAutomata, semirings are instances of classes that implement the Semiring<K> interface, where K is the weight type. The semiring() method returns the semiring of an automaton. Examples of semirings are the boolean semiring, the probabilistic semiring, and the tropical semiring. Unweighted automata are defined over the boolean semiring, which implements Semiring<Boolean>. Weighted automata are defined over semirings that implements Semiring<Double>, such as the probabilistic semiring and the tropical semiring. The probabilistic semiring is used to compute string probabilities, and the tropical semiring is used to compute the shortest path over an automaton.
An automaton can optionally provide a topological order on its states. The topological order is an implementation of Comparator<Object>. An automaton's topological order is returned by its topologicalOrder() method. Topological orders are used by certain algorithms to make them faster. If an automaton has no topological order or if its topological order is not known, then the topologicalOrder() method returns null.
Automata can be created in a variety of ways. They can be read from files, they can be created using Java code, or they can be created from existing automata, by performing operations on them. Here I briefly describe the first two methods. Operations are described below.
JAutomata supports file input for acceptors and transducers. Acceptors are read from files using the AcceptorIO class, and transducers are read from files using the TransducerIO class. Both classes contain static methods readUnweighted, readWeighted and read. readUnweighted is used for unweighted automata, and readWeighted is used for weighted automata over the probabilistic semiring. read is used for automata over arbitrary semirings, specified by an argument.
In the following example, an unweighted acceptor is read from a file:
The first two lines create a Reader for a file. The third line loads an unweighted acceptor from the file.
Similar to the previous example, a weighted transducer over the probabilistic semiring is read from a file in the following way:
A weighted transducer over the tropical semiring is read from a file in the following way:
One way of creating an automaton is by creating a class that implements Automaton<L, K> and by creating an instance of that class. Though this is the most flexible way, it is also the most time consuming way. For many purposes, there are some predefined classes available. One of them is EditableAutomaton<L, K>, which allows automata to be defined using its addState and addTransition methods. The states and transitions of this class are integers. The method addState creates a state with the specified initial and final weights, and returns the int that identifies the newly created state. The method addTransition creates a transition with the specified source state, destination state, label and weight, and returns the int that identifies the newly created transition. In the following example, a weighted acceptor over the probabilistic semiring is created that accepts strings that start with "b" followed by an arbitrary number of "a"s ("b", "ba", "baa", "baaa", etc). The string weights of the automaton decrease exponentially, such that their total sum is one.
JAutomata provides methods to compute the weights of strings, the n shortest paths, and the n best strings. Methods for these computations are provided by the Automata class (not to be confused with the Automaton<L, K> interface).
The weight of a string can be computed using the Automata.stringWeight(Automaton<L, K> automaton, List<L> string) method. Strings of elements are represented by Lists. For automata with label type Character, there is also the convenience method Automata.stringWeight(Automaton<L, K> automaton, String string), which allows strings of characters to be described by String objects.
The n shortest paths of an automaton can be computed using the Automata.shortestPaths(Automaton<L, K> automaton, int numPaths) method. This method returns a list of up to n paths. A path has type Path>L, K> and contains the sequence of transitions that define the path, its concatenated label, and the weight of the path. In case of a probabilistic automaton, the method returns the paths with the highest probability, and in case of an unweighted automaton, the method returns up to n accepted paths.
The n best strings of an automaton can be computed using the Automata.bestStrings(Automaton<L, K> automaton, int numStrings) method. This method returns a list of up to n paths with unique strings. In case of an unweighted automaton, the method returns up to n accepted strings.
Operations take one or more automata as input, and produce a new automaton as output. JAutomata supports a number of operations on automata:
Most operations are found as static methods in the Operations class. Transducer-specific operations are found in the Transducers class, and operations specific to multi-taoe automata are found in the MTAutomata class.