Can someone please tell me what a finite state transducer is?
I have read the Wikipedia article and don't understand a thing.
Can someone please tell me what a finite state transducer is?
I have read the Wikipedia article and don't understand a thing.
A finite state transducer (FST) is a finite state automaton (FSA, FA) which produces output as well as reading input, which means it is useful for parsing (while a "bare" FSA can only be used for recognizing, i.e. pattern matching).
An FST consists of a finite number of states which are linked by transitions labeled with an input/output pair. The FST starts out in a designated start state and jumps to different states depending on the input, while producing output according to its transition table.
FSTs are useful in NLP and speech recognition because they have nice algebraic properties, most notably that they can be freely combined (form an algebra) under composition, which implements relational composition on regular relations (think of this as non-deterministic function composition) while staying very compact. FSTs can do parsing of regular languages into strings in linear time.
As an example, I once implemented morphological parsing as a bunch of FSTs. My main FST for verbs would turn a regular verb, say "walked", into "walk+PAST". I also had an FST for the verb "to be", which would turn "is" into "be+PRESENT+3rd" (3rd person), and similarly for other irregular verbs. All the FSTs were combined into a single one using an FST compiler, which produced a single FST that was much smaller than the sum of its parts and ran very fast. FSTs can be built by a variety of tools that accept an extended regular expression syntax.
A finite state transducer essentially is a finite state automaton that works on two (or more) tapes. The most common way to think about transducers is as a kind of ``translating machine''. They read from one of the tapes and write onto the other. This, for instance, is a transducer that translates
a
s intob
s:
a:b
at the arc means that in this transition the transducer readsa
from the first tape and writesb
onto the second.
Reference: Finite State Transducers
In as simple terms as possible, I understand that an FST is essentially a "thing" that moves from one state to the next based on an input tape and writes to a different output tape. A tape is essentially a set of inputs like characters in a string.
The entire FST is represented by a set of states and links between them. A link is "activated" when its input condition is correct and then gives then next state the adjusted tape.
For example let's say an FST starts with the tape abc
at state 1. A link to state 2 matches a
and changes that to b
. This would get activated, set the output tape to just b
, and pass the remaining bc
to state 2. As you can see, each state is only activated if there is a link to it whose input condition was correct, passes the remaining input to the next state, and writes to a separate output tape. Each FST runs across the tape once and output to another tape once.
To get a more clear understanding of them read and take a look at the diagrams in this article (original broken link).
The direct answer: a finite state transducer is just a finite state automaton - but one that has both input and output words. This is best understood in algebraic terms.
A language, A, over an alphabet X is a subset of the set X* of all finite words drawn from X; i.e. A ⊆ X*. A translation, B, with input alphabet X and output alphabet Y is a subset of B ⊆ X* × Y*. This can be generalized to allow for three or more channels or tiers, e.g. C ⊆ X* × Y* × Z*, though curiously, the concept is apparently not seen in the literature; even though the idea of having multiple tiers or channels appears all over the place in applications, like in multi-agent communication protocols, or in music and video production, as well as in theoretical linguistics.
The set X* is endowed with an algebra, with word concatenation as the product, and the empty word, denoted here as 1, as the identity. It satisfies the two axioms:
Associativity: (uv)w = u(vw), for all u, v, w ∈ X*;
Identity: w1 = w = 1w, for all w ∈ X*;
Together, these comprise the defining properties of a monoid. The set X* is a free monoid, because no other relations hold over it, except those which come from the axioms; so X* is the monoid freely generated from the set X.
One can define the product M × N of any two monoids, M and N, as a monoid, by taking (1,1) as the identity and defining products as (m,n)(m',n') = (mm',nn'), where m, m' ∈ M and n, n' ∈ N.
The product M × N is the result of throwing the elements of M and N together and imposing the relations mn = nm, for m ∈ M and n ∈ N: by punning each m ∈ M as (m,1) ∈ M × N, and each n ∈ N as (1,n) ∈ M × N; with the identity (m,1)(1,n) = (m,n) = (1,n)(m,1) being a punning of the relation mn = nm, itself. Among other things, that means M × N is (almost) never a free monoid.
In particular, X* × Y* is equivalently described as the free monoid (X ∪ Y)*, itself, subject to the identities xy = yx, for x ∈ X and y ∈ Y ... provided that X and Y are non-overlapping, X ∩ Y = ∅.
Finite state automata can be defined over any monoid. This applies equally well to all other families of automata described in the classical literature: push-down automata, turing machines, etc. When the automaton is over a product monoid, particularly one of the form X* × Y*, then it's a transducer.
The respective families are 𝒫X* for languages over X and 𝒫(X* × Y*) for translations over inputs X and outputs Y; where the power set 𝒫S of a set S is defined by the condition Z ∈ 𝒫S ⇔ Z ⊆ S.
The power set 𝒫M of a monoid M inherits the operations of M by taking the singleton set {1} ∈ 𝒫M as the identity, and the defining product AB of sets A, B ∈ 𝒫M as AB = { ab ∈ M: a ∈ A, b ∈ B }. This has the same effect of just throwing each m ∈ M into 𝒫M as {m} ∈ 𝒫M, and endowing the algebra with an ordering relation A ≥ B ⇔ A ⊇ B ⇔ B ⊆ A ⇔ B ≤ A, and a "sum" operator given as unions: ∑Y ≡ ⋃Y, for Y ⊆ 𝒫M. This is the least upper bound operator, or supremum, with respect to the ordering relation "≥": ∑Y = ⋁Y. In particular, you have the regular expression operators:
0 ≡ ⋁∅ = ∅
A + B ≡ ⋁{A,B} = A ∪ B
A* ≡ ⋁_{n≥0} Aⁿ = {1} ∪ A ∪ A² ∪ A³ ⋯
Arbitrary distributivity holds, which can be expressed equivalently in either of the following ways:
A(⋁Y)B = ⋁{ACB ∈ 𝒫M: C ∈ Y}, for all A, B ∈ 𝒫M and Y ⊆ 𝒫M
(⋁Y)(⋁Z) = ⋁{AB ∈ 𝒫M: A ∈ Y, B ∈ Z}, for Y, Z ⊆ 𝒫M
An algebra that extends a monoid by including an ordering relation and a least upper bound operator over arbitrary subsets that satisfies distributivity is called a quantale with unit (here, the unit is {1}). The more general variety of quantales don't have to include a unit.
Both 𝒫X* and 𝒫(X* × Y*) are quantales, as is 𝒫M for arbitrary monoids. The quantale 𝒫X* of languages over X is freely generated from the set X, while (as before) the quantale 𝒫(X* × Y*) of translations over X and Y can be considered to be generated from (X ∪ Y)*, subject to the identities xy = yx, for x ∈ X and y ∈ Y ... provided that X and Y are non-overlapping, X ∩ Y = ∅.
More generally, for any monoid M, the quantale 𝒫M is the result of just taking M, itself, and attaching the ordering relation and least upper bound operator on it, so it is a free extension of M. It inherits the structure of M.
The family ℛM ⊆ 𝒫M of rational subsets of M comprise the family containing the singleton subsets {m} ∈ ℛM, for each m ∈ M; finite least upper bounds (particularly: ∅ ∈ ℛM and A ∪ B ∈ ℛM for each A, B ∈ ℛM); products (AB ∈ ℛM, for A, B ∈ ℛM) and stars (A* ∈ ℛM for A ∈ ℛM).
The family of subsets of a monoid M that are recognized by a finite state automaton are also its rational subsets ℛM. The Kleene Theorem generalizes to arbitrary monoids.
The regular expressions over X are the elements of ℛX*, while the rational transductions over input alphabet X and output alphabet Y are elements of ℛ(X* × Y*).
The elements of ℛM are generally called "rational expressions" (rather than regular expressions), while the specialization to free monoids M = X* is where the term "regular expressions" is used.
The distributivity property gives rise to the following limited forms of distributivity:
The Zero Property: A0B = 0, for A, B ∈ 𝒫M
(Additive) Distributivity: A(B+C)D = ABD + ACD, for A, B, C, D ∈ 𝒫M
*-Continuity: AC + ABC + AB²C + AB³C + ⋯ = A B* C, for A, B, C, ∈ 𝒫M
which applies generally to 𝒫M, but specifically also to the subfamily ℛM.
Since ℛM is, itself, a monoid, you can also consider automata that have rational expressions from ℛM on their arcs. Allowing regular expressions on the arcs is a common convention ... and it generalizes to arbitrary monoids. In that case, the corresponding family is ℛℛM.
As an algebra, the families ℛM are instances of what are known as *-continuous Kleene algebras: they are reduced forms of quantales in which the least upper bound operator (and distributivity property) are only postulated to hold over rational subsets, rather than over arbitrary subsets. That means: if Y ∈ ℛℛM then ⋁Y = ⋃Y ∈ ℛM, and ⋁{ACB: C ∈ Y} = A(⋁Y)B, for A, B ∈ ℛM and Y ∈ ℛℛM.
Larger classes of automata/transducers and language/translation families can be considered - and they can each be generalized to monoids. The same applies to grammars.
So, the family of context-free subsets of a monoid M may be denoted 𝒞M; which specializes to 𝒞X*, for the context-free languages over alphabet X, and the family 𝒞(X* × Y*), whose corresponding class of grammars and automata are, respectively, called "simple syntax directed translations" and "push-down transducers". The algebra 𝒞M is a reduced version of quantale that has least upper bounds over its own context-free subsets, 𝒞𝒞M, which is distributive over the range of subsets. They're are now known to be equivalently described as what are called μ-continuous Chomsky-algebras. They are the same as a Kleene algebra, except that the *-continuity property is generalized to:
μ-continuity: ⋁{A 0 B, A f(0) B, A f(f(0)) B, ⋯} = A (μx f(x)) B
ranging over all Kleene-algebraic functions f(x); where μx f(x) denotes the least fixed point solution to x ≥ f(x). The Kleene-star, for instance, can be defined by, A* = μx (1 + Ax) or by A* = μx (1 + xA). So, the *-continuity property is a limited version of the μ-continuity property.
Structural relations between the different algebras are now known. In an algebraic formalism that inherits the monoid structure, one may be able to define the tensor product A ⊗ B of algebras A and B as the algebra that results from throwing A and B together and subjecting the result to the relations ab = ba for a ∈ A, b ∈ B.
For monoids, the "tensor product" is just the direct product, already described above. The other classes - the *-continuous Kleene algebra, the μ-continuous Chomsky algebra and unital quantales (the ones corresponding to ℛ, 𝒞, 𝒫, respectively) also have their varieties of "tensor product", which might be denoted ⊗ℛ, ⊗𝒞 and ⊗𝒫, respectively, to make the distinction clear.
The chief property is
ℛ(M × N) ≅ ℛM ⊗ℛ ℛN,
𝒞(M × N) ≅ 𝒞M ⊗𝒞 𝒞N,
𝒫(M × N) ≅ 𝒫M ⊗𝒫 𝒫N,
for arbitrary monoids, M and N. So, the *-continuous Kleene algebra of rational transductions with input alphabet X and output alphabet Y can be expressed equivalently as the tensor product of regular expression algebras:
ℛ(X* × Y*) ≅ ℛX* ⊗ℛ ℛY*.
It's now also known that the Chomsky-Schützenberger Theorem, in which Chomsky and Schützenberger very nearly created an algebraic foundation for the upward extension of the regular expressions to "context-free expressions", is a precursor to a result - only recently established - that shows that it actually does; namely showing, for arbitrary monoids M, that
𝒞M ⊆ ℛM ⊗ℛ C₂
where the C₂ is the algebra containing {b,d,p,q} subjected to the identities
bd = 1 = pq, bq = 0 = pd, db + qp = 1.
It can also be established with 𝒞M ⊆ ℛM ⊗ℛ C₂', where C₂' is subjected only to the identities bd = 1 = pq, bq = 0 = pd and not to db + qp = 1. So, in effect, Chomsky co-authored a new formalism for context-free expressions in the early 1960's; without realizing it at the time (he knows it now). He's like the Faraday of formal language and automata theory.
So, among other things, push-down transductions can be expressed as rational transductions on finite state transducers, provided that you also allow elements of C₂' on the arcs. A push-down transducer, itself, can be directly expressed in that way.
A similar family 𝒯M exists for the Turing-computeable subsets of a monoid M; and the concepts of "type 0 grammar", Turing machines & transducers all generalize to arbitrary monoids. The corresponding property 𝒯(M × N) ≅ 𝒯M ⊗𝒯 𝒯N holds. So, Turing transducers can also be expressed as finite state transducers in which elements of C₂ ⊗ℛ C₂ are permitted on the arcs. It is also believed that the following analogue to the algebraic version of the Chomsky-Schützenberger Theorem holds: 𝒯M ⊆ ℛM ⊗ℛ C₂ ⊗ℛ C₂ - which would give rise to a further upward extension of "context-free expressions" to "Turing expressions".
© 2022 - 2024 — McMap. All rights reserved.