Average-case Complexity of Automata based Computation - AVATAR

Planning

About

Automata and formal languages are omnipresent in Computer Science. In spite of their relatively limited expressive power, regular languages are important in many areas. With an increasing number of applications, the descriptional complexity of formal languages has become a major topic of research in the last 25 years. Given a measure, the descriptional complexity of a language is the size of its smallest representation. It is important to know how this size varies when several of such representations are combined or transformed, since this has a direct influence on the performance and the amount of resources need. Having smaller objects improves our control on software, that becomes simpler, more efficient and reliable.

Computational models can be either deterministic or nondeterministic. Even for finite automata where deterministic (DFAs) and nondeterministic (NFAs) have the same expressive power, there is a trade-off between descriptional and computational complexity. NFAs can provide an exponentially more succinct description than DFAs (i.e., determinisation of an NFA with n states has in the worst-case 2^n states). On the other hand, many decision problems that are solvable in polynomial time for DFAs, e.g. equivalence or inclusion, are computationally hard for NFAs. Most of the work done in this field addresses worst-case complexity. However, in practice, the average-case analysis provides more accurate information about the needed resources. Unfortunately, up to now not much effort has been made in this direction. With the aim of filling this gap, and taking advantage of the team's expertise, this proposal focus on the descriptional complexity of the conversion of nondeterministic to deterministic models of regular languages or transductions, from an average case point of view.

In recent years, we studied the average size of different NFA constructions from regular expressions REs, using the framework of analytic combinatorics. In some cases one needs to use generating functions implicitly defined by algebraic curves, and for these cases we developed new methods to extract the required information for the asymptotic estimates.

These new techniques will be crucial to tackle the major problem of the average-case complexity of NFA determinisation, which has a decisive impact on the operational (state) complexity. The state complexity of a regular language is given by the number of states of its minimal DFA. The state complexity of an operation on regular languages (or operational state complexity) is the state complexity (in the worst or the average) of the language resulting from the operation, considered as a function of the state complexities of its operands. For instance, given two DFAs with state complexities n and m, to obtain a DFA for the concatenation language one first computes an NFA for the concatenation, which is then determinised and minimised. The number of states f(n,m) of the final automaton is the operational state complexity of the concatenation. The costly operation in this process is determinisation.

The problem of restricting the degree of nondeterminism of a model and its consequence for the descriptional complexity of operations is another challenge we want to tackle. In that regard, the notion of (un)ambiguity plays also a major role. Ambiguity is a fundamental concept in parsing, pattern matching, XML processing and security. In most applications, instead of NFAs, regular expressions (RE) are used to represent regular languages.

Several classes of unambiguous REs have been studied. Using derivative based conversions we have enlarged some of those classes, and our present goal is to develop feasible decision procedures for restricted unambiguous REs.

A major challenge in current applications is the huge size (and diversity) of alphabets. This is particularly critical in machine learning algorithms, that use regular expressions for natural language descriptions. To have succinct representations of models over large alphabets, we introduced automata and expressions with symbolic and alphabet invariant labels. In this project we will study the descriptional complexity of the resulting models, as well as to develop feasible algorithms on them.

Overall, our aim is to push forward on average complexity results for core problems of many automata based algorithms with wide applications. We expect to contribute to the development of more efficient automata based methods, by a fine-grained analysis of several regular classes. A main break through is, to obtain, for the first time average case results for determinisation, both using the framework of analytic combinatorics or considering the probabilistic distributions for particular language classes.

Research Group

Semigroups, Automata and Languages

Duration

18 months

Financial Support

49 937,80

Start Date

2025-01-01