Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook...

19
Handbook of Formal Languages Editors: G. Rozenberg A. Salomaa Advisory Board: J. Berstel C. Calude K. Culik 11 J. Engelfriet H. Jürgensen J. Karhumäki W. Kuich M. Nivat G. Päun A. Restivo W. Thomas D. Wood

Transcript of Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook...

Page 1: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Handbook of Formal Languages

Editors: G. Rozenberg A. Salomaa

Advisory Board: J. Berstel C. Calude K. Culik 11 J. Engelfriet H. Jürgensen J. Karhumäki W. Kuich M. Nivat G. Päun A. Restivo W. Thomas D. Wood

Page 2: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Springer-Verlag Berlin Heidelberg GmbH

Page 3: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

G. Rozenberg A. Salomaa (Eds.)

Handbook of Formal Languages

Volume2

Linear Modeling: Background and Application

With 87 Figures

" Springer

Page 4: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Prof. Dr. Grzegorz Rozenberg Leiden University Department of Computer Science P .0. Box 9512 NL-2300 RA Leiden The Netherlands

Library of Congress Cataloging-in-Publication Data

Prof. Dr. Arto Salomaa Turku Centre for Computer Science Data City FIN-20520 Turku Finland

Handbook of formallanguages I G. Rozenberg, A. Salomaa, (eds.). p.cm. Includes bibliographical references and index. Contents: v. 1. Word, language, grammar - v.2. Linear modeling: background and application - v. 3. Beyond words. ISBN 978-3-642-08230-6 ISBN 978-3-662-07675-0 (eBook) DOI 10.1007/978-3-662-07675-0

1. Formallanguages. 1. Rozenberg, Grzegorz. II. Salomaa, Arto. QA267.3.H36 1997 511.3 - DC21 96-47134

CIP

CR Subject Classification (1991): F.4 (esp. F.4.2-3), F.2.2, A.2, G.2, 1.3, D.3.1, E.4

ISBN 978-3-642-08230-6

This work is subject to copyright. AII rights are reserved, whether the whole part of the material is concerned, specifically the rights of translation, reprinting, reuse of iIIustrations .. recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag Berlin Heidelberg GmbH.Violations are liable for prosecution under the German Copyright law.

©Springer-Verlag Berlin Heidelberg 1997 Originally published by Springer-Verlag Berlin Heidelberg New York in 1997 Softcover reprint of the hardcover 1 st edition 1997

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

Cover design: MetaDesign, Berlin Typesetting: Data conversion by lewis & leins, Berlin SPIN: 11326724 45/3111 - 543 2 1 - Printed on acid-free paper

Page 5: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Preface

The need for a comprehensive survey-type exposition on formal languages and related mainstream areas of computer science has been evident for some years. In the early 1970s, when the book Formal Languages by the second­mentioned editor appeared, it was still quite feasible to write a comprehensive book with that title and include also topics of current research interest. This would not be possible anymore. A standard-sized book on formal languages would either have to stay on a fairly low level or else be specialized and restricted to some narrow sector of the field.

The setup becomes drastically different in a collection of contributions, where the best authorities in the world join forces, each of them concentrat­ing on their own areas of specialization. The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state of the art in formallanguage theory. We were most satisfied with the enthusiastic response given to our request for contributions by specialists representing various subfields. The need for a Handbook of Formal Languages was in many answers expressed in different ways: as an easily accessible his­torical reference, a general source of information, an overall course-aid, and a compact collection of material for self-study. We are convinced that the final result will satisfy such various needs.

The theory of formal languages constitutes the stern or backbone of the field of science now generally known as theoretical computer science. In a very true sense its role has been the same as that of philosophy with respect to science in general: it has nourished and often initiated a number of more specialized fields. In this sense formal language theory has been the origin of many other fields. However, the historical development can be viewed also from a different angle. The origins of formal language theory, as we know it today, come from different parts of human knowledge. This also explains the wide and diverse applicability of the theory. Let us have a brief look at some of these origins. The topic is discussed in more detail in the introductory Chapter 1 of Volume 1.

The main source of the theory of formallanguages, most clearly visible in Volume 1 of this Handbook, is mathematics. Particular areas of mathe­matics important in this respect are combinatorics and the algebra of semi­groups and monoids. An outstanding pioneer in this line of research was

Page 6: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

vi Preface

Axel Thue. Already in 1906 he published a paper about avoidable and un­avoidable patterns in long and infinite words. Thue and Emil Post were the two originators of the formal notion of a rewriting system or a gram­mar. That their work remained largely unknown for decades was due to the diflicult accessibility of their writings and, perhaps much more impor­tantly, to the fact that the time was not yet ripe for mathematical ideas, where noncommutativity played an essential role in an otherwise very simple setup.

Mathematical origins of formal language theory come also from mathe­maticallogic and, according to the present terminology, computability theory. Here the work of Alan Turing in the mid-1930s is of crucial importance. The general idea is to find models of computing. The power of a specific model can be described by the complexity of the language it generates or accepts. Trends and aspects of mathematicallanguage theory are the subject matter of each chapter in Volume 1 of the Handbook. Such trends and aspects are present also in many chapters in Volumes 2 and 3.

Returning to the origins of formal language theory, we observe next that much of formallanguage theory has originated from linguistics. In particular, this concerns the study of grammars and the grammatical structure of a lan­guage, initiated by Noam Chomsky in the 1950s. While the basic hierarchy of grammars is thoroughly covered in Volume 1, many aspects pertinent to linguistics are discussed later, notably in Volume 2.

The modeling of certain objeets or phenomena has initiated large and significant parts of formal language theory. A model can be expressed by or identified with a language. Specific tasks of modeling have given rise to specific kinds of languages. A very typical example of this are the L systems introduced by Aristid Lindenmayer in the late 1960s, intended as models in developmental biology. This and other types of modeling situations, ranging from molecular genetics and semiotics to artificial intelligence and artificial life, are presented in this Handbook. Words are one-dimensional, therefore linearity is a feature present in most of formal language theory. However, sometimes a linear model is not suflicient. This means that the language used does not consist of words (strings) but rather of trees, graphs, or some other nonlinear objects. In this way the possibilities for modeling will be greatly increased. Such extensions of formallanguage theory are considered in Volume 3: languages are buHt from nonlinear objects rather than strings.

We have now already described the contents of the different volumes of this Handbook in brief terms. Volume 1 is devoted to the mathematical as­peets of the theory, whereas applications are more direct1y present in the other two volumes, of which Volume 3 also goes into nonlinearity. The di­vision of topics is also reflected in the titles of the volumes. However, the borderlines between the volumes are by no means strict. From many points of view, for instance, the first chapters of Volumes 2 and :3 could have been included in Volume 1.

Page 7: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Preface vii

We now come to a very important editorial decision we have made. Each of the 33 individual chapters constitutes its own entity, where the subject matter is developed from the beginning. References to other chapters are only occasional and comparable with references to other existing literature. This style of writing was suggested.to the authors of the individual chapters by us from the very beginning. Such an editorial policy has both advantages and disadvantages as regards the final result. Aperson who reads through the whole Handbook has to get used to the fact that notation and terminology are by no means uniform in different chapters; the same term may have different meanings, and several terms may mean the same thing. Moreover, the prereq­uisites, especially in regard to mathematical maturity, vary from chapter to chapter. On the positive side, for a person interested in studying only a spe­cific area, the material is all presented in a compact form in one place. More­over, it might be counterproductive to try to change, even for the purposes of a handbook, the terminology and notation already well-established within the research community of a specific subarea. In this connection we also want to emphasize the diversity of many of the subareas of the field. An interested reader will find several chapters in this Handbook having almost totally dis­joint reference lists, although each of them contains more than 100 references.

We noticed that guaranteed timeliness of the production of the Handbook gave additional impetus and motivation to the authors. As an illustration of the timeliness, we only mention that detailed accounts about DNA comput­ing appear here in a handbook form, less than two years after the first ideas about DNA computing were published.

Having discussed the reasons behind our most important editorial deci­sion, let us still go back to formallanguages in general. Obviously there cannot be any doubt about the mathematical strength of the theory - many chapters in Volume 1 alone suffice to show the strength. The theory still abounds with challenging problems for an interested student or researcher. Mathematical strength is also a necessary condition for applicability, which in the case of formal language theory has proved to be both broad and diverse. Some de­tails of this were already mentioned above. As the whole Handbook abounds with illustrations of various applications, it would serve no purpose to try to classify them here according to their importance or frequency. The reader is invited to study from the Handbook older applications of context-free and contextual grammars to linguistics, of parsing techniques to compiler con­struction, of combinatorics of words to information theory, or of morphisms to developmental biology. Among the newer application areas the reader may be interested in computer graphics (application of L systems, picture lan­guages, weighted automata), construction and verification of concurrent and distributed systems (traces, omega-Ianguages, grammar systems), molecular biology (splicing systems, theory of deletion), pattern matching, or cryptol­ogy, just to mention a few of the topics discussed in the Handbook.

Page 8: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

vüi Preface

About Volume 2

Some brief guidelines about the contents of the present Volume 2 follow. Problems about complexity occur everywhere in language theory; Chapter 1 gives an overall account. Parsing techniques are essential in applications, both for natural and programming languages. They are dealt with in Chapter 2, while Chapters 3-6 study extensions and variations of classicallanguage the­ory. While Chapter 3 continues the general theory of context-free languages, Chapters 5 and 6 are motivated by linguistics, and Chapter 4, motivated by artificial intelligence, is also applicable to distributed systems. DNA comput­ing has been an important recent breakthrough - some language-theoretic aspects are presented in Chapter 7. Chapter 8 considers the string editing problem which in various settings models a variety of problems arising from DNA and protein sequences. Chapter 9 considers several methods of word matching that are based on the use of automata. Chapter 10 discusses the relationship between automata theory and symbolic dynamics (the latter area has originated in topology). By its very nature the whole of cryptology can be viewed as apart of language theory. Chapter 11 gives an account of language­theoretic techniques that have turned out to be especially useful in cryptology.

Acknowledgements

We would like to express our deep gratitude to all the authors of the Hand­book. Contrary to what is usual in case of collective works of this kind, we hoped to be able to keep the time schedule - and succeeded because of the marvellous cooperation of the authors. Still more importantly, thanks are due because the authors were really devoted to their task and were willing to sacrifice much time and energy in order to achieve the remarkable end result, often exceeding our already high expectations.

The Advisory Board consisting of J. Berstel, C. Calude, K. Culik II, J. Engelfriet, H. Jürgensen, J. Karhumäki, W. Kuich, M. Nivat, G. Paun, A. Restivo, W. Thomas, and D. Wood was of great help to us at the initial stages. Not only was their advice invaluable for the planning of the Handbook but we also appreciate their encouragement and support.

We would also like to extend our thanks from the actual authors of the Handbook to all members of the scientific community who have supported, advised, and helped us in many ways during the various stages of work. It would be a hopeless and maybe also unrewarding task to try to single out any list of names, since so many people have been involved in one way or another.

We are grateftIl also to Springer-Verlag, in particular Dr. Hans Wössner, Ingeborg Mayer, J. Andrew Ross, and Gabriele Fischer, for their coopera­tion, excellent in every respect. Last but not least, thanks are due to Marloes Boon-van der Nat for her assistance in an editorial stages, and in particular, for keeping up contacts to the authors.

September 1996 Grzegorz Rozenberg, Arto Salomaa

Page 9: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Contents of Volume 2

Chapter 1. Complexity: A Language-Theoretic Point of View Cristian Calude and Juraj Hromkovic ............................. 1 1. Introduction................................................. 1 2. Theory of computation ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Computing fallibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Turing machines, Chaitin computers, and Chomsky grammars . 7 2.3 Universality............................................. 8 2.4 Silencing a universal computer. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11 2.5 Digression: A simple grammatical model of brain behaviour ... 12 2.6 The halting problem ..................................... 13 2.7 The Church-Turing Thesis ................................ 14 2.8 Digression: mind, brain, and computers. . . . . . . . . . . . . . . . . . . .. 15

3. Computational complexity measures and complexity classes . . . . . .. 16 3.1 Time and space complexities and their properties ....... . . . .. 16 3.2 Classification of problems according to computational difficulty

and nondeterminism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25 3.3 Hard problems and probabilistic computations . . . . . . . . . . . . . .. 31

4. Program-size complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36 4.1 Dynamic versus program-size complexities .................. 36 4.2 The halting problem revisited ............................. 38 4.3 Random strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39 4.4 From random to regular languages .............. . . . . . . . . . .. 41 4.5 Trade-offs............................................... 45 4.6 More about P =? NP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46

5. Parallelism.................................................. 47 5.1 Parallel computation thesis and alternation ................. 47 5.2 Limits to parallel computation and P-completeness. . . . . . . . . .. 51 5.3 Communication in parallel and distributive computing . . . . . . .. 52

References ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 54

Chapter 2. Parsing of Context-Free Languages Klaas Sikkel and Anton Nijholt . . . . . . . . . . . . .. . . .. . . . . . . . . . . .. . . . .. 61 1. Introduction................................................. 61

1.1 Parsing algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 62 1.2 Parsing technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 63 1.3 About this chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 64

2. An informal introductioß ..................................... 66

Page 10: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

x Contents

3. Parsing schemata ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70 3.1 Parsing systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 70 3.2 Parsing schemata ........... . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71 3.3 Correctness of parsing schemata ........................... 72

4. Generalization............................................... 74 4.1 Some examples .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 74 4.2 Formalizatiön ........................................... 75 4.3 Properties of generalization ............................... 77

5. Filtering.................................................... 79 5.1 Static filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 79 5.2 Dynamic filtering ......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 80 5.3 Step contraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 82 5.4 Properties of filtering relations. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83

6. Some larger examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83 6.1 Left-corner parsing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 84 6.2 De Vreught and Honig's algorithm .......... . . . . . . . . . . . . . .. 86 6.3 Rytter's algorithm ...... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 90 6.4 Some general remarks ..... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 92

7. From schemata to algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 93 8. Beyond context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 96 9. Conclusions................................................. 97 References .................. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 97

Chapter 3. Grammars with Controlled Derivations Jürgen Dassow, Gheorghe Paun, and Arto Salomaa ................. 101 1. Introduction and notations .................................... 101 2. Some types of controlled derivat ions and their power ............. 103

2.1 Prescribed sequences ..................................... 103 2.2 Control by context conditions ............................. 115 2.3 Grammars with partial parallelism ......................... 124 2.4 Indexed gramm ars ....................................... 134 2.5 Hierarchies of families with controlled derivations ............. 135

3. Basic properties ............................................. 139 3.1 Operations on language families ........................... 139 3.2 Decision problems ........................................ 141 3.3 Descriptional complexity .................................. 145

4. Further topics ............................................... 148 References ..................................................... 150

Chapter 4. Grammar Systems Jürgen Dassow, Gheorghe Paun, and Grzegorz Rozenberg ............ 155 1. Introduction .................................................. 155 2. Formallanguage prerequisites ................................. 157 3. CD grammar systems ......................................... 158

3.1 Definitions .............................................. 158

Page 11: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Contents xi

3.2 ExaIIlples ............................................... 160 3.3 On the generative capacity ................................ 162 3.4 Hybrid systems .......................................... 164 3.5 Increasing the power by teams ............................. 167 3.6 Descriptional complexity .................................. 169 3.7 Other classes of CD graIIlmar systems . . . . . . . . . . . . . . . . . . . . . . 172

4. PC graIIlmar systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.1 Definitions .............................................. 173 4.2 Examples ............................................... 177 4.3 On the generative capacity ................................ 180 4.4 The context-sensitive case ................................. 184 4.5 Non-synchronized PC graIIlmar systems ..................... 185 4.6 Descriptional and communication complexity ................ 186 4.7 PC graIIlmar systems with communication by command ...... 189 4.8 Further variants and results ............................... 194

5. Related models .............................................. 196 5.1 Eco-graIIlmar systems .................................... 196 5.2 Test tube systems ........................................ 201

References ..................................................... 207

Chapter 5. Contextual Grammars and Natural Languages Solomon Marcus . ............................................... 215 The year 1957: two complementary strategies ....................... 215 The origin of contextual graIIlmars ................................ 216 Motivation of simple contextual grammars and of contextual

grammars with choice ........................................ 216 The duality between strings and contexts

and the Sestier closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Steps in modelling morphological categories ........................ 219 The contextual approach in a generative perspective ................. 221 Contextual graIIlmars can generate both strings and contexts . . . . . . . . . 223 Interplay of strings, contexts and contextual graIIlmars with choice . . . . 225 Going deeper in the interplay strings-contexts ...................... 227 A higher level of abstraction: parts of speech ....................... 228 Generative power of contextual grammars . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Further suggestions: restricted contextual grammars, graIIlmar

systems and splicing contextual schemes ........................ 230 References ..................................................... 232

Chapter 6. Contextual Grammars and Formal Languages Andrzej Ehrenfeucht, Gheorghe Paun, and Grzegorz Rozenberg . ....... 237 1. Introduction................................................. 237 2. Contextual grammars with unrestricted choice ................... 238

2.1 Preliminaries ............................................ 238 2.2 Definitions.............................................. 238

Page 12: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

xii Contents

2.3 Examples ............................................... 241 2.4 Necessary conditions and counterexamples .................. 243 2.5 Generative capacity ...................................... 247 2.6 Closure properties ........................................ 249 2.7 Decidability properties .................................... 253

3. Contextual grammars with restricted choice ..................... 256 3.1 Definitions and basic results ............................... 256 3.2 Internal contextual grammars with finite choice .............. 261 3.3 External contextual gramm ars with regular choice ............ 264

4. Variants of contextual gramm ars ............................... 275 4.1 Deterministic grammars .................................. 275 4.2 One-sided contexts ....................................... 277 4.3 Leftmost derivation ...................................... 281 4.4 Parallel derivation ....................................... 282 4.5 Maximal/minimal use of selectors .......................... 284

5. Bibliographical notes ......................................... 285 References ...................................................... 290

Chapter 7. Language Theory and Molecular Genetics Thomas Head, Gheorghe Paun, and Dennis Pixton ................... 295 1. Introduction................................................. 295 2. Formallanguage theory prerequisites ........................... 298 3. The splicing operation ........................................ 298

3.1 The uniterated case ...................................... 298 3.2 The iterated case ........................................ 306 3.3 The case of multisets ..................................... 319

4. Generative mechanisms based on splicing ....................... 325 4.1 Simple H systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 4.2 Extended H systems ...................................... 333

5. Splicing circular words ........................................ 335 5.1 Circular words ........................................... 335 5.2 Circular splicing ......................................... 336 5.3 Mixed splicing ........................................... 342

6. Computing by splicing ........................................ 344 7. Bibliographical notes ......................................... 348

Appendix ................................................... 351 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358

Chapter 8. String Editing and Longest Common Subsequences Alberto Apostolico . .............................................. 361 1. Introduction ................................................. 361

1.1 Approximate string searching .............................. 363 1.2 Local similarity searches in DNA and protein sequences ....... 363 1.3 Longest common subsequences ............................. 364

Page 13: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Contents xiii

2. Two basic paradigms for the LCS problem ...................... 366 2.1 Hirschberg's paradigm: finding antichains one at a time ....... 368 2.2 Incremental antichain decompositions and the Hunt-Szymanski

paradigm ............................................... 371 3. A speed-up for HS ........................................... 372 4. Finger trees ................................................. 375 5. Linear space ................................................. 379

5.1 Computing the length of a solution ......................... 380 5.2 Computing an LCS in O(n(m -l)) time and linear space ..... 382

6. Combining few and diverse tools: Hirschberg's paradigm in linear space ..... " ... , ............................................ 386

7. Parallel algorithms ........................................... 389 References ..................................................... 395

Chapter 9. Automata for Matching Patterns Maxime Crochemore and Christophe Hancart . ...................... 399 1. Pattern matching and automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 2. Notations ................................................... 400

2.1 Alphabet and words ...................................... 401 2.2 Languages .............................................. 401 2.3 Regular expressions ...................................... 401 2.4 Finite automata ......................................... 402 2.5 Algorithms for matching patterns .......................... 403

3. Representations of deterministic automata ...................... 405 3.1 Transition matrix ........................................ 405 3.2 Adjacency lists .......................................... 406 3.3 Transition list ........................................... 407 3.4 Failure function .......................................... 407 3.5 Table-compression ....................................... 408

4. Matching regular expressions .................................. 408 4.1 Outline................................................. 408 4.2 Regular-expression-matching automata ..................... 409 4.3 Searching with regular-expression-matching automata ........ 411 4.4 Time-space trade-off ...................................... 414

5. Matching finite sets of words .................................. 414 5.1 Outline ................................................. 414 5.2 Dictionary-matching automata ............................. 415 5.3 Linear dictionary-matching automata ....................... 416 5.4 Searching with linear dictionary-matching automata .......... 420

6. Matching words .............................................. 422 6.1 Outline ..................... : ........................... 422 6.2 String-matching automata ................................ 423 6.3 Linear string-matching automata ........................... 426 6.4 Properties of string-matching automata ..................... 428 6.5 Searching with linear string-matching automata .............. 431

Page 14: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

XIV Contents

7. Suffix automata .............................................. 434 7.1 Outline ................................................. 434 7.2 Sizes and properties ...................................... 435

7.2.1 End-positions ...................................... 435 7.2.2 Suffix function ..................................... 436 7.2.3 State splitting ...................................... 437 7.2.4 Sizes of suffix automata ............................. 439

7.3 Construction ............................................ 441 7.3.1 Suffix links and suffix paths .......................... 441 7.3.2 On-line construction ................................ 442 7.3.3 Complexity ........................................ 446

7.4 As indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 7.4.1 Membership ....................................... 448 7.4.2 First position ., .................................... 448 7.4.3 Oceurrenee number ................................. 449 7.4.4 List of positions .................................... 450 7.4.5 Longest repeated factor ............................. 450

7.5 As string-matehing automata .............................. 451 7.5.1 Ending faetors ..................................... 451 7.5.2 Optimization of suffix links .......................... 452 7.5.3 Searehing for rotations .............................. 453

7.6 Faetor automata ......................................... 454 7.6.1 Relation to suffix automata .......................... 454 7.6.2 Size of factor automata .............................. 455 7.6.3 On-line eonstruetion ................................ 456

Bibliographie notes .............................................. 459 Referenees ..................................................... 461

Chapter 10. Symbolic Dynamics and Finite Automata Marie-Pierre Beat and Dominique Pernn . . . . . . . . . . . . . . . .. . ....... 463 1. Introduction ................................................. 463 2. Symbolie dynamieal systems ................................... 464 3. Reeurrenee and minimality .................................... 470 4. Sofic systems and shifts of finite type ........................... 472 5. Minimal automaton of a subshift ............................... 477 6. Codes and finite-to-one maps .................................. 480 7. State splitting and merging ................................... 484 8. Shift equivalenee ............................................. 487 9. Entropy .................................................... 490 10. The road eoloring problem .................................... 496 11. The zeta function of a subshift ................................. 498 12. Cireular codes, shifts of finite type and Krieger embedding theorem 500 Referenees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503

Page 15: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Contents xv

Chapter 11. Cryptology: Language-Theoretic Aspects Valtteri Niemi . ................................................. 507 1. Introduction ................................................. 507 2. Basic notions in cryptology .................................... 507 3. Connections between cryptology and language theory ............. 510 4. Public-key systems based on language theory .................... 511

4.1 Wagner-Magyarik system ................................. 511 4.2 Salomaa-Welzl system .................................... 512 4.3 Subramanian et al. system ................................ 513 4.4 Siromoney-Mathew system ................................ 514 4.5 Niemi system ............................................ 514 4.6 Oleshchuk system ........................................ 515

5. Cryptosystems based on automata theory ....................... 516 5.1 Wolfram system ......................................... 516 5.2 Guan public-key system .................................. 516 5.3 Tao-Chen public-key system ............................... 517

6. Theoretical cryptologic research based on language theory ......... 518 7. Cryptanalysis based on language theory ......................... 519 8. Language-theoretic research inspired by cryptology ............... 520 9. Research associated with language theory and cryptology ......... 521 References ..................................................... 521

Index . ......................... " .............. " ., ..... , ..... 525

Page 16: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Contents of Volume 1

1. Formal Languages: an Introduction and a Synopsis Alexandru Mateescu and Arto Salomaa . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Regular Languages Sheng Yu ................................................... 41

3. Context-Free Languages and Pushdown Automata Jean-Michel Autebert, Jean Berstel, and Luc Boasson ............ 111

4. Aspects of Classical Language Theory Alexandru Mateescu and Arto Salomaa ......................... 175

5. L Systems Lila K ari, Grzegorz Rozenberg, and Arto Salomaa ................ 253

6. Combinatorics of Words Christian Choffrut and Juhani Karhumäki ...................... 329

7. Morphisms Tero Harju and Juhani Karhumäki .. ........................... 439

8. Codes Helmut Jürgensen and Stavros Konstantinidis ................... 511

9. Semirings and Formal Power Series Werner Kuich ............................................... 609

10. Syntactic Semigroups Jean-Eric Pin ............................................... 679

11. Regularity and Finiteness Conditions Aldo de Luca and Stefano Varncchio ........................... 747

12. Families Generated by Grammars and L Systems Gheorghe Paun and Arto Salomaa . ............................. 811

Page 17: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Contents of Volume 3

1. Tree Languages Ferenc Gecseg and Magnus Steinby. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Tree-Adjoining Grammars Aravind K. Joshi and Yves Schabes. . . . . . . . . . . . . . . . . . . . . . . . . . .. 69

3. Context-Free Graph Grammars Joost Engelfriet .............................................. 125

4. Two-Dimensional Languages Dora Giammarresi and Antonio Restivo ........................ 215

5. Basics of Term Rewriting Matthias Jantzen ............................................ 269

6. w-Languages Ludwig Staiger ........ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

7. Languages, Automata, and Logic Wolfgang Thomas .. .......................................... 389

8. Partial Commutation and Traces J

Volker Diekert and Yves Metivier .............................. 457

9. Visual Models of Plant Development Przemyslaw Prusinkiewicz, Mark Hammel, Jim Hanan, and Radomir Mech . ............................... 535

10. Digital Images and Formal Languages Karel Culik II and Jarkko Kari ................................ 599

Page 18: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

Authors. Addresses

Alberto Apostolico Dipartimento di Elettronica eInformatica, Universita di Padova Via Gradenigo 6fa, 1-35131 Padova, Italy and Department of Computer Science, Purdue University 1398 Computer Science Building, West Lafayette, IN 47907-1398, U.S.A. [email protected]

Marie-Pierre Beal Institut Gaspard Monge, Universite de Marne-la-Vallee 2, rue de la Butte verte, F-93166 Noisy-le-Grand, France [email protected]

Cristian Calude Centre for Discrete Mathematics and Theoretical Computer Science The University of Auckland, Private Bag 92019, Auckland, New Zealand [email protected]

Maxime Crochemore Institut Gaspard Monge, Universite de Marne-la-Vallee 2, rue de la Butte verte, F-93166 Noisy-le-Grand, France [email protected]

Jürgen Dassow Faculty of Computer Science, Otto-von-Guericke-University of Magdeburg P.O. Box 4120, D-39016 Magdeburg, Germany [email protected]

Andrzej Ehrenfeucht Department of Computer Science, University of Colorado at Boulder Campus 430, Boulder, CO 80309, U.S.A. [email protected]

Christophe Rancart Laboratoire d'lnformatique de Rouen, Faculte des Sciences et Techniques Universite de Rouen, F-76821 Mont-Saint-Aignan Cedex, France [email protected]

Thomas Read Department of Mathematics, University of Binghamton P.O. Box 6000, Binghamton, NY 13902, U.S.A. [email protected]

Page 19: Handbook of Formal Languages978-3-662-07675-0/1.pdf · The present three-volume Handbook constitutes such a unique collection. In these three volumes we present the current state

xxii Authors' Addresses

Juraj Hromkovic Institut für Informatik und Praktische Mathematik, Universität Kiel Olshausenstrasse 40, D-24098 Kiel, Germany [email protected]

Solomon Marcus Faculty of Mathematics, University of Bucharest Str. Academiei, RO-70109 Bucharest, Romania [email protected]

Valtteri Niemi Department of Mathematics and Statistics, University of Vaasa FIN-65101 Vaasa, Finland [email protected]

Anton Nijholt Computer Science Department, University of Twente P.O. Box 217, NL-7500 AE Enschede, The Netherlands [email protected]

Gheorghe Paun Institute of Mathematics of the Romanian Academy P.O. Box 1-764, RO-70700 Bucharest, Romania [email protected]

Dominique Perrin Institut Gaspard Monge, Universite de Marne-la-Vallee 2, rue de la Butte verte, F-93166 Noisy-Ie-Grand, France [email protected]

Dennis Pixton Department of Mathematics, University of Binghamton P.O. Box 6000, Binghamton, New York 13902, U.S.A. [email protected]

Grzegorz Rozenberg Department of Computer Science, Leiden University P.O. Box 9512, NL-2300 RA Leiden, The Netherlands and Department of Computer Science, University of Colorado at Boulder Campus 430, Boulder, CO 80309, U.S.A. [email protected]

Arto Salomaa Academy of Finland and Turku Centre for Computer Science (TUCS) Lemninkäisenkatu 14 A, FIN-20520 Turku, Finland [email protected]

Klaas Sikkel FIT, CSCW, German National Research Centre for Information Technology (GMD) Schloß Birlinghoven, D-53757 Sankt Augustin, Germany [email protected]