CS173 Tuesday Oct 1, 2002 ACM local programming contest Friday Oct 4 Meet in CSB 601 at noon 5 problems, 4 hours teams of 3 Quiz on Thursday Read Chapter 10 ----------------- Finite Automata and Regular Expressions Pattern recognition and generation in computing - symbols in a program - words in a document - characters in a file - bits in an image Finite automata: formal (or abstract) machines to recognize patterns used in compilers, text editors, communication hardware Regular expressions: formal notation to describe patterns can be used to "generate" patterns" used in programming language manuals, fact: FA and RE are equally powerful can use automated procedure to convert one to the other --------------------- Finite Automata Write a program that reads a file and tells whether it contains the word 'main'. 1 repeat // look for an 'm' read c; if eof, fail while c != 'm' 2 repeat // found an 'm'; look for an 'a' read c; if eof, fail while c == 'm' if c != 'a' return to step 1 3 read c; if eof, fail // found "ma"; look for an 'i' if c == 'm' return to step 2 if c != 'i' return to step 1 4 read c; if eof, fail // found "mai"; look for an 'n' if c == 'm' return to step 2 if c != 'n' return to step 1 5 got what we wanted; skip remainder of file and succeed Each "step" is a different place in the recognition process. Can capture same behavior in a graph * node = a step in the process * arcs = movement from one step to another * labels on the arcs = input required to make a transition ------------------------------------------------------------------------ Definition of Finite Automata - idealized machine used to recognize patterns within input taken from some set of symbols (alphabet) C. - Alphabet - general: text, message types, etc. - FA: accept or reject input based on whether a patterns appears Parts of an FA * a finite set S of N states * a special start state * a set C of input symbols (the book uses Lambda; some use Gamma) * a set of final (or accepting) states F * a set of transitions T (many books use delta) from one state to another, labeled with symbols in C transition function, maps (stat,input symbol) pairs to states delta : S x C --> S Representations: textual, graphical Can "execute" an FA on an input - begin in the start state - if the next input symbol matches the label on an outgoing arc, go to that new state - continue making transitions on each input symbol - if not at EOF and no move is possible, then reject - if EOF then accept iff in an accepting state Wrinkles: ? accept before reading entire input ? book is unclear formulation above: read whole string, then decide, or reject if not possible to transition easier to talk about strings that END with substring easier to prove equivalence of deterministic and non-deterministic automata (more later) Note: given automaton designed to accept before all input read: convert easily add self-loop at each accepting state Some formulations require that never get "stuck". Must always have possible transition easily convert, create non-accepting dead-state with self loop need this dead state for finding smallest DFA (more later) Some formulations allow the machine to produce output (or perform actions) at each state (Moore) or on each transition (Mealy). Bounce filter in book is a Moore machine Good for communication protocols ------------------------------------------------------------------------ Examples * 4-state FA to recognize words with at least 3 x's * 3-state FA to recognize Pascal variable names (letter followed by one or more letters or digits) * 4-state FA to recognize binary strings that end with 111 * 8-state FA to recognize real numbers in Pascal (one or more digits followed by (a) a dot followed by one or more digits, and/or (b) an E followed by either one or more digits or a plus or minus followed by zero or more digits) * 7-state FA for a soda machine that accepts nickels, dimes, and quarters, and requires that you input 30 cents or more.