assets.phoo.orgassets.phoo.org/book/pica/book.pdf · ' Daniel Page [email protected] CONTENTS I Tools...

A Practical Introduction to ComputerArchitecture

Daniel Page 〈[email protected]〉

git # ba293a0e @ 2019-11-14

mailto:[email protected]

© Daniel Page 〈[email protected]〉

git # ba293a0e @ 2019-11-14 2



CONTENTS

I Tools and Techniques 11

1 Mathematical preliminaries 131.1 Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1.1 Connectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.1.2 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Collections: sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.2.1 Basic definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.2.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.2.3 Advanced definition and short-hands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3 Collections: sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3.1 Basic definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3.3 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.3.4 Advanced definition and short-hands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.4 Collections: some additional special-cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.4.1 Tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.4.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.5.1 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.5.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.6 Boolean algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.6.1 Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.6.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.6.3 Normal (or standard) forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1.7 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.8 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1.8.1 Bits, bytes and words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.8.2 Positional number systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421.8.3 Representing integer numbers, i.e., members of Z . . . . . . . . . . . . . . . . . . . . . . . 461.8.4 Representing real numbers, i.e., members of R . . . . . . . . . . . . . . . . . . . . . . . . . 511.8.5 Representing characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

1.9 A conclusion: steps toward a digital logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2 Basics of digital logic 632.1 Switches and transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.1.1 A brief tour of fundamental principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.1.2 Implementing transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2.2 Combinatorial logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732.2.1 A suite of simplified logic gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

git # ba293a0e @ 2019-11-14 3



2.2.2 Harnessing the universality of NAND and NOR . . . . . . . . . . . . . . . . . . . . . . . . 752.2.3 Designing circuits for arbitrary combinatorial functions . . . . . . . . . . . . . . . . . . . . 772.2.4 Physical properties of combinatorial logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 872.2.5 Building block components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

2.3 Sequential logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012.3.1 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022.3.2 Latches, flip-flops and registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042.3.3 Putting everything together: general clocking strategies . . . . . . . . . . . . . . . . . . . 112

2.4 Pipelined logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162.4.1 An analogy: car production lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162.4.2 Treating logic as a production line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.4.3 Some concrete examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

2.5 Implementation and fabrication technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1202.5.1 Silicon fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1212.5.2 (Re)programmable fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

3 Finite State Machines (FSMs) 1313.1 State machines: from simple to more complex control-paths . . . . . . . . . . . . . . . . . . . . . 131

3.1.1 A rough overview of FSM-related theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1313.1.2 Practical implementation of FSMs in hardware . . . . . . . . . . . . . . . . . . . . . . . . . 134

4 Basics of computer arithmetic 1454.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1454.2 High-level ALU architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1464.3 Components for addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

4.3.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1474.3.2 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1544.3.3 Carry and overflow detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

4.4 Components for shift and rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1584.4.1 Introductory concepts and theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1584.4.2 Iterative designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1614.4.3 Combinatorial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

4.5 Components for multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1644.5.1 Introductory concepts and theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1644.5.2 Iterative, bit-serial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1704.5.3 Iterative, digit-serial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1734.5.4 Combinatorial designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1814.5.5 Some multiplier case-studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

4.6 Components for comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1864.6.1 Unsigned comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1874.6.2 Signed comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1904.6.3 Beyond equality and less than . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

5 Hardware design using Verilog 1935.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

5.1.1 The problem of design complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1935.1.2 Design automation as a solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

5.2 Structural modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1955.2.1 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1955.2.2 Wires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1955.2.3 Values and literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975.2.4 Simple operators on wires and wire vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 1985.2.5 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1985.2.6 User-Defined Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2045.2.7 RTL-based constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

5.3 Behavioural modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2105.3.1 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2115.3.2 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2125.3.3 Statement blocks (or groups) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2145.3.4 Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2145.3.5 Tasks and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

5.4 Effective development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

git # ba293a0e @ 2019-11-14 4



5.4.1 A rough guide to simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2255.4.2 System tasks and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2255.4.3 Named port lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2275.4.4 The Verilog pre-processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2275.4.5 The timescale directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2295.4.6 Module parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2295.4.7 Generate statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2295.4.8 Developing test stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

II Appendices 235

A Example exam-style questions 237A.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237A.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240A.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303A.4 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

B Example exam-style solutions 309B.1 Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309B.2 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313B.3 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402B.4 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412

git # ba293a0e @ 2019-11-14 5



git # ba293a0e @ 2019-11-14 6



LIST OF FIGURES

1.1 A collection of Venn diagrams for standard set operations. . . . . . . . . . . . . . . . . . . . . . . 201.2 An example Venn diagram showing membership of two sets. . . . . . . . . . . . . . . . . . . . . 201.3 Number lines illustrating the mapping of 8-bit sequences to integer values using three different

representations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451.4 A visualisation of the impact of increasing q, the number of fractional digits, in a fixed-point

representation; the result is increased detail within the rendering of a Mandelbrot fractal. . . . . 531.5 Single- and double- precision IEEE-754 floating-point formats described graphically as bit-

sequences and concretely as C structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551.6 A short C program that performs direct manipulation of IEEE floating-point numbers. . . . . . . 591.7 A teletype machine being used by UK-based Royal Air Force (RAF) operators during WW2

(public domain image, source: http://en.wikipedia.org/wiki/File:WACsOperateTeletype.jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

1.8 A table describing the printable ASCII character set. . . . . . . . . . . . . . . . . . . . . . . . . . . 60

2.1 The sub-atomic structure of a lithium atom. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642.2 A simple circuit conditionally connecting a capacitor (or battery) to a lamp depending on the

state of a switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652.3 Some simple examples of Boolean-style control of a lamp by combinations of switches. . . . . . . 652.4 A 6P1P (i.e., a 100W to 200W, photo-sensitive type) vacuum tube (public domain image, source:

http://en.wikipedia.org/wiki/File:6P1P.jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . 682.5 A moth found by operations of the Harvard Mark 2; the “bug” was trapped within the computer

and caused it to malfunction (public domain image, source: http://en.wikipedia.org/wiki/File:H96566k.jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2.6 A replica of the first point-contact transistor, a precursor of designs such as the MOSFET, con-structed at Bell Labs (public domain image, source: http://en.wikipedia.org/wiki/File:Replica-of-first-transistor.jpg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2.7 A high-level diagram of a MOSFET transistor, showing the terminal and body materials. . . . . 692.8 A pair of N-MOSFET and P-MOSFET transistors, arranged to form a CMOS cell. . . . . . . . . . 692.9 Symbolic descriptions of N-MOSFET and P-MOSFET transistors. . . . . . . . . . . . . . . . . . . 702.10 MOSFET-based implementations of NOT, NAND and NOR logic gates. . . . . . . . . . . . . . . 712.11 A voltage-oriented truth table for NOT, NAND and NOR logic gates. . . . . . . . . . . . . . . . . 722.12 Representation of standard logic gates in English, Boolean algebra, C and symbolic notations. . . 742.13 Truth tables for standard logic gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742.14 Identities for standard logic gates in terms of NAND and NOR. . . . . . . . . . . . . . . . . . . . 752.15 4- and 3-input example Boolean functions respectively. . . . . . . . . . . . . . . . . . . . . . . . . 812.16 Quine-McCluskey simplification, step #1: extraction of prime implicants. . . . . . . . . . . . . . . 862.17 Quine-McCluskey simplification, step #2: covering the prime implicants table. . . . . . . . . . . 862.18 An illustration of idealised and realistic switching activity wrt. a MOSFET-based NOT gate. . . 882.19 A behavioural waveform demonstrating the effects of propagation delay on an XOR implemen-

tation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

git # ba293a0e @ 2019-11-14 7


http://en.wikipedia.org/wiki/File:WACsOperateTeletype.jpg


http://en.wikipedia.org/wiki/File:6P1P.jpg

http://en.wikipedia.org/wiki/File:H96566k.jpg


http://en.wikipedia.org/wiki/File:Replica-of-first-transistor.jpg



2.20 A simple design, involving just a NOT and an AND gate, that exhibits glitch-like behaviour. . . 882.21 A contrived circuit illustrating the idea of fan-out, whereby one source gate may need to drive

n target gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882.22 An overview of a 2-input (resp. 2-output), 1-bit multiplexer (resp. demultiplexer) cells. . . . . . 932.23 Application of the isolated and cascaded replication design patterns. . . . . . . . . . . . . . . . . 942.24 An overview of equality and less than comparators. . . . . . . . . . . . . . . . . . . . . . . . . . . 962.25 An overview of half- and full-adder cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.26 Gate universality used to implement a NAND- and NOR-based half-adder. Note that the

dashed boxes in the NAND and NOR implementations (middle and bottom) are translations ofthe primitive gates within the more natural description (top). . . . . . . . . . . . . . . . . . . . . . 98

2.27 An example encoder/decode pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992.28 An incorrect counter design, using naive “looped” feedback. . . . . . . . . . . . . . . . . . . . . . 1012.29 An illustration of standard features in 1- and 2-phase clocks. . . . . . . . . . . . . . . . . . . . . . 1022.30 Symbolic descriptions of D-type latch and flip-flop components (note the triangle annotation

around en). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062.31 A collection of NOR- and NAND-based SR type latches, with simpler (top) to more complicated

(middle and bottom) control features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1082.32 A case-by-case overview of NOR-based SR latch behaviour; notice that there are two sane cases

for S = 0 and R = 0, and no sane cases for when S = 1 and R = 1. . . . . . . . . . . . . . . . . . . . 1082.33 An annotated SR latch, decomposed into two NOR gates and then into transistors; r0, the output

of the top NOR gate, is used as input by the bottom NOR gate and r1, the output from the bottomNOR gate, is used as input by the top NOR gate (although the physical connections are not drawn).110

2.34 A NOR-based D-type flip-flop created using a glitch generator. . . . . . . . . . . . . . . . . . . . 1112.35 A NOR-based D-type flip-flop created using a primary-secondary organisation of latches. . . . . 1112.36 An n-bit register, with n replicated 1-bit components synchronised using the same enable signal. 1132.37 A correct counter design, using sequential logic components. . . . . . . . . . . . . . . . . . . . . . 1142.38 Two illustrative waveforms, outlining stages of computation within the associated counter design.1152.39 Two different high-level clocking strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162.40 Production line #1, staffed with pre-Ford workers. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1162.41 Production line #2, staffed with post-Ford workers. . . . . . . . . . . . . . . . . . . . . . . . . . . 1172.42 Four different ways to split a (hypothetical) component X into stages. . . . . . . . . . . . . . . . . 1182.43 A problematic pipeline, and a solution involving the use of pipeline registers and a control signal

to indicate when each stage should advance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1202.44 An illustrative waveform, outlining the stages of computation as a pipeline is driven by a clock. 1212.45 An unpipelined, abstract combinatorial circuit and a 3-stage pipelined alternative. . . . . . . . . 1212.46 An unpipelined, 8-bit Multiply-ACumulate (MAC) circuit and a 3-stage pipelined alternative. . 1222.47 An unpipelined, 8-bit logarithmic shift circuit and a 3-stage pipelined alternative. . . . . . . . . . 1232.48 A high-level illustration of a lithography-based fabrication process. . . . . . . . . . . . . . . . . . 1242.49 Bonding wires connected to a high quality gold pad (public domain image, source: http:

//en.wikipedia.org/wiki/Image:Wirebond-ballbond.jpg). . . . . . . . . . . . . . . . . . . . . 1252.50 A heatsink ready to be attached, via the z-clip, to a circuit in order to dissipate heat (public

domain image, source: http://en.wikipedia.org/wiki/File:Pin_fin_heat_sink_with_a_z-clip.png). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

2.51 A timeline of Intel processor innovation demonstrating Moore’s Law (data from http://www.intel.com/technology/mooreslaw/). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

2.52 Conceptual diagrams of a PLA fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1282.53 Conceptual diagrams of an FPGA fabric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

3.1 An example FSM to decide whether there is an odd number of 1 elements in some sequence X. . 1323.2 An example FSM modelling a simple vending machine. . . . . . . . . . . . . . . . . . . . . . . . . 1323.3 Two generic FSM frameworks (for different clocking strategies) into which one can place imple-

mentations of the state, δ (the transition function) and ω (the output function). . . . . . . . . . . 1353.4 Two illustrative waveforms (for different clocking strategies), outlining stages of computation

within the associated FSM framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1363.5 An example FSM modelling an ascending modulo 6 counter. . . . . . . . . . . . . . . . . . . . . . 1383.6 An example FSM modelling an ascending or descending modulo 6 counter. . . . . . . . . . . . . 1393.7 An example FSM modelling a traffic light controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 139

4.1 Two high-level ALU architectures: each combines a number of sub-components, but does sousing a different strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

4.2 An n-bit, ripple-carry adder described using a circuit diagram. . . . . . . . . . . . . . . . . . . . . 1494.3 An n-bit, ripple-carry subtractor described using a circuit diagram. . . . . . . . . . . . . . . . . . 149

git # ba293a0e @ 2019-11-14 8


http://en.wikipedia.org/wiki/Image:Wirebond-ballbond.jpg


http://en.wikipedia.org/wiki/File:Pin_fin_heat_sink_with_a_z-clip.png


http://www.intel.com/technology/mooreslaw/



4.4 An n-bit, ripple-carry adder/subtractor described using a circuit diagram. . . . . . . . . . . . . . 1494.5 An n-bit, carry look-ahead adder described using a circuit diagram. . . . . . . . . . . . . . . . . . 1494.6 An illustration depicting the structure of carry look-ahead logic, which is formed by an upper-

and lower-tree of OR and AND gates respectively (with leaf nodes representing gi and pi termsfor example). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

4.7 An overview of half- and full-subtractor cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1554.8 An iterative design for n-bit (left-)shift described using a circuit diagram. . . . . . . . . . . . . . . 1624.9 A combinatorial design for n-bit (left-)shift described using a circuit diagram. . . . . . . . . . . . 1624.10 Two examples demonstrating different strategies for accumulation of base-b partial products

resulting from two 3-digit operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1654.11 An iterative, bit-serial design for (n × n)-bit multiplication described using a circuit diagram. . . 1724.12 A tabular description of stages in example (4 × 4)-bit Wallace and Dadda tree multiplier designs. 1834.13 An (n × n)-bit tree multiplier design, described using a circuit diagram. . . . . . . . . . . . . . . . 1844.14 A example, (4 × 4)-bit Wallace-based tree multiplier design, described using a circuit diagram. . 1844.15 An n-bit, unsigned equality comparison described using a circuit diagram. . . . . . . . . . . . . . 1874.16 An n-bit, unsigned less than comparison described using a circuit diagram. . . . . . . . . . . . . 188

5.1 A waterfall-style hardware development cycle using Verilog. . . . . . . . . . . . . . . . . . . . . . 1955.2 Several different examples of wire definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1965.3 Verilog literal values and their strength. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975.4 Diagrammatic descriptions of simple Verilog operators on wires and wire vectors. . . . . . . . . 1995.5 Two styles of module interface for a 1-bit, 2-way multiplexer. . . . . . . . . . . . . . . . . . . . . . 2005.6 Gate-level implementations, and their diagrammatic analogues, of several building block com-

ponents using primitive modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2025.7 Gate behaviours where one or more inputs are the unknown value. . . . . . . . . . . . . . . . . . 2045.8 Gate-level implementations, and their diagrammatic analogues, of two components using user-

defined modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2055.9 A 1-bit, 2-way multiplexer described using a UDP. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2065.10 A list of computationally-oriented Verilog operators. . . . . . . . . . . . . . . . . . . . . . . . . . . 2075.11 Two (re)implementations of user-defined multiplexer modules using RTL-based continuous

assignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2085.12 A full-adder cell described using a Verilog continuous assignment. . . . . . . . . . . . . . . . . . 2095.13 Several different examples of register definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2115.14 Connection rules for internal and external registers and wires. . . . . . . . . . . . . . . . . . . . . 2115.15 (Incomplete) examples of Verilog process types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2135.16 (Incomplete) examples of Verilog always processes with associated sensitivity lists. . . . . . . . . 2135.17 (Incomplete) examples of Verilog block types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2135.18 (Incomplete) examples of Verilog procedural assignments. . . . . . . . . . . . . . . . . . . . . . . 2135.19 A D-type flip-flop, implemented using behavioural Verilog. . . . . . . . . . . . . . . . . . . . . . 2145.20 A logarithmic shifter, implemented using behavioural Verilog. . . . . . . . . . . . . . . . . . . . . 2155.21 An implementation of the traffic light controller from Chapter 2. . . . . . . . . . . . . . . . . . . . 2185.22 Matching (or equality comparison) where one or more inputs are the unknown or high impedance

values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2195.23 Some example Verilog functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2245.24 Some example Verilog tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2245.25 Using named port lists to instantiate a full-adder cell. . . . . . . . . . . . . . . . . . . . . . . . . . 2275.26 Implementation of an N-bit, 2-way multiplexer where the pre-processor defines a symbol N. . . . 2285.27 Implementation of a 1-bit, 2-way multiplexer in either gate-level or RTL Verilog depending on

whether the symbol GATE is defined or not. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2285.28 Implementation of an N-bit, 2-way multiplexer using module parameters, and instanciation as

4-bit, 2-way and 8-bit, 2-way variants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2305.29 Implementation of an 8-bit, 2-way multiplexer using a generate statement to instanciate four

1-bit, 2-way multiplexer instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2305.30 Two styles of test stimulus for a full-adder cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2325.31 A test stimulus for fawith integrated oracle (using the built-in Verilog plus operator). . . . . . . 2335.32 A (partial) test stimulus for the traffic light FSM (which requires a clock signal) . . . . . . . . . . 233

git # ba293a0e @ 2019-11-14 9



git # ba293a0e @ 2019-11-14 10



Part I

Tools and Techniques

git # ba293a0e @ 2019-11-14 11



CHAPTER

1

MATHEMATICAL PRELIMINARIES

In Mathematics you don’t understand things. You just get used to them.

– von Neumann

The goal of this Chapter is to provide a fairly comprehensive overview of theory that underpins the rest of the book. Atfirst glance the content may seem a little dry, and is often excluded in other similar books. It seems clear, however, thatwithout a solid understanding of said theory, using the constituent topics to solve practical problems will be much harder.

The topics covered all relate to the field of discrete Mathematics; they include propositional logic, sets and functions,Boolean algebra and number systems. These four topics combine to produce a basis for formal methods to describe,manipulate and implement digital systems. Readers with a background in Mathematics or Computer Science might skipthis Chapter and use it simply for reference; those approaching it from some other background would be advised to readthe material in more detail.

1.1 Propositional logic

Definition 1.1. A proposition is a statement whose meaning, termed the truth value, is either true or false (lessformally, we say the statement is true if it has a truth value of true and false if it has a truth value of false). A givenproposition can involve one or more variables; only when concrete values are assigned to the variables can the meaningof a proposition be evaluated.

In part because we use them naturally in language, it almost seems too formal to define what a proposition is.However, by doing so we can start to use them as a building block to describe what propositional logic is andhow it works. This is best explained step-by-step by example:

Example 1.1. The statement

“the temperature is 90◦C”

is a proposition since it is definitely either true or false. When we take a proposition and decide whether it is trueor false, we say we have evaluated it. However, there are clearly a lot of statements that are not propositionsbecause they do not state any proposal. For example,

“turn off the heat”

is a command or request of some kind, it does not evaluate to a truth value. Propositions must also be welldefined in the sense that they are definitely either true or false, i.e., there are no “grey areas” in between. Thestatement

“90◦C is too hot”

git # ba293a0e @ 2019-11-14 13



is not a proposition, because it could be true or false depending on the context: 90◦C is probably too hot forbody temperature, but not for a cup of coffee. Finally, some statements seem to be propositions but cannot beevaluated because they are paradoxical: a famous example is the so-called liar paradox, usually attributed tothe Greek philosopher Eubulides, who stated it as

“a man says that he is lying, is what he says true or false?”

although a clearer version is the more commonly referenced

“this statement is false” .

If the man is telling the truth, everything he says must be true which means he is lying and hence everythinghe says is false. Conversely, if the man is lying everything he says is false so he cannot be lying (because hesaid that he was). In terms of the statement, we cannot be sure of the truth value so this cannot be classified asa proposition.

Example 1.2. When a proposition contains one or more variables, we can only evaluate it having first assignedeach a concrete value. For example, consider

“x◦C equals 90◦C”

where x is a variable. By assigning x a value we get a proposition; setting x = 10, for example, gives

“10◦C equals 90◦C”

which clearly evaluates to false. Setting x = 90◦C gives

“90◦C equals 90◦C”

which evaluates to true.

Definition 1.2. Informally, a propositional function is just a short-hand way of writing a proposition; we give thefunction a name and a list of free variables. So, for example, the function

f (x, y) : x = y

is called f and has two variables named x and y. If we use the function as f (10, 20), performing the binding x = 10 andy = 20, it has the same meaning as 10 = 20.

Example 1.3. We might writeg : “the temperature is 90◦C”

and hence use g (the left-hand side) as a short-hand for the longer proposition (the right-hand side): it worksthe same way in the sense that if g tells us the truth value of said proposition. Here, g has no free variables;imagine we extend our example to write

h(x) : “x◦C equals 90◦C”.

Now, h is representing a longer proposition. When we bind x to a value via h(10), we find

h(10) = “10◦C equals 90◦C”

which can be evaluated to false.

1.1.1 Connectives

Definition 1.3. A connective binds together a number of propositional terms into a single, compound proposition calledan expression. For brevity, we use symbols to denote common connectives:

• “not x” is denoted ¬x, and often termed logical complement (or negation).

• “x and y” is denoted x ∧ y, and often termed logical conjunction.

• “x or y” is denoted x ∨ y, and often called an inclusive-or, and termed logical (inclusive) disjunction.

• “x or y but not x and y” is denoted x⊕ y, and often called an exclusive-or, and termed logical (exclusive) disjunc-tion.

git # ba293a0e @ 2019-11-14 14



• “x implies y” is denoted x ⇒ y, and sometimes written as “if x then y”, and termed logical implication, andfinally

• “x is equivalent to y” is denoted x ≡ y, and sometimes written as “x if and only if y” or even “x iff. y”. termedlogical equivalence.

Note that we group statements using parentheses when there could be some confusion about the order they are applied.As such (x ∧ y) is the same as x ∧ y, and (x ∧ y) ∨ z simply means we apply the ∧ connective to x and y first, then ∨ tothe result and z.

Definition 1.4. Provided we include parentheses in a compound proposition, there will be no ambiguity wrt. the orderconnectives are applied. For instance, if we write

(x ∧ y) ∨ z

it is clear that we first resolve the conjunction of x and y, then the disjunction of that result and z.If parentheses are not included however, we rely on precedence rules to determine the order for us. In short, the

following list

1. ¬,

2. ∧,

3. ∨,

4. ⇒,

5. ≡

assigns a precedence level to each connective. Using the same example as above, if we omit the parentheses and insteadwrite

x ∧ y ∨ z

we still get the same result: ∧ has a higher precedence level than ∨ (sometimes we say ∧ “binds more tightly” to operandsthan ∨), so we resolve the former before the latter.

Example 1.4. For example, the expression

“the temperature is less than 90◦C ∧ the temperature is greater than 10◦C”

contains two terms that propose

“the temperature is less than 90◦C”

and

“the temperature is greater than 10◦C” .

These terms are joined together using the ∧ connective so that the whole expression evaluates to true if both ofthe terms are true, otherwise it evaluates to false. In a similar way we might write a compound proposition

“the temperature is less than x◦C ∧ the temperature is greater than y◦C”

which can only be evaluated when we assign values to the variables x and y.

Definition 1.5. The meaning of connectives is usually describe in a tabular form which enumerates the possible valueseach term can take and what the resulting truth value is; we call this a truth table.

x y ¬x x ∧ y x ∨ y x ⊕ y x⇒ y x ≡ yfalse false true false false false true truefalse true true false true true true falsetrue false false false true true false falsetrue true false true true false true true

Example 1.5. The ¬ connective complements (or negates) the truth value of a given expression. Consideringthe expression

¬(x > 10),

we find that the expression ¬(x > 10) is true if the term x > 10 is false and the expression is false if x > 10 istrue. If we assign x = 9, x > 10 is false and hence the expression ¬(x > 10) is true. If we assign x = 91, x > 10 istrue and hence the expression ¬(x > 10) is false.

git # ba293a0e @ 2019-11-14 15



Example 1.6. The meaning of the ∧ connective is also as one would expect; the expression

(x > 10) ∧ (x < 90)

is true if both the expressions x > 10 and x < 90 are true, otherwise it is false. So if x = 20, the expression is true.But if x = 9 or x = 91, then it is false: even though one or other of the terms is true, they are not both true.

Example 1.7. The inclusive-or and exclusive-or connectives are fairly similar. The expression

(x > 10) ∨ (x < 90)

is true if either x > 10 or x < 90 is true or both of them are true. Here we find that all the assignments x = 20,x = 9 and x = 91 mean the expression is true; in fact it is hard to find an x for which it evaluates to false!Conversely, the expression

(x > 10) ⊕ (x < 90)

is only true if only one of either x > 10 or x < 90 is true; if they are both true then the expression is false. Wenow find that setting x = 20 means the expression is false while both x = 9 and x = 91 mean it is true.

Example 1.8. Implication is more tricky. If we write x ⇒ y, we typically call x the hypothesis and y theconclusion. In order to justify the truth table for implication, consider the example

(x is prime ) ∧ (x , 2)⇒ (x ≡ 1 (mod 2))

i.e., if x is a prime other than 2, it follows that it is odd. Therefore, if x is prime then the expression is true ifx ≡ 1 (mod 2) and false otherwise (since the implication is invalid). If x is not prime, then the expression doesnot really say anything about the expected outcome: we only know what to expect if x was prime. Since it couldstill be that x ≡ 1 (mod 2) even when x is not prime, based on what we know from the example, we assume itis true when this case occurs.

Put in a less formal way, the idea is that anything can follow from a false hypothesis. If the hypothesis isfalse, we cannot be sure whether or not the conclusion is false: we therefore we assume it is possibly true,which is sort of an “optimistic default”. Consider a less formal example to support this. The statement “if Iam unhealthy then I will die” means x = “I am unhealthy” and y = “I will die”, and that r = x ⇒ y has fourpossible cases:

1. I am healthy and do not die, so x = false, y = false and r = true,

2. I am healthy and die, so x = false, y = true and r = true,

3. I am unhealthy and do not die, so x = true, y = false and r = false, and

4. I am unhealthy and die, so x = true, y = true and r = true.

The first two cases do not contradict the original statement (since in them I am healthy, so it doesn’t apply):only the third case does, in that I do not die (maybe I had a good doctor for instance).

Example 1.9. In contrast, equivalence is fairly simple. The expression x ≡ y is only true if x and y evaluateto the same value. This matches the concept of equality in other contexts, such as between numbers. As anexample, consider

(x is odd ) ≡ (x ≡ 1 (mod 2)).

This expression is true since if the left side is true, the right side must also be true and vice versa. If we changeit to

(x is odd ) ≡ (x is prime ),

then the expression is false. To see this, note that only some odd numbers are prime: just because a numberis odd does not mean it is always prime although if it is prime it must be odd (apart from the corner case ofx = 2). So the equivalence works in one direction but not the other and hence the expression is false.

Definition 1.6. An expression which is equivalent to true, no matter what values are assigned to any variables, is calleda tautology; an expression which is equivalent to false is called a contradiction.

Definition 1.7. We call two expressions logically equivalent if they are composed of the same variables and have thesame truth value for every possible assignment to those variables. More formally, two expressions x and y are equivalentiff. x ≡ y can be proved a tautology.

Various subtleties emerge when trying to prove two expressions are logically equivalent, but for our purposesit suffices to adopt a brute-force approach by a) enumerating the values each variable can take, then b) checkingwhether or not the expressions produce identical truth values in all cases. Note that, in practise, this can clearlybecome difficult wrt. amount of work required: with n variables there will be 2n possible assignments, whichgrows (very) quickly as n grows.

git # ba293a0e @ 2019-11-14 16



1.1.2 Quantifiers

Definition 1.8. A free variable in a given expression is one which has not yet been assigned a value. Roughly speaking,a quantifier allows a free variable to take one of many values:

• the universal quantifier “for all x, y is true” is denoted ∀ x [y], while

• the existential quantifier “there exists an x such that y is true” is denoted ∃ x [y].

We say that binding a quantifier to a variable quantifies it; after it has been quantified we say it is bound (rather thanfree).

As an aside, quantifiers can be roughly viewed as moving from propositional logic into predicate (or first-order)logic (with second-order logic then a further extension, e.g., allowing quantification of relations). Put moresimply, however, when we encounter an expression such as

∃ x [y]

we are essentially assigning x all possible values; to make the expression true, just one of these values needs tomake the expression y true. Likewise, when we encounter

∀ x [y]

we are again assigning x all possible values. This time however, to make the expression true, all of them needto make the expression y true.

Example 1.10. Consider the following

“there exists an x such that x ≡ 0 (mod 2)”

which we can rewrite symbolically as∃ x [x ≡ 0 (mod 2)].

In this case, x is bound by an ∃ quantifier; we are asserting that for some value of x, it is true that x ≡ 0 (mod 2).Restating the same thing another way, if just one x means x ≡ 0 (mod 2) is true then the whole (quantified)expression is true. Clearly x = 2 satisfies this condition, so the expression is true.

Example 1.11. Consider the following

“for all x, x ≡ 0 (mod 2)”

which we can rewrite symbolically as∀ x [x ≡ 0 (mod 2)].

This is a more general assertion about x, demanding that for all x it is true that x ≡ 0 (mod 2). Taking theopposite approach to the above, to conclude the whole (quantified) expression is false we need an x such thatx . 0 (mod 2). This is easy, because any odd value of x is good enough, so the expression is false.

1.2 Collections: sequences

Definition 1.9. A sequence is an ordered collection of elements, which can be of any (but normally homogeneous) type.The size or length of a sequence, denoted |X| (or, alternativly, #X elsewhere), is the number of elements it contains.

The order of elements is important, with an index used to refer to each one: the i-th element of a sequence X is denotedXi, st. 0 ≤ i < |X| and X j =⊥ for j < 0 or j ≥ |X|.

1.2.1 Basic definition

Example 1.12. Consider a sequence of elements

A = 〈0, 3, 1, 2〉

which one can think of as like a list, read from left-to-right. In this case, we conclude, for example, that |A| = 4,A0 = 0, A1 = 3, A2 = 1, and A3 = 2; A4 =⊥, because that element does not exist (i.e., the index 4 is too large, andso deemed out-of-bounds).

Example 1.13. Each element in the sequence A is a number, but we might equally define a sequence of characterssuch as

B = 〈‘a’, ‘b’, ‘c’, ‘d’, ‘e’〉.However, since the order or elements is important if we define

C = 〈2, 1, 3, 0〉

then clearly A , C because, for example, A0 , C0. Both A and B are sequences of homogeneous type: theirelements are all numbers and characters respectively.

git # ba293a0e @ 2019-11-14 17



1.2.2 Operations

The concatenate operator can be used to join two together sequences. Although most often used on the right-hand side of an equality (or an assignment), it is also allowed on the left-hand side: in such a case, it performs“deconcatination” by splitting apart a sequence.

Example 1.14. Imagine we start with two 4-element sequences

F = 〈0, 1, 2, 3〉G = 〈4, 5, 6, 7〉

Their concatenation is denoted

H = F ‖ G = 〈0, 1, 2, 3〉 ‖ 〈4, 5, 6, 7〉 = 〈0, 1, 2, 3, 4, 5, 6, 7〉

noting that the result H is an 8-element sequence whose first (resp. last) four elements match F (resp. G).Likewise, we might write

I ‖ J = H

where now the concatenation operator appears on the left-hand side: this works basically the same way but inreverse, meaning

I ‖ J = H = 〈0, 1, 2, 3, 4, 5, 6, 7〉 = 〈0, 1, 2, 3〉 ‖ 〈4, 5, 6, 7〉

and so I = F = 〈0, 1, 2, 3〉 and J = G = 〈4, 5, 6, 7〉. Note that this approach demands the left- and right-hand sideshave the same length, so elements can be organised appropriately.

1.2.3 Advanced definition and short-hands

It can make sense to avoid enumerating a sequence completely, which is the approach used above: explicitlyincluding each element can become laborious, error prone, or simply inconvenient. The examples below showvarious short-hands to address this problem:

1. Where there is no ambiguity, ellipses (or continuation dots) are allowed to replace one or more elements.For example, again consider the sequence

B = 〈‘a’, ‘b’, ‘c’, ‘d’, ‘e’〉

which we could rewrite asB = 〈‘a’, ‘b’, . . . , ‘e’〉.

with the ellipsis representing the sub-sequence 〈‘c’, ‘d’〉. In fact, this approach is sometimes required. In Bthere was a well defined start and end to the sequence, but in

E = 〈1, 2, 3, 4, . . .〉.

the ellipsis represents elements we either do not know, or which do not matter: because there is no endto the sequence, we cannot necessarily fill in the ellipsis as before. Note that this also means |E|might beinfinite or simply unknown.

2. It can be convenient to apply similar reasoning to the indices used to specify elements. For example,

B0,1,...,3 = B0,1,2,3= B0 ‖ B1 ‖ B2 ‖ B3= 〈‘a’, ‘b’, ‘c’, ‘d’〉

3. The so-called comprehension (or builder) notation allows generation of a sequence using a rule. Consider

F = 〈x | 4 ≤ x < 8〉 = 〈4, 5, 6, 7〉

for example: the comprehension includes a) an output expression (i.e., x) and b) a rule (or predicate, i.e.,4 ≤ x < 8) that limits the instances of variables considered when forming the output. Informally, youmight read this example as “all x such that x is between 4 and 7”.

git # ba293a0e @ 2019-11-14 18



1.3 Collections: sets

Definition 1.10. A set is an unordered collection of elements; the elements may only occur once (otherwise we have abag or multi-set), and can normally be of any (but homogeneous) type.

The size or cardinality of a set, denoted |X| (or, alternativly, #X elsewhere), is the number of elements it contains. Ifthe element x is in (resp. not in) the set X, we say x is a member of X (resp. not a member) or write x ∈ X (resp. x < X).

As an aside, this suggests the elements can potentially be other sets. Russell’s paradox, a discovery by BertrandRussell in 1901, describes an issue with formal set theory that stems from this fact. In a sense, the paradox is arephrasing of the liar paradox seen earlier. Consider A, the set of all sets which do not contain themselves: thequestion is, does A contain itself? If it does, it should not be in A by definition but it is; if it does not, it shouldbe in the set A by definition but it is not.

1.3.1 Basic definition

Example 1.15. Consider the set of integers between two and eight (inclusive), which we can define as

A = {2, 3, 4, 5, 6, 7, 8}.

In this case, we conclude, for example, that |A| = 7, 2 ∈ A, and 9 < A (i.e., 2 is a member, but 9 is not a member,of A). Notice that, unlike a sequence, because the order of elements is irrelevant, it makes no sense to refer tothem via an index: Ai implies there is some specific i-th element, but, without a specific order, which element itrefers to is unclear. However, also note the same fact means if we define

B = {8, 7, 6, 5, 4, 3, 2}

then we can conclude A = B.

1.3.2 Operations

Definition 1.11. A sub-set, say Y, of a set X is such that for every y ∈ Y we have that y ∈ X. This is denoted Y ⊆ X.Conversely, we can say X is a super-set of Y and write X ⊇ Y.

From this definition, it follows that every set is a valid sub-set and super-set of itself and, therefore, that X = Y iff.X ⊆ Y and Y ⊆ X. If X , Y we use the terms proper sub-set and proper super-set, and so write Y ⊂ X and X ⊃ Yrespectively.

Definition 1.12. For sets X and Y, we have that

• the union of X and Y is X ∪ Y = {x | x ∈ X ∨ x ∈ Y},

• the intersection of X and Y is X ∩ Y = {x | x ∈ X ∧ x ∈ Y},

• the difference of X and Y is X − Y = {x | x ∈ X ∧ x < Y}, and

• the complement of X is X = {x | x ∈ U ∧ x < X}.

We say X and Y are disjoint (or mutually exclusive) if X ∩ Y = ∅. Note also that the complement operation can berewritten X − Y = X ∩ Y.

Definition 1.13. The union and intersection operations preserve a law of cardinality called the principle of inclusion,which states

|A ∪ B| = |A| + |B| − |A ∩ B|.

This property has a simple intuition, in that elements in both A and B will be counted twice by |A| and |B|; this is correctedvia the last term (i.e., via |A ∩ B|).

Definition 1.14. The power set of a set X, denoted P(X), is the set of every possible sub-set of X. Note that ∅ is a memberof all power sets.

On first reading, these definitions can seem quite abstract. However, we have another tool at our disposalwhich describes what they mean in a more concrete, visual way. This tool is a Venn diagram, named aftermathematician John Venn who invented the concept in 1881. The idea is that sets are represented by regionsdrawn inside a frame that implicitly represents the universal set U. By placing the regions inside each otherand overlapping their boundaries, we can describe most set-related concepts very easily.

git # ba293a0e @ 2019-11-14 19



A B

(a) A ∪ B.

A B

(b) A ∩ B.

A B

(c) A − B.

BA

(d) A.

Figure 1.1: A collection of Venn diagrams for standard set operations.

A B

1

2

3

4

5

6

7

8

9

10

Figure 1.2: An example Venn diagram showing membership of two sets.

git # ba293a0e @ 2019-11-14 20



Example 1.16. Figure 1.1 includes four Venn diagrams which describe the union, intersection, difference, andcomplement operations: there is a shaded region representing members of each resulting set. For example, inthe diagram for A∪ B the shaded region covers all of the sets A and B: the result contains all elements in eitherA or B or both.

Example 1.17. Consider the setsA = {1, 2, 3, 4}B = {3, 4, 5, 6}

where the universal set isU = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.

Recalling that elements within a given region are members of that set, Figure 1.2 describes several cases. Noticethat

1. the union of A and B is A ∪ B = {1, 2, 3, 4, 5, 6}, i.e., elements which are either members of A or B or both;note that the elements 3 and 4 do not appear twice because said result is a set,

2. the intersection of A and B is A ∩ B = {3, 4}, i.e., elements that are members of both A and B,

3. the difference between A and B is A−B = {1, 2}, i.e., elements that are members of A but not also membersof B, and

4. the complement of A is A = {5, 6, 7, 8, 9, 10}, i.e., elements that are not members of A.

We can also use this example to verify that the principle of inclusion holds: given |A| = 4 and |B| = 4, checkingthe above shows |A ∪ B| = 6 and |A ∩ B| = 2 so by the principle of inclusion we have 6 = 4 + 4 − 2.

1.3.3 Products

Definition 1.15. The Cartesian product (or cross product) of n sets, say X0,X1, . . . ,Xn−1, is defined as

X0 × X1 × · · · × Xn−1 = {〈x0, x1, . . . , xn−1〉 | x0 ∈ X0 ∧ x1 ∈ X1 ∧ · · · ∧ xn−1 ∈ Xn−1}.

In the most simple case of n = 2, the Cartesian product X0 × X1 is the set of all possible pairs where the first item in thepair is a member of X0 and the second item is a member of X1.

Definition 1.16. The Cartesian product of a set X with itself n times is denoted Xn; for completeness, we define X0 = ∅and X1 = X. A special-case of this notation is X∗, which applies the Kleene star operator: this captures the Cartesianproduct of X with itself a finite number of times (i.e., zero or more): a more precise definition is therefore

X∗ = {〈x0, x1, . . . , xn−1〉 | n ≥ 0, xi ∈ X},

which is sometimes extended to include a so-called Kleen plus st.

X+ = {〈x0, x1, . . . , xn−1〉 | n ≥ 1, xi ∈ X}.

Example 1.18. Imagine we have the set A = {0, 1}. The Cartesian product of A with itself is

A × A = A2 = {〈0, 0〉, 〈0, 1〉, 〈1, 0〉, 〈1, 1〉}.

That is, the pairs in A × A (or A2, if you prefer) represent all possible sequences a) whose length is two, and b)whose elements are members of A.

1.3.4 Advanced definition and short-hands

Definition 1.17. Some sets are hard (or impossible) to define using the notation used so far, and therefore need somespecial treatment:

• The set ∅, called the null set or empty set, contains no elements: it is empty, meaning |∅| = 0. Note that ∅ is a setnot an element: one cannot write the empty set as {∅} since this is the set with one element, that element being theempty set itself.

• The contents of the set U, called the universal set, depends on the context. Roughly speaking, it contains everyelement from the problem being considered.

git # ba293a0e @ 2019-11-14 21



It can make sense to avoid enumerating a set completely, which is the approach used above: explicitly includingeach element can become laborious, error prone, or simply inconvenient. The examples below show variousshort-hands to address this problem:

1. Where there is no ambiguity, ellipses (or continuation dots) are allowed to replace one or more elements.For example, we might rewrite the set A as

A = {2, 3, . . . , 7, 8}

with the ellipsis representing the sub-set {4, 5, 6}. In fact, this approach is sometimes required. Imagine wewant to define a set of even integers which are greater than or equal to two: this set has an infinite size,so we need to defined it as

C = {2, 4, 6, 8, . . .}.

2. The so-called comprehension (or builder) notation allows generation of a set using a rule. Consider

D = {x | f (x)}.

for example: the comprehension includes a) an output expression (i.e., x) and b) a rule (or predicate,i.e., f (x)) that limits the instances of variables considered when forming the output. Informally, youmight read this example as “all x such that f (x) = true”. Using the same idea, we could rewrite previousexamples as

A = {x | 2 ≤ x ≤ 8},

and

C = {x | x > 0 ∧ x ≡ 0 (mod 2)}

and so define the same sets we defined explicitly.

Definition 1.18. Several useful sets that relate to numbers can be defined:

• The integers are whole numbers which can be positive or negative and also include zero; this set is denoted by

Z = {. . . ,−3,−2,−1, 0,+1,+2,+3, . . .}

or alternatively

Z = {0,±1,±2,±3, . . .}.

• The natural numbers are whole numbers which are positive; they are denoted by the set

N = {0, 1, 2, 3, . . .}.

and represent a sub-set of Z.

• The binary numbers are simply one and zero, i.e.,

B = {0, 1},

and represent a sub-set of N.

• The rational numbers are those which can be expressed in the form x/y, where x and y are both integers andtermed the numerator and denominator. This set is denoted

Q = {x/y | x ∈ Z ∧ x ∈ Z ∧ y , 0}

where we disallow a value of y = 0 to avoid problems. Clearly the set of rational numbers is a super-set of Z, N,and B, since, for example, we can write x/1 to convert any x ∈ Z as a member of Q. However, not all numbers arerational: some are irrational in the sense that it is impossible to find a x and y such that they exactly represent therequired result. Examples include the value of π which is approximated by, but not exactly equal to, 22/7.

git # ba293a0e @ 2019-11-14 22



1.4 Collections: some additional special-cases

1.4.1 Tuples

Definition 1.19. It is common to use the term tuple as a synonym for sequence: a sequence of n elements is an n-tuple,or simply a tuple if the number of elements is irrelevant. Note that the special cases

n = 2 { 2-tuple { pairn = 3 { 3-tuple { triple

have intuitive names.

Note that, from here on, we use the terms sequence and tuple as an informal way to distinguish between caseswhere elements are, respectively, a) of (potentially) homogeneous and heterogeneous type, and/or b) mutable(i.e., can be altered) and immutable (i.e., cannot be altered).

Example 1.19. Noting the bracketing style used to differentiate it from a sequence, we can define an example2-tuple or pair as

A = (4, ‘f’)

In this case, the elements A0 = 4 and A1 = ‘f’ clearly have different types: the first is a number and one whichis a character.

1.4.2 Strings

Definition 1.20. An alphabet is a non-empty set of symbols.

Definition 1.21. A string X wrt. some alphabet Σ is a sequence, of finite length, whose elements are members of Σ, i.e.,

X = 〈X0,X1, . . . ,Xn−1〉

for some n st. Xi ∈ Σ for 0 ≤ i < n; if n is zero, we term X the empty string and denote it ε. It can be useful, andis common to write elements in in human-readable form termed a string literal: this basically just means writing themfrom right-to-left without any associated notation (e.g., brackets or commas).

Definition 1.22. A language is a set of strings.

Example 1.20. If

Σ = {0, 1}

then the strings of length n = 2 (left), and the corresponding literal (right), are as follows:

〈0, 0〉 ≡ 00〈1, 0〉 ≡ 01〈0, 1〉 ≡ 10〈1, 1〉 ≡ 11

Example 1.21. If

Σ = {‘a’, ‘b’, . . . , ‘z’}

then the strings of length n = 2 (left), and the corresponding literal (right), are as follows:

〈‘a’, ‘a’〉 ≡ aa〈‘b’, ‘a’〉 ≡ ab

...〈‘a’, ‘b’〉 ≡ ba〈‘b’, ‘b’〉 ≡ bb

...〈‘z’, ‘z’〉 ≡ zz

git # ba293a0e @ 2019-11-14 23



1.5 Functions

Definition 1.23. If X and Y are sets, a function f from X to Y is a process that maps each element of X to an element ofY. We write this as

f : X→ Y

where X is termed the domain of f and Y is the codomain of f . For an element x ∈ X, which we term the pre-image,there is only one y = f (x) ∈ Y which is termed the image of x. Finally, the set

{y | y = f (x) ∧ x ∈ X ∧ y ∈ Y}

which is all possible results, is termed the range of f and is always a sub-set of the codomain.

From this definition it might seem as though we can only have functions with one input and one output.However, we are perfectly entitled to use sets of sets; this means we can use a Cartesian product as the domain.For example, we can define a function

f : A × A→ B

which takes elements from the Cartesian product A × A as input, and produces an element of B as output. Sosince the inputs are of the form 〈x, y〉 ∈ A × A, f takes two input values “packaged up” as a single pair.

Example 1.22. Consider a function Inv which takes an integer x as input, and produces the rational number1/x as output:

Inv :

Z → Q

x 7→ 1/x

Note that here we write the function signature, which defines the domain and codomain of Inv, inline withthe definition of the function behaviour. This is simply a short-hand for writing the function signature

Inv : Z→ Q

and function behaviourInv(x) 7→ 1/x

separately. In either case the domain of Inv is Z, because it accepts an integer as input; the codomain is Q,because it produces a rational number as output. If we take an integer and apply the function to get somethinglike Inv(2) = 1/2, we have that 1/2 is the image of 2 or conversely 2 is the pre-image of 1/2 under Inv.

Example 1.23. Consider the function

Max :

Z ×Z → Z

〈x, y〉 7→

{x if x > yy otherwise

This is the maximum function on integers; it takes two integers as input and produces an integer, the maximumof the inputs, as output. So if we take the pair of integers 〈2, 4〉 say, and then apply the function, we getMax(2, 4) = 4. In this case, the domain of Max is Z×Z and the codomain is Z; the integer 4 is the image of thepair 〈2, 4〉 under Max.

1.5.1 Composition

Definition 1.24. Given two functions f : X→ Y and g : Y→ Z, the composition of f and g is denoted

g ◦ f : X→ Z.

The notation g ◦ f should be read as “apply g to the result of applying f ”. That is, given some input x ∈ X, thiscomposition is equivalent to applying y = f (x) and then z = g(y) to get the result z ∈ Z. More formally, we have

(g ◦ f )(x) = g( f (x)).

git # ba293a0e @ 2019-11-14 24



1.5.2 Properties

Definition 1.25. For a given function f , we say that f is

• surjective if the range equals the codomain, i.e., there are no elements in the codomain which do not have apre-image in the domain,

• injective if no two elements in the domain have the same image in the range, and

• bijective if the function is both surjective and injective, i.e., every element in the domain is mapped to exactly oneelement in the codomain.

Using the examples above, we clearly have that Inv is not surjective but Max is. This follows because wecan construct a rational 2/3 which does not have an integer pre-image under Inv so the function cannot besurjective. Equally, for any integer x in the range of Max there is always a pair 〈x, y〉 in the domain such thatx > y so Max is surjective, in fact there are lots of them sinceZ is infinite in size! In the same way, we have thatInv is injective but Max is not. Only one pre-image x maps to the value 1/x in the range under Inv but thereare multiple pairs 〈x, y〉which map to the same image under Max, for example 4 is the image of both 〈1, 4〉 and〈2, 4〉 under Max.

Definition 1.26. The identity function I on a set X is defined by

I :

X → X

x 7→ x

so that it maps all elements to themselves. Given two functions f and g defined by f : X→ Y and g : Y→ X, if g ◦ f isthe identity function on set X and f ◦ g is the identity on set Y, then f is the inverse of g and g is the inverse of f . Wedenote this by f = g−1 and g = f−1. If a function f has an inverse, we hence have f−1

◦ f = I.

The inverse of a function maps elements from the codomain back into the domain, reversing the originalfunction. It is easy to see that not all functions have an inverse. In particular, if a function is not injective therewill be more than one potential pre-image for the inverse of any image; this suggests we cannot sensibly mapfrom the codomain back into the domain. The Inv function is another, more concrete example: some valuesuch as 1/x do have an inverse, namely x/1, yet others such as 2/3 do not. For example, 3/2 is not an integer,i.e., not a member of the domainZ, so we cannot map it from the codomain back into the domain. Put anotherway, Inv, as we have defined it at least, has no inverse.

Example 1.24. Consider the successor function on integers

Succ :

Z → Z

x 7→ x + 1

which takes an integer x as input and produces the successor (or next) integer x + 1 as output. This function isbijective, since the codomain and range are the same and no two integers have the same successor. As a result,the inverse is easy to describe as

Pred :

Z → Z

x 7→ x − 1

which is the predecessor function: it takes an integer x as input and produces x − 1 as output. To see thatSucc−1 = Pred and Succ−1 = Pred note that

(Pred ◦ Succ)(x) = (x + 1) − 1 = x

which is the identity function, and conversely that

(Succ ◦ Pred)(x) = (x − 1) + 1 = x

which is also the identity function.

git # ba293a0e @ 2019-11-14 25



1.5.3 Relations

Definition 1.27. Informally, a binary relation f on a set X is like a propositional function which takes members of theset as input and “filters” them to produce an output. As a result, for a set X the relation f forms a sub-set of X × X. Fora given set X and a binary relation f , we say f is

• reflexive if f (x, x) = true for all x ∈ X,

• symmetric if f (x, y) = true implies f (y, x) = true for all x, y ∈ X, and

• transitive if f (x, y) = true and f (y, z) = true implies f (x, z) = true for all x, y, z ∈ X.

If f is reflexive, symmetric and transitive, then we call it an equivalence relation.

Example 1.25. Consider a set A = {1, 2, 3, 4}, whose Cartesian product is

A × A =

〈1, 1〉, 〈1, 2〉, 〈1, 3〉, 〈1, 4〉,〈2, 1〉, 〈2, 2〉, 〈2, 3〉, 〈2, 4〉,〈3, 1〉, 〈3, 2〉, 〈3, 3〉, 〈3, 4〉,〈4, 1〉, 〈4, 2〉, 〈4, 3〉, 〈4, 4〉

.Imagine we define a function

Equ :

Z ×Z → {false, true}

〈x, y〉 7→

{true if x = yfalse otherwise

which tests whether two inputs are equal. Using the function we can form a sub-set of A × A called AEqu, forexample, by “filtering out” the pairs (x, y) Equ(x, y) = true to get

AEqu = {〈1, 1〉, 〈2, 2〉, 〈3, 3〉, 〈4, 4〉}.

For members of A, say x, y, z ∈ A,

1. Equ(x, x) = true, so the relation is reflexive,

2. if Equ(x, y) = true then Equ(y, x) = true, so the relation is symmetric, and

3. if Equ(x, y) = true and Equ(y, z) = true then Equ(x, z) = true, so the relation is transitive

and hence an equivalence relation. Now imagine we define another function

Lth :

Z ×Z → {false, true}

〈x, y〉 7→

{true if x < yfalse otherwise

which tests whether one input is less than another. Taking the same approach as above, we can form

ALth = {〈1, 2〉, 〈1, 3〉, 〈1, 4〉, 〈2, 3〉, 〈2, 4〉, 〈3, 4〉}

of all pairs (x, y) with x, y ∈ A st. Lth(x, y) = true. Now, for members of A, say x, y, z ∈ A,

1. Lth(x, x) = false, so the relation is not reflexive (it is irreflexive),

2. if Lth(x, y) = true then Lth(y, x) = false, so the relation is not symmetric (it is anti-symmetric), but

3. if Lth(x, y) = true and Lth(y, z) = true then Lth(x, z) = true, so the relation is transitive.

git # ba293a0e @ 2019-11-14 26



1.6 Boolean algebra

Most people encounter elementary algebra fairly early on at school. Even if the name is unfamiliar, the basicidea should be: one has

• a set of values, e.g., Z,

• a set of operators, e.g., +,

• a set of relations, e.g., =, and

• a set of axioms which dictate what the operators and relations mean and how they work.

Again, you may not know what these axioms are called, but you probably do know how they work. Forexample, given x, y, z ∈ Z, you might know a) we can write x + (y + z) = (x + y) + z, i.e., say that addition isassociative, or b) we can write x · 1 = x, i.e., say that the multiplicative identity of x is 1. In reality, we can bemuch more general than this: when we discuss “an” algebra, all we really mean is a set of values for whichthere is a well defined set of operators, relations and axioms; abstract algebra is basically concerned with setsof values that are, potentially, not numbers.

Definition 1.28. An abstract algebra includes

• a set of values, say X,

• a set of binary operators� : X × X→ X,

• a set of unary operators� : X→ X,

• a set of binary relations : X × X→ {false, true},

and

• a set of axioms which dictate what the operators and relations mean and how they work.

In the early 1840s, mathematician George Boole put this generality to good use by combining (or, in fact,unifying) concepts in logic and set theory: the result forms Boolean algebra [1]. Put (rather too) simply, Boolesaw that working with logic a expression is much the same as working with an arithmetic expression, andreasoned that the axioms of the latter should apply to the former as well. Based on what we already know, forexample, 0 and false and ∅ are all sort of equivalent, as are 1 and true andU; likewise, x ∧ y and x ∩ y are sortof equivalent, as are x∨ y and x∪ y and ¬x and x. More formally, we can see that the identity axiom applies insame way:

x ∨ false = x x ∧ true = xx ∪ ∅ = x x ∩U = xx + 0 = x x · 1 = x

Ironically, this was viewed as somewhat obscure; Boole himself did not necessarily regard logic directly as amathematical concept. It was not until 1937 that Claude Shannon, then a student of Electrical Engineering andMathematics, saw the potential of using Boolean algebra to represent and manipulate digital information [7].This insight is fundamentally important, essentially allowing a “link” between theory (i.e., Mathematics) andpractice (i.e., physical circuits that we can build).

Definition 1.29. Putting everything together produces the following definition for Boolean algebra. Consider the setB = {0, 1} on which there are two binary operators

∧ :

B × B → B

〈x, y〉 7→

0 if x = 0 and y = 00 if x = 0 and y = 10 if x = 1 and y = 01 if x = 1 and y = 1

git # ba293a0e @ 2019-11-14 27



and

∨ :

B × B → B

〈x, y〉 7→


and a unary operator

¬ :

B → B

x 7→

{1 if x = 00 if x = 1

AND, OR and NOT respectively; they are governed the following axioms

commutativity x ∧ y ≡ y ∧ xassociation (x ∧ y) ∧ z ≡ x ∧ (y ∧ z)distribution x ∧ (y ∨ z) ≡ (x ∧ y) ∨ (x ∧ z)identity x ∧ 1 ≡ xnull x ∧ 0 ≡ 0idempotency x ∧ x ≡ xinverse x ∧ ¬x ≡ 0absorption x ∧ (x ∨ y) ≡ xde Morgan ¬(x ∧ y) ≡ ¬x ∨ ¬y

commutativity x ∨ y ≡ y ∨ xassociation (x ∨ y) ∨ z ≡ x ∨ (y ∨ z)distribution x ∨ (y ∧ z) ≡ (x ∨ y) ∧ (x ∨ z)identity x ∨ 0 ≡ xnull x ∨ 1 ≡ 1idempotency x ∨ x ≡ xinverse x ∨ ¬x ≡ 1absorption x ∨ (x ∧ y) ≡ xde Morgan ¬(x ∨ y) ≡ ¬x ∧ ¬y

equivalence x ≡ y ≡ (x⇒ y) ∧ (y⇒ x)implication x⇒ y ≡ ¬x ∨ yinvolution ¬¬x ≡ x

Note that the ∧ and ∨ operations in Boolean algebra behave in a similar way to · and + in a elementary algebra: as such,they are sometimes referred to as “product” and “sum” operations (and denoted · and + as a result).

Definition 1.30. In line with propositional logic, it is common to add a third binary operator called XOR:

⊕ :

B × B → B

〈x, y〉 7→


More generally, XOR is an example of a derived operator, a name which hints at the fact it is a short-hand derived fromoperators we already have. Put another way, because

x ⊕ y ≡ (¬x ∧ y) ∨ (x ∧ ¬y),

XOR can be defined in terms of AND, OR and NOT. Two other examples, which will be useful later, are

• “NOT-AND” or NAND, which is denoted and defined as

x ∧ y ≡ ¬(x ∧ y),

and

git # ba293a0e @ 2019-11-14 28



• “NOT-OR” or NOR, which is denoted and defined as

x ∨ y ≡ ¬(x ∨ y).

Definition 1.31. A functionally complete (or universal) set of Boolean operators is st. every possible truth table canbe described by combining the constituent members into a Boolean expression. For example, the sets {¬,∧} and {¬,∨} arefunctionally complete.

In 1921, Emil Post developed [5] a set of necessary and sufficient conditions for such a description to be valid(i.e., a method to prove whether a given set is or is not functionally complete); where such a set is singleton,i.e., contains one operator only, that operator is termed a Sheffer function [8] (after Henry Sheffer, who, during1912, independently rediscovered work of 1880 by Charles Sanders Peirce). For example, the singleton sets{ ∧ } and { ∨ } are functionally complete, meaning NAND and NOR can both be described as Sheffer functions.

Definition 1.32. Certain operators (and hence axioms) are termed monotone: this means changing an operand eitherleaves the result unchanged, or that it always changes the same way as the operand. Conversely, other operators aretermed non-monotone when these conditions do not hold.

Example 1.26. We can describex ∧ 0

as monotone, because changing x does not change the result (which is always 0); the same argument applies to

x ∧ 1.

In this case, notice that if x = 0 then the result is 0 whereas if x = 1 then the result is 1: this suggests changing xfrom 0 to 1 (resp. from 1 to 0) changes the result in the same way.

Definition 1.33. The fact there are AND and OR forms of most axioms hints at a more general underlying principle.Consider a Boolean expression e: the principle of duality states that the dual expression eD is formed by

1. leaving each variable as is,

2. swapping each ∧ with ∨ and vice versa, and

3. swapping each 0 with 1 and vice versa.

Of course e and eD are different expressions, and clearly not equivalent; if we start with some e ≡ f however, then we dostill get eD

≡ f D.As an example, consider axioms for

1. distribution, e.g., ife = x ∧ (y ∨ z) ≡ (x ∧ y) ∨ (x ∧ z)

theneD = x ∨ (y ∧ z) ≡ (x ∨ y) ∧ (x ∨ z)

and

2. identity, e.g., ife = x ∧ 1 ≡ x

theneD = x ∨ 0 ≡ x.

Definition 1.34. The de Morgan axiom can be turned into a more general principle. Consider a Boolean expression e: theprinciple of complements states that the complement expression ¬e is formed by

1. swapping each variable x with the complement ¬x,

2. swapping each ∧ with ∨ and vice versa, and

3. swapping each 0 with 1 and vice versa.

git # ba293a0e @ 2019-11-14 29



As an example, consider that ife = x ∧ y ∧ z,

then by the above we should findf = ¬e = (¬x) ∨ (¬y) ∨ (¬z).

Proof:x y z ¬x ¬y ¬z e f0 0 0 1 1 1 0 10 0 1 1 1 0 0 10 1 0 1 0 1 0 10 1 1 1 0 0 0 11 0 0 0 1 1 0 11 0 1 0 1 0 0 11 1 0 0 0 1 0 11 1 1 0 0 0 1 0

1.6.1 Manipulation

Saying we have manipulated an expression just means we have transformed it from one form to another; whendone correctly, this should imply the original and alternative, transformed forms are equivalent. Often this ispresented as a derivation, or sequence of steps which relate to an axiom or assumption (so is assumed validby definition).

Example 1.27. Consider the (supposed) equality

(a ∧ b ∧ c) ∨ (¬a ∧ b) ∨ (a ∧ b ∧ ¬c) = b,

for example, which we can prove is valid via the derivation

(a ∧ b ∧ c) ∨ (¬a ∧ b) ∨ (a ∧ b ∧ ¬c)= (a ∧ b ∧ c) ∨ (a ∧ b ∧ ¬c) ∨ (¬a ∧ b) (commutativity)= (a ∧ b) ∧ (c ∨ ¬c) ∨ (¬a ∧ b) (distribution)= (a ∧ b) ∧ 1 ∨ (¬a ∧ b) (inverse)= (a ∧ b) ∨ (¬a ∧ b) (identity)= b ∧ (a ∨ ¬a) (distribution)= b ∧ 1 (inverse)= b (identity)

Of course we might employ a brute-force approach instead. If we write a truth table for the left- and right-handsides, this allows us to compare them: if the outputs match in all rows, we can conclude the left- and right-handsides are equivalent. For example,

a b c t0 = a ∧ b ∧ c t1 = ¬a ∧ b t2 = a ∧ b ∧ ¬c t0 ∨ t1 ∨ t2

0 0 0 0 0 0 00 0 1 0 0 0 00 1 0 0 1 0 10 1 1 0 1 0 11 0 0 0 0 0 01 0 1 0 0 0 01 1 0 0 0 1 11 1 1 1 0 0 1

shows the left- and right-hand sides are equivalent, as expected. Of course if there were more variables, wewould need to enumerate all possible values of each one. Our truth table would grow, and, at some point,the derivation-type approach starts to become more attractive: we achieve the same outcome, but withoutbrute-force enumeration.

Example 1.28. Another motivation for manipulating a given expression is to produce an alternative with somegoal or metric in mind; a common metric to use is the number of operators each expression uses, i.e., howsimple they are st. with the task then termed simplification, which is one way to judge their evaluation cost.Consider the exclusive-or operator, i.e., an expression x ⊕ y, which we can write as the more complicatedexpression

(y ∧ ¬x) ∨ (x ∧ ¬y)

git # ba293a0e @ 2019-11-14 30



An aside: how many n-input Boolean functions are there?

To be more concrete, imagine we are interested in the function

f : Bn→ B.

Note that each of the n inputs can obviously be assigned one of two values, namely 0 or 1, so there are 2n

possible assignments to n inputs. For example, if f were to have n = 1 input, say x, there would be 21 = 2possible assignments because x can either be 0 or 1. In the same way, for n = 2 inputs, say x and y, there are22 = 4 possible assignments: we can have

x = 0 y = 0x = 0 y = 1x = 1 y = 0x = 1 y = 1

This is why a truth table for n inputs will have 2n rows: each row details one assignment to the inputs, and theassociated output.

So, how many functions are there? A function with n-inputs is specified by a truth table with 2n rows; eachrow includes an output that is assigned 0 or 1, depending on exactly which function the truth table describes.So to count how many functions there are, we can just count how many possible assignments there are to the2n outputs. The correct answer is 22n

.

or the less complicated (i.e., simpler) expression

(x ∨ y) ∧ ¬(x ∧ y).

One can prove these are equivalent by writing truth tables for them, as we did above. To do so, however, weneed the expressions in the first place: how did we get the alternative from the original one?

The answer is we start with one expression, and (somehow intelligently) apply axioms to move step-by-step toward the other. For example, to do so more easily, notice that we can manipulate each term in the firstexpression whose form is p ∧ ¬q as follows:

(p ∧ ¬q)= (p ∧ ¬q) ∨ 0 (identity)= (p ∧ ¬q) ∨ (p ∧ ¬p) (inverse)= p ∧ (¬p ∨ ¬q) (distribution)

This introduces a new rule that we can make use of; since it was derived from axioms we assume are valid, wecan assume it is valid as well. Using it, we can rewrite the original expression as

(y ∧ ¬x) ∨ (x ∧ ¬y)= (x ∧ (¬x ∨ ¬y)) ∨ (y ∧ (¬x ∨ ¬y)) (p ∧ ¬q rule above)= (x ∨ y) ∧ (¬x ∨ ¬y) (distribution)= (x ∨ y) ∧ ¬(x ∧ y) (de Morgan)

which gives us the alternative we are looking for, noting it requires 4 operators rather than 5.

1.6.2 Functions

Definition 1.35. Given the definition of Boolean algebra, it is perhaps not surprising that a generic n-input, 1-outputBoolean function f can be described as

f : Bn→ B.

It is possible to extend this definition so it caters for m-outputs; we write the function signature as

g : Bn→ Bm.

This can be thought of as m separate n-input, 1-output Boolean functions, i.e.,

g0 : Bn→ B

g1 : Bn→ B

...gm−1 : Bn

→ B

git # ba293a0e @ 2019-11-14 31



An aside: an enumeration of 2-input Boolean functions.

We know there are 22nBoolean functions with n inputs; this represents a lot of functions as n grows. However,

for a small number of inputs, say n = 2, 22n= 222

= 24 = 16 functions is fairly manageable. In fact, we can easilywrite them all down: if fi denotes the i-th such function, we find

x y f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15

0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 10 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

This hints that one way to see why 22nis correct is to view each column for fi as filled by i represented in binary.

How many (unsigned) integers can be represented in m bits? The answer is 2m, suggesting 22n(unsigned)

integers can represented in 2n bits and hence there are 22nfunctions.

Some of the functions should look familiar, and, either way, we can try to describe them in English languageterms vs. their truth table. Note, for example, that

• f0 is the constant 0 function (i.e., f0(x, y) = 0, ignoring x and y),

• f1 is disjunction composed with complement (i.e., f1(x, y) = ¬(x ∨ y)),

• f2 is inhibition (i.e., f2(x, y) = y ∧ ¬x, which is like x < y),

• f3 is complement (i.e., f3(x, y) = ¬x, ignoring y),

• f4 is inhibition (i.e., f4(x, y) = x ∧ ¬y, which is like y < x),

• f5 is complement (i.e., f5(x, y) = ¬y, ignoring x),

• f6 is non-equivalence (i.e., f5(x, y) = x ⊕ y, which is like x , y),

• f7 is conjunction composed with complement (i.e., f7(x, y) = ¬(x ∧ y)),

• f8 is conjunction (i.e., f8(x, y) = x ∧ y),

• f9 is equivalence (i.e., f5(x, y) = ¬(x ⊕ y), which is like x = y),

• f10 is identity (i.e., f10(x, y) = y, ignoring x),

• f11 is implication (i.e., f11(x, y) = y =⇒ x),

• f12 is identity (i.e., f12(x, y) = x, ignoring y),

• f13 is implication (i.e., f13(x, y) = x =⇒ y),

• f14 is disjunction (i.e., f14(x, y) = x ∨ y), and

• f15 is the constant 1 function (i.e., f15(x, y) = 1, ignoring x and y).

git # ba293a0e @ 2019-11-14 32



where the output of g is described by

g(x) 7→ g0(x) ‖ g1(x) ‖ . . . ‖ gm−1(x).

That is, the output of g is just the m individual 1-bit outputs gi(x) concatenated together. This is often termed a vectorialBoolean function: the inputs and outputs are vectors (or sequences) over the set B rather than single elements of it.

Definition 1.36. A Boolean-valued function (or predicate function)

f : X→ {0, 1}

is a function whose output is a Boolean value: note the contrast with a Boolean function, in so far as it places no restrictionon what the input (i.e., the set X) must be.

Example 1.29. Consider a 2-input, 1-output Boolean function, whose signature we can write as

f : B2→ B

st. for r, x, y ∈ B, the input is a pair 〈x, y〉 and the output for a given x and y is written r = f (x, y). The functionitself can be specified in two ways. First, as previously, we could enumerate all possible input combinations,and specify corresponding outputs. This can be written equivalently in the form of an inline function behaviour,or as a truth table:

f (x, y) 7→


≡

x y f (x, y)0 0 00 1 11 0 11 1 0

However, with a large number of inputs, this becomes difficult. As a short-hand, we can therefore specify f asa Boolean expression instead, e.g.,

f : 〈x, y〉 7→ (¬x ∧ y) ∨ (x ∧ ¬y).

This basically tells us how to compute outputs, rather than listing those outputs explicitly.

Example 1.30. Consider a 2-input, 2-output Boolean function

h :

B2→ B2

〈x, y〉 7→

〈0, 0〉 if x = 0 and y = 0〈1, 0〉 if x = 0 and y = 1〈1, 0〉 if x = 1 and y = 0〈0, 1〉 if x = 1 and y = 1

which we might write more compactly as the truth table

x y h(x, y)0 0 〈0, 0〉0 1 〈1, 0〉1 0 〈1, 0〉1 1 〈0, 1〉

Clearly we can decompose h into

h0 :

B2→ B

〈x, y〉 7→


and

h1 :

B2→ B

〈x, y〉 7→


meaning thath(x, y) ≡ h0(x, y) ‖ h1(x, y).

git # ba293a0e @ 2019-11-14 33



1.6.3 Normal (or standard) forms

Definition 1.37. Consider a Boolean expression:

1. When the expression is written as a sum (i.e., OR) of terms which each comprise the product (i.e., AND) of variables,e.g.,

(a ∧ b ∧ c)︸︷︷︸minterm

∨(d ∧ e ∧ f ),

it is said to be in disjunctive normal form or Sum of Products (SoP) form; the terms are called the minterms.Note that each variable can exist as-is or complemented using NOT, meaning

(¬a ∧ b ∧ c)︸︷︷︸minterm

∨(d ∧ ¬e ∧ f ),

is also a valid SoP expression.

2. When the expression is written as a product (i.e., AND) of terms which each comprise the sum (i.e., OR) of variables,e.g.,

(a ∨ b ∨ c)︸︷︷︸maxterm

∧(d ∨ e ∨ f ),

it is said to be in conjunctive normal form or Product of Sums (PoS) form; the terms are called the maxterms.As above each variable can exist as-is or complemented using NOT.

Example 1.31. Consider a 1-input, 1-output Boolean function

g :

B → B

x 7→


Writing this as a truth table, i.e.,x y g(x, y)0 0 00 1 11 0 11 1 0

the minterms are the second and third rows, while the maxterms are the first and fourth lines. An expressionfor g in SoP form is

gSoP(x, y) = (¬x ∧ y) ∨ (x ∧ ¬y).

where terms ¬x∧ y and x∧¬y represent minterms of g: when the term is 1 or 0 the corresponding output is 1 or0. It is usually crucial that all the variables appear in all the minterms so that the function is exactly described.To see why this is so, consider writing an incorrect SoP expression by removing the reference to y from the firstminterm so as to get

(¬x) ∨ (x ∧ ¬y).

Now ¬x is 1 for the first and second rows, rather than the second (as was the case with ¬x ∧ y), so we havedescribed another function h , g described by

x y h(x, y)0 0 10 1 11 0 11 1 0

In a similar way, we can construct a PoS expression for g as

gPoS(x, y) = (x ∨ y) ∧ (¬x ∨ ¬y).

git # ba293a0e @ 2019-11-14 34



where x ∨ y and ¬x ∨ ¬y are the maxterms of g. By manipulating the expressions, we can prove that gSoP andgPoS are just two different ways to write the same function, i.e., g. Recall that for p and q

(p ∧ ¬q) = (p ∧ ¬q) ∨ 0 (identity)= (p ∧ ¬q) ∨ (p ∧ ¬p) (inverse)= p ∧ (¬p ∨ ¬q) (distribution)

Using this rule, we can show

gSoP(x, y) = (¬x ∧ y) ∨ (x ∧ ¬y)= (y ∧ ¬x) ∨ (x ∧ ¬y) (commutativity)= (x ∧ (¬x ∨ ¬y)) ∨ (y ∧ (¬x ∨ ¬y)) (p ∧ ¬q rule above)= (x ∨ y) ∧ (¬x ∨ ¬y) (distribution)= gPoS

1.7 Signals

Definition 1.38. In general, a signal can be described as a descriptive function (abstractly), or a physical quantity(concretely), that varies in time or space so as to represent and/or communicate (i.e., convey) information. We say that

• a discrete-time signal is valid for a discrete (so finite) range of time indices, e.g., t ∈ Z,

• a continuous-time signal is valid for a continuous (so infinite) range of time indices, e.g., t ∈ R,

• a discrete-value signal has a value from a discrete (so finite) range, e.g., f (t) ∈ Z, and

• a continuous-value signal has a value from a continuous (so infinite) range, e.g., f (t) ∈ R.

Definition 1.39. The term analogue signal is a synonym of continuous-value signal: a physical quantity that varies intime is typically used to represent (that is, it is analagous to) some abstract variable.

Definition 1.40. Strictly speaking, digital signal is a synonym of discrete-value signal: it will have a digital (i.e.,discrete or exact) value. This terminology is often overloaded, however, and taken to mean a signal whose value is either0 or 1 (cf. logic signal).

The transition of a digital signal from 0 to 1 (resp. 1 to 0) is called a positive (resp. negative) edge; we often say ithas toggled from 0 to 1 (resp. 1 to 0). During any time the signal has a value of 1 (resp. 0), we say it is at a positive(resp. negative) level (and use the term pulse as a synonym for positive level, i.e., the period between a positive andnegative edge).

Definition 1.41. It is common to describe a signal by plotting it as a waveform: the y-axis represents the value of thesignal as time varies over time as represented by the x-axis.

Note that it is common, though incorrect, to describe discrete-time signals by using a continuous plot; connectingdiscrete points implies a formally incorrect description (i.e., it gives the impression of a continuous-timesignal). Doing so typically stems from either a) the fact said discrete-time signal is derived from an associatedcontinuous-time signal (e.g., the latter has been quantised wrt. time, by sampling it at discrete time indices),or b) aesthetics, in the sense it is easier to see when printed.

1.8 Representations

God made the integers; all the rest is the work of man.

– Kronecker

1.8.1 Bits, bytes and words

Definition 1.42. Used as a term by Claude Shannon in 1948 [6] (but attributed to John Tukey), a bit is a binary digit.As a result, a given bit is a member of the set ={0, 1}; it can be used to represent a truth value, i.e., false or true, andhence a Boolean value within the context of Boolean algebra.

Definition 1.43. An n-bit bit-sequence (or binary sequence) is a member of the set Bn, i.e., it is an n-tuple of bits.Much like other sequences, we use Xi to denote the i-th bit of a binary sequence X and |X| = n to denote the number ofbits in X.

git # ba293a0e @ 2019-11-14 35



Definition 1.44. Instead of writing out X ∈ Bn symbolically, i.e., writing 〈X0,X1, . . . ,Xn−1〉, we sometimes prefer to listthe bits within a bit-literal (or bit-string, wrt. an implicit alphabet Σ = {0, 1}). For example, consider the followingbit-sequence

X = 〈1, 1, 0, 1, 1, 1, 1〉

st. |X| = 7, which can be written as the bit-literal

X = 1111011.

The question is however, what does a bit-sequence mean: what does it represent, other than just an (unstruc-tured) sequence of bits? The answer is they can represent anything we decide they do; there is just one keyconcept, namely

X 7→ X

the

repr

esen

tati

onof

X

map

sto

the

valu

eof

XThat is, all we need is a) a representation and mapping specified concretely (i.e., written down, vs. reasonedabout abstractly), and b) a mapping that means the right thing wrt. values, plus is ideally consistent in bothdirections (e.g., does not change based on the context, and is injective st. a single representation cannot beinterpreted ambiguously). Notice the (subtle) annotation on the left-hand side of the mapping: X is intended tohighlight this is a representation of some X, whose value therefore depends on the mapping used. Put anotherway, this suggests different mappings may legitimately map the same X to different values by interpreting thebit-sequence differently. This is, essentially, what means we can represent such a rich set of data (e.g., the pixelsin an image) using only a bit (or sequence thereof) as a starting point.

1.8.1.1 Properties

Definition 1.45. Following the idea of vectorial Boolean function, given an n-element bit-sequence X, and an m-elementbit-sequence Y we can clearly

1. overload � ∈ {¬}, i.e., writeR = �X,

to meanRi = �Xi

for 0 ≤ i < n,

2. overload ∈ {∧,∨,⊕}, i.e., writeR = X Y,

to meanRi = Xi Yi

for 0 ≤ i < n = m, where if n , m, we pad either X or Y with 0 until the n = m.

Definition 1.46. Given two n-bit sequences X and Y, we can define some important properties named after RichardHamming, a researcher at Bell Labs:

• The Hamming weight of X is the number of bits in X that are equal to 1, i.e., the number of times Xi = 1. Thiscan be expressed as

H(X) =

n−1∑i=0

Xi.

• The Hamming distance between X and Y is the number of bits in X that differ from the corresponding bit in Y,i.e., the number of times Xi , Yi. This can be expressed as

D(X,Y) =

n−1∑i=0

Xi ⊕ Yi.

git # ba293a0e @ 2019-11-14 36



An aside: the origins and impact of endianness.

The term endianness stems from a technical article [2], written in the 1980s by Danny Cohen, using Gulliver’sTravels as an inspiration/analogy: an argument over whether cracking the big- or small-end of a soft-boiledegg is proper in the former, inspired terminology wrt. arguments over byte ordering in the latter. It does abrilliant job of surveying the significant impact of what is, at face value, a fairly trivial choice.

Note that both quantities naturally generalise to non-binary sequences.

Example 1.32. For example, given A = 〈1, 0, 0, 1〉 and B = 〈0, 1, 1, 1〉we find that

H(A) =

n−1∑i=0

Ai = 1 + 0 + 0 + 1 = 2

and

D(A,B) =

n−1∑i=0

Ai ⊕ Bi = (1 ⊕ 0) + (0 ⊕ 1) + (0 ⊕ 1) + (1 ⊕ 1) = 1 + 1 + 1 + 0 = 3

st. two bits in A equal 1, and three bits differ between A and B.

1.8.1.2 Ordering

There is, by design, no “structure” to a bit-literal. This can be problematic if, for example, we need a wayto make sure the order of bits in the bit-literal is clear wrt. the corresponding bit-sequence. The same issuesappear whenever describing a large(r) quantity in terms of small(er) parts, but, focusing on bits, we can describeendianness as follows:

Definition 1.47. A given literal, say

X = 1111011,

can be interpreted in two ways:

1. A little-endian ordering is where we read bits in a literal from right-to-left, i.e.,

XLE = 〈X0,X1,X2,X3,X4,X5,X6〉 = 〈1, 1, 0, 1, 1, 1, 1〉,

where

• the Least-Significant Bit (LSB) is the right-most in the literal (i.e., X0), and

• the Most-Significant Bit (MSB) is the left-most in the literal (i.e., Xn−1 = X6).

2. A big-endian ordering is where we read bits in a literal from left-to-right, i.e.,

XBE = 〈X6,X5,X4,X3,X2,X1,X0〉 = 〈1, 1, 1, 1, 0, 1, 1〉,

where

• the Least-Significant Bit (LSB) is the left-most in the literal (i.e., Xn−1 = X6), and

• the Most-Significant Bit (MSB) is the right-most in the literal (i.e., X0).

Unless specified, from here on it is (fairly) safe to assume that a little-endian convention is used. Keep in mindthat having selected an endianness convention, which acts as a rule for conversion, there is no real distinctionbetween a bit-sequence and a bit-literal: we can convert between them in either little-endian or bit-endiancases.

git # ba293a0e @ 2019-11-14 37



1.8.1.3 Grouping

Definition 1.48. Some bit-sequences are given special names depending on their length. Given a word size w (e.g., thenatural size as dictated by a given processor), we can defined

bit ≡ 1-bitnybble ≡ 4-bit

byte ≡ 8-bit

half-word ≡ (w/2)-bitword ≡ w-bit

double-word ≡ (w · 2)-bitquad-word ≡ (w · 4)-bit

but note that standards in particular often use the term octet as a synonym for byte (st. an octet string is therefore abyte-sequence): although less natural, we follow this terminology where it seems of value to match associated literature.

Example 1.33. Given a bit-sequence

B = 〈1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0〉

it can be attractive to group the bits into short(er) sub-sequences. For example, we could rewrite the sequenceas either

C = 〈〈1, 1, 0, 0〉, 〈0, 0, 0, 0〉, 〈1, 0, 0, 0〉, 〈1, 0, 1, 0〉〉= 〈1, 1, 0, 0〉 ‖ 〈0, 0, 0, 0〉 ‖ 〈1, 0, 0, 0〉 ‖ 〈1, 0, 1, 0〉= 〈1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0〉= B

D = 〈〈1, 1, 0, 0, 0, 0, 0, 0〉, 〈1, 0, 0, 0, 1, 0, 1, 0〉〉= 〈1, 1, 0, 0, 0, 0, 0, 0〉 ‖ 〈1, 0, 0, 0, 1, 0, 1, 0〉= 〈1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0〉= B

st. C has four elements (each of which is a sub-sequence of four bits from B), while D has two elements (each ofwhich is a sub-sequence of eight bits from B). It is important to see that we have not altered the bits themselves,just how they are grouped together: we can easily “flatten out” the sub-sequences and reconstruct the originalsequence B.

Example 1.34. Consider four nibbles in C, i.e., the four 4-bit sub-sequences

C0 = 〈1, 1, 0, 0〉C2 = 〈0, 0, 0, 0〉C3 = 〈1, 0, 0, 0〉C4 = 〈1, 0, 1, 0〉

If we want to reconstruct C itself, we need to know which order to put the sub-sequences in: via a little-endianconvention we get

CLE = 〈C0,C1,C2,C3〉 = 〈〈1, 1, 0, 0〉, 〈0, 0, 0, 0〉, 〈1, 0, 0, 0〉, 〈1, 0, 1, 0〉〉

whereas via a big-endian convention we get

CBE = 〈C3,C2,C1,C0〉 = 〈〈1, 0, 1, 0〉, 〈1, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈1, 1, 0, 0〉〉.

1.8.1.4 Units

There is a standard notation for measuring multiplicities of bits and bytes: a suffix specifies the quantity (‘b’ or“bit” for bits, ‘B’ for bytes), and a prefix specifies a multiplier. Although the notation remains consistent, someambiguities about how to interpret prefixes complicate matters.

The International System of Units (SI) works with decimal, base-10 prefixes so, for example, a kilobit means103 = 1000 bits. As a result, we find that

git # ba293a0e @ 2019-11-14 38



An aside: the shift-and-mask paradigm, part #1.

Given some w-bit word, the shift-and-mask paradigm allows us to extract (or isolate) individual or contiguoussequences of bits. Understanding this is crucial in many areas, and often used in lower-level C programs; this,and related techniques, it is often termed “bit twiddling” or “bit bashing”.

• Imagine we want to set the i-th bit of some x, i.e., xi, to 1. This can be achieved by computing

x ∨ (1� i)

For example, if x = 0011(2) and i = 2 then we compute

x ∨ ( 0001(2) � i )0011(2) ∨ ( 0001(2) � 2 )0011(2) ∨ 0100(2)0111(2)

meaning initially x2 = 0, then we changed it so x2 = 1.

• Imagine we want to set the i-th bit of some x, i.e., xi, to 0. This can be achieved by computing

x ∧ ¬(1� i)

For example, if x = 0111(2) and m = 2 then we compute

x ∧ ¬ ( 0001(2) � i )0111(2) ∧ ¬ ( 0001(2) � 2 )0111(2) ∧ ¬ ( 0100(2) )0111(2) ∧ 1011(2)0011(2)

meaning initially x2 = 1, then we changed it so x2 = 0.

In both cases, the idea is to first create an appropriate mask then combine it with x to get x′; in both cases wedo no actual arithmetic, only Boolean-style operations.

git # ba293a0e @ 2019-11-14 39




Imagine we want to extract an m-bit sub-word (i.e., m contiguous bits) starting at the i-th bit of some x. Thiscan be achieved by computing

(x� i) ∧ ((1� m) − 1)

The computation is a little more complicated, but basically the same principles apply: first we create anappropriate mask (the right-hand term) and combine it with x (the left-hand term). For example, if x = 1011(2)and m = 2:

• If i = 0 then we want to extract the sub-word 〈x1, x0〉

( x � i ) ∧ ( ( 1 � m ) − 1 )( 1011(2) � 0 ) ∧ ( ( � 2 ) − 1 )( 1011(2) ) ∧ ( ( 0100(2) ) − 1 )( 1011(2) ) ∧ ( 0011(2) )

0011(2)

meaning 〈x1, x0〉 = 〈1, 1〉 as expected.


( x � i ) ∧ ( ( 1 � m ) − 1 )( 1011(2) � 1 ) ∧ ( ( 1 � 2 ) − 1 )( 0101(2) ) ∧ ( ( 0100(2) ) − 1 )( 0101(2) ) ∧ ( 0011(2) )

0001(2)



( x � i ) ∧ ( ( 1 � m ) − 1 )( 1011(2) � 2 ) ∧ ( ( 1 � 2 ) − 1 )( 0010(2) ) ∧ ( ( 0100(2) ) − 1 )( 0010(2) ) ∧ ( 0011(2) )

0010(2)


Notice that the (0001(2) � m) − 1 term is basically giving us a way to create a value y where ym−1...0 = 1, i.e.,whose 0-th through to (m − 1)-th bits are 1. If we know m ahead of time, we can clearly simplify this byproviding y directly rather than computing it.

git # ba293a0e @ 2019-11-14 40




As a special case of extracting an m-element sub-sequence, when we set m = 1 we extract the i-th bit of x alone.This is a useful and common operation: following the above, it is achieved by computing

(x� i) ∧ 1,

i.e., replacing the general-purpose mask with the special-purpose constant (1� 1)−1 = 2−1 = 1. For example:

• If x = 0011(2) and i = 2 then we compute

( x � i ) ∧ 1( 0011(2) � 2 ) ∧ 1( 0000(2) ) ∧ 1

0000(2)

meaning x2 = 0.

• If x = 0011(2) and i = 0 then we compute

( x � i ) ∧ 1( 0011(2) � 0 ) ∧ 1( 0011(2) ) ∧ 1

0001(2)

meaning x0 = 1.

1 kilobit (kbit) = 103 bits = 1000 bits1 megabit (Mbit) = 106 bits = 1 000 000 bits1 gigabit (Gbit) = 109 bits = 1 000 000 000 bits1 terabit (Tbit) = 1012 bits = 1 000 000 000 000 bits

1 kilobyte (kB) = 103 bytes = 1000 bytes1 megabyte (MB) = 106 bytes = 1 000 000 bytes1 gigabyte (GB) = 109 bytes = 1 000 000 000 bytes1 terabyte (TB) = 1012 bytes = 1 000 000 000 000 bytes

However, in the context of Computer Science the same English prefixes are commonly (ab)used to specifya binary, base-2 multiplier. For example, kilo will be read to means 210 = 1024 ' 1000: RAM or hard diskcapacity is commonly measured in this way, for example. To eliminate resulting ambiguity, the InternationalElectrotechnical Commission (IEC) added some more SI prefixes; the result is that

1 kibibit (Kibit) = 210 bits = 1024 bits1 mebibit (Mibit) = 220 bits = 1 048 576 bits1 gigibit (Gibit) = 230 bits = 1 073 741 824 bits1 tebibit (Tibit) = 240 bits = 1 099 511 627 776 bits

1 kibibyte (KiB) = 210 bytes = 1024 bytes1 mebibyte (MiB) = 220 bytes = 1 048 576 bytes1 gigibyte (GiB) = 230 bytes = 1 073 741 824 bytes1 tebibyte (TiB) = 240 bytes = 1 099 511 627 776 bytes

The question is, which should we use? Does it really matter? Clearly, yes: if we buy a hard disk which says itholds 1 terabyte of data, we hope they are talking, in traditional terms, about a tebibyte, i.e., 1, 099, 511, 627, 776bytes rather than 1, 000, 000, 000, 000 bytes, because then we get more storage capacity! In the same way,imagine we are comparing two hard disks: we need to make sure their quoted storage capacity use the sameunits, or the comparison will be unfair.

From here on, we try to make consistent use of the new SI prefixes: when we say kilobyte or kB we mean103 bytes, and when we say kibibyte or KiB we mean 210 bytes. On one hand, this might not be popular froma historical point of view; on the other hand, it should mean we clear and consistent.

git # ba293a0e @ 2019-11-14 41



1.8.2 Positional number systems

As humans, and because (mostly) we have ten fingers and toes, we are used to working with numbers writtendown using digits from the set {0, 1, . . . , 9}. Imagine we write down such a number, say 123. It may not be ascommon, but hopefully you can believe this is roughly the same as writing the sequence

A = 〈A0,A1,A2〉

= 〈3, 2, 1〉

given that 3 is the first digit of 123, 2 is the second digit and so on; we are reading digits in the sequence fromleft-to-right vs. right-to-left in the literal, but otherwise they capture the same meaning.

But how do we know what either 123 or A means? Informally at least, writing 123 intuitively means thevalue “one hundred and twenty three” which might be rephrased as “one hundred, two tens and three units”.The latter case suggests how to add formalism to this intuition: we are just weighting each digit 1, 2 and 3 bysome amount then adding everything up. For example, per the above we are computing the value via

A 7→ 123 = 1 · 100 + 2 · 10 + 3 · 1.

We could also write the same thing as

A 7→ 123 = 1 · 102 + 2 · 101 + 3 · 100

given 100 = 1 and 101 = 10, or more formally still as

A 7→ 123 =

|A|−1∑i=0

Ai · 10i.

meaning we add up the terms

A0 · 100 = 3 · 100 = 3 · 1 = 3A1 · 101 = 2 · 101 = 2 · 10 = 20A2 · 102 = 1 · 102 = 1 · 100 = 100

to make a total of 123 as expected. Put another way, the sequence A represents the value “one hundred andtwenty three”. Two two facts start to emerge, namely

1. each digit is being weighted by a power of some base (or radix), which in this case is 10, and

2. the exponent in said weight is related to the position of the corresponding digit: the i-th digit is weightedby 10i.

A neat outcome of identifying the base as some sort of parameter is that we can consider choices other thanb = 10. Generalising the example somewhat provides the following definition:

Definition 1.49. A base-b (or radix-b) positional number system uses digits from a digit set X = {0, 1, . . . , b − 1}.A number x is represented using n digits in total, m of which form the fractional part, i.e.,

x = 〈x0, x1, . . . , xn−1〉

7→ ±

n−m−1∑i=−m

xi · bi

where xi ∈ X; we term x the base-b expansion of X.

Definition 1.50. The following common choices correspond to

b = 2 { binaryb = 8 { octalb = 10 { decimalb = 16 { hexadecimal

numbers.

git # ba293a0e @ 2019-11-14 42



Example 1.35. Reconsider the example above: imagine we select b = 2, then make a claim that “one hundredand twenty three” is represented by

B = 〈B0,B1,B2,B3,B4,B5,B6,B7〉

= 〈1, 1, 0, 1, 1, 1, 1, 0〉

where, per the definition, now the digit set used is st. Bi ∈ {0, 1} for 0 ≤ i < 8 (and thus implicitly setting n = 8and m = 0). The value represented by B is given using exactly the same approach, i.e.,

|B|−1∑i=0

Bi · 2i,

noting that where we previously had 10 we now have 2, st. we add up the terms

B0 · 20 = 1 · 20 = 1 · 1 = 1B1 · 21 = 1 · 21 = 1 · 2 = 2B2 · 22 = 0 · 22 = 0 · 4 = 0B3 · 23 = 1 · 23 = 1 · 8 = 8B4 · 24 = 1 · 24 = 1 · 16 = 16B5 · 25 = 1 · 25 = 1 · 32 = 32B6 · 26 = 1 · 26 = 1 · 64 = 64B7 · 27 = 0 · 27 = 0 · 128 = 0

to obtain a total of 123 as before.

1.8.2.1 Digits

Describing elements in the digit set {0, 1, . . . , b − 1}, for whatever b, using a single digit can be fairly important;using multiple digits, for example, can start to introduce some ambiguity wrt. how we interpret a literal. Inparticular, once we select a b > 10 we hit a problem: we run out of single Roman-style digits that we can writedown.

Example 1.36. Consider the same example as above where we have the literal 123: we know that if b = 10 andA = 〈3, 2, 1〉 then

A 7→ 123 = 1 · 102 + 2 · 101 + 3 · 100.

However, if b = 16, although we know123 = 7 · 161 + 11 · 160

we have no single-digit way to write 11. To solve this problem, we use the symbols (or in fact characters) A . . . Fto represent 10 . . . 15. Otherwise everything works the same way, meaning for example, that if B = 〈B, 7〉 then

B 7→ 123 = 7 · 161 + B · 160

= 7 · 161 + 11 · 160

1.8.2.2 Notation

Amazingly there are not many jokes about Computer Science, but here are two (bad, comically speaking)examples:

1. There are only 10 types of people in the world: those who understand binary, and those who do not.

2. Why did the Computer Scientist always confuse Halloween and Christmas? Because 31 Oct equals 25Dec.

Whether or not you laughed at them, both jokes stem from ambiguity in the representation of numbers: thereis an ambiguity between “ten” written in decimal and binary in the former, and “twenty five” written in octaland decimal in the latter.

Look at the first joke: it is basically saying that the literal 10 can be interpreted as binary or decimal, i.e., as1 · 2 + 0 · 1 = 2 in binary and 1 · 10 + 0 · 1 = 10 in decimal. So the two types of people are those who understandthat 2 can be represented by 10, and those that do not. Now look at the second joke: this is a play on words inthat “Oct” can mean “October” but also “octal” or base-8 and “Dec” can mean “December” but also “decimal”or base-10. With this in mind, we see that

3 · 8 + 1 · 1 = 25 = 2 · 10 + 5 · 1.

git # ba293a0e @ 2019-11-14 43



An aside: octal and hexadecimal as a short-hand for binary.

It is useful to remember is that octal and hexadecimal can be viewed as just a short-hand for binary: each octalor hexadecimal digit represents exactly three or four binary digits respectively. This can make it much easier towrite and remember long sequences of binary digits. As an example, consider hexadecimal. Each hexadecimaldigit xi ∈ {0, 1, . . . , 15} can be represented using four bits (since there are 24 = 16 possible combinations), so canbe viewed instead as those four binary digits.

Using a concrete example, the following translation steps

2223 = 1 · 1 + 1 · 2 + 1 · 4 + 1 · 8 + 0 · 16 + 1 · 32 +0 · 64 + 1 · 128 + 0 · 256 + 0 · 512 + 0 · 1024 + 1 · 2048

= 1 · 20 + 1 · 21 + 1 · 22 + 1 · 23 + 0 · 24 + 1 · 25 +0 · 26 + 1 · 27 + 0 · 28 + 0 · 29 + 0 · 210 + 1 · 211

= 〈1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1〉(2)= 〈〈1, 1, 1, 1〉(2), 〈0, 1, 0, 1〉(2), 〈0, 0, 0, 1〉(2)〉(16)= 〈15(10), 10(10), 8(10)〉(16)= 〈F(16),A(16), 8(16)〉(16)= 〈F,A, 8〉(16)= 15 · 160 + 10 · 161 + 8 · 162

= 15 · 1 + 10 · 16 + 8 · 256= 2223

are clearly valid.In C you can in fact write decimal literals (which is the default), and hexadecimal literals using the prefix

0x. However, beware of literals starting with 0: this will be interpreted as octal! For example, 012 has the samevalue as 10 because

012 7→ 1 · 81 + 2 · 80

= 10(10)= 1 · 101 + 0 · 100

7→ 10

You cannot directly express binary literals in C, although doing so is possible in other languages (e.g., Python,via a 0b prefix).

git # ba293a0e @ 2019-11-14 44



reversed copy, non-contiguous number line

1111

1111

1000

0010

1000

0001

1000

0000

0000

0000

0000

0001

0000

0010

0111

1111

1000

0000

1000

0001

1111

1110

1111

1111

−12

7 (10

)

−2 (

10)

−1 (

10)

−0 (

10)

+0 (

10)

+1 (

10)

+2 (

10)

+12

7 (10

)+

128 (

10)

+12

9 (10

)

+25

4 (10

)+

255 (

10)

(a) A number line for sign-magnitude representation.

direct copy, non-contiguous number line

1000

0000

1111

1101

1111

1110

1111

1111

0000

0000

0000

0001

0000

0010

0111

1111

1000

0000

1000

0001

1111

1110

1111

1111

−12

7 (10

)

−2 (

10)

−1 (

10)

−0 (

10)

+0 (

10)

+1 (

10)

+2 (

10)

+12

7 (10

)+

128 (

10)

+12

9 (10

)

+25

4 (10

)+

255 (

10)

(b) A number line for one’s-complement representation.

direct copy, contiguous number line

1000

0000

1000

0001

1111

1110

1111

1111

0000

0000

0000

0001

0000

0010

0111

1111

1000

0000

1000

0001

1111

1110

1111

1111

−12

8 (10

)−

127 (

10)

−2 (

10)

−1 (

10)

±0 (

10)

+1 (

10)

+2 (

10)

+12

7 (10

)+

128 (

10)

+12

9 (10

)

+25

4 (10

)+

255 (

10)

(c) A number line for two’s-complement representation.

Figure 1.3: Number lines illustrating the mapping of 8-bit sequences to integer values using three different representations.

git # ba293a0e @ 2019-11-14 45



i.e., 31 Oct equals 25 Dec in the sense that 31 in base-8 equals 25 in base-10.Put in context, we saw above that the decimal sequence A and decimal number 123 are basically the same

iff. we interpret A in the right way. The if in that statement is a problem, in the sense there is ambiguity: if wefollow the same reasoning as in the jokes, how do we know what base the literal 01111011 is written down in? Itcould mean the decimal number 123 (i.e., “one hundred and twenty three”) if we interpret it using b = 2, or thedecimal number 01111011 (i.e., “one million, one hundred and eleven thousand and eleven”) if we interpret itusing b = 10; clearly that is quite a difference!

To clear up this ambiguity, where necessary we write literal numbers and representations with the baseappended to them. For example, we write 123(10) to show that 123 should be interpreted in base-10, or01111011(2) to shows that 01111011 should be interpreted in base-2. We can now be clear, for example, that123(10) = 01111011(2); using this notation, the two jokes become even less amusing when written simply as10(2) = 2(10) and 31(8) = 25(10).

Example 1.37. Consider a case where m , 0, which allows negative values of i and therefore negative powersof the base: whereas m = 0 implies no fraction part to the resulting value, because 10−1 = 1/10 = 0.1 and10−2 = 1/100 = 0.01, for example, when m , 0 we can write down numbers which do have fractional parts.Consider that

123.3125(10) = 1 · 102 + 2 · 101 + 3 · 100 + 3 · 10−1 + 1 · 10−2 + 2 · 10−3 + 5 · 10−4

given we have n = 7 digits, m = 4 of which capture the fractional part. Of course since the definition is thesame, we can do the same thing using a different base, e.g.,

123.3125(10) = 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 + 0 · 2−1 + 1 · 2−2 + 0 · 2−3 + 1 · 2−4

= 1111011.0101(2).

The decimal point in the former has the same meaning (i.e., as a separator between fractional and non-fractionalparts) when translated into a binary point in the latter; more generally we call this a fractional point wherethe base is irrelevant.

Example 1.38. We mentioned previously that certain numbers are irrational: the definition of Q suggested that,in such cases, we could not find an x and y such that x/y provided the result required.

In fact, the base such numbers are represented in has some impact on their (ir)rationality. Informally, wealready know that when we write 1/3 as a decimal number we have 0.3333 . . .; the ellipsis mean the sequencerecurs infinitely. 1/10, however, is rational when written as 0.1 in decimal, but irrational when written inbinary; the closest approximation is 0.000110011 . . ..

1.8.3 Representing integer numbers, i.e., members of Z

The positional number systems explored above afford a flexible way to represent numbers in theory, e.g., ifwe just want to write down some examples. However, some (implicit) challenges exist when using them inpractice. Considering the set of integers Z, for example, we have no way to cater for a) the infinite size of thisset, given the finite concrete resources available to us (which bound the number of digits we can have in agiven base-b expansion), and b) the fact members of the set can be positive or negative. In even more concreteterms, consider some integer data types available in C:

unsigned char 7→ Zunsigned char = { 0, . . . , +28− 1 }

unsigned int 7→ Zunsigned int = { 0, . . . , +232− 1 }

char 7→ Zchar = { −27, . . . , 0, . . . , +27− 1 }

int 7→ Zint = { −231, . . . , 0, . . . , +231− 1 }

This is meant to illustrate, for example, that the int data type, which one might say as “an integer”, is in fact anapproximation of the integers (i.e., ofZ): the range of values is finite. That said, however, why use this particularapproximation?

We can answer this question by investigating concrete representations used in C, basing our discussion onpositional number systems via use of bit-sequences (of fixed length n) to encode members ofZ. Note that whereappropriate, we use colour to highlight parts of each representation that determine the sign and magnitude(or size) of the associated value; since we are representing integers, we implicitly set m = 0 within the generaldefinition of a positional number system (since there is, by definition, no fractional part in an integer).

git # ba293a0e @ 2019-11-14 46



An aside: the actual range of C integer data types.

In describing the C data type int as implying an associated set (or range)

Zint = {−231, . . . , 0, . . . + 231− 1}

of values, we simplified what is, in reality, a somewhat complicated issue. In short, the C language specifies theabove much more abstractly; the C compiler and platform (i.e., processor) make the details concrete, allowing usto reason as we did above.

It is worth looking at this issue in more detail: on one hand it is not often covered elsewhere, but, onthe other hand, will help avoid making assumptions that may be (subtly, and infrequently) incorrect. Otherdescriptions exist, but we follow that in [9] due to the clarity of presentation. Considering integer data typesonly, i.e., for each type

T ∈ {char, short, int, long, long long},

the C language defines two abstract properties:

1. the signed’ness of a type T is denoted

S(T) =

{0 if T is unsigned1 if T is signed

and allows us to destinguish between unsigned int and int, for example, and

2. the rank of a type T, denoted R(T), is an abstract measure of size (and hence range); rather than a numericalvalue, types are simply ordered st.

R(char) < R(short) < R(int) < R(long) < R(long long).

The platform provides concrete detail, in particular assigning a width (or size) of

W(T) ∈ {1, 2, 4, 8}

bytes to each type; this is termed the data model. Based on use of two’s-complement, we can derive the rangeof each type as

I(T) =

{{0, . . . ,+28·W(T)

− 1} if S(T) = 0, st. T is unsigned{−28·W(T)−1, . . . , 0, . . . ,+28·W(T)−1

− 1} if S(T) = 1, st. T is signed

which matches our own definitions. Although the platform can select W(T) for each T, a crucial restrictionapplies: for any types T1 and T2 where R(T1) < R(T2), the property W(T1) ≤ W(T2) must hold. Put anotherway, we can be sure the width of int is less than or equal to that of long, even if those widths are not known; itcannot be the other way around, for example, st. long is wider than int.

So we assumed W(int) = 4 in our description, but this is not the only possibilty. [9, Table 1] surveys variousdata models, noting, for example, that

LP32 ILP32W(char) 1 1W(short) 2 2W(int) 2 4W(long) 4 4W(long long) 8 8

are valid posibilities: if we assume W(int) = 4 in a program compiled and executed on a platform associatedwith the left-hand data model problems may well occur, whereas the right-hand data model matches.

git # ba293a0e @ 2019-11-14 47



1.8.3.1 Unsigned integers

Natural binary expansion

Definition 1.51. An unsigned integer can be represented in n bits by using the natural binary expansion. That is, wehave

x = 〈x0, x1, . . . , xn−1〉

7→

n−1∑i=0

xi · 2i

for xi ∈ {0, 1}, and

0 ≤ x ≤ 2n− 1.

Example 1.39. If n = 8 for example, we can represent values in the range +0 . . . + 255; selected cases are asfollows:

11111111 7→ 1 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +255(10)...

...10000101 7→ 1 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 1 · 22 + 0 · 21 + 1 · 20 = +133(10)

......

10000000 7→ 1 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +128(10)01111111 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +127(10)

......

01111011 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = +123(10)...

...00000001 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = +1(10)00000000 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +0(10)

Binary Coded Decimal (BCD) BCD is an alternative method of representing unsigned integers: rather thanrepresenting the number itself as a bit-sequence, the idea is to write it in decimal and encode each decimal digitindependently. The overall representation is the concatenation of bit-sequences which result from encodingthe decimal digits:

Definition 1.52. Consider the function

f :

{0, 1, . . . , 9} → B4

d 7→

〈0, 0, 0, 0〉 if d = 0〈1, 0, 0, 0〉 if d = 1〈0, 1, 0, 0〉 if d = 2〈1, 1, 0, 0〉 if d = 3〈0, 0, 1, 0〉 if d = 4〈1, 0, 1, 0〉 if d = 5〈0, 1, 1, 0〉 if d = 6〈1, 1, 1, 0〉 if d = 7〈0, 0, 0, 1〉 if d = 8〈1, 0, 0, 1〉 if d = 9

which encodes a decimal digit d into a corresponding 4-bit sequence; this function corresponds to the Simple Binary CodedDecimal (SBCD), or BCD 8421, standard. Given the decimal number

x = 〈x0, x1, . . . , xn−1〉(10),

the BCD representation is

x = 〈 f (x0), f (x1), . . . , f (xn−1)〉.

Example 1.40. If n = 8 for example, we can represent values in the range +0 . . . + 99999999; selected cases are

git # ba293a0e @ 2019-11-14 48



as follows:

10011001100110011001100110011001 7→

⟨〈1, 0, 0, 1〉, 〈1, 0, 0, 1〉, 〈1, 0, 0, 1〉, 〈1, 0, 0, 1〉,〈1, 0, 0, 1〉, 〈1, 0, 0, 1〉, 〈1, 0, 0, 1〉, 〈1, 0, 0, 1〉

⟩7→ 〈9, 9, 9, 9, 9, 9, 9, 9〉(10)= +99999999(10)

...

00000000000000000000000100100011 7→

⟨〈1, 1, 0, 0〉, 〈0, 1, 0, 0〉, 〈1, 0, 0, 0〉, 〈0, 0, 0, 0〉,〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉

⟩7→ 〈3, 2, 1, 0, 0, 0, 0, 0〉(10)= +123(10)

...

00000000000000000000000000000001 7→

⟨〈1, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉,〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉

⟩7→ 〈1, 0, 0, 0, 0, 0, 0, 0〉(10)= +1(10)

00000000000000000000000000000000 7→

⟨〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉,〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉, 〈0, 0, 0, 0〉

⟩7→ 〈0, 0, 0, 0, 0, 0, 0, 0〉(10)= +0(10)

1.8.3.2 Signed integers

Sign-magnitude

Definition 1.53. A signed integer can be represented in n bits by using the sign-magnitude approach; 1 bit is reservedfor the sign (0 means positive, 1 means negative) and n − 1 for the magnitude. That is, we have

x = 〈x0, x1, . . . , xn−1〉

7→ −1xn−1 ·

n−2∑i=0

xi · 2i

for xi ∈ {0, 1}, and−2n−1 + 1 ≤ x ≤ +2n−1

− 1.

Note that there are two representations of zero (i.e., +0 and −0).

Example 1.41. If n = 8, for example, we can represent values in the range −127 . . . + 127; selected cases are asfollows:

01111111 7→ −10· ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 ) = +127(10)

......

01111011 7→ −10· ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 ) = +123(10)

......

00000001 7→ −10· ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 ) = +1(10)

00000000 7→ −10· ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 ) = +0(10)

10000000 7→ −11· ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 ) = −0(10)

10000001 7→ −11· ( 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 ) = −1(10)

......

11111011 7→ −11· ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 ) = −123(10)

......

11111111 7→ −11· ( 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 ) = −127(10)

git # ba293a0e @ 2019-11-14 49



One’s-complement

Definition 1.54. The one’s-complement method represents a signed integer in n bits by assigning the complement ofx (i.e., ¬x) the value −x. That is, given

x = 〈x0, x1, . . . , xn−1〉

7→

n−2∑i=0

xi · 2i

for xi ∈ {0, 1}, then the encoding of ¬x is assumed to represent −x. This means we have

−2n−1− 1 ≤ x ≤ +2n−1

− 1.

Note that there are two representations of zero (i.e., +0 and −0).

Example 1.42. If n = 8 for example, we can represent values in the range −127 . . . + 127; selected cases are asfollows:

01111111 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +127(10)...

...01111011 7→ 0 · 27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = +123(10)

......

00000001 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = +1(10)00000000 7→ 0 · 27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +0(10)11111111 7→ −0(10)11111110 7→ −1(10)

...10000100 7→ −123(10)

...10000000 7→ −127(10)

Two’s-complement

Definition 1.55. A signed integer can be represented in n bits by using the two’s-complement approach. The basicidea is to weight the (n − 1)-th bit using −2n−1 rather than +2n−1, and all other bits as normal. That is, we have

x = 〈x0, x1, . . . , xn−1〉

7→ xn−1 · −2n−1 +n−2∑i=0

xi · 2i

for xi ∈ {0, 1}, and−2n−1

≤ x ≤ +2n−1− 1.

Example 1.43. If n = 8 for example, we can represent values in the range −128 . . . + 127; selected cases are asfollows:

01111111 7→ 0 · −27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = +127(10)...

...01111011 7→ 0 · −27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 0 · 22 + 1 · 21 + 1 · 20 = +123(10)

......

00000001 7→ 0 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = +1(10)00000000 7→ 0 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = +0(10)11111111 7→ 1 · −27 + 1 · 26 + 1 · 25 + 1 · 24 + 1 · 23 + 1 · 22 + 1 · 21 + 1 · 20 = −1(10)

......

10000101 7→ 1 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 1 · 22 + 0 · 21 + 1 · 20 = −123(10)...

...10000000 7→ 1 · −27 + 0 · 26 + 0 · 25 + 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 0 · 20 = −128(10)

git # ba293a0e @ 2019-11-14 50



Given that two’s-complement is the de facto choice for signed integer representation, it warrants some furtherexplanation: it is important to grasp how the representation works.

One approach is to consider Figure 1.3c, which is a a number line of values in two’s-complement repre-sentation. Offset a little to the left, it shows that 0 (bottom) is represented by the literal 00000000 (which is, ofcourse, equivalent to a bit-sequence 〈0, 0, 0, 0, 0, 0, 0, 0〉); reading from the point toward the right, shows unsignedintegers up to 255 could be represented use their natural binary representation. Sometimes you see a numberline wrapped into a circle, to emphasise the fact that the values it captures will wraps-around: when we reach255 (or 11111111), and give we have n = 8 bits here, the next value is 0 (or 00000000) because the representationwraps-around. Toward the left of 0, it starts to be clear that two’s-complement is basically “moving” the upperor right-hand range of what would be 128 to 255: by using a large, negative weight for the (n−1)-th bit it movesthe (positive) range 128 to 255 into the (negative) range −128 to −1. This movement is direct, in the sense theorder of the range is preserved; this contrasts with sign-magnitude, for example, which, per the same idea inFigure 1.3a, reverses the range as it is moved. This difference stems from the fact that two’s-complement fits theconcept of a positional number system naturally, whereas the same cannot be said of sign-magnitude wherethe sign bit is sort of a special case (i.e„ weighted abnormally). However, subtle this point is, it is important.More specifically, the fact that

1. there is one representation of the value zero, and

2. as we step left or right through representations, they remain in-order wrt. the values they represent

means we can apply the same approach to arithmetic using signed integers represented using two’s-complementas with the simpler case of unsigned integers; this is not true of sign-magnitude, for example, arguably makingit less attractive as a result.

Another approach is via an appeal to intuition: if we have x and add −x, i.e., compute x + (−x), then weintuitively expect to produce 0 as a result. The two’s-complement representation satisfies this: we can see fromthe above that

x = 2(10) 7→ 0 0 0 0 0 0 1 0y = −2(10) 7→ 1 1 1 1 1 1 1 0 +c = 1 1 1 1 1 1 1 0 0r = 0(10) 7→ 0 0 0 0 0 0 0 0

meaning that if we ignore the carry-out (which cannot be captured: we have too few bits), we get the resultexpected. As a by-product, this yields a useful fact:

Definition 1.56. The term two’s-complement can be used as a noun (i.e., to describe the representation) or a verb (i.e., todescribe an operation): the latter case defines “taking the two’s-complement of x” to mean negating x and thus computingthe representation of −x. To do so, we compute −x 7→ ¬x + 1.

To see why this is true, first note that for an x represented in two’s-complement, adding x to ¬x produces −1 asa result. For example:

x = 2(10) 7→ 0 0 0 0 0 0 1 0y = ¬2(10) 7→ 1 1 1 1 1 1 0 1 +c = 0 0 0 0 0 0 0 0 0r = −1(10) 7→ 1 1 1 1 1 1 1 1

This should make sense, in that each corresponding i-th bit in x and ¬x will be the opposite of each other: eitherone will be 0 and the other is 1 or vice versa, st. their sum will always be 1. The result is off-by-one, however,in the sense we produce −1 rather than the expected 0. So, if we compute

x = 2(10) 7→ 0 0 0 0 0 0 1 0y = ¬2(10) + 1 7→ 1 1 1 1 1 1 1 0 +c = 1 1 1 1 1 1 1 1 0r = 0(10) 7→ 0 0 0 0 0 0 0 0

instead then we are back to the same example as above: the result is 0, so it −x 7→ ¬x + 1.

1.8.4 Representing real numbers, i.e., members of R

In the above, we considered concrete representations for elements in Z. Each case used positional numbersystems as an underlying idea, but in different ways (so with different properties); each one coped with theinfinite size of Z by approximating it with a bit-sequence of fixed length (i.e., using n bits). Using the samemotivation, namely the observation that C yields the approximations

float 7→ Rfloat ' { −3.40 · 1038, . . . , +3.40 · 1038} ∪ {±∞,NaN}

double 7→ Rdouble ' { −1.79 · 10308, . . . , +1.79 · 10308} ∪ {±∞,NaN}

git # ba293a0e @ 2019-11-14 51



we can apply roughly the same approach to represent R, the set of real numbers. Since we know a positionalnumber system can accommodate numbers with a fractional part (via an m > 0), the fact we can considerrepresentations for R should not be surprising. However, the approach we use doe differs somewhat: it makesenses to ignore the previous notation etc. and start afresh with another underlying idea. That is, we willapproximate some x by taking a base-b integer m (signed or otherwise) and scaling it, i.e., have

x 7→ m · be' x

for some e. Two more concrete representations based on this idea can be described as follows:

1. if e is fixed (i.e., does not vary between different x and hence m) we have a fixed-point representation,whereas

2. if e is not fixed (i.e., can vary between different x and hence m) we have a floating-point representation.

1.8.4.1 Fixed-point

Definition 1.57. The goal of a fixed-point representation is to allow expression of real numbers whose form is

x = m · b−q

or, equivalently,

x = m ·1bq ,

where

• m ∈ Z is the mantissa, and

• q ∈N is the exponent.

Informally, the magnitude of such a number is given by applying a scaling factor to the mantissa. Since the exponent isfixed, this essentially means interpreting m, and hence x, as two components, i.e.,

1. a q-digit fractional component taken from the least-significant digits, and

2. a p-digit integral component taken from the most-significant digits

where n = p + q; we use the notation Qp,q to denote this. Abusing notation a little, we have that

x = 〈x0, x1, . . . , xn−1〉

7→ 〈m0, . . . ,mq−2,mq−1︸︷︷︸q digits

,mn−p, . . . ,mn−2,mn−1︸︷︷︸p digits

〉Qp,q

7→ m · 1bq

7→

n−1∑i=0

mi · bi·

1bq

7→

n−1∑i=0

mi · bi−q

Definition 1.58. There are some important quantities relating to a fixed-point representation Qp,q:

• The resolution is the smallest difference between representable values, i.e., the value 1bq .

• The precision is essentially n, the number of digits in the representation; in a sense this (in combination with theresolution) governs the range of values that can be represented.

Example 1.44. This might seem confusing, but the basic idea is as above. That is, given an integer x, we justshift the fractional point by a fixed amount to determine the associated value. Imagine we set b = 10, n = 7,q = 4 and write the literal

x = 1233125.

git # ba293a0e @ 2019-11-14 52



(a) QS32,0 (b) QS

31,1 (c) QS30,2

(d) QS29,3 (e) QS

28,4 (f) QS27,5

(g) QS26,6 (h) QS

25,7 (i) QS16,16

Figure 1.4: A visualisation of the impact of increasing q, the number of fractional digits, in a fixed-point representation;the result is increased detail within the rendering of a Mandelbrot fractal.

git # ba293a0e @ 2019-11-14 53



Interpreting x in the fixed-point representation specified by n and q means there are q = 4 fractional digits, i.e.,3125, and p = n − q = 7 − 4 = 3 integral digits, i.e., 123. Therefore

x 7→ x ·1bq = 1233125 ·

1104 = 123.3125(10),

meaning we have simply taken x and shifted the fractional point by q = 4 digits. Put yet another way, weagain alter the weight associated with each digit: taken as an integer the i-th digit will be weighted by bi, butinterpreting the same digit as above means weighting it by bi−q.

Example 1.45. There is a neat way to visualise the intuitive effect of adding more precision to (i.e., increasingthe number of fractional digits in) a fixed-point representation. Figure 1.4 includes different renderings of theMandelbrot fractal, named after mathematician Benoît Mandelbrot. Each rendering uses a 32-bit integer, i.e.,n = 32, to specify a fixed-point representation but with different values of q, i.e., different numbers of fractionaldigits. Quite clearly, as we increase q there is more detail. Without expanding on the detail, the fractal isrendered by sampling points on a circle of radius 2 centred at the point (0, 0). With no fractional digits, wecan sample points (x, y) with x, y ∈ Z st. x, y ∈ {−2,−1, 0,+1,+2}; this is restrictive it the sense it allows onlya few points. However, by adding more fractional digits we can sample many more intermediate points, e.g.,(0.5, 0.5) and so on, meaning more detail in the rendering.

Example 1.46. Although the definition is general enough to accommodate any choice of b, it may not besurprising that b = 2 is attractive: this allows us to reuse what we know about representing integers usingbit-sequences, and apply it to representing real numbers using a fixed-point representation.

• We can describe an unsigned fixed-point representation based on an unsigned integer; imagine we selectn = 8 with p = 5 and q = 3, denoted QU

5,3. This means

x = 〈x0, x1, . . . , x7〉

7→ (∑p+q−1

i=0 xi · 2i) · 12q

which produces a value in the range

0 ≤ x ≤ 2p−

12q

or rather 0 ≤ x ≤ 31.875 with a resolution of 0.125. For example

x = 15(10)= 00001111(2)7→ 00001111(QU

5,3)

7→ 0 · 24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−2 + 1 · 2−3

7→ 1.875(10)

• We can describe an signed fixed-point representation based on a two’s-complement signed integer;imagine we select n = 8 with p = 5 and q = 3, denoted QS

5,3. This means

x = 〈x0, x1, . . . , x7〉

7→ (−xp+q−1 · 2p+q−1 +∑p+q−2

i=0 xi · 2i) · 12q

which produces a value in the range

−2p−1≤ x ≤ 2p−1

−12q

or rather −16 ≤ x ≤ 15.875 with a resolution of 0.125. For example

x = 142(10)= 10001111(2)7→ 10001111(QS

5,3)

7→ 1 · −24 + 0 · 23 + 0 · 22 + 0 · 21 + 1 · 20 + 1 · 2−1 + 1 · 2−2 + 1 · 2−3

7→ −14.125(10)

git # ba293a0e @ 2019-11-14 54



022233031

s e m

(a) 32-bit, single-precision format as a bit-sequence.

typedef struct __ieee32_t {uint32_t m : 23, // mantissa

e : 8, // exponents : 1; // sign

} ieee32_t;

(b) 32-bit, single-precision format as a C structure.

051526263

s e m

(c) 64-bit, double-precision format as a bit-sequence.

typedef struct __ieee64_t {uint64_t m : 52, // mantissa

e : 11, // exponents : 1; // sign

} ieee64_t;

(d) 64-bit, double-precision format as a C structure.

Figure 1.5: Single- and double- precision IEEE-754 floating-point formats described graphically as bit-sequences andconcretely as C structures.

1.8.4.2 Floating-point

Definition 1.59. The goal of a floating-point representation is to allow expression of real numbers whose form is

x = −1s·m · be

where

• s ∈ {0, 1} is the sign bit,

• m ∈N is the mantissa, and

• e ∈ Z is the exponent.

Informally, the magnitude of such a number is given by applying a scaling factor to the mantissa; since the exponent canvary, it acts to “float” the fractional point, denoted ◦, into the correct position.

We say that a number of the form

x = −1s· (mn−1 ◦ mn−2 . . . m1 m0︸︷︷︸

n digits

) · be,

is normalised: the fractional point is initially (i.e., before it is moved via the scaling factor) assumed to be after the firstnon-zero digit of the mantissa. Note that n determines the precision.

Definition 1.60. IEEE-754 specifies two floating-point representations (or formats); each format represents a floating-point number as a bit-sequence by concatenating together three components, i.e., the mantissa, the exponent and the signbit. There are two features to keep in mind:

• Imagine x is normalised as above, since b = 2 we know mn−1 = 1 since the leading digit of the mantissa must benon-zero. This means we do not need to include mn−1 explicitly in the representation of x, the now implicit valuebeing termed a (or the) hidden digit.

• The exponent needs a signed integer representation; one might imagine that two’s-complement is suitable, butinstead an approach called biasing is used. Essentially this means that the representation of x adds a constant β tothe real value of e so it is always positive, i.e.,

e 7→ e − β.

The formats are as follows:

git # ba293a0e @ 2019-11-14 55



• The single-precision, 32-bit floating-point format allocates the least-significant 23 bits to the mantissa, the next-significant 8 bits to the exponent and the most-significant bit to the sign:

x = 〈x0, x1, . . . , x31〉

= 〈m0,m1, . . . ,m22︸︷︷︸23 bits

, e0, e1, . . . , e7︸︷︷︸8 bits

, s〉QIEEE−32−bit

7→ −1s·m · 2e−127

Note that here, β = 127.

• The double-precision, 64-bit floating-point format allocates the least-significant 52 bits to the mantissa, the next-significant 11 bits to the exponent and the most-significant bit to the sign:

x = 〈x0, x1, . . . , x63〉

= 〈m0,m1, . . . ,m51︸︷︷︸52 bits

, e0, e1, . . . , e10︸︷︷︸11 bits

, s〉QIEEE−64−bit

7→ −1s·m · 2e−1023

Note that here, β = 1023.

Definition 1.61. The IEEE floating-point representations reserve some values in order to represent special quantities.For example, reserved values are used to represent +∞, −∞ and NaN, or not-a-number: +∞ and −∞ can occur whena result overflows beyond the limits of what can be represented, NaN occurs, for example, as a result of division by zero.For the single-precision, 32-bit format these special values are

00000000000000000000000000000000 7→ +010000000000000000000000000000000 7→ −001111111100000000000000000000000 7→ +∞11111111100000000000000000000000 7→ −∞

01111111100000100000000000000000 7→ NaN11111111100100010001001010101010 7→ NaN

with similar forms for the double-precision, 64-bit format.

Example 1.47. Imagine we want to represent x = 123.3125(10) in the single-precision, 32-bit IEEE floating-pointformat. First we write the number in binary

x = 1111011.0101(2)

before normalising it, meaning we shift it so that there is only one digit to the left of the binary point, to get

x = 1.1110110101(2) · 26.

Recalling that we do not store the implicit hidden digit (i.e., the digit to the left of the binary point), ourmantissa, exponent and sign become

m = 11101101010000000000000(2)e = 00000110(2)s = 0(2)

noting we pad both with less-significant zeros to ensure each is of the correct length. Finally, we can converteach component into a literal using their associated representations, i.e.,

m = 11101101010000000000000e = 10000101s = 0

noting that we bias e (i.e., add 127 to e = 6) to get the result, and concatenate the components into the singleliteral

x = 01000010111101101010000000000000.

git # ba293a0e @ 2019-11-14 56



Definition 1.62. Consider a case where the result of some arithmetic operation (or conversion) requires more digits ofprecision than are available. That is, it cannot be represented exactly within the n digits of mantissa available. To combatthis problem, we can use the concept of rounding. For example, you probably already know that if we only have twodigits of precision available then

• 1.24(10) is rounded to 1.2(10) because the last digit is less than five, while

• 1.27(10) is rounded to 1.3(10) because the last digit is greater than or equal to five.

Such a rounding mode is essentially a rule that takes the ideal result, i.e., the result if one could use infinite precision,to the most suitable representable result.

The IEEE-754 specification mandates the availability of four rounding modes. In each case, the idea is toimagine the ideal result x is written using an l > n digit mantissa m, i.e.,

x = −1s· (ml−1 ◦ ml−2 . . . m1 m0︸︷︷︸

l digits

) · be.

To round x, we copy the most-significant n digits of m to get

x′ = −1s· (m′n−1 ◦ m′n−2 . . . m′1 m′0︸︷︷︸

n digits

) · be

where m′i = mi+l−n, then “patch” m′0 = ml−n according to rules given by the rounding mode. Within the following,we offer some decimal examples for clarity (minor alterations apply in binary), rounding for n = 2 digits ofprecision in each example. Note that the the C standard library offers access to these features, using constantvalues FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, and FE_TOWARDZERO respectively to refer to the roundingmodes themselves. For example, the rint function rounds a floating-point value using the currently selectedIEEE-754 rounding mode; this can be inspected and set using the fegetround and fesetround functions.

Definition 1.63. Sometimes termed Banker’s Rounding, the round to nearest mode alters basic rounding to providemore specific treatment when the ideal result is exactly half way between representable results, i.e., when m′0 = 5. It canbe described via the following rules:

• If ml−n−1 ≤ 4, then do not alter m′0.

• If ml−n−1 ≥ 6, then alter m′0 by adding one.

• If ml−n−1 = 5 and at least one of the trailing digits from ml−n−2 onward is non-zero, the alter m′0 by adding one.

• If ml−n = 5 and all of the trailing digits from ml−n−1 onward are zero, then alter m′0 to the nearest even digit. Thatis:

– if m′0 ≡ 0 (mod 2) then do not alter it, but

– if m′0 ≡ 1 (mod 2) then alter it by adding one.

Example 1.48. Using the round to nearest mode, we find that

• 1.24(10) rounds to 1.2(10),

• 1.27(10) rounds to 1.3(10),

• 1.251(10) rounds to 1.3(10),

• 1.250(10) rounds to 1.2(10), and

• 1.350(10) rounds to 1.4(10).

Definition 1.64. Sometimes termed ceiling, the round toward +∞ mode can be described via the following rules:

• If x is positive (i.e., s = 0), if ml−n−1 is non-zero then alter m′0 by adding one.

• If x is negative (i.e., s = 1), the trailing digits from ml−n−1 onward are discarded.

Example 1.49. Under the round toward +∞mode, we find that

git # ba293a0e @ 2019-11-14 57



• 1.24(10) rounds to 1.3(10),

• 1.27(10) rounds to 1.3(10),

• 1.20(10) rounds to 1.2(10),

• −1.24(10) rounds to −1.2(10),

• −1.27(10) rounds to −1.2(10), and

• −1.20(10) rounds to −1.2(10).

Definition 1.65. Sometimes termed floor, the round toward −∞ mode can be described via the following rules:

• If x is positive (i.e., s = 0), the trailing digits from ml−n−1 onward are discarded.

• If x is negative (i.e., s = 1), if ml−n−1 is non-zero then alter m′0 by adding one.

Example 1.50. Under the round toward −∞mode, we find that

• −1.24(10) rounds to −1.3(10),

• −1.27(10) rounds to −1.3(10),

• −1.20(10) rounds to −1.2(10),

• 1.24(10) rounds to 1.2(10),

• 1.27(10) rounds to 1.2(10), and

• 1.20(10) rounds to 1.2(10).

Definition 1.66. The round toward zero mode operates as round toward −∞ for positive numbers and as round toward+∞ for negative numbers.

Example 1.51. Under the round toward zero mode, we find that

• 1.27(10) rounds to 1.2(10),

• 1.24(10) rounds to 1.2(10),

• 1.20(10) rounds to 1.2(10),

• −1.27(10) rounds to −1.2(10),

• −1.24(10) rounds to −1.2(10), and

• −1.20(10) rounds to −1.2(10).

Example 1.52. The (slightly cryptic) C program in Figure 1.6 offers a practical demonstration that floating-pointworks as expected. The idea is to “overlap” a single-precision, 32-bit floating-point value called x with aninstance of the ieee32_t structure called y; main creates an instance of this union, calling it t. Since we canaccess individual fields within t.y (e.g., the sign bit t.y.s, or the mantissa t.y.m), we can observe the effectaltering them has on the value of t.x. Compiling and executing the program gives the following output:+2.800000 0 80 333333-2.800000 1 80 333333-5.600000 1 81 333333

+nan 0 FF 400000+inf 0 FF 000000

The question is, what on earth does this mean? We can answer this by looking at the each part of the program(each concluding with a call to printf that produces the lines of output):

• t.x is set to 2.8(10), and then t.x and each component of t.y is printed. The output shows that

t.y.s = 0(16) 7→ 0t.y.e = 80(16) 7→ 10000000t.y.m = 333333(16) 7→ 01100110011001100110011

Accounting for the bias and including the hidden bit, this of course represents the value

−10· 1.011001100110011001100110011(2) · 21

or 2.8(10) as expected.

git # ba293a0e @ 2019-11-14 58



typedef union __view32_t {float x;ieee32_t y;

} view32_t;

typedef union __view64_t {double x;ieee32_t y;

} view64_t;

(a) Two unions which “overlap” the representations of an actual floating-point field xwith an instance y of the structure(s)defined in Figure 1.5.

int main( int argc, char* argv[] ) {view32_t t;

t.x = 2.8;printf( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.s = 0x01;printf( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.e = 0x81;printf( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.s = 0x00;t.y.e = 0xFF;t.y.m = 0x400000;printf( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

t.y.s = 0x00;t.y.e = 0xFF;t.y.m = 0x000000;printf( "%+9f %01X %02X %06X\n", t.x, t.y.s, t.y.e, t.y.m );

return 0;}

(b) A driver function main that uses an instance x of view32_t to demonstrate how manipulating fields in t.y impactson the value of t.x.

Figure 1.6: A short C program that performs direct manipulation of IEEE floating-point numbers.

• t.y.s is set to 01(10) = 1(10), and then t.x and each component of t.y is printed. We expect that settingthe sign bit to 1 rather than 0 will change t.x from being positive to negative; this is confirmed by theoutput, which shows t.x is equal to −2.8(10) as expected.

• t.y.e is set to 81(10) = 129(10), and then t.x and each component of t.y is printed. We expect thatsetting the exponent to 129 rather than 128 will double t.x (the unbiased value of the exponent is now129 − 127 = 2 st. the mantissa is scaled by 22 = 4 rather than 21 = 2); this is confirmed by the output,which shows t.x is equal to −5.6(10) as expected.

• t.y.s, t.y.e and t.y.m are set to reserved values corresponding to NaN and +∞.

1.8.5 Representing characters

So far we have examined techniques to represent numbers, but clearly we might want to work with othertypes of data; a computer can process all manner of data such as emails, images, music and so on. Theapproach used to represent characters (or letters) is a good example: basically we just need a way translatefrom what we want into a numerical representation (which we already know how to deal with) and backagain. More specifically, we need two functions: Ord(x) which takes a character x and gives us back thecorresponding numerical representation, and Chr(y) which takes a numerical representation y and gives backthe corresponding character. But how should the functions work? Fortunately, people have thought aboutthis problem for us and provided standards we can use. One of the oldest and most simple is the AmericanStandard Code for Information Interchange (ASCII), pronounced “ass key”.

ASCII has a rich history, but was developed to permit communication between early teleprinter devices.These were like a combination of a typewriter and a telephone, and were able to communicate text to eachother before innovations such as the fax machine. Later, but long before monitors and graphics cards existed,similar devices allowed users to send input to early computers and receive output from them. Figure 1.8 showsthe 128-entry ASCII table which tells us how characters are represented as numbers. Of the entries, 95 areprintable characters we can instantly recognise (including SPC which is short for “space”). There are also 33

git # ba293a0e @ 2019-11-14 59



Figure 1.7: A teletype machine being used by UK-based Royal Air Force (RAF) operators during WW2 (public domainimage, source: http://en.wikipedia.org/wiki/File:WACsOperateTeletype.jpg).

y Chr(y) y Chr(y) y Chr(y) y Chr(y)Ord(x) x Ord(x) x Ord(x) x Ord(x) x

0 NUL 1 SOH 2 STX 3 ETX4 EOT 5 ENQ 6 ACK 7 BEL8 BS 9 HT 10 LF 11 VT12 FF 13 CR 14 SO 15 SI16 DLE 17 DC1 18 DC2 19 DC320 DC4 21 NAK 22 SYN 23 ETB24 CAN 25 EM 26 SUB 27 ESC28 FS 29 GS 30 RS 31 US32 SPC 33 ! 34 " 35 #36 $ 37 % 38 & 39 '40 ( 41 ) 42 * 43 +44 , 45 - 46 . 47 /48 0 49 1 50 2 51 352 4 53 5 54 6 55 756 8 57 9 58 : 59 ;60 < 61 = 62 > 63 ?64 @ 65 A 66 B 67 C68 D 69 E 70 F 71 G72 H 73 I 74 J 75 K76 L 77 M 78 N 79 O80 P 81 Q 82 R 83 S84 T 85 U 86 V 87 W88 X 89 Y 90 Z 91 [92 \ 93 ] 94 ^ 95 _96 ` 97 a 98 b 99 c100 d 101 e 102 f 103 g104 h 105 i 106 j 107 k108 l 109 m 110 n 111 o112 p 113 q 114 r 115 s116 t 117 u 118 v 119 w120 x 121 y 122 z 123 {124 | 125 } 126 ~ 127 DEL

Figure 1.8: A table describing the printable ASCII character set.

git # ba293a0e @ 2019-11-14 60




others which represent non-printable control characters: originally, these would have been used to control theteleprinter rather than to have it print something. For example, the CR and LF characters (short for “carriagereturn” and “line feed”) would combine to move the print head onto the next line; we still use these charactersto mark the end of lines in text files. Other control characters also play a role in modern computers. Forexample, the BEL (short for “bell”) characters play a “ding” sound when printed to most UNIX terminals, wehave keyboards with keys that relate to DEL and ESC (short for “delete” and “escape”) and so on.

Since there are 128 entries in the table, ASCII characters can be and are represented by 8-bit bytes. However,notice that 27 = 128 and 28 = 256 so in fact we could represent 256 characters: essentially one of the bits is notused by the ASCII encoding. Specific computer systems sometimes use the unused bit to permit use of an“extended” ASCII table with 256 entries; the extra characters in the this table can be used for special purposes.For example, foreign language characters are often defined in this range (e.g., é or ø), and “block” charactersare included for use by artists who form text-based pictures. However, the original use of the unused bit wasas an error detection mechanism.

Given the table, we can see that for example that Chr(104) = ‘h’, i.e., if we see the number 104 then thisrepresents the character ‘h’. Conversely we have that Ord(‘h’) = 104. Although in a sense any consistenttranslation between characters and numbers like this would do, ASCII has some useful properties. Lookspecifically at the alphabetic characters:

• Imagine we want to test if one character x is alphabetically before some other y. The way the ASCIItranslation is specified, we can simply compare their numeric representation. If we find Ord(x) < Ord(y)then the character x is before the character y in the alphabet. For example ‘a’ is before ‘c’ because

Ord(‘a’) = 97 < 99 = Ord(‘c’).

• Imagine we want to convert a character x from lower-case into upper-case. The lower-case characters arerepresented numerically as the contiguous range 97 . . . 122; the upper-case characters as the contiguousrange 65 . . . 90. So we can covert from lower-case into upper-case simply by subtracting 32. For example

Chr(Ord(‘a’) − 32) = ‘A’.

1.9 A conclusion: steps toward a digital logic

If we were to summarise all the pieces of theory accumulated above, the list would be roughly as follows:

1. We know that we can define Boolean algebra, which gives us a) a set of values (i.e., B = {0, 1}), b) a setof (unary and binary) operators (i.e., NOT, AND, and OR), and c) a set of axioms. This means we canconstruct Boolean expressions and manipulate them while preserving their meaning.

2. We can describe Boolean functions of the form

Bn→ B

and hence also construct more general functions of the form

Bn→ Bm

using m separate functions whose outputs are concatenated together. It therefore makes sense that NOT,AND, and OR are well-defined for theBn as well asB: we overload AND, for example, and write r = x∧yas a short-hand for ri = xi ∧ yi where 0 ≤ i < n.

3. We can represent various objects, such as numbers using sequences of bits. Since we can describe Booleanfunctions of the form

Bn→ Bm,

we can construct such functions that perform arithmetic with numbers we are representing. Imagine,for example, we have two integers x and y represented by the n-bit sequences x and y. To compute the(integer) sum of x and y, all we need is a Boolean function

f : Bn× Bn

→ Bn

st.r = f (x, y) 7→ x + y,

i.e., whose output r represents r = x + y.

What we end up with is the ability to perform meaningful computation; fairly simple computation, granted,but computation none the less. Fundamentally, this is what computers are: they are just devices that performcomputation. So if you follow through all the theory, we have developed as “blueprint” for how to build a realcomputer. That is, we have a link (however tenuous) between a theoretical model of computation based onMathematics and the first steps toward a practical realisation of that model.

git # ba293a0e @ 2019-11-14 61



References

[1] G. Boole. An investigation of the laws of thought. Walton & Maberly, 1854 (see p. 27).

[2] D. Cohen. “On Holy Wars and a Plea for Peace”. In: IEEE Computer 14.10 (1981), pp. 48–54 (see p. 37).

[3] D. Goldberg. “What Every Computer Scientist Should Know About Floating-Point Arithmetic”. In: ACMComputing Surveys 23.1 (1991), pp. 5–48.

[4] C. Petzold. Code: Hidden Language of Computer Hardware and Software. Microsoft Press, 2000.

[5] E.L. Post. “Introduction to a General Theory of Elementary Propositions”. In: American Journal of Mathe-matics 43.3 (1921), pp. 163–185 (see p. 29).

[6] C.E. Shannon. “A mathematical theory of communication”. In: Bell System Technical Journal 27.3 (1948),pp. 379–423 (see p. 35).

[7] C.E. Shannon. “A Symbolic Analysis of Relay and Switching Circuits”. In: Transactions of the AmericanInstitute of Electrical Engineers (AIEE) 57.12 (1938), pp. 713–723 (see p. 27).

[8] H.M. Sheffer. “A set of five independent postulates for Boolean algebras, with applications to logicalconstants”. In: Transactions of the American Mathematical Society 14.4 (1913), pp. 481–488 (see p. 29).

[9] C. Wressnegger et al. “Twice the Bits, Twice the Trouble: Vulnerabilities Induced by Migrating to 64-BitPlatforms”. In: Computer and Communications Security (CCS). 2016, pp. 541–552 (see p. 47).

git # ba293a0e @ 2019-11-14 62



CHAPTER

2

BASICS OF DIGITAL LOGIC

Scientists build to learn; Engineers learn to build.

– Brooks

In the previous Chapter, we made some statements regarding various features of digital logic without backing themup with any evidence or explanation. Adopting a “from atoms upwards” approach in order to support material insubsequent Chapters, this Chapter has two central aims that, in combination, describe digital logic. First it expands onprevious statements, such as the above, demonstrating how they can be satisfied using introductory Physics. Note thata detailed, in-depth treatment of such material could fill another book, and, arguably, is not strictly required given theremit of this book; the focus is therefore at a high level, offering an overview of only pertinent details at the right level ofabstraction. For example, to connect theory such as Boolean algebra to practice, it is important to understand how wecan design and manufacture implementations of Boolean operators that can physically provide the same functionality.Then, second, it explains why doing so is useful and important: the bulk of the Chapter demonstrates, step-by-step, howsuccessively higher-level components, capable of successively more complex and so useful computation, can be designedand implemented.

2.1 Switches and transistors

Even complex use of digital logic is, at the lowest level of detail, based on remarkably simple building blocks:fundamentally, all we really need is a way to manufacture a switch.

In the subsequent Sections we focus exclusively on transistors, whose design and behaviour depend onsub-atomic properties of the materials they are created from. There is a good reason for this focus: transistorsare (currently) the dominant way to realise digital logic, and can be found in most if not all devices we routinelyuse. However, it is crucial to remember that transistors are not the only option. Put another way, providedcorrect switch-like behaviour is possible, we might legitimately select another implementation technology. Sincenew materials and manufacturing processes, applications and quality metrics will all appear due to advancesin technology, understanding the underlying principles is as important as any specific example (such as thetransistor), because it, like anything, could be superseded over time.

2.1.1 A brief tour of fundamental principles

2.1.1.1 Atoms and sub-atomic particles

Everything around us is formed from building blocks called atoms; in turn, each atom is formed from sub-atomic particles including a) a group of nucleons, either protons or neutrons, in a central core or nucleus, andb) a cloud of electrons orbiting said nucleus. The number of such sub-atomic particles yields information aboutthe associated atom. More specifically, the number of protons dictates the atomic number (or family: this iswhat we mean by the term element) whereas the number of neutrons dictates the isotope (or instance, withinthat family). Likewise, the electrons can orbit the nucleus in one of several levels (or shells) in what is termedthe electron configuration.

git # ba293a0e @ 2019-11-14 63



An aside: describing basic physics using the hydraulic analogy.

For some, the electrical properties of atoms and sub-atomic particles may be an unfamiliar topic. As a result, itis common, and potentially quite useful, to align them with more familiar concepts via the so-called hydraulicanalogy.

Imagine a water tower (resp. battery), connected via a pipes (resp. wire) which eventually powers a waterwheel (resp. lamp):

• the water pressure (resp. electrical potential) is dictated by how much water (resp. electrical charge) isheld in the water tower,

• water flows along the pipes; a wider pipe (resp. a wire with lower resistivity) allows water to flow moreeasily, and hence quicker, than a narrower pipe (resp. wire with higher resistivity),

• when the water reaches the water wheel, it causes it to turn as a result of two properties: the pressure(resp. voltage), and the flow rate (resp. current) of the water.

+ + +−− −

Figure 2.1: The sub-atomic structure of a lithium atom.

Example 2.1. By consulting a suitable periodic table, consider that

• silicon has atomic number fourteen; it has three shells containing two, eight and four electrons respec-tively, whereas

• lithium has atomic number three; it has two shells containing two and one electrons respectively.

Definition 2.1. Each type of sub-atomic particle carries a specific electrical charge: electrons carry a negative charge,protons carry a positive charge, and neutrons carry no (or a neutral) charge; the unit of measurement is the coulomb(after Charles-Augustin de Coulomb). This suggests any atom with an imbalance of electrons and protons will have anon-neutral charge overall; we term such cases an ion, st. negatively (resp. positively) charged ions will have more (resp.fewer) electrons than protons.

2.1.1.2 Electrical charge, current, and voltage

The sub-atomic particles in an atom are bound together by forces that make sure they remain a cohesive whole.More specifically, nucleons are bound together by a “strong” nuclear force, whereas electrons are bound to thenucleus by a “weak” electromagnetic attraction to the protons; electrons in more inner shells are bound moretightly. That said, the force binding electrons can be overcome in a process called ionisation: an atom canbe turned into an ion by displacing electrons, thereby producing imbalance between the number of electronsand protons, using some energy. The exact amount of energy required relates to how tightly the electrons are

git # ba293a0e @ 2019-11-14 64



insu

lato

r−

−

−

Figure 2.2: A simple circuit conditionally connecting a capacitor (or battery) to a lamp depending on the state of a switch.

(a) A battery-and-lamp AND-style computation.

(b) A battery-and-lamp OR-style computation.

(c) A battery-and-lamp XOR-style computation.

Figure 2.3: Some simple examples of Boolean-style control of a lamp by combinations of switches.

bound to the nucleus, and so by the type of atom. Electrons also exhibit a property whereby they repel eachother, but are attracted by holes (or “gaps”) in a given electron cloud; this implies they can move.

Definition 2.2. Electrical current refers to a (net) flow of electric charge; the unit of measurement is the ampere (oramp, after André-Marie Ampére).

Definition 2.3. Electrical potential difference (or, more often, voltage) refers to the difference in electrical potentialenergy between two points per unit electric charge; the unit of measurement is the volt (after Alessandro Volta). Informally,you can think of voltage as the electrical work (or the effort) needed to move (or drive) the electrons and hence cause aflow of current.

Definition 2.4. Electrical power refers to the rate of electrical work, i.e., the amount of charge driven, per unit of time,by a given voltage; the unit of measurement is the watt (after James Watt). We say electrical power is dissipated (or“consumed”) when electrical potential energy associated with some charge is converted into another form (e.g., heat orlight) by a component (or load).

An electron can move between atoms, doing so from a point of more negative charge toward a point of morepositive charge, i.e., from lower to higher voltage, or driven by a potential difference. This movement or flowof valence electrons from one point to another suggests a (net) flow of charge and hence a current between thetwo points.

This is formally termed electron current, in part to distinguish it from conventional current: when weuse the term current, we almost universally mean the latter. Although electron current describes the flow ofnegative charge, means we actually focus on what would be the flow of positive charge if that were possible(i.e., the opposite of electron current). Put another way, some electron moving from a more negative point X toa more negative point Y will make Y more negative and X more positive; the electron current is from X to Y,whereas the conventional current is from Y to X. This is why you might traditionally think of charge movingfrom a terminal labelled +ve to that labelled −ve on a battery. Set in the context of what we now now to be

git # ba293a0e @ 2019-11-14 65



true, this is confusing1. However, it also has a clear historical linage we are now stuck with: Benjamin Franklinadopted this convention in the mid 1700s, also labelling charge using the positive or negative terminology,during his pioneering study of electricity. Either way, from here on, you should read current as a synonym forconventional current.

2.1.1.3 Conductors, insulators and semi-conductors

Definition 2.5. Two materials with different sub-atomic composition may exhibit different properties wrt. their conduc-tivity and resistivity; these terms (which are antithetical) state whether a material allows or prevents the movement ofelectrons.

Definition 2.6. A conductor, e.g., a metal, has high-conductivity (resp. low-resistivity) and allows electrons to moveeasily, whereas an insulator, e.g., a vacuum, has low-conductivity (resp. high-resistivity) and does not allow electrons tomove easily.

When we describe a material as conductive or resistive, we typically mean it is on a spectrum between thetwo: although unlikely to represent a perfect conductor or insulator, we mean it is closer to one end of thespectrum or other (e.g., is more conductive or more resistive). Although such properties are inherent in thematerial, it is possible to explicitly manipulate the sub-atomic composition of semi-conductor materials usinga process called doping. For example, imagine we need a material for some task; any non-ideal material willhave non-ideal properties wrt. the task. The idea is instead to take some non-ideal material as a starting point,then dope (or combine) it with a dopant material: their combination should be similar to the starting point,but more ideal wrt. the properties required.

Example 2.2. Consider pure silicon, which has an electron cloud of four electrons (only about half full); it ismore or less an insulator. Doping with a boron or aluminium donor creates extra holes, while doping withphosphor or arsenic creates extra electrons.

An important use-case for doping is the production of semi-conductor materials. Although various materialsmight exhibit the properties of a semi-conductor, doping allows careful control over the ratio of electrons vs.hole and hence conductivity (resp.resistivity) of the result. Rather than rely on the perfect material beingnaturally available, we therefore produce a material with exactly the properties required for a given task.

Definition 2.7. A doped semi-conductor material falls into one of two classes, namely

1. an N-type semi-conductor has an abundance of electrons produced by doping with a donor material, or

2. a P-type semi-conductor has an abundance of holes produced by doping with a acceptor material.

2.1.1.4 Using switches for computation

Example 2.3. Consider Figure 2.2, which includes a capacitor (top), a lamp (bottom-right), and a push-buttonswitch (bottom-left). The capacitor is constructed using two conductive plates separated by an insulator (calleda dielectric); it stores electrical energy (meaning it is similar to a battery2), and, since this one has already beencharged, one of the plates has many electrons and the other many holes. Electrons cannot move through theinsulator, so the only way for them to move from the negative onto the positively charged plate (i.e., from lowto high potential) is via the (conductive) wire. This is only possible when the switch is closed: when the switchis closed, electrons are allowed to flow through the lamp which causes it to light up.

Following from this example, it may be worth convincing ourselves that a switch is useful for somethingbeyond controlling a lamp as above. To provide an answer, we just need to generalise the example: we a) usemultiple switches, and b) treat each switch as an input and the lamp as an output. Put another way, imaginewe have two switches labelled x and y; we are interested in how their combination controls the lamp labelledr, so, in effect, what the function f described by r = f (x, y) is.

Example 2.4. Consider Figure 2.3:

1. Figure 2.3a controls the lamp via two switches, and models an AND operator: r = f (x, y) |= x ∧ y. Onlywhen both of the switches are closed will the lamp be on: if either is open, there is no connection withthe battery.

1 See, e.g., http://xkcd.com/567/.2 Although the analogy is reasonable, keep in mind that a battery differs from a capacitor: behaviour of the former is due to chemical a

process, which converts chemical energy to electrical energy and thus delivers a flow of electrons (i.e., a current).

git # ba293a0e @ 2019-11-14 66


http://xkcd.com/567/


2. Figure 2.3b controls the lamp via two switches, and models an OR operator: r = f (x, y) |= x ∨ y. Whenone or the other, or both the switches are closed will the lamp be on: there is connection with the batteryunless both of the switches are open.

3. Figure 2.3c controls the lamp via two switches, and models an XOR operator: r = f (x, y) |= x ⊕ y. Thistime, the switches sort of operate in the opposite way to each other; to make a connection between thelamp and battery along the top (resp. bottom) wire, the left-hand switch needs to be closed (resp. open)while the right-hand switch needs to be open (resp. closed): there is connection with the battery if oneor the other, but not both switches are closed. You often find this sort of arrangement in homes where asingle light on some stairs is controlled by switches located at the top and bottom.

On one hand, the examples above should be encouraging: they show we can mirror the behaviour of Booleanoperators, using a careful organisation of multiple switches. On the other hand, however, push-button switchesare mechanically operated: we want an electrically operated switch, which is actuated (i.e., pressed or released)via an electrical property (e.g., a flow of electrons) rather than by hand. Crucially, this will allow the output ofone such operator to be used as the input to another, and therefore the implementation of larger expressions.

2.1.2 Implementing transistors

2.1.2.1 Switches in a pre-transistor era: vacuum tubes

From a historical perspective, numerous different electrically operated switch designs have been conceivedand used; it is both interesting and useful to examine them in some detail, because their properties act asmotivation for modern alternatives. In particular, the vacuum tube (or thermionic valve), is a compellingexample because it was used extensively by early generations of computing equipment; it still often playsa role in high-end audio equipment. The idea is to use a glass or ceramic envelope to maintain a vacuumsurrounding an electron-producing filament (or cathode) and a metal plate (or anode). When the filament isheated, electrons are produced into the vacuum which are attracted by the plate resulting in a current betweenthe two. In simple terms, this implements a switch: when the filament is heated the switch is on, when it iscooled the switch is off.

The design outlined above, plus an example in Figure 2.4, hint at some potential disadvantages: they offerthe functionality we require, but, in relative terms, are physically large, operate slowly, and are unreliable. Thelatter properties both stem from the need to (repeatedly) heat and cool the filament. This takes some time,and stresses the filament up to a point where it fails (much like a light bulb failing when turned on or off). Asan aside, the terms bug and debug (allegedly) stem from failure of this sort. In 1945, programmers using theHarvard Mark II computer, developed by Howard Aiken, discovered a moth inside one of the components; theunfortunate insect had shorted the component, which had resulted in the malfunction. Although the terms hadbeen used previously in various other contexts, Grace Hopper and her team are now often cited as introducingthem to Computer Science. Certainly this real-life bug, shown in Figure 2.5, is so famous it is still on displayin the Smithsonian Museum of American History!

2.1.2.2 Design of MOSFET transistors

A transistor can in fact be used for a various tasks; for example, they can act as amplifiers. However, whenused as switches they

1. allow charge to flow between two terminals (i.e., act as a conductor) when we turn the the switch on, and

2. prevent change flowing between two terminals (i.e., act as a resistor) when we turn the the switch off.

The word transistor is a portmanteau of “transfer resistor”, offering a hint as to the underlying principle: atransistor is a resistor, but one we can control by altering how resistive it is. Put more simply, we can control itst. it is conductive when we want to turn the switch on and resistive when we want to turn it off.

The question is then how such behaviour is realised. Improvement and different trade-offs have given usnumerous transistor designs, but we focus on just one: the Field Effect Transistor (FET), initially designed andpatented by Julius Lilienfeld in 1925. However, at that point in time the general understanding of sub-atomicbehaviour was less than now, meaning use of his design was limited. This changed in 1952, when a teamof Engineers at Bell Labs, led by William Shockley, invented what is now termed a junction gate FET (orJFET, due to some legal wranglings wrt. to the Lilienfeld patent). In turn, this gave rise to the Metal OxideSemi-Conductor Field-Effect Transistor (MOSFET), invented in 1960 by Dawon Kahng and Martin Atalla,also at Bell Labs. These designs delivery the properties we require to avoid their limiting complexity of moderndigital logic components; in particular, they a) have the switch-like functionality described as useful thus far,while also b) are simultenously physically small, operate quickly, are reliable, and easy to manufacture.

git # ba293a0e @ 2019-11-14 67



Figure 2.4: A 6P1P (i.e., a 100W to 200W, photo-sensitive type) vacuum tube (public domain image, source: http://en.wikipedia.org/wiki/File:6P1P.jpg).

Figure 2.5: A moth found by operations of the Harvard Mark 2; the “bug” was trapped within the computer and causedit to malfunction (public domain image, source: http://en.wikipedia.org/wiki/File:H96566k.jpg).

Figure 2.6: A replica of the first point-contact transistor, a precursor of designs such as the MOSFET, constructed at BellLabs (public domain image, source: http://en.wikipedia.org/wiki/File:Replica-of-first-transistor.jpg).

git # ba293a0e @ 2019-11-14 68








source draingate

body

Figure 2.7: A high-level diagram of a MOSFET transistor, showing the terminal and body materials.

N-type N-typeP-type

P-type P-typeN-type

Figure 2.8: A pair of N-MOSFET and P-MOSFET transistors, arranged to form a CMOS cell.

Figure 2.7 offers a high-level description of a MOSFET, in which atomic-scale layers of semi-conductormaterial are combined with metal or poly-silicon layers for the terminals; although a lower-level, more detaileddescription would require deeper understanding of related Physics (see, e.g., [8]), we already have enoughbackground to explain the basic concept at this high level. In short, the switch-like behaviour is realised byusing the gate terminal to control a conductive channel between the source and drain terminals . Unlike aa JFET, where an explicit semi-conductor layer is constructed for use as the channel, in a MOSFET transistorthe channel is induced. Specifically, applying a small potential difference to the gate terminal repels holes inthe P-type body; doing so forms a depletion layer in which the number of holes is depleted. As the potentialdifference applied grows, an inversion layer is formed at the surface: the abundance of electrons relative tothe number of (repelled) holes inverts the properties of the P-type body, turning it into N-type and so forminga conductive channel between N-type source and drain terminals.

Realising this behaviour in practice depends on the careful selection of semi-conductor materials; Figure 2.9illustrates the symbols used for two MOSFET variants. These symbols abstract away the implementation detail(retaining only the terminals, with d, s and g denote the drain, source and gate), which is as follows:

Definition 2.8. An N-MOSFET (or N-type MOSFET, or N-channel MOSFET, or NPN MOSFET) is constructedfrom N-type semi-conductor terminals and a P-type body:

• applying a potential difference to the gate widens the conductive channel, meaning source and drain are connected(i.e., act like a conductor); the transistor is activated.

• removing the potential difference from the gate narrows the conductive channel, meaning source and drain aredisconnected (i.e., act like an insulator); the transistor is deactivated.

Definition 2.9. A P-MOSFET (or P-type MOSFET, or P-channel MOSFET, or PNP MOSFET) is constructedfrom P-type semi-conductor terminals and an N-type body:

• applying a potential difference to the gate narrows the conductive channel, meaning source and drain are disconnected(i.e., act like an insulator); the transistor is deactivated.

• removing the potential difference from the gate widens the conductive channel, meaning source and drain areconnected (i.e., act like a conductor); the transistor is activated.

Put another way, for an N-MOSFET, applying a large potential difference to the gate terminal produces a widerconductive channel, and so allows electrons (i.e., current) to flow between source and drain. Conversely, asmall potential difference (or at least smaller than some threshold) means a narrower conductive channel,which prevents said flow. The gate terminal therefore offers functionality much like a switch: controlling thepotential difference applied controls conductivity between source and drain, and hence regulates the current.

2.1.2.3 Physical properties of MOSFET transistors

Various physical properties stem from the design of MOSFET transistors; since they are related, we define thesestep-by-step in what follows.

Definition 2.10. One or more power rails supply voltage levels to each transistor, connecting to the gate or sourceterminal.

git # ba293a0e @ 2019-11-14 69



d

s

g

(a) An N-MOSFET transistor.

d

s

g

(b) A P-MOSFET transistor.

Figure 2.9: Symbolic descriptions of N-MOSFET and P-MOSFET transistors.

Definition 2.11. The threshold voltage of a given MOSFET (i.e., either N- or P-MOSFET) is the minimum voltagelevel (i.e., potential difference between gate and source) required to activate the transistor and thus connect the source anddrain; below the threshold voltage, the source and drain remain disconnected.

Definition 2.12. The concept of sub-threshold leakage (or just leakage) relates to a non-ideal properties of theconductive channel: below the threshold voltage the source and drain are not perfectly disconnected, st. a small flow ofelectrons (i.e., the leakage current) is possible.

2.1.2.4 Organisation of MOSFET transistors into CMOS-based logic gates

Rather than use MOSFET transistors in isolation, it is common to organise them into larger combinations; byoffering a higher level of abstraction, such combinations are usually easier to reason about from both functionaland behavioural perspectives.

Ultimately the aim is to (re)produce Section 2.1.1 where we outlined Boolean-like functionality usingmechanical switches, but now by using transistors. A popular3 first step relates to organisation of two transis-tors (pairing an N-MOSFET with a P-MOSFET) to form one Complementary Metal-Oxide Semi-Conductor(CMOS) component we term a cell. This approach, as illustrated at a high-level by Figure 2.8, was first con-ceived in 1963 by Frank Wanlass at Fairchild Semi-conductor. The idea is to organise the transistors so theyoperate in a complementary manner:

Definition 2.13. CMOS-based design strategies typically use two distinct parts to form a given component: there will be

1. a pull-up network of P-MOSFET transistors between the Vdd power rail and the output, and

2. a pull-down network of N-MOSFET transistors between the Vss power rail and the output.

A consequence of this logic style is that only one of the pull-up or pull-down networks can be active (i.e., connected) at atime.

Definition 2.14. The power dissipation of a CMOS cell, and hence a CMOS-based design more generally, can be describedin terms of

1. a static component, where the transistors remain in a given state (to are “idle” in some sense), and

2. a dynamic component, where the transistors switch state, i.e., the gate is changes from being driven by Vdd to Vss,or vice versa.

CMOS exhibits a marginal amount of sub-threshold leakage, so the majority of power dissipation occurs due to switchingactivity.

This has some obvious advantages, which make CMOS an attractive choice vs. alternatives. In particular,when organising lots of transistors in close proximity, CMOS will have lower overall power consumption andheat dissipation, and, in turn, better reliability.

The next step is to package CMOS cells into small, useful building-blocks that act as the next-level componentabove transistors themselves. As an example, consider building a component which inverts the input st. if theinput x is Vdd the output is Vss and vice versa.

Example 2.5. Consider Figure 2.10a, where

1. connecting x to Vss means the top P-MOSFET will be connected, the bottom N-MOSFET will be discon-nected, so r will be connected to Vdd. while

3 It is important to stress that CMOS is not the only possible logic style: although it represents a first step here, it may not be necessaryif an alternative is used instead.

git # ba293a0e @ 2019-11-14 70



rx

Vdd

Vss

(a) A CMOS-based NOT gate.

x

y

r

Vss

Vdd

(b) A CMOS-based, 2-input NAND gate.

x

y

r

Vdd

Vss

(c) A CMOS-based, 2-input NOR gate.

Figure 2.10: MOSFET-based implementations of NOT, NAND and NOR logic gates.

git # ba293a0e @ 2019-11-14 71



x y NOT NAND NORVss Vss Vdd Vdd VddVss Vdd Vdd Vdd VssVdd Vss Vss Vdd VssVdd Vdd Vss Vss Vss

Figure 2.11: A voltage-oriented truth table for NOT, NAND and NOR logic gates.

An aside: naming conventions for voltage levels.

In a CMOS-based design strategy, we normally refer to the power rails as Vdd and Vss. The ‘d’ stands for drain:Vdd could be read as “voltage level at the drain” st. it also makes sense to have Vss read as “voltage level at thesource”. This naming convention seems to stems from earlier bipolar-based transistors, where Vcc and Vee aresort of the same thing but for collector and emitter terminals.

This all starts to become a little involved however, and beyond the scope of what we want to discuss.All we really care about is that Vdd and Vss make our transistors work correctly, and we can tell them apart.Although it might be too informal for some tastes, it is therefore enough to keep the following in mind:

• Vdd is the high or positive voltage level, e.g., 3.3V or 5V, and

• Vss is the low or negative voltage level, e.g., 0V ' GND.

Note that GND refers to ground: this can be thought of as a) a reference point other voltages are measuredrelative to (note that voltage is a synonym for potential difference, meaning we need such a reference), or b) a(or the) return path, i.e., the point to which electrons will move due to their preference to move from high tolow potential difference.

2. connecting x to Vdd means the top P-MOSFET will be disconnected, the bottom N-MOSFET will beconnected, so r will be connected to Vss.

Note that even with this simple organisation, we can identify the pull-up and pull-down networks; althoughthere is just one transistor in each, it is true that the P-MOSFET connects Vdd to the output iff. x = Vss and theN-MOSFET connects Vss to the output iff. x = Vdd. We can of course consider more complex organisationsunder the same design strategy, by increasing the number of transistors.

Example 2.6. Consider Figure 2.10b, where

1. connecting both x and y to Vss means both top P-MOSFETs will be connected, both bottom N-MOSFETSwill be disconnected, so r will be connected to Vdd,

2. connecting x to Vdd and y to Vss means the right-most P-MOSFET will be connected, the upper-mostN-MOSFET will be disconnected, so r will be connected to Vdd,

3. connecting x to Vss and y to Vdd means the left-most P-MOSFET will be connected, the lower-mostN-MOSFET will be disconnected, so r will be connected to Vdd, while

4. connecting both x and y to Vdd means both top P-MOSFETs will be disconnected, both bottom N-MOSFETSwill be connected, so r will be connected to Vss.

Example 2.7. Consider Figure 2.10c, where

1. connecting both x and y to Vss means both top P-MOSFETs will be connected, both bottom N-MOSFETSwill be disconnected, so r will be connected to Vdd,

2. connecting x to Vdd and y to Vss means the upper-most P-MOSFET will be disconnected, the left-mostN-MOSFET will be connected, so r will be connected to Vss,

3. connecting x to Vss and y to Vdd means the lower-most P-MOSFET will be disconnected, the right-mostN-MOSFET will be connected, so r will be connected to Vss, while

4. connecting both x and y to Vdd means both top P-MOSFETs will be disconnected, both bottom N-MOSFETSwill be connected, so r will be connected to Vss.

git # ba293a0e @ 2019-11-14 72



A second aspect of the design strategy is made evident by increasing the number of transistors. Specifically, thetwo examples include P-MOSFETs organised in parallel (st. either can be activated to connect Vdd to the output)and N-MOSFETs organised in series (st. both must be activated to connect Vss to the output), or vice versa.

Hopefully it is obvious that the three examples model the NOT, NAND (or NOT AND) and NOR (or NOTOR) Boolean operators respectively; this fact is renforced by Table 2.11. Either way, the fact is that from astarting point involving atomic-level concepts we have developed components that we can reason about wrt.both theory and practice. That is, we have used electrical switches to implement Boolean algebra; instead ofreasoning about computation involving the latter in theory, we can now actually build components that do thatcomputation in practice.

2.1.2.5 Some common terminology in CMOS-based logic design

Definition 2.15. The process used to manufacture organisations of transistors, plus their associated properties andconstraints, is a logic style (or logic family): examples include CMOS and TTL.

Definition 2.16. A given logic style will suggest an associated standard cell, i.e., an organisation of transistors thatrealises a higher-level building block, namely either a a) computational component (e.g., a Boolean AND operator), or b)storage component (e.g., a latch); where the former is more naturally described as a logic gate. Each such cell will haveassociated functional specification (i.e., a truth table or excitation table), and behavioural specification (e.g., detailingpropagation delay).

Definition 2.17. A standard cell library is a collection of standard cells, used as building-blocks in a design.

Definition 2.18. The standard cell methodology permits design abstraction, in the sense a design can be specified ata high- vs. low-level (i.e., in terms of standard cells, vs. transistors).

Definition 2.19. A Gate Equivalent (GE) is a unit of measurement used to assess the (area) complexity of a digitallogic design independently from the manufacturing process technology. It is common (e.g., for CMOS) to consider a2-input NAND gate as 1 GE: you can think about it as a normalisation factor for manufacturing processes, st. designsspecified using different processes can be compared fairly.

2.2 Combinatorial logic

2.2.1 A suite of simplified logic gates

It should already be clear that designing functionality, even as simple as single Boolean operators, is hard atthe transistor-level: transistors are too low a level of abstraction, st. the amount of detail is prohibative at alarger scale or higher level. To address this problem, we usually adopt a more abstract view of logic gates bytaking two steps: we 1) forget about the voltage levels Vss and Vdd, abstractly labelling them 0 and 1, then 2)forget about the power rails, and just draw a symbol to represent each gate (with suitable inputs and outputs).

Figure 2.12 highlights several different notations for the resulting logic gates, including each of the NOT,NAND and NOR gates from above and also AND, OR and XOR from Chapter 1; corresponding truth tablesare shown in Figure 2.13. Keep the following in mind:

• We are assuming the voltage levels used to represent values on each wire are perfect in some sense. Inshort, we assume the associated signals have a “square” waveform and so are digital signals (i.e., onlyever have a value of 0 or 1). In reality this can be dubious, because physical phenomena that underpinthose voltage levels mean the edges of said signals might be “rounded” and so imperfect (e.g., have avalue of 0.5 say); we basically ignore this issue, at least until later.

• An inversion bubble on the output of a gates is used to denote that fact that the output is inverted. Assuch, a buffer (or BUF) is simply a gate that connects the input directly to the output; a NOT gate is thena buffer that inverts the input to form the output.

• For completeness we have included the NXOR (sometimes written XNOR) gate, which has the obviousmeaning but is seldom used in practise; per Chapter 1, we use ∧ , ∨ and ⊕ as a short-hand to denoteNAND, NOR and NXOR respectively. Clearly, for example, we have

x ∧ y ≡ ¬(x ∧ y).

• Given 2-input gates such as AND, OR, and XOR, we use a short-hand and draw the gates with moreinputs; this is equivalent to making a tree of 2-input gates since, for example, we have

(w ∧ x ∧ y ∧ z) ≡ (w ∧ x) ∧ (y ∧ z).

git # ba293a0e @ 2019-11-14 73



r is x ≡ r = x ≡ x r

r is NOT x ≡ r = ¬x ≡ x r

r is x NAND y ≡ r = x ∧ y ≡ xy r

r is x NOR y ≡ r = x ∨ y ≡ xy r

r is x AND y ≡ r = x ∧ y ≡ xy r

r is x OR y ≡ r = x ∨ y ≡ xy r

r is x XOR y ≡ r = x ⊕ y ≡ xy r

Figure 2.12: Representation of standard logic gates in English, Boolean algebra, C and symbolic notations.

BUFx r0 01 1

(a) A 1-input, 1-output buffer.

NOTx r0 11 0

(b) A 1-input, 1-output NOT gate.

ANDx y r0 0 00 1 01 0 01 1 1

(c) A 2-input, 1-output AND gate.

NANDx y r0 0 10 1 11 0 11 1 0

(d) A 2-input, 1-output NAND gate.

ORx y r0 0 00 1 11 0 11 1 1

(e) A 2-input, 1-output OR gate.

NORx y r0 0 10 1 01 0 01 1 0

(f) A 2-input, 1-output NOR gate.

XORx y r0 0 00 1 11 0 11 1 0

(g) A 2-input, 1-output XOR gate.

NXORx y r0 0 10 1 01 0 01 1 1

(h) A 2-input, 1-output NXOR gate.

Figure 2.13: Truth tables for standard logic gates.

git # ba293a0e @ 2019-11-14 74



x r ≡ x r ≡ x r

xy r ≡ xy r ≡

x

yr

xy r ≡

x

yr ≡ xy r

Figure 2.14: Identities for standard logic gates in terms of NAND and NOR.

Now, by treating the gates as operators per Boolean algebra we can combine them together and designcomponents that fall into a category often termed combinatorial logic; the gate behaviours combine to computea result continuously, with their output updated whenever an input changes.

2.2.2 Harnessing the universality of NAND and NOR

Following from the above, (at least) two questions should be immediately apparent:

1. Chapter 1 suggests NOT, AND, and OR are the operators to focus on, so why design NAND and NORfrom transistors? and

2. given the design of NAND and NOR from transistors was an involved, detailed process, is there a wayto avoid repeating this for AND and OR?

The answer to both questions stems from the functional completeness off NAND and NOR: they are universal,in the sense we can implement every other logic gate using one or other of them alone (as already discussed inChapter 1). The identities

¬x ≡ x ∧ xx ∧ y ≡ (x ∧ y) ∧ (x ∧ y)x ∨ y ≡ ¬x ∧ ¬y ≡ (x ∧ x) ∧ (y ∧ y)

and

¬x ≡ x ∨ xx ∧ y ≡ ¬x ∨ ¬y ≡ (x ∨ x) ∨ (y ∨ y)x ∨ y ≡ (x ∨ y) ∨ (x ∨ y)

replicated diagrammatically in Figure 2.14, demonstrate why; one can easily verify them via enumeration, e.g.,in

x y x ∧ y x ∧ x y ∧ y (x ∧ y) ∧ (x ∧ y) (x ∧ x) ∧ (y ∧ y)0 0 1 1 1 0 00 1 1 1 0 0 11 0 1 0 1 0 11 1 0 0 0 1 1

and

x y x ∨ y x ∨ x y ∨ y (x ∨ x) ∨ (y ∨ y) (x ∨ y) ∨ (x ∨ y)0 0 1 1 1 0 00 1 0 1 0 0 11 0 0 0 1 0 11 1 0 0 0 1 1

This is enormously important: it explains why designing NAND and NOR from transistors made sense in thefirst place, but, more over, it allows us to implement any Boolean expression, and so any Boolean function, fromNAND and NOR gates alone. The manufacture of such implementations, which we cover in Section 2.5, willbe vastly easier as a result. At the transistor-level, we only need deal with some (large) number of one buildingblock (i.e., NAND or NOR) vs. the added complexity and effort associated with many such building blocks (i.e.,AND, OR, XOR, and so on): everything at a low level is expressed in terms of NAND or NOR, so implementedby exactly the organisations of N- and P-MOSFETs we have already seen.

git # ba293a0e @ 2019-11-14 75



An aside: why NAND not AND?!

Arguments based on universality of NAND and NOR motivate a preference for these building blocks bya preference for minimalism: using a single building block to implement every other component will offermanufacturing advantages, for example, vs. a more diverse set.

That said, it is reasonable to question what other motivations exist. Put another way, what would happenif we wanted an AND design in similar, transistor-based terms? A common starting point for such questions isthe following

x

y

Vdd

Vss

r

in which we only have a pull-up network. The reasoning is often that if x = Vdd and y = Vdd then r = Vdd asrequired, whereas if x = Vss or y = Vss then r is disconnected; in defining what 0 and 1 mean, if we just definedisconnected as 0 then maybe this design is valid? A counterargument (among many) is to think about whathappens if we use r elsewhere as an input, e.g.,

x

y

z

Vdd

Vss

r

Now, if r is disconnected, the top-most transistor in the second layer simply does not work: the gate terminalis disconnected from either Vdd or Vss so the transistor cannot function.

It turns out there is a solution to this sort of issue, which is to opt for a pull-down resistor rather thannetwork of transistors, i.e., something like

x

y

Vdd

Vss

r

which you could think of as providing a “default” value to any disconnected wire. The problem is, now wehave to reason about and manufacture another component (i.e., the resistor): both of these are out of scope, so,at least here, this approach is not viable.

git # ba293a0e @ 2019-11-14 76



2.2.3 Designing circuits for arbitrary combinatorial functions

Now we have logic gates that act as physical implementations of each Boolean operator, the next challenge ishow to produce Boolean expressions for some (arbitrary) Boolean function. Put another way, the challenge is totake a specification of a function f , e.g., a truth table, and derive a Boolean expression e which computes it.

Chapter 1 has provides a complete enough background that we can attempt to address this challenge in amechanical, algorithmic manner; doing so contrasts with deriving or manipulating expressions by hand usingthe Boolean axioms. Several viable approaches and thus algorithms exist, which we investigate in the followingSections: each has advantages and disadvantages, and can be described as taking the description of f as input,and producing e in SoP form as output.

2.2.3.1 Some design patterns

Before dealing with arbitrary Boolean functions, it is useful to start with some specific examples that can besolved by using a design pattern (or template): although they may or may not apply to a particular problem,whenever they do apply they represent a pre-designed solution we can use as is without further effort.

We use a specific example to introduce each design pattern below: in each case, a 2-input, 1-bit AND gate isused to solve some sort of problem. It is crucial to remember that the example illustrates a more general pattern:we will see cases where this is true later.

1. If, within some larger design, we use an AND gate to compute

r = x ∧ y

and then, somewhere else, computer′ = x ∧ y

we can replace the two AND gates with one: it is obvious that r = r′ = x ∧ y, so the output of a singleAND gate can be shared between the two usage points. This simplification is possible, but harder tocapture within a single Boolean expression: using

r = (w ∧ x ∧ y) ∨ (x ∧ y ∧ z)

as an example, it is usual to first define some intermediate, say

t = x ∧ y

then rewrite the expression asr = (w ∧ t) ∨ (t ∧ z).

Doing so acts as a direct analogue to sub-expression elimination, an optimisation commonly applied byC compilers to expressions in C programs.

2. A 2-input, m-bit AND gate can be realised using isolated replication of 2-input, 1-bit AND gates. Thatis, If x and y are m-bit values then

r = x ∧ y

is computed viari = xi ∧ yi

for 0 ≤ i < m, i.e., m separate 2-input, 1-bit gates, each i-th instance of which uses xi and yi to produce theoutput ri.

3. An n-input, 1-bit AND gate can be realised using cascaded replication of 2-input, 1-bit AND gates. Thatis,

r =

n−1∧i=0

xi

for n = 4 is the same asr = (x0 ∧ x1) ∧ (x2 ∧ x3).

This expression forms a tree of AND gates, which, in this case is balanced; it is more attractive thanequivalents such as

r = x0 ∧ (x1 ∧ (x2 ∧ x3))

because although they use the same number of gates, the critical path of the former is shorter (i.e.,representing 2 rather than 3 such gates).

git # ba293a0e @ 2019-11-14 77



2.2.3.2 Mechanical derivation method #1

Imagine we are tasked with deriving a Boolean expression that implements some Boolean function f . Thefunction has n inputs I0,I1, . . . ,In−1, and one output O; we are given a truth table that describes it. The ideais to follow a (fairly) simple algorithm:

1. Find a set T such that i ∈ T iff. O = 1 in the i-th row of the truth table.

2. For each i ∈ T, form a term ti by AND’ing together all the variables while following two rules:

(a) if I j = 1 in the i-th row, then we useI j

as is, but(b) if I j = 0 in the i-th row, then we use

¬I j.

3. An expression implementing the function is then formed by OR’ing together all the terms, i.e.,

e =∨i∈T

ti,

which is in SoP form.

Intuitively, each i ∈ T will produce a minterm ti in the SoP form: each term ti ANDs inputs together (to formtheir product), whereas e ANDs together the terms (to form their sum). Each minterm fully specifies an inputassignment (i.e., a value for each input) for a row of the truth table where the output is 1; in a sense, we are“covering” (or dealing with) each such row by doing so.

Example 2.8. Consider the task of implementing an expression for XOR, i.e., an e in SoP form which implementsf (x, y) = x ⊕ y, a truth table for which is reproduced (cf. Figure 2.13) here for clarity:

XORx y r0 0 00 1 11 0 11 1 0

1. Looking at the truth table, it is clear there are

• n = 2 inputs that we denote I0 = x and I1 = y, and• one output that we denote O = r.

Likewise, it is clear that T = {1, 2} because O = 1 in rows 1 and 2, whereas O = 0 in rows 0 and 3.

2. Each term ti for i ∈ T = {1, 2} is formed as follows:

• For i = 1, we find

– I0 = x = 0 and so we use ¬x,– I1 = y = 1 and so we use y

and hence form the term t1 = ¬x ∧ y.• For i = 2, we find

– I0 = x = 1 and so we use x,– I1 = y = 0 and so we use ¬y

and hence form the term t2 = x ∧ ¬y.

3. The expression implementing the function is therefore

e =∨i∈T

ti

=∨

i∈{1,2}ti

= (¬x ∧ y) ∨ (x ∧ ¬y)

which is in SoP form.

git # ba293a0e @ 2019-11-14 78



For example, notice that the row for i = 1 produces the minterm t1 = ¬x∧ y meaning “the row where x = 0 andy = 1”, whereas the row for i = 2 produces the minterm t2 = x ∧ ¬y meaning “the row where x = 1 and y = 0”;combining the minterms together, we get an SoP expression that specifies rows where the output should be 1as “either x = 0 and y = 1, or x = 1 and y = 0”.

2.2.3.3 Mechanical derivation method #2: Karnaugh maps

Example 2.9. Consider the truth table in Figure 2.15a which describes a 4-input Boolean function, and the SoPexpression

r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )

resulting from application of the method above.

Although it only becomes apparent when you do so, deriving such an expression is tedious and error prone;although the algorithm is simple, it could be described as machine-friendly (in the sense it is best executedby a computer). The complexity of the expression, in the sense it contains many operators, is more obvious.Although we could simplify it by applying Boolean axioms, for example, this is again quite tedious. It isobvious to ask, therefore, whether (and if so, how) we can improve the original method wrt. these problems?

The Karnaugh map invented in 1953 by Maurice Karnaugh while working at Bell Labs [2], is an alternativemethod which offers (at least) two advantages over the original: it a) offers a more visual and so arguablyhuman-friendly way to derive the resulting expression, and b) automatically applies various optimisationswhile doing so, st. we no longer need to apply (as much) post-derivation optimisation by hand. Although anexample more usefully illustrates how to use a Karnaugh map, and so the advantages above, the method itselfis best summarised using another algorithm. Again imagine we are tasked with deriving a Boolean expressionthat implements some Boolean function f with n inputs and one output:

1. Draw a rectangular (p × q)-element grid, st.

(a) p ≡ q ≡ 0 (mod 2), and

(b) p · q = 2n

and each row and column represents one input combination; order rows and columns according to aGray code.

2. Fill the grid elements with the output corresponding to inputs for that row and column.

3. Cover rectangular groups of adjacent 1 elements which are of total size 2m for some m; groups can “wraparound” edges of the grid and overlap.

4. Translate each group into one term of an SoP form Boolean expression e where

(a) bigger groups, and

(b) less groups

mean a simpler expression.

Based on this description, the underlying reason it delivers the claimed (or in fact any) advantages is far fromintuitive. However, there is a way to explain it: the central observation is that if we find two minterms st.their input assignment differs in exactly one input, we can simplify the resulting expression by eliminating thatinput. If you (re-)consider Figure 2.15a, the minterms associated with

(w, x, y, z) = (1, 0, 1, 0)

and(w, x, y, z) = (1, 0, 1, 1),

git # ba293a0e @ 2019-11-14 79



An aside: binary versus Gray code.

Consider a sequence of unsigned, n-bit integers; selecting n = 4, for example, and starting from zero, such asequence would be

〈0, 0, 0, 0〉 7→ 0(10)〈1, 0, 0, 0〉 7→ 1(10)〈0, 1, 0, 0〉 7→ 2(10)〈1, 1, 0, 0〉 7→ 3(10)〈0, 0, 1, 0〉 7→ 4(10)〈1, 0, 1, 0〉 7→ 5(10)〈0, 1, 1, 0〉 7→ 6(10)〈1, 1, 1, 0〉 7→ 7(10)

...

where the RHS describes a (decimal) value, and the LHS describes the (binary) representation of that value.Notice that moving from 〈1, 1, 0, 0〉 to the next entry 〈0, 0, 1, 0〉 means changing 3 bits: the 0-th and 1-st bitstoggle from 1 to 0, and the 2-nd bit from 0 to 1. Now consider an alternative ordering of the same integers:

〈0, 0, 0, 0〉 7→ 0(10)〈1, 0, 0, 0〉 7→ 1(10)〈1, 1, 0, 0〉 7→ 3(10)〈0, 1, 0, 0〉 7→ 2(10)〈0, 1, 1, 0〉 7→ 6(10)〈0, 0, 1, 0〉 7→ 4(10)〈1, 0, 1, 0〉 7→ 5(10)〈1, 1, 1, 0〉 7→ 7(10)

...

Now, moving from any entry to the next or the previous one will always toggle one bit: such an ordering istermed a Gray code after Frank Gray who made reference to it in a 1953 patent application (such orderingshad been known and used for quite some time before that). Crucially,

1. we can produce an ordering that satisfies the same property for any n, and

2. the alternative ordering is just a permutation of the original: we keep the same values (and the samerepresentations), but just rearrange them within the sequence.

git # ba293a0e @ 2019-11-14 80



w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

(a) A 4-input example.

x y z r0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 01 0 1 ?1 1 0 11 1 1 ?

(b) A 3-input example.

Figure 2.15: 4- and 3-input example Boolean functions respectively.

i.e., rows of the truth table for i = 10 and i = 11, satisfy exactly this condition: they differ wrt. z, which is 0in the first case and 1 in the second case. In the original method, we would implement them using the twominterms

w ∧ ¬x ∧ y ∧ ¬z

andw ∧ ¬x ∧ y ∧ z.

However, this is overly pessimistic and so sub-optimal: the value of z is irrelevant provided w = 1, x = 0, andy = 1, because the output is 1 either way. As such, we eliminate z and use the LHS of

w ∧ ¬x ∧ y ≡ (w ∧ ¬x ∧ y ∧ ¬z) ∨ (w ∧ ¬x ∧ y ∧ z)

which is equivalent to the RHS and therefore cover both cases via a single, simpler expression.

Example 2.10. The best way to illustrate this in practice is to fully examine the the truth table in Figure 2.15a:

1. Essentially, the first two steps just translate information from the truth table into the map (or grid); keepin mind that we are representing the same information, i.e., the specification of f , in both cases.

Since f has n = 4 inputs, the associated truth table has 24 = 16 rows; by selecting p = q = 4, we can drawthe following square grid with enough elements to capture those rows

y

w

z

x

Correctly interpreting the grid layout is crucial, since we need to translate rows of the truth table into thecorrect elements. Note that w and x relate to the columns (or horizontal axis), whereas y and z relate tothe rows (or vertical axis). The left-most column, for example, relates to cases where w and x both havethe values 0, i.e., where (w, x) = (0, 0); reading that column top-to-bottom, the rows within it relates tocases where (y, z) = (0, 0), (0, 1), (1, 1) and (1, 0). The other columns, read from left-to-right, are similar fory and z, but for the remaining cases where (w, x) = (0, 1), (1, 1) and (1, 0). As such, we can now fill eachelement in the grid with an output listed in the corresponding truth table row to get

git # ba293a0e @ 2019-11-14 81



1 11 1

1000

1 00 0

1011y

w

z

x

Bars above and to the left of this grid denote cases where the associated input is 1: the 1-st and 2-nd (ormiddle) columns are where x = 1, for example, whereas the 0-th and 3-rd (or outer) columns are wherex = 0. Elsewhere you might also see numbers to the left of each row, or above each column to makethe values more explicit: they might show (0, 0) and (1, 0) (or just 00(2) and 01(2)) for the 1-st and 2-nd(or middle) columns, and (0, 1) and (1, 1) (or just 10(2) and 11(2)) for the 0-th and 3-rd (or outer) columns.Either way, the ordering might, reasonably, seem odd: note that in row- and column-wise directions, aGray code is used. From top-to-bottom, elements in a column are for (y, z) = (0, 0), (0, 1), (1, 1) and (1, 0),not (y, z) = (0, 0), (0, 1), (1, 0) and (1, 1) which might seem more natural. The reason for this choice will bemade apparent later, but, for now, keep in mind that it is what allows the Karnaugh map to deliver theadvantages outlined above.

2. The next step is to cover 1 elements in the grid. In a sense, this is analogous to what we did in the originalmethod when we identified each row in the truth table where the output was 1: there we would have agroup for each 1 element, but here we can form larger groups and cover multiple 1 elements.

The rules state we can form rectangular, potentially overlapping groups whose size is a power-of-two(i.e., 2m for some m): provided we follow them, each group formed will represent a term we then need toimplement as part of the SoP expression. The larger the group, the fewer inputs we be included in eachof the terms; the fewer groups, the fewer terms there are. An example grouping in this case is as follows:

��

��

�� 1 11 1

1000

1 00 0

1011y

w

z

x

Here we have four groups:

• a group of four elements in the top left-hand corner spanning the 0-th and 1-st rows and columns,

• a group of one element in the top right-hand corner,

• a group of two elements in the 2-nd row spanning the 2-nd and 3-rd columns, and

• a group of two elements which wrap around the bottom-left and bottom-right corners.

3. Finally, we need to translate each group into a term in the SoP expression. As an example, consider thefirst group (i.e., of four elements in the top left-hand corner) and the values each input is assigned withinit. It should become clear that the value of x is irrelevant provided that w = 0. Put another way, fixingw = 0 means we include the two left-most columns only (excluding the two right-most columns becausethey relate to cases where w = 1). In the same way, the value of z is irrelevant provided that y = 0.

By specifying values for each relevant input and ignoring the irrelevant inputs, we can implement thisterm as

¬w ∧ ¬y

to cover all four cells in that group; we are specifying “the columns where w = 0 and rows where y = 0”,which restricts us precisely to elements within the group. By applying similar reasoning to the otherthree groups, we find that

r = ( ¬w ∧ ¬y ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( w ∧ y ∧ z )

which is equivalent to but clearly simpler than the result we derived originally: there are a) fewer terms,and b) each term is the combination of fewer inputs.

git # ba293a0e @ 2019-11-14 82



Example 2.11. The result above is simpler than the original, but it turns out we can do better still by morecareful formation of the groups. More specifically, we could consider the following alternative

��

��

1 11 1

1000

1 00 0

1011y

w

z

x

where there are now three groups, namely

• a group of four elements in the top left-hand corner spanning the 0-th and 1-st rows and columns,

• a group of two elements in the 2-nd row spanning the 2-nd and 3-rd columns, and

• a group of four elements which wrap around the top-left, bottom-left, top-right, and bottom-right corners.

The end result is a simpler expression, including one less term:

r = ( ¬w ∧ ¬y ) ∨

( ¬x ∧ ¬z ) ∨

( w ∧ y ∧ z ) .

Constructive use of don’t care entries An important feature or extension of truth tables, as defined so far, isthe potential to include so-called don’t care entries: rather than 0 nor 1, we use ? to denote we do not care whatthe value is (vs. we do not know what the the value is, for example). When used in the context of an output, itcan be rationalised by considering a component whose output simply does not matter given some combinationof inputs: maybe this input is invalid, so the output is never used due to the resulting error.

Example 2.12. Consider the truth tablex y r0 0 00 1 ?? 0 1

which describes some 2-input Boolean function, where don’t care entries are used in two roles:

1. On the LHS, wrt. the input x. In this case, the ? represents a short-hand, because by saying we don’t carewhat the value of x is we expand that one row into two: one for x = 0 and one for x = 1, which is likesaying “irrespective of x (so if x = 0 or x = 1), provided y = 0 then r = 1”.

2. On the RHS, wrt. the output r. In this case, the ? represents a choice, because by saying we don’t carewhat the value of r is we can select whatever suits us: it could be thought of like a “wildcard” of somesort.

This concept has various applications, but is immediately useful during the derivation of an expression from thespecification (including don’t care entries) of some function. In short, both the original method and Karnaughmap alternative can, at a high level, be described as covering 1 entries in the truth table (either individually, orin a group); in both cases, fewer 1 entries implies a simpler the SoP expression. As such, it makes sense to dealwith don’t care entries (in the output) in a way that helps: we are free to treat them as 0 or 1, so a) treating themas 0 means we do not need to cover them with a group, whereas b) treating them as 1 means we can potentiallyform larger groups.

Example 2.13. Consider the truth table in Figure 2.15b which describes a 3-input Boolean function and thushas 23 = 8 rows; selecting p = 2 and q = 4 yields the (empty) map

x

z

y

Consider the following two groupings:

git # ba293a0e @ 2019-11-14 83



�� 0 10 1

01??

x

z

y ��

��0 1

0 101??

x

z

y

The left-hand option treats the element associated with x = 1 and z = 1 in the 1-st row, 2-nd column as 0: assuch it is not covered by a group, and we are forced to form two rectangular groups as a result st. the resultingexpression is

r = (¬x ∧ y) ∨ (y ∧ ¬z).

In contrast, the right-hand option treats the element as a 1, meaning it can be included in a single, larger group.This produces the (much) simpler expression r = y.

Why Gray code?! In the example above, we informally cited the use of Gray code ordering for rows andcolumns in a Karnaugh map as important wrt. the advantages it then offers. The easiest way to see why this istrue, is via another example where we do not use this approach.

Example 2.14. Consider the truth tablew x y z r0 0 0 0 00 0 0 1 00 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 01 1 1 1 0

which describing some 4-input function f . By using a Gray code ordering, we translate it into the followingKarnaugh map

�� 0 10 0

0100

0 00 0

0000y

w

z

x

that allows formation of a single group that covers the two elements in the 0-th column; this group producesthe SoP expression

r = x ∧ ¬y ∧ ¬z,

noting that the value of w is irrelevant in this case (i.e., provided x = 0, y = 0 and z = 0, that alone is enough tocover the group). Now consider a similar Karnaugh map without a Gray code ordering

�� 0 10 0

0 10 0

0 00 0

0 00 0

y

w

z

z

x x

which is more like a Veitch diagram [10], a precursor to the Karnaugh map. Note, for example, that the 2-ndcolumn now represents cases where w = 1 and x = 0, and the 3-nd column now represents cases where w = 1and x = 1: the 2-nd and 3-rd columns are swapped versus the original Karnaugh map (and likewise for therows). The problem is, now we cannot make a single group that covers the same two elements: we now need

git # ba293a0e @ 2019-11-14 84



two groups, each covering one element. These groups obviously produce a more complicated SoP expression,namely

r = ( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z )

where we now include w even though we know it is not required; to get the same result as before, we wouldnow have to manipulate the expression by hand using suitable axiomatic steps.

This basically demonstrates that by using a Gray code ordering, where one bit will always toggle in the inputassignment when moving between rows and/or columns, we support precisely the observation outlined at thestart of the Section. Put another way, we wanted to identify input assignments that differed wrt. one input onlyso as to eliminate that input; by ensuring that two adjacent (including wrap-around) rows or columns satisfythis property, a group that spans them will naturally translate into a term that eliminates the single, differentinput that identifies them.

2.2.3.4 Mechanical derivation method #3: Quine-McCluskey

Although Karnaugh maps can represent functions with any number of inputs, they become unwieldy todraw and use for larger n; the reason for this scalability problem stems fundamentally from the emphasis ona human-friendly vs. machine-friendly algorithm. However, we can address the problem by investigatingQuine-McCluskey minimisation: this is a method developed independently by Willard Quine [7] and EdwardMcCluskey [4] in the mid 1950s. It is reasonable to think of Quine-McCluskey as offering the advantages of boththe previous methods: it a) can be automated easily, while also b) automatically applying various optimisations,and so avoids the need for (as much) post-derivation optimisation by hand. Unlike the previous methods wherewe could write a concise description of the algorithm, it is easiest to explain this one inline with an example.The following Sections do so by (re)considering the truth table in Figure 2.15a.

Step #1: extraction of prime implicants The first step is to produce a table, Table 2.16 for this example, thatwe extend step-by-step: we

1. initialise the 0-th section by extracting each minterm from the truth table (i.e., each input assignment st.the output is 1), then

2. process the i-th section to construct the (i + 1)-th section, iterating until no progress can be made.

Based on an input assignment represented as a tuple, in this case (z, y, x,w), we identify each minterm usingan integer: you can see the seven mintems extracted from Figure 2.15a at the top of Table 2.16. In the table,each entry (i.e., each row) is called an implicant; they are assigned a group based on the number of elements inassociated tuple that equal 1. Consider section 0 for example. Implicant 0 represented by (0, 0, 0, 0) (st. w = 0,x = 0, y = 0 and z = 0) is assigned group 0 because zero elements of the representation equal 1, In contrast,implicant 5 represented by (1, 0, 1, 0) (st. w = 0, x = 1, y = 0 and z = 1) and implicant 10 represented by (0, 1, 0, 1)(st. w = 0, x = 1, y = 0 and z = 1) are both assigned group 2 because two elements of their representationsequal 1.

Recall from our simplification using Karnaugh maps that we were able to apply a rule to implement bothminterms w ∧ ¬x ∧ y ∧ ¬z and w ∧ ¬x ∧ y ∧ z with a single, simpler expression w ∧ ¬x ∧ y because the valueof z is irrelevant. We use a similar approach here, using the i-th section to construct the (i + 1)-th section bycomparing members of the j-th and ( j + 1)-th groups in the former; our goal is to find pairs of implicants whoserepresentations differ in one element, and combine them together. We skip comparison of the j-th group andgroups other than the ( j+1)-th, because by definition they cannot satisfy the criterion. As an example, considerconstruction of the 1-st section from the 0-th section: we

• compare implicant 0 from group 0 with implicants 1, 2, 4 and 8 from group 1,

• compare implicants 1, 2, 4 and 8 from group 1 with implicants 5 and 10 from group 2,

• compare implicants 5 and 10 from group 2 with implicant 11 from group 3, and

• compare implicant 11 from group 3 with implicant 15 from group 4.

In the new section, we replace the differing element of paired implicants with ? to highlight the fact we don’tcare about that input: combining implicants 0 and 1 represented by the tuples (0, 0, 0, 0) and (1, 0, 0, 0), forexample, produces an implicant represented by (?, 0, 0, 0). Furthermore, each implicant from the i-th sectionwhich is used to form an implicant in the (i + 1)-th section is marked with aX next to it; implicants 0, 1, 2, 4 and8 are thus marked due to the comparison between groups 0 and 1 and their use in forming implicants 0 + 1,0 + 2, 0 + 4 and 0 + 8.

git # ba293a0e @ 2019-11-14 85



Section Group Implicant Used0 0 0 (0, 0, 0, 0) X

1 1 (1, 0, 0, 0) X2 (0, 1, 0, 0) X4 (0, 0, 1, 0) X8 (0, 0, 0, 1) X

2 5 (1, 0, 1, 0) X10 (0, 1, 0, 1) X

3 11 (1, 1, 0, 1) X4 15 (1, 1, 1, 1) X

1 0 0 + 1 (?, 0, 0, 0) X0 + 2 (0, ?, 0, 0) X0 + 4 (0, 0, ?, 0) X0 + 8 (0, 0, 0, ?) X

1 1 + 5 (1, 0, ?, 0) X4 + 5 (?, 0, 1, 0) X2 + 10 (0, 1, 0, ?) X8 + 10 (0, ?, 0, 1) X

2 10 + 11 (?, 1, 0, 1)3 11 + 15 (1, 1, ?, 1)

2 0 0 + 1 + 4 + 5 (?, 0, ?, 0)0 + 2 + 8 + 10 (0, ?, 0, ?)0 + 4 + 1 + 5 (?, 0, ?, 0) duplicate0 + 8 + 2 + 10 (0, ?, 0, ?) duplicate

Figure 2.16: Quine-McCluskey simplification, step #1: extraction of prime implicants.

0 1 2 4 8 5 10 11 150 + 1 + 4 + 5 X X X X0 + 2 + 8 + 10 X X X X10 + 11 X X11 + 15 X X

Figure 2.17: Quine-McCluskey simplification, step #2: covering the prime implicants table.

git # ba293a0e @ 2019-11-14 86



The process is iterated, constructing subsequent sections until we can no longer make progress, i.e., there areno implicants that can be combined. Table 2.16 includes three sections, noting that section 2 has no implicantsthat be combined and so is the last constructed. In addition, it illustrates the fact combination of implicantsin the i-th section may produce duplicates in the (i + 1)-th section: here, we can see (0, ?, 0, ?) and (?, 0, ?, 0) areduplicated. Whenever this occurs, we ignore the duplicates and omit them from further comparisons.

Step #2: covering the prime implicants table Any unmarked implicants are termed prime implicants: theseform the focus of a second step whose task is to produce the SoP expression. The content of Table 2.16 includesfour prime implicants, namely

0 + 1 + 4 + 5 7→ (?, 0, ?, 0)0 + 2 + 8 + 10 7→ (0, ?, 0, ?)10 + 11 7→ (?, 1, 0, 1)11 + 15 7→ (1, 1, ?, 1)

These are used to form a prime implicant table, as in Table 2.17: it lists the prime implicants along the left-handside and the original minterms along the top, and includes aX character in every elements where a given primeimplicant includes a given minterm.

The goal now is to select a combination of the prime implicants which covers all of the original minterms.For example, the implicant 0 + 1 + 4 + 5 covers the prime implicants 0, 1, 4 and 5; selecting this as well asimplicant 10 + 11 will cover 0, 1, 4, 5, 10 and 11. Before doing so, we can make our task easier by identifyingthe set of essential prime implicants, i.e., those which are the only cover for a given minterm. We can see theprime implicant 11 + 15 is such a case in Table 2.17, because it is the only way to cover minterm 15; as a result,we must include it in our expression.

The process for coverage is fairly intuitive: we start with essential prime implicants, and then draw a linethrough the associated row in the prime implicants table; when a line goes through a X, we also draw a linethrough that column. The resulting lines show which minterms are currently covered by prime implicants wehave selected for inclusion in our SoP expression. For our example we

• draw a line through the row for implicant 11 + 15, and hence through the columns for implicants 11 and15,

• draw a line through the row for implicant 0 + 1 + 4 + 5, and hence through the columns for implicants 0,1, 4 and 5, and finally

• draw a line through the row for implicant 0 + 2 + 8 + 10, and hence through the columns for implicants0, 2, 8 and 10.

The end result shows that by using prime implicants 0 + 1 + 4 + 5, 0 + 2 + 8 + 10, and 11 + 15, we can can coverall the original minterms; we need not include prime implicant 0 + 1 + 4 + 5 for example, since minterms 0, 1, 4and 5 are all covered elsewhere. Looking at the associated tuples, we have

0 + 1 + 4 + 5 7→ (0, ?, 0, ?)0 + 2 + 8 + 10 7→ (?, 0, ?, 0)11 + 15 7→ (1, ?, 1, 1)

Following the rule that for some input t

if t = 0 { use ¬tif t = 1 { use tif t = ? { ignore

we form a term for each prime implicant listed and thus implement the SoP expression as

r = ( ¬w ∧ ¬y ) ∨

( ¬x ∧ ¬z ) ∨

( w ∧ y ∧ z )

as per our original attempt using Karnaugh maps.

2.2.4 Physical properties of combinatorial logic

2.2.4.1 Delay: from static to dynamic (i.e., including time) evaluation

Definition 2.20. Within some combinatorial logic, two classes of delay (which is often described as propagation delay,with a hint toward delay of signals more generally) dictate the time between change to some input and correspondingchange (if any) in an output: these are

git # ba293a0e @ 2019-11-14 87



0 1 2 3 4 5

0

1

2

3

4

5

0 1 2 3 4 5

0

1

2

3

4

5

Input voltage

Out

putv

olta

gethreshold voltage

0

1

(a) Idealised, square switching activity.

0 1 2 3 4 5

0

1

2

3

4

5

0 1 2 3 4 5

0

1

2

3

4

5

Input voltage

Out

putv

olta

ge

threshold voltage

0

1

(b) Realistic, curved switching activity.

Figure 2.18: An illustration of idealised and realistic switching activity wrt. a MOSFET-based NOT gate.

r

x

y

t0

t1

t2

t3

(a) An annotated implementation of an XOR gate, usingNOT, AND and OR gates.

xy

t0

t1

t2

t3

r

0ns

10ns

20ns

30ns

40ns

50ns

(b) A waveform tracking intermediate results that occurwhen x is changed from 0 to 1.

Figure 2.19: A behavioural waveform demonstrating the effects of propagation delay on an XOR implementation.

rxt

Figure 2.20: A simple design, involving just a NOT and an AND gate, that exhibits glitch-like behaviour.

xy

m target gates

Figure 2.21: A contrived circuit illustrating the idea of fan-out, whereby one source gate may need to drive n target gates.

git # ba293a0e @ 2019-11-14 88



• wire delay, which relates to the time taken for current to move through the conductive wire from one point toanother, and

• gate delay, which relates to the time taken for transistors in each gate to switch between connected and unconnectedstates.

The latter is typically larger than the former, and both relate to the associated implementations: the latter relates toproperties of the transistors used, the former to properties of the wire (e.g., conductivity, length, and so on). x

Definition 2.21. The critical path through some combinatorial logic is the longest sequential sequence of delays (so wireand/or gate delays) between the inputs and outputs.

Although such wire and gate delays are typically very small, when many gates are placed in series or whenwires are very long, the delays add up; the problem of managing the result is multiplied as the complexity ofcombinatorial logic increases. The concept of wire delay is perhaps more intuitive than gate delay, so it makesense to expand a little on the latter; the example below attempts to explain the cause.

Example 2.15. Consider Figure 2.18, which includes an idealised (left-hand side, in Figure 2.18a) and (more)realistic (right-hand side, in Figure 2.18b) illustration of what happens when the input to a MOSFET-basedNOT gate, i.e., Figure 2.10a, switches.

The idea is to stress the fact that in the idealised case, there is an instantaneous change in the output voltage:the plot representing the output is square-edged, changing (or swinging) from 5V (i.e., 1) to 0V (i.e., 0) theinstant that the input voltage changes from 0V (i.e., 0) to 5V (i.e., 1), or, more precisely, when it reaches thethreshold voltage. Note that the illustration includes output voltage levels above 0V and below 5V that representthe threshold at which said output is interpreted as a 0 or 1, but since the change is instantaneous these areirrelevant.

In contrast, the realistic case suggests a non-instantaneous change in the output voltage, i.e., it takes sometime. The characteristics of the now curved plot relate to properties of the transistors. However, the importantthing to realise is that the input voltage will take some time to change between 0V (i.e., 0) and 5V (i.e., 1), sothere is some delay in the output voltage changing from 5V (i.e., 1) and 0V (i.e., 0); this also suggests there is a(short) period of time where the output voltage cannot be interpreted is either 0 or 1.

Although this property is often abstracted when illustrating the value of a wire in a waveform, meaningtransitions from 0 to 1, or vice versa, are square-edged, it can be captured with sloped-edges as shown in

x

y

Notice that x and y toggle between 0 and 1 in the same way, but transitions in the former (resp. latter) areinstantaneous (resp. take some time). Whether implicit or explicit, the gate delay property still exists, and hasan impact on evaluation of larger combinatorial designs:

Example 2.16. Consider Figure 2.19a, which shows the implementation of an XOR gate (using, so derived fromNOT, AND and OR gates). If we take a static approach to evaluating the output using the inputs, it is reasonablethat by setting x = 0 and y = 1 we get

x = = 0y = = 1t0 = ¬x = 1t1 = ¬y = 0t2 = t0 ∧ y = 1t3 = t1 ∧ x = 0r = t2 ∨ t3 = 1

However, this ignores the impact of delay on the evaluation process; if we take a dynamic approach and imaginethe delay of

1. a NOT gate is 10ns,

2. an AND gate is 20ns, and

3. an OR gate is 20ns,

this changes matters. Imagine we toggle the inputs from x = 0, y = 1 to x = 1, y = 1; immediately we introducetime, in the sense we have introduced previous values of x and y rather than just current values. An illustrationof the gate behaviour is given in Figure 2.19b, however simplistic. The waveform starts when the gate is in the

git # ba293a0e @ 2019-11-14 89



correct state given the inputs x = 0, y = 1, after which the inputs are toggled to x = 1, y = 1 (at 0ns). Noticethat the the result is not valid immediately. In particular, we can examine points in the waveform and showthat the final and intermediate results are actually incorrect. For example, it takes 10ns before either NOT gateproduces the correct output on t0 and t1; the result r remains incorrect until 50ns; gate delay has caused a gapbetween the inputs being toggled, and output being valid.

To conclude, it is important to stress the central role a critical path has: it is a limiting factor or bound on howquickly some combinatorial logic computes outputs, i.e., it dictates the associated latency. That may not seemimportant, but obviously we prefer an optimised design that has lower latency; this implies a design challenge,in that we almost always want to minimise the critical path.

Example 2.17. Following the example above, consider Figure 2.19a: this XOR design has a critical path thatgoes through a NOT gate, then an AND gate, and then an OR gate: the path has a total delay of 50ns. In a way,this formalises what we found above: it took 50ns to get the correct output r from inputs x and y. However,examining the critical path delivers this information with no evaluation; it basically tells us the design cannever compute outputs in less time, which of course might imply the system said design is placed in is furtherlimited as a result.

2.2.4.2 Glitches as a by-product of delay

Definition 2.22. A glitch is normally defined to describe a (momentary) change wrt. some wire, which may cause a(momentarily) invalid or incorrect output if used as an input to some gate; the cause is typically delay of some sort, e.g.,a mismatch in when two gate inputs become valid.

Example 2.18. Consider Figure 2.20, wherein the two AND gate inputs are forced to be valid at different timesdue to imbalanced delay: it clearly takes longer for the value of x to propagate through the NOT gate thandirectly along the wire. The net result is that if we toggle x = 0 to x = 1 then back again, we produce a shortglitch, i.e.,

xtr

0ns

10ns

20ns

30ns

40ns

50ns

60ns

matching the NOT gate delay.

2.2.4.3 On the sanity of buffer gates

Figure 2.13 included a so-called buffer gate, whose function can be described as r = x: no computation isperformed per se, because the output matches the input. As such, it is reasonable to question the purpose ofsuch a gate; we could eliminate it (or just replace it with a wire) and produce an equivalent result. It turns outthe buffer can be used in two somewhat subtle roles:

1. Although the functionality of a buffer is r = x, there is still some associated gate delay (roughly equivalentto a NOT gate); it can thus be used to equalise the delay through different paths in some combinatoriallogic, and thus help solve the glitch problem outlined above. Within Figure 2.19a, for example, one canimagine adding a buffer between y and the second input to the top AND gate; this would ensure that ¬xand the buffered version of y arrive at the inputs to said gate at the same time.

2. Recall that the output of each MOSFET-based gate was formed by conditionally connecting Vdd or Vss to r;the inputs, e.g., x and y, simply control which connection was made. This is important, because it impliesthat even if the inputs are in some way “weak” then the output will be amplified, so equal to the “strong”levels Vdd or Vss. A buffer can therefore be viewed a way to get r, an identical but amplified version of x.

Neither of these fact is particularly important within the remit of what we cover, but is is nonetheless importantto keep them in mind iff. you see buffer gates in designs elsewhere.

2.2.4.4 Fan-in and fan-out

The terms fan-in and fan-out refer to properties of logic gates associated with their inputs and outputs:

Definition 2.23. Consider a given logic gate:

git # ba293a0e @ 2019-11-14 90



• The term fan-in is used to describe the number of inputs to a given gate.

• The term fan-out is used to describe the number of inputs (so in a rough sense the number of other gates) theoutput of a given gate is connected to.

The former is easier to explain: it is just a way to formalise the fact that, wlog. a 2-input AND gate thatcomputes r = x ∧ y has fan-in of 2, whereas a 3-input AND gate that computes r = x ∧ y ∧ z has fan-in of 3.A gate with higher fan-in will typically switch more slowly than a gate with lower fan-in; this stems from thefact the larger number of inputs are processed using a more complex internal organisation of transistors.

The latter is still easy to explain, but harder to justify as important. The idea is that, ideally, we are freeto connect the output of a given source gate to the inputs of say m other target gates; in practice, however,there is a limit on m. It stems from increased load on the source gate, and so longer propagation delay: itbasically takes longer for the driving voltage to meet the required threshold. In addition, a transistor is limitedwrt. the current driven through it before it will malfunction in some way; if the fan-out requires this to beexceeded, then the under-supplied source gate will fail somehow. So, in a sense, fan-out is an intrinsic versusextrinsic implication of propagation delay (where the latter simply delays computation in some sense, theformer disrupts it). For example, consider the contrived design in Figure 2.21: the source AND gate on theleft-hand side is used to drive m other target AND gates to the right. Unless the source gate drives enoughcurrent onto its output, it may malfunction because the target gates will not receive enough of a share to operatecorrectly. The implementation of each gate will be rated wrt. fan-out, which essentially say how many is toomany, i.e., the the number of target gates which can be safely connected to a source gate; CMOS-based gateshave quite a high fan-out rating, perhaps 100 target gates or more can be connected to a single source.

2.2.4.5 3-state logic

In a sense, fan-out constrains m, the number of target gates we might connect to n = 1 source gate. But whatabout n, and, in particular, what happens if we drive any number of target gates with the output from n = 0source gates (i.e., none), or n > 1 source gates (e.g., by two rather than one)?

Suspend disbelieve for a moment and assume these cases could be of use in some way; hopefully it is obviousthat neither is likely to yield the outcome we want, or indeed can reason precisely about. In the first case, theinput is neither 0 or 1 so it is unclear what the output will be. Perhaps the only caveat to this is where oneinput along can dictate the output; reconsider Figure 2.10b for example, which implements a NAND gate andso computes r = x ∧ y. The truth table for NAND suggests if y = 0 the r = 1 irrespective of x: this reasoning isvalidated by the implementation, since if y = 0 one P-MOSFET will always connect Vdd to r irrespective of theother. This aside, however, so in general, if an input is not a Boolean value then it remains unlikely we get theBoolean-like behaviour intended. In the same way, in the second case we basically join n outputs together: thisis more dangerous, because both drive current along the wire. The outcome depends on a number of factors,but is, again, normally not a positive one wrt. the behaviour we want.

We can mitigate this issue by extending the idea of 2-state, Boolean logic values into 3-state logic. Thereare two main ideas:

1. We introduce a new logic value, hence the name 3-state, called Z or high impedance; the easiest way tothink about this value is as representing a null, or disconnected value that can be safely “overpowered”by any other value (i.e., 0 or 1).

2. We introduce a new logic gate, a so-called enable gate, which is essentially just a switch implementedusing a single transistor, i.e.,

x r

en

The associated truth table accommodates the high impedance value as follows:

git # ba293a0e @ 2019-11-14 91



Enablex en r0 0 Z1 0 ZZ 0 Z0 1 01 1 1Z 1 Z0 Z Z1 Z ZZ Z Z

In combination, these steps allow us to cope with both cases above. The first case is now less of an issue: westill might not get the behaviour we wanted, but at least we can reason about it. In the second case, we can usethe enable gate to allow conditional access to a shared wire: if en = 0 the output is Z so not driven, meaninganother driver could be safely connected to and use the same wire. However, when en = 1 the output is x;nothing else should be driving a value along this wire or we are back to the situation which caused the originalproblem.

2.2.4.6 Stable, unstable, and meta-stable states

Definition 2.24. Consider a component with a given output: the output (or component) can be said to be in

• a stable state if the output is predictable, i.e., either be 0 or 1, whereas

• an unstable state if the output is unpredictable, e.g., either be 0, 1, a voltage level between the threshold for either,or oscilate between the two somehow.

Definition 2.25. A meta-stable state is an unstable state, which, after some period of time, will resolve to some stablestate: the output eventually settles to either a 0 or 1 (i.e., become stable), but we cannot predict which or when.

Instances of instability typically stem from some form of logical inconsistency in a design, and, in the case ofmeta-stability, are only ever resolved due to physical characteristics of the implementation (e.g., strength oftransistors).

Example 2.19. Consider the following example

x r

which could be captured using the (logically inconsistent) expression x = r = ¬x. Clearly there is a problem,because of x = 0 it should be 1 due to the NOT gate, and if x = 1 it should be 0; as a result, the output r will beunstable and oscillate somehow (potentially at a rate that is related to the gate delay involved).

2.2.5 Building block components

We have already seen it is convenient to design combinatorial logic using logic gates rather than transistors;in short, this allows a higher level of abstraction. In the same way, it may be convenient to design larger,more complex combinatorial logic components using smaller, less complex combinatorial logic components.The latter are, in a sense, just standard building blocks that are useful when designing the former. Whereappropriate, they allow us to decompose a larger component into smaller components; this is often attractive,in that designing the larger component within one, monolithic task is often a lot more difficult.

Without a context, it is easy to look at the building blocks we cover in the following and deem them odd oreven useless. Keep in mind that each one is covered specifically because it is useful; think of them as a way topractice the techniques developed so far, and believe we will make use of them later (e.g., in Chapter 4).

git # ba293a0e @ 2019-11-14 92



c

x

yr

(a) The multiplexeras a symbol.

MUX2c x y r0 0 ? 00 1 ? 11 ? 0 01 ? 1 1

(b) The multiplexer as a truth table. c

r

x

y

(c) The multiplexer as a circuit.

c

r0

r1x

(d) The demulti-plexer as a symbol.

DEMUX2c x r1 r0

0 0 ? 00 1 ? 11 0 0 ?1 1 1 ?

(e) The demultiplexer as a truth table. c

r0

r1

x

(f) The demultiplexer as a circuit.

Figure 2.22: An overview of a 2-input (resp. 2-output), 1-bit multiplexer (resp. demultiplexer) cells.

2.2.5.1 Components for choosing between options

The idea of choice is crucial in constructing larger components: often we want to control the component, forexample making it operate differently depending on some input. The idea is that

1. a multiplexer

• has m inputs,

• has 1 output,

• uses a (dlog2(m)e)-bit control signal input to choose which input is connected to the output,

while

2. a demultiplexer

• has 1 input,

• has m outputs,

• uses a (dlog2(m)e)-bit control signal input to choose which output is connected to the input,

noting that each the input and output is n-bit. We can describe how the components behave using C as ananalogy. For example, ignoring the number of bits in each input, output and control signal, the statement

switch( c ) {case 0 : r = w; break;case 1 : r = x; break;case 2 : r = y; break;case 3 : r = z; break;

}

acts similarly to a 4-input multiplexer: depending on the control signal c, one of the inputs (i.e., w, x, y, or z) isassigned to the output (i.e., r). Likewise,

switch( c ) {case 0 : r0 = x; break;case 1 : r1 = x; break;case 2 : r2 = x; break;case 3 : r3 = x; break;

}

git # ba293a0e @ 2019-11-14 93



c

x

yr

c

x

yr

c

x

yr

c

x

yr

r0x0y0

c

r1x1y1

c

r2x2y2

c

r3x3y3

c

(a) A 2-input, 4-bit multiplexer.

c

x

yr

c

x

yr

c

x

yr

c0

wx

c1

r

c0

yz

(b) A 4-input, 1-bit multiplexer.

Figure 2.23: Application of the isolated and cascaded replication design patterns.

acts similarly to a 4-output demultiplexer: depending on the control signal c, one of the outputs (i.e., r0,r1, r2, or r3) is assigned from the input (i.e., x). Although attractive, using such an analogy needs care. Inparticular, keep in mind the C fragments include an implicit, discrete order wrt. the assignments. In contrast,the component design means an analogous connection is evaluated in a continuous manner: whenever eitherthe control signal or any input changes, the output may change to match.

This behaviour stems from a design based on combinatorial logic, which is easy to develop for bothcomponents; in a similar way to before, we write down a truth table that describes the behaviour we require,then derive a Boolean expression to implement that behaviour:

Example 2.20. Consider the case of a 2-input (resp. 2-output), 1-bit multiplexer, a truth table for which isoutlined in Figure 2.22b. The idea is we have two 1-bit inputs x and y, and one 1-bit control signal c; we wantto drive r with either x or y depending on whether c = 0 or c = 1. The truth table should make sense in thatwhen c = 0 the output r matches x, and when c = 1 the output r matches y; the don’t care entries, and so truthtable as a whole, can be read as “if c = 0 then r = x irrespective of y, whereas if c = 1 then r = y irrespective ofx”. From the truth table, we can arrive at the expression

r = ( ¬c ∧ x ) ∨

( c ∧ y )

which is shown diagrammatically in Figure 2.22c.

Example 2.21. Consider the case of a 2-input (resp. 2-output), 1-bit demultiplexer, a truth table for which isoutlined in Figure 2.22e. The idea is we have two 1-bit outputs r0 and r1, and one 1-bit control signal c; wewant to drive either r0 or r1 with x depending on whether c = 0 or c = 1. The truth table should make sense inthat when c = 0 the output r0 matches x, and when c = 1 the output r1 matches x; the don’t care entries, and sotruth table as a whole, can be read as “if c = 0 then r0 = x and r1 is irrelevant, whereas if c = 1 then r1 = y andr0 is irrelevant”. From the truth table, we can derive the expression

r0 = ¬c ∧ xr1 = c ∧ x

shown diagrammatically in Figure 2.22f.

For more general m-input (resp. m-output), n-bit alternatives, we employ the design patterns outlined earlierusing the 2-input (resp. 2-output), 1-bit components as a starting point.

Example 2.22. Consider the task of designing a 2-input, n-bit multipliexer, wlog. taking n = 4 as an example.Note that with m = 2 inputs, we need dlog2(m)e = 1 control signals: one of 21 = 2 possible input assignments isused to select each input.

Figure 2.23a illustrates the design, which uses replication. The idea is simple: we use n separate 2-input,1-bit multiplexers where the i-th instance accepts the i-th bit of each input x and y and produces the i-th bit ofthe output r. Or, put another way, since each instance is controlled by the same c, they are all either selectingsome bit of x or of y to produce r.

git # ba293a0e @ 2019-11-14 94



Example 2.23. Consider the task of designing a m-input, 1-bit multipliexer, wlog. taking m = 4 as an example.Note that with m = 4 inputs, we need dlog2(m)e = 2 control signals: one of 22 = 4 possible input assignments isused to select each input.

One strategy would be to simply write down a larger truth table, i.e.,

MUX4c1 c0 w x y z r0 0 0 ? ? ? 00 0 1 ? ? ? 10 1 ? 0 ? ? 00 1 ? 1 ? ? 11 0 ? ? 0 ? 01 0 ? ? 1 ? 11 1 ? ? ? 0 01 1 ? ? ? 1 1

and then derive a larger Boolean expression

r = ( ¬c0 ∧ ¬c1 ∧ w ) ∨

( c0 ∧ ¬c1 ∧ x ) ∨

( ¬c0 ∧ c1 ∧ y ) ∨

( c0 ∧ c1 ∧ z )

This yields a reasonable result, but as the number of inputs grows the task becomes more difficult. An alternativeis to divide-and-conquer, using 2-input, 1-bit multiplexers to decompose the larger decision task into smallersteps. Figure 2.23b illustrates the design, which uses a cascade. The first, left-most layer of multipliexers iscontrolled by c0: the top-most instance produces w if c0 = 0, or x if c0 = 1, whereas the bottom-most instanceproduces y if c0 = 0, or z if c0 = 1. These outputs are fed into a second, right-most layer that uses c1 to selectappropriately: if c1 = 0 the output of the top-most multiplexer in the first layer is selected, whereas if c1 = 1the output of the bottom-most multiplexer in the first layer is selected. The overall result r is the same as ourdedicated design above, but hopefully it is clear the cascaded design is conceptually a lot simpler.

2.2.5.2 Components for doing basic arithmetic

Chapter 1 addressed the challenge of representing numbers, integers for example, as n-bit binary sequences;a question left open was how we might do arithmetic with those numbers, or, more precisely, how we mightdo computation with the associated representations. Since we are now able to design arbitrary Booleanfunctionality, we can start to investigate this question.

The general, high-level task is to design a large, more complex combinatorial logic component that imple-ments some arithmetic operation (e.g., integer addition): it might accept n-bit inputs x and y that represent xand y, and produce an n-bit result r st.

r = f (x, y) 7→ x + y,

i.e., an r that represents the sum of x and y. The content of Chapter 4 does exactly this. As a means of support,however, a more specific, lower-level first step considers a set of less complex 1-bit building block components:although not so useful alone, they will act as building blocks within the more general alternatives.

Comparators In contrast to arithmetic proper, where we expect both inputs and output to be numbers, acomparison compares numerical inputs thus produces a Boolean output. Various types of comparison areuseful, but it is enough to consider two in particular: the others are derived from these comparators, that dealwith 1-bit inputs.

Example 2.24. Given 1-input inputs x and y, an equality comparator computes

r =

{1 if x = y0 otherwise

From the associated truth tables is shown in Figure 2.24b, we can derive the expression

r = ¬(x ⊕ y).

Example 2.25. Given 1-input inputs x and y, a less than comparator computes

r =

{1 if x < y0 otherwise

git # ba293a0e @ 2019-11-14 95



x

yr

(a) The equalitycomparator as asymbol.

Equalx y r0 0 10 1 01 0 01 1 1

(b) The equality comparator as a truthtable.

xy r

(c) The equality comparator as a circuit.

x

yr

(d) The less thancomparator as asymbol.

Less-Thanx y r0 0 00 1 11 0 01 1 0

(e) The less than comparator as a truthtable.

x

y r

(f) The less than comparator as a circuit.

Figure 2.24: An overview of equality and less than comparators.

From the associated truth tables is shown in Figure 2.24e, we can derive the expression

r = ¬x ∧ y.

While fairly self explanatory, the truth tables may seem a little odd as a result of their dealing with 1-bit inputs.However, reading through them row-wise should demonstrate their content is sane: using less than as anexample, consider than the truth table mirrors your intuition wrt. this comparison by stating that 0 is not lessthan 0, 0 is less than 1, 1 is not less than 0, and, finally, 1 is not less than 1. Note that the equality comparatordesign hints that an inequality comparator can be simpler still: inverting the expression, we find r = x ⊕ yprovides an inequality comparison

r =

{1 if x , y0 otherwise

because, by definition, when x = y (i.e., x = 0 and y = 0 or x = 1 and y = 1) x ⊕ y = 0 and when x , y (i.e., x = 0and y = 0 or x = 1 and y = 1) x ⊕ y = 1.

Adders The simplest arithmetic operation, conceptually at least, is addition. There are two variants of a 1-bitadder, instances of which will be sufficient to construct larger, n-bit adders later:

Example 2.26. Given 1-bit inputs x and y, a half-adder component computes a 1-bit sum s and carry-out co (i.e.,the LSB and MSB of the 2-bit sum x + y + ci), as output. The corresponding truth table shown in Figure 2.25bcan be used to derive associated Boolean expressions

co = x ∧ ys = x ⊕ y

illustrated in Figure 2.25c.

Example 2.27. Given 1-bit inputs x and y and a carry-in ci, a full-adder component computes a 1-bit sum s andcarry-out co (i.e., the LSB and MSB of the 2-bit sum x + y + ci), as output. The corresponding truth table shownin Figure 2.25e can be used to derive associated Boolean expressions

co = (x ∧ y) ∨ (x ∧ ci) ∨ (y ∧ ci)= (x ∧ y) ∨ ((x ⊕ y) ∧ ci)

s = x ⊕ y ⊕ ci

illustrated in Figure 2.27d.

git # ba293a0e @ 2019-11-14 96



co

sxy

(a) The half-adder asa symbol.

Half-Adderx y co s0 0 0 00 1 0 11 0 0 11 1 1 0

(b) The half-adder as a truth table.

yx s

co

(c) The half-adder as a circuit.

co

s

cixy

(d) The full-adder asa symbol.

Full-Adderci x y co s0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

(e) The full-adder as a truth table.

yx

ci

s

co

(f) The full-adder as a circuit.

Figure 2.25: An overview of half- and full-adder cells.

As was the case with comparators, the truth tables may seem a little odd as a result of their dealing with 1-bitinputs; again, reading through them row-wise should demonstrate their content is sane. The half-adder, forexample, is st.

• if we add x = 0 to y = 0, the sum is 0 and there is no carry-out,

• if we add x = 0 to y = 1, the sum is 1 and there is no carry-out,

• if we add x = 1 to y = 0, the sum is 1 and there is no carry-out, and finally

• if we add x = 1 to y = 1, the sum is 2: since we cannot represent 2 using a single bit, we set s = 0 andco = 1 to indicate there is a carry-out.

Note that the full-adder design is essentially two half-adders joined in a cascade: to accommodate the extracarry-in the first instance computes t = x + y with the second one then computing s = t + ci. Also, note that theBoolean expressions listed for a full-adder effectively include two (equivalent) options for co. One reason toprefer the second is that given we need to compute both co and s, it contains the shared term x ⊕ y which canbe capitalised on during optimisation.

As an aside, the half-adder represents a simple enough design to explore the idea of gate universality in (alittle) more detail:

Example 2.28. Given the natural half-adder implementation in Figure 2.26a as a starting point, Figure 2.26bdescribes an equivalent using NAND gates only.

Example 2.29. Given the natural half-adder implementation in Figure 2.26a as a starting point, Figure 2.26cdescribes an equivalent using NOR gates only.

Focusing on the NAND-based variant, for example, naive translation using identities (annotated using dashedboxes, wrt. the original gates in the natural implementation) yields an implementation with 11 NAND gates.As you might expect, we can improve this with some careful optimisation: capitalising on the equivalence

x ∧ (y ∧ y) ≡ x ∧ (x ∧ y),

git # ba293a0e @ 2019-11-14 97



s

co

yx

(a) An expanded half-adder, with XOR in terms of NOT, AND and OR.

x y

s

co

(b) An half-adder based on NAND gates only.

x y

s

co

(c) An half-adder based on NOR gates only.

Figure 2.26: Gate universality used to implement a NAND- and NOR-based half-adder. Note that the dashed boxesin the NAND and NOR implementations (middle and bottom) are translations of the primitive gates within the morenatural description (top).

git # ba293a0e @ 2019-11-14 98



Enc-4-to-2x3 x2 x1 x0 x′1 x′00 0 0 1 0 00 0 1 0 0 10 1 0 0 1 01 0 0 0 1 1

(a) The encoder as a truth table.

x3

x1

x2

x′0

x′1

(b) The encoder as a circuit.

Dec-2-to-4x′1 x′0 x3 x2 x1 x0

0 0 0 0 0 10 1 0 0 1 01 0 0 1 0 01 1 1 0 0 0

(c) The decoder as a truth table.

x′0x′1

x3

x2

x1

x0

(d) The decoder as a circuit.

Figure 2.27: An example encoder/decode pair.

we can writes = x ⊕ y

= (¬x ∧ y) ∨ (x ∧ ¬y) (XOR identity)= ¬(¬x ∧ y) ∧ ¬(x ∧ ¬y) (OR into NAND identity)= (¬¬x ∨ ¬y) ∧ (¬x ∨ ¬¬y) (de Morgans)= (x ∨ ¬y) ∧ (¬x ∨ y) (involution)= (¬x ∧ ¬¬y) ∧ (¬¬x ∧ ¬y) (OR into NAND identity)= (¬x ∧ y) ∧ (x ∧ ¬y) (involution)= ((x ∧ x) ∧ y) ∧ (x ∧ (y ∧ y)) (NOT into NAND identity)= ((x ∧ y) ∧ y) ∧ (x ∧ (x ∧ y))

which uses 4 NAND gate due to the shared term x ∧ y, which is also shared with

co = x ∧ y= (x ∧ y) ∧ (x ∧ y)

meaning 5 NAND gates for the whole half-adder (which is roughly the same number of non-NAND gateswithin the natural implementation). There is a more direct ways to manipulate the expression for s, but noticethat in the above a) steps 1 to 5 yield a result equivalent to Figure 2.26b, b) steps 6 to 7 eliminate any (obviously)redundant NOT gates, and c) step 8 reorganises the gates to maximise shared logic (rather than eliminating anygates outright). Although this is a specific example, these steps demonstrate a general strategy that often has acounter-intuitive impact on any given design: correctly optimised, using NAND (or NOR) often yields a lower(if any) increase in gate count vs. your expectation or an initial translation. Put another way, although theycan be harder to work with, they do not imply a less efficient result wrt. area (while also retaining advantagessuch as regularity).

2.2.5.3 Components that translate between representations

Informally at least, encoder and decoder components can be viewed as translators. Consider the communicationbetween two parties (or components) as an example

git # ba293a0e @ 2019-11-14 99



Encoder Decoderx x

m-bit x′n-bit x n-bit x

where the encoder accepts the input x, and encodes it into an x′ then transmitted; the decoder receives x′, anddecodes it so as to recover the original x. Phrased as such, both encoder and decoder are basically translatingbetween representations because x′ could be thought of as a different representation of x we normally term acode word.

Definition 2.26. Modelling an encoder and decoder as two functions

Encoder : {0, 1}n → {0, 1}m

Decoder : {0, 1}m → {0, 1}n

we use the term n-to-m to describe either component where it has n inputs and m outputs:

1. an n-to-m encoder translates an n-bit input value into an m-bit code word, and

2. an m-to-n decoder translates an m-bit code word into an n-bit output value.

Definition 2.27. If for every code word x we haveH(x) = 1, i.e., every possible code word has exactly one bit set to 1, wecall the associated encoder (resp. decoder) one-hot (or one-of-many).

Definition 2.28. A priority encoder is st. priority (or preference) is given to one input over another. This concept ismost obviously useful in a one-hot encoder, allowing it to cope gracefully with erroneous situations whereH(x) > 1: theidea is that if xi = 1 and x j = 1, then priority is given to xi say (meaning the fact x j = 1 is ignored).

These formalisms hide various subtleties, most notably the fact that it only makes sense to discuss encoderand decoders in context: both a) the encoding (resp. decoding) scheme and so structure of code words, andb) parameterisation of said scheme (e.g., n and m), are totally domain-specific, meaning we cannot describe ageneral encoder (resp. decoder) in a sensible manner.

• The n-to-m terminology suggests inputs (resp. outputs) drawn from sets of 2n (resp. 2m) values. However,it is clearly possible, and often useful for some code words to remain unused; as such, it can be useful torelax the terminology this think of n-value and m-value sets instead. Note there is no strict requirementthat m > n, or vice versa.

• Normally we need to consider the encoder and decoder together, as their behaviour is related: wenormally expect

(Decode ◦ Encode)(x) = x,

i.e., Decode = Encode−1. This fact implies that it is not always possible to describe a valid decoder (resp.encoder) for a given encoder (resp. decoder): some functions have no inverse. That said, however, somecontexts do not need a decoder: the problem at hand may be st. the code word is useful as is, and thecorresponding x need not be recovered.

Example 2.30. Consider the task of taking n inputs, say xi for 0 ≤ i < n, and producing a unsigned integer x′

that determines which xi = 1 given that for all j , i, x j = 0. In other words, we want an encoder that takes x andproduces some x′ as a result; the task of taking x′ and recovering each xk for 0 ≤ k < n demands a correspondingdecoder. This problem might be motivated by a need to control components: if we have n such components ina system, the decoder could, for instance, be used to enable one of them at a time.

By setting n = 4 for example, the encoder (resp. decoder) will have four inputs x0, x1, x2, and x3; this impliesx′ ∈ {0, 1, 2, 3} and hence m = 2, meaning two outputs x′0 and x′1. Figure 2.27a and Figure 2.27c show truth tablesfor the two components. For the encoder, we derive the Boolean expressions

x′0 = x1 ∨ x3x′1 = x2 ∨ x3

and for the decoderx0 = ¬x′0 ∧ ¬x′1x1 = x′0 ∧ ¬x′1x2 = ¬x′0 ∧ x′1x3 = x′0 ∧ x′1

git # ba293a0e @ 2019-11-14 100



co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

0

1 0 0 0

r 0 r 1

r n−

1

Figure 2.28: An incorrect counter design, using naive “looped” feedback.

Example 2.31. Using the previous example for motivation, imagine we break the rules and set both x1 = 1 andx2 = 1: the encoder fails, producing

x′0 = x1 ∨ x3 = 1x′1 = x2 ∨ x3 = 1

as the code word (incorrectly suggesting that x3 = 1). To address problems of this sort, we can employ a priorityencoder, giving x2 priority over x1 for example (or, more generally, every xi has priority over x j for i > j). Tocapture this requirement, we rewrite the truth table as follows:

PriorityEnc-4-to-2x3 x2 x1 x0 x′1 x′00 0 0 1 0 00 0 1 ? 0 10 1 ? ? 1 01 ? ? ? 1 1

Take the 2-nd row for example: although potentially x0 = 1 or x1 = 1, the output gives priority to x2. That is,provided that x2 = 1 and x3 = 0 (i.e., irrespective of x0 and x1) the output will be st. x′0 = 0 and x′1 = 1. Theassociated Boolean expressions are updated accordingly to

x′0 = (x1 ∧ ¬x2 ∧ ¬x3) ∨ x3x′1 = (x2 ∧ ¬x3) ∨ x3

2.3 Sequential logic

Imagine we are tasked with designing an n-bit counter, i.e., a component whose n-bit output r steps throughvalues 0, 1, . . . , 2n

−1 and then cycles (i.e., starts from 0 again). Recall that we have a 1-bit full-adder component;Chapter 4 later explains how to extend this into an adder that can compute x + y for n-bit x and y, the basic ideabeing to organise n full-adder instances in a cascade where the carry-out of each i-th instance connects to thethe carry-in of the next, (i + 1)-th instance. A natural attempt at the counter design might therefore use such anadder, computing r← r + 1 as Figure 2.28 illustrated in: we essentially set one input of the adder y = 1 and theother to x = r, suggesting the adder will compute r + 1. This might sound like a reasonable approach in theory,but has (at least) two practical flaws:

1. we cannot initialise the value, and

2. we do not let the output of each full-adder settle before it is used again as an input: they are computedcontinuously (because there is a loop, from x through the full-adder to r and so back to x).

So despite the fact it intuitively functions as required, this design is far from ideal and, in fact, invalid. Perhapsthe only use it has is to illustrate some fundamental limitations of combinatorial logic. More specifically, wecannot control when a component computes an output (since it does so continuously), nor have it remembersaid output once produced. We need a different approach, which along with components used to support it, istermed sequential logic: we need

• some way to control (e.g., synchronise) components,

• one or more components that remember what state they are in, and

• a mechanism to perform computation as a sequence of steps, rather than continuously,

which are addressed step-by-step in the following Sections.

git # ba293a0e @ 2019-11-14 101



positive level negative level

negative edge positive edgeclock cycle

(a) A 1-phase clock.

Φ1

Φ2

δ1 δ2 δ3 δ4

(b) A 2-phase clock.

Figure 2.29: An illustration of standard features in 1- and 2-phase clocks.

2.3.1 Clocks

If we want to perform computation as a sequence of steps, we need to exert control over the componentsinvolved: for example, it could be important to synchronise each component st. they all start (or stop)computation at the same time. We use a special control signal to do this:

Definition 2.29. A clock signal is simply a digital signal that oscillates between 0 and 1 in a regular fashion.

Note that despite the terminology, in the context of digital logic a clock is somewhat analogous to a metronome:rather than tracking the (wall-clock) time, for example, it simply produces a regular series of “ticks” (or features)that are used for synchronise associated actions.

Clock features Since a clock signal is a digital signal, it shares features such as positive and negative edgesand levels as previously outlined within Chapter 1 and now by Figure 2.29a. That said, however, severalspecific features are also important:

Definition 2.30. The interval between a given positive (resp. negative) edge and the next positive (resp. negative) edgeis termed a clock cycle. Additional terms you commonly encounter stem from this definition: for example, the clockperiod is the time taken for a clock cycle to complete, while the clock frequency (or clock rate) is the number of clockcycles completed in a unit of time (typically each second, and hence the inverse of the clock period).

Definition 2.31. The time the clock signal spends at positive and negative levels need not be equal; the term duty cycleis used to describe the ratio between these times. A clock will typically have a duty cycle of a half, meaning the signal isat a positive level (literally “doing its duty”) for the same time it is at a negative level, but clearly other ratios are valid.

These features are harnessed by a clocking strategy, which is a formal way of saying “a way of using the clocksignal”. For example, we might use a clock edge to trigger the start some computation, or a clock level toenable or disable (e.g., reset) some computation.

Clock generation In a sense, any signal could be deemed a clock signal provided it satisfies the definition(s)above. However, in practice there is set of distinguished clock signals generated by a) an external or b) aninternal clock source (or clock generator) component.

Example 2.32. An external clock source is commonly provided using a piezoelectric crystal. When a voltageis applied across such a component, it will oscillate according to a natural frequency (related to the physicalcharacteristics of the crystal); roughly speaking, one can use the resulting electrical field generated by thisoscillation as a clock signal.

Definition 2.32. It can be useful to manipulate a given clock signal, in order to alter it wrt. features such as frequency; thisis common whenever the clock signal is provided as an input to a design, but the design has specific internal requirements.In this context, the original and manipulated cases are sometimes termed the reference clock signal and derived clocksignal.

git # ba293a0e @ 2019-11-14 102



Increasing the frequency of, i.e., multiplying, a reference clock is possible but somewhat beyond our scope;dedicated designs exist to solve this problem, but we omit a detailed overview. Decreasing the frequency of,i.e., dividing, a reference clock is much easier. Imagine that each positive edge of the reference clock clk causesa counter c to be incremented: assuming c = 0 initially, the individual bits of c can be visualised as

clkc0

c1

c2

c=

0c=

1c=

2c=

3c=

4c=

5

Notice that each successive bit of c models clk, but with a period that is twice as long: formally, the (i − 1)-thbit of the counter c acts like clk divided by 2i. This means, for example, that if i = 1 we can extract a clock signalwith 1

2i = 121 = 1

2 times (i.e., half) the frequency via the 0-th bit of c.

Clock distribution

Definition 2.33. As with the power rails, a given clock signal must be distributed (or supplied) to each component thatmakes use of it; a clock distribution network is tasked with doing so.

Definition 2.34. The term clock skew describes a phenomena whereby a clock signal arrives at one component along adifferent path to another, and so at a different time; this suggests the two components are unsynchronised.

Example 2.33. Example clock distribution network topologies include the H-tree, which is a form of spacefilling curve. The advantage of a H-tree is that wire delay from the clock generator to each target component,is uniform: this helps minimise the potential for clock skew.

Definition 2.35. The term clock domain defines the influence of control exerted by a specific clock signal; everycomponent in a given clock domain is controlled by the same clock signal.

It is hard(er) to reason about the relationship between the features in different clock signals that imply differentclock domains. This means, for example, that a) synchronising, and/or b) communicating values betweencomponents in two, different clock domains is harder than if the same components are in the one, single clockdomain: intuitively, for example, it is hard to tell when positive edges on said clocks may occur at the sametime (and so synchronise the components, say). As a result, points of interaction between (i.e., at the boundaryof) clock domains (e.g., so-called clock domain crossings) demand careful attention.

From 1-phase to n-phase clocks Although it is easiest to think of a single clock signal, as illustrated inFigure 2.29a, more complicated arrangements are both possible and useful. A central example is the conceptof an n-phase clock, which sees the clock distributed as n separate signals along n separate wires.

A common4 instance of this general concept is 2-phase clock: the idea is that the clock is represented bytwo signals, often labelled Φ1 and Φ2. Figure 2.29b shows how the signals behave relative to each other. Notethat features within a 1-phase clock, e.g., the clock period, levels and edges, translate naturally to both Φ1 andΦ2. However, notice the additionally guarantee which means their positive levels are non-overlapping: whileΦ1 is at a positive level, Φ2 is always at a negative level and vice versa. This behaviour is controlled by fourparameters

• δ1 is the period between a negative edge on Φ2 and a positive edge on Φ1,

• δ2 is the period between a positive edge on Φ1 and a negative edge on Φ1,

• δ3 is the period between a negative edge on Φ1 and a positive edge on Φ2, and

• δ4 is the period between a positive edge on Φ2 and a negative edge on Φ2.

Adjusting these parameters will shorten or elongate the period of Φ1 and/or Φ2, or the “gaps” between them,but the central principle of their being non-overlapping is maintained.

4 Based admittedly on limited experience, it seems that relatively few textbooks cover both 1- and 2-phase clocking strategies: in someways this is a pity, since the use of 2-phase clocks is certainly simpler given the requirement for latches rather than flip-flops. If you wantan alternative overview, then [11, Section 5] offers an example.

git # ba293a0e @ 2019-11-14 103



2.3.2 Latches, flip-flops and registers

Our second requirement is a component which remembers what state it is in, which is to say it stores a value(state and value are used synonymously). Put more formally, it should retain some current state Q (which canalso be read as an output), and allow update to some next state Q′ (which is provided as an input, meaning webasically store the input value).

Definition 2.36. A stateful component can be classified as being

1. an astable, where the component is not stable in either state and flips uncontrolled between states,

2. a monostable, where the component is stable in one states and flips uncontrolled but periodically between states,or

3. a bistable, where the component is stable in two states and flips between states under control of an input.

The third class or bistables is often the most useful, and our focus here, since it has the most useful behaviour.

Definition 2.37. Given a suitable bistable component controlled using an enable signal en that determines when updateshappen, we say it can be

1. level-triggered, i.e., updated by a given level on en, or

2. edge-triggered, i.e., updated by a given edge on en.

The former type is typically termed a latch, with the latter termed a flip-flop.

Latches are sometimes described as transparent: this term refers to the fact that while enabled, their input andoutput will match since the state (which matches the output) is being updated with the input. This is not thecase with flip-flops, because their state is only updated at the exact instant of an edge.

Definition 2.38. Whether a positive or negative level (resp. edge) of some signal controls the component depends onwhether it is active high or active low; a signal en is often written ¬en to denote the latter case.

Definition 2.39. It is common for a given latch or flip-flop design to include additional control signals; an importantexample is a reset signal rst, that is often included to allow (re)initialisation of a design.

Definition 2.40. When used as a verb rather than a noun (cf. logic gate), gate means to conditionally turn off somecomponent or feature.

Example 2.34. Consider a component whose 1-bit input x is gated by AND’ing it with a control signal g: theinput provided to the component is

x′ = g ∧ x.

We say g gates x because if g = 0 then x′ = g ∧ x = 0 ∧ x = 0: whatever the value of x, the component getsx′ = 0 as input if g = 0, hence x has been “turned off” by g. In contrast, if g = 1 then x′ = g ∧ x = 1 ∧ x = x: thecomponent gets x′ = x as normal if g = 1.

Our description of such components has so far been very abstract; the goal in what follows is to remedy thissituation. First, we describe the high-level design and behaviour of some latch and flip-flip components. Thenwe show how this behaviour can be realised, using a lower-level design expressed in terms of logic gates. Incombination, we focus specifically on the goal of developing an n-bit register based on D-type latches and/orflip-flops.

2.3.2.1 Common latch and flip-flop types

High-level descriptions of behaviour There are four common, concrete instantiations of the somewhatabstract components described above. That is, we usually rely on four common latch and flip-flop types:

1. An SR-type latch (resp. SR-type flip-flop) component has two inputs S (or set) and R (or reset):

• when enabled and

– S = 0, R = 0 the component retains Q,– S = 1, R = 0 the component updates to Q = 1,– S = 0, R = 1 the component updates to Q = 0,– S = 1, R = 1 the component is meta-stable,

but

git # ba293a0e @ 2019-11-14 104



• when not enabled, the component is in storage mode and retains Q.

The corresponding behaviour is described as follows:

SR-Latch/SR-FlipFlopCurrent Next

S R Q ¬Q Q′ ¬Q′

0 0 0 1 0 10 0 1 0 1 00 1 ? ? 0 11 0 ? ? 1 01 1 ? ? ? ?

2. A D-type latch or “data latch” (resp. D-type flip-flop) component has one input D:


– D = 1 the component updates to Q = 1,– D = 0 the component updates to Q = 0,

but



D-Latch/D-FlipFlopCurrent Next

D Q ¬Q Q′ ¬Q′

0 ? ? 0 11 ? ? 1 0

3. A JK-type latch (resp. JK-type flip-flop) component has two inputs J (or set) and K (or reset):


– J = 0, K = 0 the component retains Q,– J = 1, K = 0 the component updates to Q = 1,– J = 0, K = 1 the component updates to Q = 0,– J = 1, K = 1 the component toggles Q,

but



JK-Latch/JK-FlipFlopCurrent Next

J K Q ¬Q Q′ ¬Q′

0 0 0 1 0 10 0 1 0 1 00 1 ? ? 0 11 0 ? ? 1 01 1 0 1 1 01 1 1 0 0 1

4. A T-type latch or “toggle latch” (resp. T-type flip-flop) component has one input T:


– T = 0 the component retains Q,– T = 1 the component toggles Q,

but


git # ba293a0e @ 2019-11-14 105



en

D Q

¬Q

(a) A level-triggered, D-type latch.

en

D Q

¬Q

(b) A edge-triggered, D-type flip-flop.

Figure 2.30: Symbolic descriptions of D-type latch and flip-flop components (note the triangle annotation around en).


T-Latch/T-FlipFlopCurrent Next

T Q ¬Q Q′ ¬Q′

0 0 1 0 10 1 0 1 01 0 1 1 01 1 0 0 1

It is useful to look in more detail at the D-type component, since this will help explain the basic concepts. Thecomponent has

• in addition to the enable signal en, one input called D, and

• two outputs, Q and ¬Q; we can ignore ¬Q usually, but note that it should always be the inverse of Q.

The truth table that describes the behaviour is split into two halves, which is unlike what we have seenpreviously: the left-hand half is a description of the current state, the right-hand a description of the next state,i.e., after we perform an update. Sometimes this is termed an excitation table to distinguish it from a standardtruth table. So, for example, the first row can be read as “if D = 0, then no matter what the current state is thenthe next state should be Q = 0”, and the second row can be read as “if D = 1, then no matter what the currentstate is then the next state should be Q = 1”. Put another way, this component works as required: we can eitherupdate it to store D when enabled, or operate it in storage mode to retain Q otherwise.

Armed with this knowledge, we can already think about using such components in our designs: weexpand on their internal design in the following Sections, but can already use more abstract symbols shownin Figure 2.30 to differentiate between the latch and flip-flop versions. Similar symbols describe componentsother than the D-type one we have focused on. They typically retain the the triangle annotation (or absencethereof) on en, and commonly omit any unused outputs (e.g., ¬Q).

Low(er)-level descriptions of behaviour Still focusing on the D-type component, lower-level use can beillustrated using a timing diagram, which shows the behaviour of the enable signal en (which we assume isactive high), the input D and the output Q. For a D-type latch we have something like the following:

en

Q

D

t0 t1 t2 t3 t4 t5

The vertical dashed lines highlight important points in time; between t1 and t2, and t3 and t4 for instance, en isat a positive level so the latch state is updated to match D. Otherwise, for example between t0 and t1, en is at anegative level so changes to D do not effect the latch state: the latch is in storage mode, meaning it retains thecurrent state. Swapping to a D-type flip-flop, the behaviour changes:

en

Q

D

t0 t1 t2

git # ba293a0e @ 2019-11-14 106



Now the flip-flop state will be updated to match D only at the points in time where en transitions from 0 to 1;this happens at t0, t1 and t2, meaning interim changes to D have no effect on the flip-flop state.

Definition 2.41. Using a component of this type is more difficult in practice than alluded to by these examples. Althoughwe largely ignore them from here on, the following are important:

1. The setup time (resp. hold time) is the minimum period of time that D must be stable before (resp. after) use toupdate the component.

Think of the clock feature (either level or edge) as triggering the act of sampling from D in order to update the state.As such, the two timing restrictions mentioned make sure the sample is reliable: they specify a window, around thechange to en, where D has to be stable for some period of time.

2. The clock-to-output time is an artefact of propagation delay: a delay will exist between the update event beingtriggered by the associated clock feature, and the output Q changing to match.

These time periods or delays will be determined by the implementation of the component; ideally they will be minimised,which makes the component easier to use (i.e., more tolerant).

Example 2.35. The concepts of setup, hold, and clock-to-output time are illustrated in the following (intention-ally exaggerated) waveform relating to a D-type (edge triggered) flip-flop:

en

Q

D

setuptime

holdtime

clock-to-outputtime

2.3.2.2 Implementation step #1: a basic SR latch

The first step is somewhat counter-intuitive. We start by looking at the Set-Reset or SR latch: the circuit shownin Figure 2.31a has two inputs called S and R which are the set and reset signals, and two outputs Q and ¬Q.Internally, the arrangement will probably seem odd in comparison to other designs we have seen so far: theoutputs of each NOR gate is wired to the input of the other, an arrangement we say is cross-coupled.

Understanding the behaviour of the design as a whole depends on a property of the NOR gates. Recall(e.g., from Figure 2.13) that we can describe NOR using a truth table as follows:

NORx y r0 0 10 1 01 0 01 1 0

In particular, this illustrate the fact that if either x = 1 or y = 1 then the result must be r = x ∨ y = 0. Put anotherway, we can write two axioms

x ∨ 1 = 01 ∨ y = 0

These are important, because they allow us to resolve the loop introduced by the cross-coupled nature of NORgates in this design. We can see how, on a case-by-case basis, by observing output for each possible assignment:this is shown in Figure 2.32.

• if S = 1, R = 0 (Figure 2.32a) then we force Q = 1, ¬Q = 0 (irrespective of what they were previously)because the top NOR gate must output 0 because we know 1 ∨ y = 0,

• if S = 0, R = 1 (Figure 2.32b) then we force Q = 0, ¬Q = 1 (irrespective of what they were previously)because the bottom NOR gate must output 0 because we know x ∨ 1 = 0,

git # ba293a0e @ 2019-11-14 107



S

R

¬Q

Q

(a) An NOR-based SR type latch.

S

R

Q

¬Q

(b) An NAND-based SR type latch.

S

R

¬Q

Q

en

(c) An NOR-based SR type latch with enable signal.

S

R

Q

¬Q

en

(d) An NAND-based SR type latch with enable signal.

¬Q

Q

D

en

S′

R′

(e) An NOR-based SR type latch with enable signal andR = ¬S.

Q

¬Q

D

en

S′

R′

(f) An NAND-based SR type latch with enable signal andR = ¬S.

Figure 2.31: A collection of NOR- and NAND-based SR type latches, with simpler (top) to more complicated (middleand bottom) control features.

S

R

¬Q

Q

1

0

0

1

(a) A case for S = 1, R = 0.

S

R

¬Q

Q

0

1

1

0

(b) A case for S = 0, R = 1.

S

R

¬Q

Q

0

0

1

0

(c) A case for S = 0, R = 0.

S

R

¬Q

Q

0

0

0

1

(d) A case for S = 0, R = 0.

S

R

¬Q

Q

1

1

?

?

(e) A case for S = 1, R = 1.

Figure 2.32: A case-by-case overview of NOR-based SR latch behaviour; notice that there are two sane cases for S = 0and R = 0, and no sane cases for when S = 1 and R = 1.

git # ba293a0e @ 2019-11-14 108



An aside: NAND- rather than NOR-based latches.

As an aside, we can construct (more or less) the same component using NAND rather than NOR gates; theNAND-based versions are shown alongside each of the associate NOR-based Figures. This change impliesa subtle difference in behaviour however. Essentially, the storage and meta-stable states are swapped over:when enabled and

• S = 1, R = 1 (rather than S = 0 and R = 0) the component retains Q, and

• S = 0, R = 0 (rather than S = 1 and R = 1) the component is meta-stable.

In addition, the Q and ¬Q outputs from the component swap over as well. In short, the NAND-based versionstill achieves the same goal, but we need to carefully translate the behaviour when using it within a largerdesign. It is often termed an SR latch rather than SR latch to highlight this fact, which we adopt to avoidconfusion about which type of component is meant.

• if S = 0, R = 0 then the outputs are not uniquely defined by the inputs: there are in fact two logicallyconsistent possibilities (Figure 2.32c and Figure 2.32d), namely Q = 1, ¬Q = 0 or Q = 0, ¬Q = 1,

• if S = 1, R = 1 (Figure 2.32e) then we force Q = 0, ¬Q = 0: in a sense this is contradictory, because weexpect each to be the inverse of the other, but hints at another problem.

In the final case, the latch could be (and we have) described as being in a meta-stable state because the eventualoutput is not predictable. An intuitive reading is that it makes no sense to both set and reset the value, so someform of unexpected behaviour for S = R = 1 is therefore not unreasonable. More specifically though, once wereturn to S = 0, R = 0 the latch must settle in one or other of the two possibilities outlined above: we cannotpredict which one, however, so the subsequent state of the latch is essentially random.

Note that in terms of the specified behaviour, the design does what we want. For example, we can setor reset the current state (per Figure 2.32a and Figure 2.32b) or retain the current state (per Figure 2.32c andFigure 2.32d) as need be. However, this high-level description avoids two perfectly reasonable questions,namely

1. how does the latch settle into any state, particularly given the case where S = R = 0 seems to imply thereare two options, and

2. how does it stay in one of those states when S = R = 0.

Up to a point, it is reasonable to consider that if it the latch settles into one of the two logically consistent states,there is just no motivation for it to subsequently change into the other; therefore, it will retains the same state.To provide greater detail, however, we rely on Figure 2.33. The idea is it decomposes the SR latch design intoeight individual transistors (labelled t0 through to t7) which implement the two NOR gates; this annotation isimportant because it allows a clear explanation of their behaviour.

Question #1: how does the latch settle into a state? You can use a similar reasoning for all four cases, butfocus on S = 0 and R = 1 which mean

• t0 is a P-MOSFET, so is connected since S = 0,

• t2 is an N-MOSFET, so is disconnected since S = 0,

• t4 is a P-MOSFET, so is disconnected since R = 1, and

• t6 is an N-MOSFET, so is connected since R = 1.

This means r1 = 0 because t6 is connected and t4 is disconnected. Now we can see that

• t1 is a P-MOSFET, so is connected since r1 = 0,

• t3 is an N-MOSFET, so is disconnected and since r1 = 0.

This means r0 = 1 because t0 and t1 are connected, while t2 and t3 are disconnected. Finally, we can check forconsistency, noting

git # ba293a0e @ 2019-11-14 109



Vdd

Vss

S

r1

S r1

r0

Vdd

Vss

R

r0

R r0

r1

t0

t1

t2 t3

t4

t5

t6 t7

Figure 2.33: An annotated SR latch, decomposed into two NOR gates and then into transistors; r0, the output of the topNOR gate, is used as input by the bottom NOR gate and r1, the output from the bottom NOR gate, is used as input bythe top NOR gate (although the physical connections are not drawn).

• t5 is a P-MOSFET, so is disconnected since r0 = 1, and

• t7 is an N-MOSFET, so is connected since r0 = 1.

This means r1 = 0 because t6 and t7 are connected, while t4 and t5 are disconnected: we knew that anyway. So,in short, the circuit settles into a stable state even though it might seem the “loop” would prevent it doing so,and is valid in the sense that r0 and r1 (i.e., Q and ¬Q) are each others inverse as expected.

Question #2: how does the latch remain in a state? Now imagine we flip to S = R = 0, meaning we wouldlike to retain the state fixed above, i.e., keep Q = 0 until we want to update it again. Two transistors change asa result of R changing

• t4 is a P-MOSFET, so is now connected since R = 0,

• t6 is an N-MOSFET, so is now disconnected since R = 0.

However, everything else stays the same, i.e.,

• t5 is a P-MOSFET, so is still disconnected since r0 = 1, meaning that t4 being connected does not connectr1 to Vdd, and

• t7 is an N-MOSFET, so is still connected since r0 = 1, meaning that t6 being disconnected does notdisconnect r1 from Vss.

That is, there is no motivation (or physical stimulus) for the transistors to flip into into the other stable state(i.e., where S = R = 0 and Q = 1) and so the current state is therefore retained.

git # ba293a0e @ 2019-11-14 110



¬Q

Q

en

D

Figure 2.34: A NOR-based D-type flip-flop created using a glitch generator.

Q

¬Q

D

en

Figure 2.35: A NOR-based D-type flip-flop created using a primary-secondary organisation of latches.

2.3.2.3 Implementation step #2: controlling latch updates

The initial SR latch design is arguable too simple, in the sense it is hard to use. We have little or no control overwhen an update happens for instance, because any change to S or R might provoke this; it is also unattractivethat we can produce unpredictable behaviour by (perhaps unintentionally) driving it with inputs that wouldcause meta-stability. Fortunately, both of these problems can be solved with only simple alterations to theoriginal design:

1. To control when an update happens, we gate S and R by adding an extra input en and two AND gates:the internal latch inputs become

S′ = S ∧ enR′ = R ∧ en

When en = 0, S and R are irrelevant: S′ and R′ will always be 0 because, for example, S′ = S∧ 0 = 0. Thismeans when en = 0 the latch can never be updated. When en = 1 however, S and R are passed throughinto the latch as input because S′ = S ∧ 1 = S.

Put another way, the result shown in Figure 2.31c is now clearly level-triggered because S and R onlymatter during a positive level of en. Note that although en can be considered a generic enable signal, wecan use a clock signal to provoke regular, synchronised updates.

2. To avoid the situation where S = R = 1, we simply force R = ¬S by inserting a NOT gate between themto disallow the case where S = R; Figure 2.31e shows the result, where the single input is now labelled D.By following the above, the latch inputs become

S′ = D ∧ enR′ = ¬D ∧ en

This might seem to imply that we cannot put the latch into storage mode any longer. However, rememberthat when en = 0 we always have S′ = R′ = 0 irrespective of D, so en basically decides if we retain Q (ifen = 0) or update it with D (if en = 1).

The result now represents the D-type latch component discussed originally. Reiterating, when enabled and

• D = 1 the component updates to Q = 1,

• D = 0 the component updates to Q = 0,

but when not enabled, the component is in storage mode and retains Q.

git # ba293a0e @ 2019-11-14 111



2.3.2.4 Implementation step #3: from latch to flip-flop

Our next problem is that although the level triggered D-type latch gives some control, it is not very fine-grained.Put simply, although we restrict updates to a positive (resp. negative, for active low components) level whereen = 1 (resp. en = 0), this is potentially a lengthy period of time; the input D may potentially change severaltimes during this period for instance. To give more precise control over updates, we might try to convert thelatch into a flip-flop: this means restricting updates to the precise, and hence much smaller, instant in time thatan edge on en occurs.

There are various ways to realise this alteration; flip-flop design as a topic is broad enough that it starts togo outside our scope wrt. level of detail. In the following Sections we therefore cover two approaches, both ata somewhat high level.

Using a glitch generator One approach is to construct a circuit that will intentionally generates a glitch (orpulse), i.e., an output that whose value will be 1 for a short period of time, namely when en transitions from0 to 1. The glitch then approximates an edge, even though we are still actually using a level; doing so can berationalised by noting that as long as the glitch period is short, it will give us finer grained control than theoriginal latch.

Example 2.36. Reconsider Figure 2.20, whereby a glitch is is generated (in that case unintentionally) for a (short)period of time when en = 1 and t = 1. We can drive the original D-type latch using such a design: Figure 2.34illustrates the result, which now approximates a flip-flop due to the approximation of edge-based triggering.

Using a primary-secondary organisation An alternative approach is a primary-secondary5 organisation oftwo latches, which yields the edge-triggered behaviour we want. Figure 2.35 shows the result, which is basicallyone latch (the primary, on the left) in series with a second latch (the secondary, on the right). The idea is to splita clock cycle into to half-cycles such that

1. while en = 1, i.e., during the first half-cycle, the primary latch is enabled,

2. while en = 0, i.e., during the second half-cycle, the secondary latch is enabled.

In practical terms, this means while en = 1, i.e., during a positive level on en, the primary latch stores the input.Then, the instant en = 0, i.e., at a negative edge on en, the secondary latch stores the output of the primarylatch: you can think of it as triggering a transfer from the primary to the secondary latch, or as the secondarylatch only being sensitive to the output of the primary latch rather than the input. The fact that the transfer isinstantaneous, in the sense it occurs as the result of an (in this case a negative) edge when en flips from 1 to 0,means we get what we want, i.e., an edge-triggered flip-flop.

2.3.2.5 Implementation step #4: an n-bit register

The D-type component we have, either the latch or flip-flop version, holds a 1-bit state; to store a larger, n-bitstate we simply group together n such components into a register. This just means replicating the relevantcomponent type, and synchronising updates to them all using the same enable signal.

Figure 2.36 shows the general structure. We can read the current value of the register from the Q outputs:Qi is the current state of the i-th bit held by the register. We can latch (or store) a new value Q′ into the registerby driving each Di with Q′i then waiting for an update to be triggered (which depending on the componenttype, means waiting for an appropriate level or edge on en).

2.3.3 Putting everything together: general clocking strategies

2.3.3.1 A robust n-bit counter design

So now think back to our original problem outlined at the beginning of the Section: given everything accumu-lated so far, how do we solve it? We can use one or other of two designs; both attempt to “break” the loopevident in the original design by inserting storage components, and therefore differ as a result of opting foreither flip-flops or latches.

Example 2.37. Figure 2.37a represents a solution based on use of flip-flops, which implies a 1-phase clockingstrategy. The top-half of the Figure shows an n-bit ripple-carry adder; the idea is that it computes r′ ← r + 1.This part is roughly the same as the initial, faulty solution. The bottom-half of the Figure shows an n-bit,

5 Historically, the terms master and slave have often been used in place of primary and secondary. Per [3, Section 1.1], however, anddespite some debate, the former are typically viewed as inappropriate now. We deliberately use the latter, therefore, noting that doing somay imply a need to translate the former when aligning with other literature.

git # ba293a0e @ 2019-11-14 112



en

D Q

¬Qen

D Q

¬Qen

D Q

¬Qen

D Q

¬Q

en

Q′ 0

Q′ 1

Q′ n−

1

Q0

Q1

Qn−

1

(a) A level-triggered register based on D-type latches.

en

D Q

¬Qen

D Q

¬Qen

D Q

¬Qen

D Q

¬Q

en

Q′ 0

Q′ 1

Q′ n−

1

Q0

Q1

Qn−

1

(b) An edge-triggered register based on D-type flip-flops.

Figure 2.36: An n-bit register, with n replicated 1-bit components synchronised using the same enable signal.

edge-triggered register; the idea is that it stores the current value of r. Beyond this, two features of the designare vitally important:

1. Notice that the 1-bit sum produced as output by each full-adder is AND’ed with ¬rst. This acts as alimited reset mechanism, in the sense rst gates the output register input (resp. adder output): if rst = 1(so ¬rst = 0) then the register input will always be zero, whereas if rst = 0 (so ¬rst = 1) then the registerinput will match the adder output. Put another way, if rst = 1 then the value subsequently latched by theinput flip-flop is forced to be zero: this is important, because when powered-on the current value will beundefined and hence unusable.

2. Notice that each D-type flip-flop in the register is synchronised by clk (which we assume is a clock):positive edges on clk provoke them to update the stored value r with r′ ← r + 1.The original loop is broken, because the update is instantaneous not continuous: there is a “gap” betweencomputing and storing values, in the sense that the adder has an entire clock cycle to compute the resultr + 1 given r is stored in the flip-flops. Provided that that the propagation delay associated with the adderis less than the clock period (i.e., we do not update r faster than r′ is computed) the problem is solvedand r cycles through the required values in discrete steps controlled by the clock.

Example 2.38. Figure 2.37a represents a solution based on use of latches, which implies a 2-phase clockingstrategy. A reasonable question to ask is why we cannot just replace the flip-flops with latches? Imagine we didthis: since the latches are level-triggered, they will be updated when clk = 1. So one one hand we have brokenthe original loop, but on the other hand the loop is still there when clk = 1 because the latches are essentiallytransparent.

To resolve this the design uses two sets of latches, one to store the adder input and one to store the adderoutput. Only one set is enabled at a time, because we use a 2-phase clock to control them; when Φ2 = 1 theoutput latches store the adder output, then when Φ1 = 1 the input latches store whatever the output latchesstored and subsequently provide a new input to the adder. Clearly we need more storage components tosupport this approach, but you can think of this as a trade-off wrt. reduced complexity of latches versusflip-flops. Put another way, the design might be less efficient in terms of area but is much easier to reasonabout.

2.3.3.2 Generalising the two design strategies

Figure 2.39 generalises the two counter solutions in the previous Section; you can think of both as generalframeworks, or architectures, that can be filled-in with concrete details to realise the solution to a specificproblem. These can be generalised a little further by noting the following:

Definition 2.42. A typical circuit based on sequential logic will be comprised of

1. a data-path, of computational and/or storage components, and

2. a control-path, that tells components in the data-path what to do and when to do it.

For example, within the two counter solutions we clearly have computational (i.e., the adder) and storagecomponents (i.e., the register), and also mechanisms to control them (i.e., the reset AND gates).

git # ba293a0e @ 2019-11-14 113



co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

en

D Q

¬Qen

D Q

¬Qen

D Q

¬Qen

D Q

¬Q

0

Φ1

1 0 0 0

r 0 r 1

r n−

1

rst

(a) Using a 1-phase clock and flip-flop based register(s).

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

en

D Q

¬Qen

D Q

¬Qen

D Q

¬Qen

D Q

¬Q

en

D Q

¬Qen

D Q

¬Qen

D Q

¬Qen

D Q

¬Q

0

Φ1

Φ2

1 0 0 0

r 0 r 1

r n−

1rst

(b) Using a 2-phase clock and latch based register(s).

Figure 2.37: A correct counter design, using sequential logic components.

git # ba293a0e @ 2019-11-14 114



addercom

putesr+

1

addercom

putesr+

1

· · ·

flip-flopsreset

r←

0

flip-flopsupdate

r←

r+

1

flip-flopsupdate

r←

r+

1

· · ·

(a) Using a 1-phase clock and flip-flop based register(s).

Φ1

Φ2

inputlatchesresetr

←0

inputlatchesstore

r←

r′

inputlatchesstore

r←

r′

· · ·

addercom

putesr

+1

addercom

putesr

+1

· · ·

outputlatchesstore

r′←

r+

1

outputlatchesstore

r′←

r+

1

· · ·

(b) Using a 2-phase clock and latch based register(s).

Figure 2.38: Two illustrative waveforms, outlining stages of computation within the associated counter design.

git # ba293a0e @ 2019-11-14 115



Combinatorial logic

Flip-flop basedregister(s)

Input Output

Output

Clock

(a) Using a 1-phase clock.

Combinatorial logic

Latch basedregister(s)


Input

Output

Output

Output

Φ1

Φ2

(b) Using a 2-phase clock.

Figure 2.39: Two different high-level clocking strategies.

Worker #1 Worker #2 Worker #3 Worker #4

Step #1Car #1

Step #2Car #1

Step #3Car #1

Step #4Car #1

Figure 2.40: Production line #1, staffed with pre-Ford workers.

2.4 Pipelined logic

Consider some combinatorial logic component called X. In terms of efficiency, the critical path of the componentpresents a major hurdle: it is what limits how quickly a result can be produced. To cope we might attempt oneof at least two approaches, namely

1. try to apply various low-level optimisations with the goal of reducing the critical path of X, or

2. apply the higher-level technique of pipelining, restructuring X as investigated by the rest of this Section.

2.4.1 An analogy: car production lines

Production (or assembly) lines in the context of manufacturing offer a great analogy for the concept of pipelinedlogic, which is simpler than you might expect. The basic idea of a production line is for the result to be producedas the combination of a number of stages.

Though probably not the first to employ such a process, the manufacture of cars within the Ford MotorCompany is a good example. Ford, under direction of the owner Henry Ford, used a system of continuousproduction lines to build cars. While one person was assembling the engine of car number one, another couldbe attending to the body work on car number two, while yet another could be fitting the wheels to car numberthree. By around 1913, Ford had his production line down to such a fine art that they were able to double theoutput of all their competitors, selling half of all cars purchased in the USA. Although assigning each workera dedicated task reduced accident and wasted time through their wandering around the factory, the fact thatthey stood in the same place for long periods performing repetitive tasks meant that RSI-type injuries werecommon. Ford combated the resulting high turnover of staff by increasing wages to $5 a day, cutting shiftlengths to eight hours a day and installing a dedicated medical department. Productivity soared and the costof producing each vehicle decreased as a result.

git # ba293a0e @ 2019-11-14 116



Worker #1 Worker #2 Worker #3 Worker #4

Step #1Car #1 Car #2 Car #3 Car #4

Step #2Car #1 Car #2 Car #3

Step #3Car #1 Car #2

Step #4Car #1

Figure 2.41: Production line #2, staffed with post-Ford workers.

Figure 2.40 and Figure 2.41 show two production lines: imagine #1 is pre-Ford and #2 is post-Ford if youwant. Notice that the production of a given car is still sequential: it moves through the stages of production inorder, one at a time in both cases. However, production line #2 benefits by overlapping production of differentcars with each other, i.e., producing more than one at a time, in parallel. We can measure the efficiency of theproduction lines #1 and #2 using two metrics, the first of which probably seems more natural:

Definition 2.43. The latency is the total time elapsed before a given input is operated on to produce an output; this issimply the sum of the latencies of each stage.

Definition 2.44. The throughput (or bandwidth) is the rate at which new inputs can be supplied (resp. outputscollected).

The point is that although the latency associated with one car is not changed (it takes 4 time units to producea car in both production lines), the throughput is: in production line #2 we produce a new car every time unit(once the production line is full), whereas we only produce one every 4 time units in #1. In a sense this is anobvious byproduct of the fact that in production line number #1 some of the stages are idle at any given time,but in number #2 they are all active eventually.

If we generalise, an n-stage production line will ideally give us an n-fold improvement in throughput.However, there are some caveats:

• The maximum improvement comes only when we can keep the production line full of work: if the firststage does not start because there is a lack of demand, the production line as a whole is utilised lessefficiently.

• If we cannot advance the production line for some reason (perhaps one stage is delayed by a lack ofparts), we say it has stalled; this also reduces utilisation.

• The speed at which the production line can be advanced is limited by the slowest stage; to minimiseidle time, balance is needed between the workload of stages. That is, if there is one stage that takessignificantly longer than the rest (e.g., it involves some relatively time consuming task), it will hold upthe rest.

• Usually a production line will not be perfect: moving the result of one stage to the next will take sometime, so there is some (perhaps small) overhead associated with all stages. This overhead typically reducesefficiency; minimising it means we can get closer to the ideal n-fold improvement.

2.4.2 Treating logic as a production line

Fortunately, pipelined logic does not suffer from the human-related problems that the Ford production linedid: our logic gates never tire, get RSI, or complain about wages for example! Other than this, the principlesare almost exactly the same. That is, we aim to

1. split some combinatorial logic X into a pipeline of n pipeline stages, say Xi for 0 ≤ i < n, arranged insequence,

2. have each stage perform one step of the overall computation, with in-flight (or active) partial computationadvancing through the pipeline stage-by-stage, and

3. supply inputs into the first stage X0, and collect outputs from the last stage Xn−1.

git # ba293a0e @ 2019-11-14 117



X300ns

x r

(a) Option #1: 1-stage, unpipelined.

X0200ns

X1100ns

x r

(b) Option #2: 2-stage, pipelined but unbalanced.

X0100ns

X1200ns

x r

(c) Option #3: 2-stage, pipelined but unbalanced.

X0100ns

X1100ns

X2100ns

x r

(d) Option #4: 3-stage, pipelined and balanced.

Figure 2.42: Four different ways to split a (hypothetical) component X into stages.

2.4.2.1 Problem #1: how to structure the pipeline

Given X, our first problem is in two parts: first where can we split it to produce the Xi (which depends heavilyon what X is), and second where should we split it?

A generic answer to the first question is hard, since it depends on the component itself. About the mostgeneral approach we can start with is to identify natural splitting points, i.e., look at X and see where there aresteps in the overall computation that can be grouped together. The second question is, however, easier: oncewe have an idea where we can split X, we can look at all the options and select the one that produces the bestresult. More specifically, we know the slowest stage dictates how fast we can advance the pipeline; our goal istherefore to balance the stages (as far as possible) so idleness is minimised (i.e., we avoid one stage waiting foranother). This is illustrated by Figure 2.42 wherein four options for splitting some component X into stages Xiare given:

1. a 1-stage unpipelined design, basically representing the original component X,

2. a 2-stage pipelined design where X0 has a larger latency than X1,

3. a 2-stage pipelined design where X1 has a larger latency than X0, and

4. a 3-stage pipelined design where all stages have equal latency.

Focusing only on the idea of balancing the stages, the last option is most attractive: since all stages take thesame time to compute their part of the overall result, selecting this option will minimise potential idleness.

2.4.2.2 Problem #2: how to control the pipeline

The next problem is how we control the pipeline so it does what we want, at the right time, to produce theright outputs from the right inputs. Consider Figure 2.43a, which outlines some generic pipeline stages (whosebehaviour is irrelevant). There are two key problems:

1. Fundamentally, the stages cannot operate on different inputs if there is nowhere to store those inputs: ifwe supply a new input to X0 at each step, where does the first input go once the first step is finished? Itshould be processed by X1, but instead it will vanish when replaced with the input for the second step.

2. Imagine in the j-th clock cycle the i-th stage Xi computes a partial result ti required by the (i + 1)-th stage.If the stages are connected by a wire, as soon as the i-th stage changes ti this potentially disrupts what the(i + 1)-th stage is doing.

git # ba293a0e @ 2019-11-14 118



An aside: synchronous versus asynchronous pipelines.

A synchronous pipeline is a term used to describe a pipeline structure where all stages are globally synchro-nised, controlled using a single global signal adv which you can think of as a clock; to re-enforce this fact, theperiod between advances is often termed a pipeline cycle.

In an asynchronous pipeline the aim is to remove the need for global control over when the pipelineadvances, and hence remove the need for a global clock. Roughly speaking, control is devolved into thepipeline stages themselves: for one stage to advance, it must engage in a simple handshake with the precedingand subsequent stages to agree when to advance. More formally each Xi controls advi, the local signal thatdetermines when it advances, by communicating with Xi−1 and Xi+1.

This is advantageous in that stages can operate as fast or slow as their workload, rather than a global clock,dictates: the asynchronous pipeline can advance whenever the result is ready rather than being pessimisticand forcing advancement at the rate of the slowest stage. However, although the global clock is removed onepotential disadvantage of this approach is overhead in provision of the handshake mechanism that has to existbetween stages; clearly this can become quite complex depending on the pipeline structure.

So instead, we connect the stages with by pipeline registers, say Ri. This means the (i + 1)-th stage can havea separate, stable input which only changes when the register latches a new value, i.e., when the pipelineadvances. However, each pipeline register takes time to operate, and so adds to the total latency.

Figure 2.43c outlines the new structure, which resolves both problems above. The structure is controlledby adv, shown here as a single global signal that advances all stages at the same time by having the output ofeach Xi stored in Ri and hence used subsequently as input to Xi+1. Figure 2.44 gives a high-level overview ofprogression through the pipeline, controlled by positive edges on adv.

The implication of this structure is that we need to take more care wrt. how we split X into stages.Specifically, more pipeline registers means larger overall latency; as a result, we cannot simply split X into asmany stages as we need to have them balanced. Rather, we must make a trade-off between increased latency (asthe result of some pipeline registers) and increased throughput (as the result of the pipelined design overall).

2.4.3 Some concrete examples

So far, our discussion has been necessarily abstract: many details of a concrete pipeline depend on thecomponent under consideration.

Example 2.39. Consider Figure 2.45, wherein an abstract component X is shown in both unpipelined andpipelined forms. In the unpipelined case we find that the latency is

300 + 20ns = 320ns,

while the throughput is1/320ns = 3.12 × 106operations/s

if we measure the latency of computing and storing the result. However, for a 3-stage pipeline (using the samemeasure) the latency is

100 + 20 + 100 + 20 + 100 + 20ns = 360ns,

while the throughput is1/120ns = 8.33 × 106operations/s.

That is, we have improved the throughput by (roughly) a factor of three: we now get an output from thepipeline (resp. can provide new input) every 120ns rather than 320ns. The drawback is that the overall latencyof a given operation is slightly more, i.e., 360ns rather than 320ns.

Great. But what use is this? The point is, we can relate this abstract example to a concrete component whichacts as motivation for why such an improvement is worthwhile.

Example 2.40. Consider a component that performs the logical left-shift of some 8-bit vector x by a distance ofy ∈ {0, 1, . . . , 7} bits. There are a variety of approaches to designing a circuit with the required behaviour, butone of the simplest is a combinatorial, logarithmic shifter. We will look at the design in detail in Chapter 4, butthe idea is illustrated by Figure 2.46a. In short, the result is computed using three steps: each step producesan intermediate result by either shifting an intermediate input by some fixed distance (the i-th stage shifts by2i bits), or simply passing it through unaltered. For example, if we select y = 6(10) = 110(2) then

git # ba293a0e @ 2019-11-14 119



ti ti+1 ti+2

i-thstage

(i + 1)-thstage

(a) Option #1: without pipeline registers.

ti ti+1 ti+2

adv

i-thregister

(i + 1)-thregister

(i + 2)-thregister

i-thstage

(i + 1)-thstage

(b) Option #2: with pipeline registers and a global control signal.

ti ti+1 ti+2

adv0 adv1 adv2

i-thregister

(i + 1)-thregister

(i + 2)-thregister

i-thstage

(i + 1)-thstage

(c) Option #3: with pipeline registers and multiple, per-stage control signals.

Figure 2.43: A problematic pipeline, and a solution involving the use of pipeline registers and a control signal to indicatewhen each stage should advance.

1. since y0 = 0, the 0-th stage passes the input x through unaltered to form the intermediate result x′, then

2. since y1 = 1, the 1-st stage shifts the intermediate input x′ by a distance of 21 = 2 bits to form theintermediate result x′′, then

3. since y2 = 1, the 2-nd stage shifts the intermediate input x′′ by a distance of 22 = 4 bits to form the result r

meaning overall, x is shifted by 2 + 4 = 6 bits as required.Applying the same reasoning as above Figure 2.46b splits the design into a 3-stage pipeline; this decision

is natural given that the computation is trivially split into three stages of equal latency. Now, the critical pathis now determined by just one stage rather than all three since each stages works independently; the 1-st and2-nd stages, for example, compute results using an input in the 1-st and 2-nd pipeline registers while the 0-thstage computes a result using the input x. As such, we get a similar benefit as the abstract example: basicallywe improve the throughput by nearly a factor of three, with a slight increase in overall latency as a result ofthe extra registers.

2.5 Implementation and fabrication technologies

When we write software (i.e., a program), we usually intend to use it somehow (i.e., execute it on a computer).The program (or description of behaviour) is often compiled into an form we can use; depending on theprocessor we want to use, our program might be compiled in different ways and produce different executableforms.

git # ba293a0e @ 2019-11-14 120



eachX

icom

putesti+

1from

tistored

inR

i

eachX

icom

putesti+

1from

tistored

inR

i

· · ·

eachR

i+1

storesti+

1com

putedby

Xi

eachR

i+1

storesti+

1com

putedby

Xi

eachR

i+1

storesti+

1com

putedby

Xi

· · ·

Figure 2.44: An illustrative waveform, outlining the stages of computation as a pipeline is driven by a clock.

X300ns

R20ns

x r

(a) Option #1: an unpipelined design.

X0100ns

R120ns

X1100ns

R220ns

X2100ns

R320ns

x r

(b) Option #2: a 3-stage pipelined design.

Figure 2.45: An unpipelined, abstract combinatorial circuit and a 3-stage pipelined alternative.

In a rough sense, the same process applies to circuits: once we have a description of behaviour, we need toactually realise the corresponding components (i.e., logic gates or transistors) so that we can use them. Thereare various ways to achieve this, which depend on the underlying technology used: using semi-conductors toconstruct transistors is not the only option. Although the topic is somewhat beyond the scope of this book, itis useful to understand some approaches and technologies involved: at very least, it acts to connect theoreticalconcepts with their practical realisation.

2.5.1 Silicon fabrication

2.5.1.1 Lithography

The construction of semi-conductor-based circuits is very similar to how pictures are printed, or at least wereprinted before the era of digital photography and laser printers! The act of printing pictures onto a surfaceis termed lithography and has been used for a couple of centuries to produce posters, maps and so on; theprocess involves controlled use of chemical processes within a controlled environment, often termed a darkroom. The basic idea is to coat a surface, which we usually call the substrate, with a photosensitive chemical.We then expose the substrate to light projected through a negative, or mask, of the required image; the endresult is a representation of said image left on the substrate where light reacts with the chemical. After washingthe substrate, one can treat it with further chemicals so that the treated areas representing the original imageare able to accept inks while the rest of the substrate cannot.

For semi-conductors the analogous process is photolithography, and involves very similar steps which areillustrated by Figure 2.48. We again start (Figure 2.48a) with a substrate, which is usually a wafer of silicon;this is often circular by virtue of machining it from a synthetic ingot, or boule, of very pure silicon. After beingcut into shape, the wafer is polished to produce a surface suitable for the next stage. We can now coat it witha layer of base material we wish to work with (Figure 2.48b), for example a doped silicon or metal. Then wecoat the whole thing with a photosensitive chemical, usually called a photo-resist (Figure 2.48c). Two typesexist, a positive one which hardens when hidden from light and a negative one which hardens when exposed

git # ba293a0e @ 2019-11-14 121



× +

Flip

-flop

base

dre

gist

erxy

z

r


×

Flip

-flop

base

dre

gist

erFl

ip-fl

opba

sed

regi

ster

+

Flip

-flop

base

dre

gist

erFl

ip-fl

opba

sed

regi

ster

xy

z

r

0-thstage

1-ststage

1-stregister(s)

2-ndregister(s)


Figure 2.46: An unpipelined, 8-bit Multiply-ACumulate (MAC) circuit and a 3-stage pipelined alternative.

git # ba293a0e @ 2019-11-14 122



� 1c

x

yr � 2

c

x

yr � 4

c

x

yr

Flip

-flop

base

dre

gist

er

x r

y0

y

y1 y2


� 1c

x

yr

Flip

-flop

base

dre

gist

erFl

ip-fl

opba

sed

regi

ster

� 2c

x

yr

Flip

-flop

base

dre

gist

erFl

ip-fl

opba

sed

regi

ster

� 4c

x

yr

Flip

-flop

base

dre

gist

erFl

ip-fl

opba

sed

regi

ster

x r

y0

y

y1 y2

0-thstage

1-ststage

2-ndstage

1-stregister(s)

2-ndregister(s)

3-rdregister(s)


Figure 2.47: An unpipelined, 8-bit logarithmic shift circuit and a 3-stage pipelined alternative.

git # ba293a0e @ 2019-11-14 123



(a) The substrate. (b) Coating of base material.

(c) Coating of photosensitive chemical. (d) Exposure to a (simple) mask.

(e) Application of etching. (f) Application of etching.

(g) Final result.

Figure 2.48: A high-level illustration of a lithography-based fabrication process.

git # ba293a0e @ 2019-11-14 124



Figure 2.49: Bonding wires connected to a high quality gold pad (public domain image, source: http://en.wikipedia.org/wiki/Image:Wirebond-ballbond.jpg).

Figure 2.50: A heatsink ready to be attached, via the z-clip, to a circuit in order to dissipate heat (public domain image,source: http://en.wikipedia.org/wiki/File:Pin_fin_heat_sink_with_a_z-clip.png).

to light. By projecting a mask of the circuit onto the result (Figure 2.48d), one can harden the photo-resist sothat only the required areas are covered with a hardened covering. After baking the result to fix the hardenedphoto-resist, and etching to remove the surplus base material, one is left with a layer of the base material onlywhere dictated by the mask (Figure 2.48e to Figure 2.48g).

The process iterates to produce many layers of potentially different materials, i.e., the result is 3D not 2D. Wemight need layers of N-type and P-type semi-conductor and a metal layer to produce transistors, for example.The feature size (e.g., 90nm CMOS) relates to the resolution of this process; for example, accuracy of thephotolithographic process dictates the width of wires or density of transistors. Regularity of such featuresis a major advantage: we can manufacture many similar components in a layer using one photolithographicprocess. For example, if we aim to manufacture many transistors they will all be composed of the same layersalbeit in different locations on the substrate.

2.5.1.2 Packaging

Before we can use the “raw” output from the photolithography, a process of packaging is typically applied.At very least, the first step is to cut out individual components from the resulting wafer: remember that wecan produce many identical components using the same process, so this step gives us a single component wecan use. Before we do so however, each component is typically mounted on a plastic base and connectedto externally accessible pins (or pads) with bonding wires. This makes the inputs to and outputs from thecomponent (which may be physically tiny and delicate) easier to access. A protective, often plastic, packageis also applied to prevent physical damage; large or power-hungry components might also mandate use of aheat sink (and fan) to dissipate heat.

The final result is a self-contained component, which we can describe as a microchip (or simply a chip) andstart to integrate with other components to construct a larger system.

2.5.1.3 Moore’s Law

Gordon Moore, co-founder of Intel, is credited with identification of an important and influential trend as-sociated with development of transistor-based technology. The so-called Moore’s Law was originally an

git # ba293a0e @ 2019-11-14 125






1000

10000

100000

1000000

10000000

100000000

1000000000

1970 1975 1980 1985 1990 1995 2000 2005

Num

ber o

f tra

nsis

tors

Year

Intel 4004Intel 8008

Intel 8080

Intel 8086Intel 8088

Intel 80286

Intel 80386

Intel 80486

Intel PentiumIntel Pentium Pro

Intel Pentium 2Intel Pentium 3

Intel Pentium 4Intel Pentium M

Intel Pentium D

Figure 2.51: A timeline of Intel processor innovation demonstrating Moore’s Law (data from http://www.intel.com/technology/mooreslaw/).

observation [5] in 1965

The complexity for minimum component costs has increased at a rate of roughly a factor of two per year.Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term,the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearlyconstant for at least 10 years. That means by 1975, the number of components per integrated circuit forminimum cost will be 65, 000.

– Moore

and later updated: in short “the number of transistors that can be fabricated in unit area doubles roughly everytwo years”. In a sense, this has become a form of a self-fulfilling prophecy in that the “law” is now an acceptedtruth: industry is forced to deliver improvements, and is in part driven by the law rather than the other wayaround!

Figure 2.51 demonstrates the manifestation of Moore’s Law on the development of Intel processors. Theimplications for design of such processors, and circuits more generally, can be viewed in (at least) two ways:

1. If one can fit more transistors in unit area, the transistors are getting smaller and hence working fasterdue to their physical characteristics. As a result one can take a fixed design and, over time, it will getfaster or use less power as a result of Moore’s Law.

2. If one can fit more transistors in unit area, then one can design and implement more complex structuresin the same fixed area. As a result, over time, one can use the extra transistors to improve the design yetkeep it roughly the same size.

There is no “free lunch” however; Moore notes that as feature size decreases (i.e., transistors get smaller)two problems become more and more important. First, power consumption and heat dissipation become anissue: it is harder to distribute power to the more densely packed transistors and keep with within operationaltemperature limits. Second, process variation, which may imply defects and reduce yield, starts to increasemeaning a higher chance that a manufactured chip malfunctions.

2.5.2 (Re)programmable fabrics

Among many alternatives to manufacture of circuits using silicon-based transistors, two in particular are inter-esting. You can think of them as making two steps from silicon (implying a fixed circuit once manufactured),toward a fabric that can be reprogrammed again and again (more like software) to form any circuit required.The resulting performance and flexibility characteristics blur traditional boundaries between hardware andsoftware, and such fabrics are therefore increasingly important components with a broad range of applications.

git # ba293a0e @ 2019-11-14 126





2.5.2.1 Programmable Logic Arrays (PLAs)

A Programmable Logic Array (PLA) is a general-purpose fabric that can be configured to implement specificSoP or PoS expressions as combinatorial circuits. The fabric itself accepts n inputs, say xi for 0 ≤ i < n, andproduces m outputs, say r j for 0 ≤ j < m via logic gates arranged in two planes. Using an AND-OR type PLAas an example, the first plane computes a set of minterms using AND gates; those minterms are fed as inputto a second plane of OR gates whose output is the required SoP expression. An OR-AND type PLA simplyreverses the ordering of the planes, thus allows implementation of PoS expressions.

This does not hint at a PLA being particularly remarkable: why is it any different to the combinatorial circuitswe have seen already? The crucial difference is how we end up with the required circuit. The starting point isa generic, clean fabric as shown in Figure 2.52a. At this point you can think of all of the gates being connectedto all corresponding gate inputs via existing connection points at wire junctions (filled circles), and fuses atthe gate inputs (filled boxes). This is transformed into a specific circuit using a process roughly analogous toprogramming: we selectively blow fuses, guided by a configuration that is derived from the circuit design.Normally a fuse acts as a conductive material, somewhat like a wire; when the fuse is blown using somedirected energy, however, it becomes a resistive material. Therefore, to form the required connections wesimply blow all the fuses6 where no connection is required. Figure 2.52b shows an example, where fuses havebeen blown (now shown as unfilled boxes) to form various connections (shown as thick lines). As a result, thisPLA computes

r0 = (x0 ∧ ¬x1) ∨ (¬x0 ∧ x1) = x0 ⊕ x1

andr1 = x0 ∧ x1,

i.e., it is a half-adder.We say that a PLA fabric is one-time programmable. Put simply, once a fuse (or antifuse) is blown, it

cannot be unblown. Since a PLA can only be configured once, it is not unreasonable to think of a PLA as like aROM (in the sense that once programmed, the content is fixed) but has the advantage of being able to optimisefor don’t care states. However, the fixed structure means that versus conventional combinatorial logic, it hasthe disadvantage of being less (easily) able to capitalise on optimisations such as sharing logic for commonsub-experssions.

2.5.2.2 Field Programmable Gate Arrays (FPGAs)

Although a PLA might be useful for some tasks, two clear limitations are evident: such a fabric

1. is special-purpose in so much as it implements only SoP- or PoS-type designs, as a result of the wiringand gate structure, and is constrained by parameters such as n and m, and

2. is only one-time programmable, since once the fuses are irreversibly blown it then implements a fixedcircuit.

As such, one could consider generalising the underlying idea by a) allowing the wiring and gate structure to beconfigured freely, and then b) allowing this configuration to be performed multiple times, using some type ofmemory instead of fuses for each element of configuration data. A Field Programmable Gate Array (FPGA)fabric is the result, whose goal is basically to offer a general-purpose, many-time programmable fabric: theFPGA can be configured with one circuit design and then re-configured with another design at a later point intime.

Figure 2.53a is a conceptual representation of an FPGA fabric, which is basically a collection of logic resources(or blocks) organised in a two-dimensional mesh; the logic blocks are connected using routing resources placedbetween them. Both the logic and routing resources are controlled by a configuration termed a bit-stream.For instance, the routing resources are conceptually similar to fuses in the sense they determine connectivity;unlike fuses, they are re-configurable switches that can be turned on and off as required rather than blown ina one-off act. In a similar way the logic resources are analogous to logic gates, but now their functional can bechanged to suit as part of the configuration process: a specific logic resource might be configured to act as anAND gate in one circuit, then as an XOR gate in another at some later point in time. This produces a muchmore flexible structure than a PLA, plus limit of one-time programmability.

This alone is a fairly big step forward, but the logic resources offer even more features internally: theyare not just reconfigurable logic gates. Although the architecture of different brands and families within abrand differ, we focus on Xilinx Virtex-4 devices as an example. The central Vertex-4 logic resource is calleda Configurable Logic Block (CLB): each CLB is connected to (and hence can communicate with) immediateneighbours, and contains four slices. Figure 2.53b is a block diagram of a Vertex-4 slice, which contains

6 Alternatively, one can consider an antifuse which acts in the opposite way to a fuse (normally it is a resistor but when blown it isa conductor). Using antifuses at each junction means the configuration process blows each antifuse at a junctions where a connection isrequired.

git # ba293a0e @ 2019-11-14 127



x0

x1

xn−1

r0

r1

rm−1

(a) A “clean” PLA fabric, with fuses (filled boxes) acting as potential connections between the AND and OR planes.

x0

x1

xn−1

r0

r1

rm−1

(b) The PLA fabric with blown fuses (empty boxes) to implement a half-adder.

Figure 2.52: Conceptual diagrams of a PLA fabric.

git # ba293a0e @ 2019-11-14 128



CLB CLB CLB CLB

CLB CLB CLB CLB

CLB CLB CLB CLB

DCM BRAM MUL I/O

(a) The mesh of configurable logic (large boxes) and communication resources (small boxes).

en

D Q

¬Q

en

D Q

¬Q

w

x

y

z

r

w

x

y

z

r

arithmeticand carry

logic

LUT

LUT

c

x

yr

c

x

yr

(b) A example Vertex-5 slice, including two LUTs, two D-type flip-flops and a suite of arithmetic cells.

Figure 2.53: Conceptual diagrams of an FPGA fabric.

git # ba293a0e @ 2019-11-14 129



• two 4-input, 1-output Look-Up Tables (LUTs),

• two D-type flip-flops,

• a suite of arithmetic cells, including two 1-bit full-adders, and

• several interconnected multiplexers.

The important thing to grasp is that although this looks like fixed circuit design, various aspects of it arereconfigurable. A good example is the LUT content. Each LUT is basically a 16-cell SRAM memory: givena 4-bit input i, it reads the i-th SRAM cell and uses this as the 1-bit output. So by storing appropriate valuesin the SRAM during the device configuration phase, the LUT can be used to compute any 4-input, 1-outputBoolean function. Likewise the 4-input multiplexers acting as input to the two flip-flops are controlled by thedevice configuration, not control-signals generated by another part of the circuit. In addition to standard CLBs,Vertex-4 FPGAs also offer various other special-purpose logic resources. Figure 2.53a attempts to show thisfact by including

• a Digital Clock Manager (DCM) block, which allows a fixed input clock to be manipulated in a waythat suits the device configuration,

• a Block RAM (BRAM) block, instances of which act like memory devices, and are often realised usingSRAM or similar,

• an Input/Output (I/O) block, which allow off-fabric communication.

Other possibilities include common arithmetic building blocks, multipliers for instance, which would berelatively costly to construct using the CLB resources yet are often required.

The added complexity of supporting such flexibility typically means FPGAs have a lower maximum clockfrequency, and will consume more power than a comparable implementation directly in silicon. As such,they are often used as a prototyping device for designs which will eventually be fabricated using a morehigh-performance technology. Other applications include those where debugging and updating hardwareis important, meaning an FPGA-based solution is as flexible as software while also improving performance.Consider space exploration for example: it turns out to be exceptionally useful to be able to remotely fix bugsin hardware rather than write off a multi-million pound satellite which is orbiting Mars (and hence out of thereach of any local repair men).

References

[1] D. Harris and S. Harris. Digital Design and Computer Architecture: From Gates to Processors. Morgan-Kaufmann, 2007. isbn: 0-123-70497-9.

[2] M. Karnaugh. “The map method for synthesis of combinatorial logic circuits”. In: Transactions of AmericanInstitute of Electrical Engineers 72.9 (1953), pp. 593–599 (see p. 79).

[3] M. Knodel and N. ten Oever. Terminology, Power and Oppressive Language. Internet Engineering Task Force(IETF) Internet Draft. 2018. url: https://tools.ietf.org/id/draft-knodel-terminology-00.html(see p. 112).

[4] E.J. McCluskey. “Minimization of Boolean function”. In: Bell System Technical Journal 35.5 (1956), pp. 1417–1444 (see p. 85).

[5] G.E. Moore. “Cramming more components onto integrated circuits”. In: Electronics Magazine 38.8 (1965),pp. 114–117 (see p. 126).

[6] C. Petzold. Code: Hidden Language of Computer Hardware and Software. Microsoft Press, 2000.

[7] W.V. Quine. “The problem of simplifying truth functions”. In: The American Mathematical Monthly 59.8(1952), pp. 521–531 (see p. 85).

[8] R.J. Smith and R.C. Dorf. “Chapter 12: Transistors and Integrated Circuits”. In: Circuits, Devices andSystems. 5th ed. John Wiley, 1992 (see p. 69).

[9] A.S. Tanenbaum and T. Austin. Structured Computer Organisation. 6th ed. Prentice-Hall, 2012.

[10] E.W. Veitch. “A Chart Method for Simplifying Truth Functions”. In: ACM National Meeting. 1952, pp. 127–133 (see p. 84).

[11] N.H.E. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. 2nd ed. Addison-Wesley, 1993 (see p. 103).

git # ba293a0e @ 2019-11-14 130


https://tools.ietf.org/id/draft-knodel-terminology-00.html


CHAPTER

3

FINITE STATE MACHINES (FSMS)

3.1 State machines: from simple to more complex control-paths

The topic of automaton, specifically Finite State Machines (FSMs), has a very formal basis; basically they aremodels of computation, not too far from topics such as Turing Machines (TMs). Put another way, you can thinkof an FSM as a computer, albeit a simple one.

The control-path in Figure 2.37 is very simple: this is partly an artefact of the problem at hand of course,but masks the difficulty of dealing with more complicated problems. FSMs represent an attractive solutionhowever, allowing us to reason about and implement more complicated, general-purpose control-paths.

3.1.1 A rough overview of FSM-related theory

Definition 3.1. An FSM is a (theoretical) machine that can be in a finite set of states. The machine consumes inputsymbols from an alphabet (which defines which symbols are valid and so on) one at a time; symbols make the machinetransition from one state to another according to a transition function. When the input is exhausted, the machinehalts; depending on the state it halts in, the machine is said to accept or reject the input. The set of inputs accepted bythe machine is termed the language accepted; this can be used to classify the machine itself.

Definition 3.2. Based on the fact that

1. entry actions happen when entering a given state,

2. exit actions happen when exiting a given state,

3. input actions happen based on the state and any input received, and

4. transition actions happen when a given transition between states is performed

we can categorise an FSM based on output behaviour:

1. a Moore-style FSM only uses entry actions, i.e., the output depends on the state only, while

2. a Mealy-style FSM only uses input actions, i.e., the output depends on the state and the input.

An alternative classification relates to transition behaviour, where an FSM is deemed

1. deterministic if for each state there is always one transition for each possible input (i.e., we always know what thenext state should be), or

2. non-deterministic if for each state there might be zero, one or more transitions for each possible input (i.e., weonly know what the next state could be).

Definition 3.3. A given FSM can be defined via the following:

git # ba293a0e @ 2019-11-14 131



δQ Q′

Xi = 0 Xi = 1Seven Seven SoddSodd Sodd Seven

(a) A tabular description.

Sevenstart Sodd

Xi = 0

Xi = 1

Xi = 0

Xi = 1

(b) A diagrammatic description.

Figure 3.1: An example FSM to decide whether there is an odd number of 1 elements in some sequence X.

δQ Q′

Xi = 10 Xi = 20S0 S10 S20S10 S20 S30S20 S30 S⊥S30 S⊥ S⊥


S0start S20 S⊥

S10 S30

Xi = 10

Xi = 20

Xi = 10

Xi = 20

Xi = 10

Xi = 20

Xi = 10

Xi = 20

ε

Xi = 10

Xi = 20

ε


Figure 3.2: An example FSM modelling a simple vending machine.

1. S, a finite set of states and a distinguished start state s ∈ S.

2. A ⊆ S, a finite set of accepting states.

3. An input alphabet Σ and output alphabet Γ.

4. A transition function

δ : S × Σ→ S.

5. An output function

ω : S→ Γ

in the case of a Moore FSM, or

ω : S × Σ→ Γ

in the case of a Mealy FSM.

Note that:

• The FSM itself might be enough to solve a given problem, but it is common to control an associated data-path usingthe outputs.

• A special “empty” (or null) input denoted ε allows a transition which can always occur.

• It is common to allow δ to be a partial function, i.e., a function which is not defined for all inputs.

• If the FSM is non-deterministic, then δ might instead give a set of possibilities that is sampled from.

More simply, you can think of an FSM as a directed graph where moving between nodes (which representeach state) means consuming the input on the corresponding edge. Some examples should show that the fairlyformal description above translates into a much more manageable reality.

git # ba293a0e @ 2019-11-14 132



3.1.1.1 Example #1: even or odd number of 0 elements

Imagine we are tasked with designing an FSM that decides whether a binary sequence X has an odd number1 elements in it (i.e., it computes the parity of X). The input alphabet in this case is

Σ = {0, 1}

since each Xi can either be 0 or 1. The FSM can clearly be in two states: having consumed the input so far, itcan either have seen an even or odd number of 1 elements. Therefore we can say

S = {Seven,Sodd},

have s = Seven as the starting state, and let A = {Sodd} be the (singleton) set of accepting states. There is no outputas such, so in this case both the and output alphabet Γ and output function ω are irrelevant.

Our final task is to define the transition function. Figure 3.1 includes a tabular and a diagrammaticdescription of the same thing. The tabular, truth table style description is easier to discuss. The idea is thatit lists the current state (left-hand side), alongside the next state for each possible input (right-hand side). Inwords, the rows read as follows:

• if we are in state Seven and the input Xi = 0 then we stay in state Seven,

• if we are in state Seven and the input Xi = 1 then we move to state Sodd,

• if we are in state Sodd and the input Xi = 0 then we stay in state Sodd, and

• if we are in state Sodd and the input Xi = 1 then we move to state Seven.

The intuition is, for example and with a similar argument possible for the state Sodd, that if we are in state Seven(i.e., have seen an even number of 1 elements so far) and the next input is 1, then we have now seen an oddnumber of 1 elements so move to state Sodd. Conversely, if we are in state Seven (i.e., have seen an even numberof 1 elements so far) and the next input is 0, then we have still seen an even number of 1 elements so stay instate Seven.

Consider some examples of the FSM in operation

1. For the input X = 〈1, 0, 1, 1〉 the transitions are

{ SevenX0=1{ Sodd

X1=0{ Sodd

X2=1{ Seven

X3=1{ Sodd

meaning we start in state Seven then

(a) move to Sodd since X0 = 1,

(b) stay in Sodd since X1 = 0,

(c) move to Seven since X2 = 1, and finally

(d) move to Sodd since X3 = 1

Since we finish in state Sodd, the input is accepted and hence we conclude it has an odd number of 1elements.

2. For the input X = 〈0, 1, 1, 0〉 the transitions are

{ SevenX0=0{ Seven

X1=1{ Sodd

X2=1{ Seven

X3=0{ Seven

meaning we start in state Seven then

(a) stay in Seven since X0 = 0,

(b) move to Sodd since X1 = 1,

(c) move to Seven since X2 = 1, and finally

(d) stay in Seven since X3 = 0.

Since we finish in state Seven, the input is rejected and hence we conclude it has an even number of 1elements.

git # ba293a0e @ 2019-11-14 133



3.1.1.2 Example #2: a vending machine

Imagine we are tasked with designing an FSM that controls a vending machine. The machine accepts tokensworth 10 or 20 units: when the total value of tokens entered reaches 30 units it delivers a chocolate bar butit does not give change. That is, the exact amount must be entered otherwise an error occurs, all tokens areejected and we start afresh.

The design is clearly a little more complex this time. The input alphabet is basically just the tokens that themachine can accept, so we have

Σ = {10, 20}.

The set of states the machine can be in is easy to enumerate: it can either have accepted tokens totalling 0, 10,20 or 30 units in it or be in the error state which we denote by ⊥. Thus, we can say

S = {S⊥,S0,S10,S20,S30}

and clearly set s = S0 since initially the machine has accepted no tokens. There is one accepting state, which iswhen a total of 30 tokens has been accepted, so A = {S30}. Since there is again no output, our final task is againto define the transition function. As before, Figure 3.2 outlines a tabular and diagrammatic description.

1. For the input X = 〈10, 20〉 the transitions are

{ S0X0=10{ S10

X1=20{ S30

meaning we start in state S0 then

(a) move to S10 since X0 = 10, and finally

(b) move to S30 since X3 = 30.

Since we finish in state S30, the input is accepted and we get a chocolate bar as output!

2. For the input X = 〈20, 20〉 the transitions are

{ S0X0=20{ S20

X1=20{ S⊥

meaning we start in state S0 then

(a) move to S20 since X0 = 20, and finally

(b) move to S⊥ since X3 = 20.

Since we finish in state S⊥, the error state, the input is rejected and the tokens are returned.

Note that the input marked ε is the empty input; that is, with no input we can move between the accepting orerror states back into the start state thus resetting the machine. So for example, once we accept or reject theinput we might assume the machine returns to state S0.

3.1.2 Practical implementation of FSMs in hardware

Based on the formal definition above, Figure 3.3 illustrates a general framework into which we can placeconcrete implementations of the component parts in a specific FSM. It is crucial to notice that when drawn asa diagram like this, we can have

1. the state implemented by register (i.e., a group of latches or flip-flops), and

2. the δ and ω functions implemented using combinatorial logic only: they are functions of the current stateand any input.

The behaviour of the framework is illustrated by Figure 3.4. The idea is that within a given current clock cycle

1. ω computes the output from the current state and input, and

2. δ computes the next state from the current state and input

git # ba293a0e @ 2019-11-14 134



Flip-flop basedregister(s)

δ

ω

Input

Output

Clock

Q

Q

Q′



δ

ω


Input

Output

Φ1

Φ2

Q

Q

Q′

Q′


Figure 3.3: Two generic FSM frameworks (for different clocking strategies) into which one can place implementations ofthe state, δ (the transition function) and ω (the output function).

such that the next state is latched by the positive clock edge marking the next clock cycle. So we have a periodof computation in which ω and δ operate, then an update triggered by a positive clock edge which steps theFSM from the current state into the next state. What results is a series of steps, under control of the clock, eachperforming some computation. As such, it should be clear that the clock frequency determines how quicklycomputation occurs; it has to be fast enough to to satisfy the design goals, yet slow enough to cope with thecritical path of a given step of computation. That is, the faster the clock oscillates the faster we step though thecomputation, but if it is too fast we cannot finish one step before the next one starts.

To summarise, this is a framework for a computer we can build: we know how each of the componentsfunction, and can reason about their behaviour from the transistor-level upward. To solve a concrete problemusing the framework, we follow a (fairly) standard sequence of steps:

1. Count the number of states required, and give each state an abstract label.

2. Describe the state transition and output functions using a tabular or diagrammatic approach.

3. Decide how the states will be represented, i.e., assign concrete values to the abstract labels, and allocatea large enough register to hold the state.

4. Express the functions δ and ω as (optimised) Boolean expressions, i.e., combinatorial logic.

5. Place the registers and combinatorial logic into the framework.

Versus a theoretical alternative, it is less common for a hardware-based FSM to have have an accepting statessince we cannot usually halt the circuit (without turning it off); we might include idle or error states to cope. Inaddition, and although the framework does not show it, it is common to have a reset input that (re)initialisesthe FSM into the start state. For one thing, this avoids the need to turn the FSM off then on again to reset it!

Example #1: an ascending modulo 6 counter Imagine we are tasked with designing an FSM that acts as acyclic counter modulo n (rather than 2n as before). If n = 6 for example, we want a component whose output rsteps through values

0, 1, 2, 3, 4, 5, 0, 1, . . . ,

with the modular reduction representing control behaviour (versus the uncontrolled counter that was cyclicby default). In this case it is clear the FSM can be in one of 6 states (since the counter value is is one of0, 1, . . . , 5), which we label S0,S1, . . . ,S5. Figure 3.5 includes tabular and diagrammatic descriptions of thetransition function, both of which are a little dull: they simply move from one state to the next (with the εmeaning no input is required), cycling from S5 back to S0.

git # ba293a0e @ 2019-11-14 135



compute

Q′=δ(Q

,X0 )

Y0=ω

(Q,X

0 )

compute

Q′=δ(Q

,X1 )

Y1=ω

(Q,X

1 )

· · ·

flip-flopsreset

Qto

startstate

flip-flopsstore

Q←

Q′

flip-flopsstore

Q←

Q′

· · ·


Φ1

Φ2

inputlatchesresetQ

tostartstate

inputlatchesstore

Q←

Q′

inputlatchesstore

Q←

Q′

· · ·

compute

Q′

=δ(Q

,X0 )

Y0

=ω

(Q,X

0 )

compute

Q′

=δ(Q

,X1 )

Y1

=ω

(Q,X

1 )

· · ·

outputlatchesstore

Q′

outputlatchesstore

Q′

· · ·


Figure 3.4: Two illustrative waveforms (for different clocking strategies), outlining stages of computation within theassociated FSM framework.

git # ba293a0e @ 2019-11-14 136



An aside: binary versus one-hot encodings.

The fact that state assignment occurs quite late in the design of a given FSM is intentional: it allows us tooptimise the representation based on what we do with it. So far, we have used a natural, binary encoding torepresent the i-th of n states as a (dlog2(n)e)-bit unsigned integer i. For example, if n = 6 we use

S0 7→ 〈0, 0, 0〉S1 7→ 〈1, 0, 0〉S2 7→ 〈0, 1, 0〉S3 7→ 〈1, 1, 0〉S4 7→ 〈0, 0, 1〉S5 7→ 〈1, 0, 1〉

This is not the only option, however.A one-hot encoding represents the i-th of n states as a sequence X st. Xi = 1 and X j = 0 for j , i. For

example, if n = 6 again, then we useS0 7→ 〈1, 0, 0, 0, 0, 0〉S1 7→ 〈0, 1, 0, 0, 0, 0〉S2 7→ 〈0, 0, 1, 0, 0, 0〉S3 7→ 〈0, 0, 0, 1, 0, 0〉S4 7→ 〈0, 0, 0, 0, 1, 0〉S5 7→ 〈0, 0, 0, 0, 0, 1〉

meaning that for S0, the 0-th bit is 1 and all others are 0. On one hand, and depending on n, this might meanwe need more flip-flops to store the state (i.e., n instead of dlog2(n)e). On the other hand, we potentially get twoadvantages, namely

1. transition between states is easier (we simply rotate any given encoding by the right distance to getanother), and

2. switching behaviour (and hence power consumption) is reduced since only two bits toggle for any change(one from 1 to 0, and one from 0 to 1).

Clearly 23 = 8 > 6, so we can represent the current state using a 3-bit integer Q = 〈Q0,Q1,Q2〉. That is,

S0 7→ 〈0, 0, 0〉 ≡ 000(2)S1 7→ 〈1, 0, 0〉 ≡ 001(2)S2 7→ 〈0, 1, 0〉 ≡ 010(2)S3 7→ 〈1, 1, 0〉 ≡ 011(2)S4 7→ 〈0, 0, 1〉 ≡ 100(2)S5 7→ 〈1, 0, 1〉 ≡ 101(2)

To implement the FSM, all we need to do is derive Boolean equations for the transition function δ so it cancompute the next state Q′ from Q; with this FSM there is no input, so δ is a function of the current state. To doso, we first rewrite the tabular description of δ by replacing the abstract labels with concrete values. The resultis a truth table, i.e.,

δ ωQ2 Q1 Q0 Q′2 Q′1 Q′0 r2 r1 r0

0 0 0 0 0 1 0 0 00 0 1 0 1 0 0 0 10 1 0 0 1 1 0 1 00 1 1 1 0 0 0 1 11 0 0 1 0 1 1 0 01 0 1 0 0 0 1 0 11 1 0 ? ? ? ? ? ?1 1 1 ? ? ? ? ? ?

which encodes the same information. For example, if the current state is Q = 〈0, 0, 0〉 (i.e., we are in state S0)then the next state should be Q′ = 〈1, 0, 0〉 (i.e., state S1). Note that there are 2 unused states, namely 〈0, 1, 1〉

git # ba293a0e @ 2019-11-14 137



An aside: Moore vs. Mealy style FSMs.

When written symbolically, the motivation for using either a Moore or Mealy style FSMs may be unclear. Whenthe framework for implementing FSMs is taken into account, however, the issue should become more concrete:

• In a Moore FSM the output depends on the current state only, implying changes to any input are onlyrelevant when the state is updated; you can think of this as meaning the inputs are only relevant inrelation to the clock signal that triggers said update (i.e., they are only taken into account periodically,rather than continuously).

• In contrast, a Mealy FSM allows the output to depend on the current state and any input. ω is acombinatorial function, so this implies the output can change a) in relation to the clock signal as a resultup an update to the state, and/or b) at any time as a result of changes to the input. You could think of this asmeaning the FSM is more responsive, in the sense that although the state is updated at the same frequency(i.e., in relation to the same features of the clock) the output can continuously, and instantaneously changeif/when the input changes.

Both are viable options, so it is not true that one is correct or incorrect. However, it is clearly important tounderstand the (subtle) difference so an informed choice can be made within some specific context.

δ ωQ Q′ rS0 S1 0S1 S2 1S2 S3 2S3 S4 3S4 S5 4S5 S0 5


S0

startstart

S1

S2

S3

S4

S5

ε

ε

ε

ε

ε

ε


Figure 3.5: An example FSM modelling an ascending modulo 6 counter.

and 〈1, 1, 1〉, which we include in the table: the next state in either of these cases does not matter since they areinvalid, so the entries are don’t care.

To summarise, we need to derive Boolean expressions for each of Q′2, Q′1 and Q′0 in terms of Q2, Q1 and Q0.This can be achieved by applying the Karnaugh map technique to get

��

Q′20 01 0

01??

Q1

Q2

Q0 ��

��Q′10 10 0

10??

Q1

Q2

Q0 ��

��

Q′01 01 0

10??

Q1

Q2

Q0

which produceQ′2 = ( Q1 ∧ Q0 ) ∨

( Q2 ∧ ¬Q0 )

Q′1 = ( ¬Q2 ∧ ¬Q1 ∧ Q0 ) ∨( Q1 ∧ ¬Q0 )

Q′0 = ( ¬Q0 )

Now we have enough to fill in the FSM framework: the state is simply a 3-bit register, δ is represented by

git # ba293a0e @ 2019-11-14 138



δ ωQ Q′ r f

d = 0 d = 1 d = 0 d = 1S0 S1 S5 0 0 1S1 S2 S0 1 0 0S2 S3 S1 2 0 0S3 S4 S2 3 0 0S4 S5 S3 4 0 0S5 S0 S4 5 1 0


S0

startstart

S1

S2

S3

S4

S5

d = 0

d = 1

d = 0

d = 1

d = 0

d = 1

d = 0

d = 1

d = 0

d = 1

d = 0

d = 1


Figure 3.6: An example FSM modelling an ascending or descending modulo 6 counter.

δ ωQ Q′ Mg Ma Mr Ag Aa Ar

rst = 0 rst = 1S0 S1 S6 1 0 0 0 0 1S1 S2 S6 0 1 0 0 0 1S2 S3 S6 0 0 1 0 1 0S3 S4 S6 0 0 1 1 0 0S4 S5 S6 0 0 1 0 1 0S5 S0 S6 0 1 0 0 0 1S6 S0 S6 0 0 1 0 0 1


S0

startstart

S1

S2

S3

S4

S5

S6

rst = 0

rst = 1rst = 0

rst = 1

rst = 0rst = 1

rst = 0

rst = 1

rst = 0

rst = 1

rst = 0

rst = 1

rst = 0

rst = 1


Figure 3.7: An example FSM modelling a traffic light controller.

git # ba293a0e @ 2019-11-14 139



circuit analogues of the expressions above. Note that tn this case, the output function ω is trivial: the counteroutput r = Q due to our state assignment, so in a sense ω is just the identity function.

Example #2: an ascending or descending modulo 6 counter No imagine we need to upgrade the previousexample: we are tasked with designing an FSM that again acts as a cyclic counter modulo n, but whose directioncan also be controlled. If n = 6 for example, we want a component whose output r steps through values

0, 1, 2, 3, 4, 5, 0, 1, . . .

or0, 5, 4, 3, 2, 1, 0, 5, . . .

depending on some input d, plus has an output f to signal when the cycle occurs (i.e., when the current valueis last or first in the sequence, depending on d).

The possible states are the same as before: we still have 6 states, labelled S0,S1, . . .S6. The difference is howtransitions between states occur; this is illustrated by Figure 3.6, in which the new tabular and diagrammaticdescriptions of the transition function are shown. Although it looks more complicated, we take exactly thesame approach as before: we start by rewriting the tabular description of δ by replacing the abstract labels withconcrete values to yield:

δ ωd Q2 Q1 Q0 Q′2 Q′1 Q′0 r2 r1 r0 f0 0 0 0 0 0 1 0 0 0 00 0 0 1 0 1 0 0 0 1 00 0 1 0 0 1 1 0 1 0 00 0 1 1 1 0 0 0 1 1 00 1 0 0 1 0 1 1 0 0 00 1 0 1 0 0 0 1 0 1 10 1 1 0 ? ? ? ? ? ? ?0 1 1 1 ? ? ? ? ? ? ?1 0 0 0 1 0 1 0 0 0 11 0 0 1 0 0 0 0 0 1 01 0 1 0 0 0 1 0 1 0 01 0 1 1 0 1 0 0 1 1 01 1 0 0 0 1 1 1 0 0 01 1 0 1 1 0 0 1 0 1 01 1 1 0 ? ? ? ? ? ? ?1 1 1 1 ? ? ? ? ? ? ?

The table is larger since we need to consider d as input as well as Q, but the process is the same: to computeδ, we just need a set of appropriate Boolean expressions. So next we translate the truth table into a set ofKarnaugh maps

��

Q′20 01 0

01??

1 00 1

00??d

Q1

Q2

Q0 ��

��

Q′10 10 0

10??

0 01 0

01??d

Q1

Q2

Q0 ��

��

Q′01 01 0

10??

1 01 0

10??d

Q1

Q2

Q0

and finally produceQ′2 = ( ¬d ∧ Q1 ∧ Q0 ) ∨

( ¬d ∧ Q2 ∧ ¬Q0 ) ∨( d ∧ Q2 ∧ Q0 ) ∨( d ∧ ¬Q2 ∧ ¬Q1 ∧ ¬Q0 )

Q′1 = ( ¬d ∧ ¬Q2 ∧ ¬Q1 ∧ Q0 ) ∨( ¬d ∧ Q1 ∧ ¬Q0 ) ∨( d ∧ Q2 ∧ ¬Q0 ) ∨( d ∧ Q1 ∧ Q0 )

Q′0 = ( ¬Q0 )

This time however, we need to deal with ωmore carefully: we can still generate the counter output trivially asr = Q, but also need to compute f somehow. This is straight-forward of course, because using the truth tablewe can write

git # ba293a0e @ 2019-11-14 140



��

Q′20 00 1

00??

1 00 0

00??d

Q1

Q2

Q0

and finally producef = ( ¬d ∧ Q2 ∧ Q0 ) ∨

( d ∧ ¬Q2 ∧ ¬Q1 ∧ ¬Q0 )

which completes the design in the sense we have now specified all components of the framework.

Example #3: a traffic light controller Imagine we are tasked with designing a traffic light controller for tworoads (a main road and an access road) that intersect. The requirements are to

1. stop cars crashing into each other, so the behaviour should see

(a) green on main road and red on access road, then

(b) amber on main road and red on access road, then

(c) red on main road and amber on access road, then

(d) red on main road and green on access road, then

(e) red on main road and amber on access road, then

(f) amber on main road and red on access road,

and then cycle, and

2. allow an emergency stop button to force red on both main and access roads while pushed, then reset thesystem into an initial start state when released.

First we need to take stock of the problem itself: there is basically one input (the emergency stop button,denoted rst) and six outputs (namely the traffic light values, denoted Mg, Ma and Mr for the main road and Ag,Aa and Ar for the access road). Next we try to develop a precise description of the FSM behaviour. We need 7states in total: S0,S1, . . . ,S5 represent steps in the normal traffic light sequence, and S6 is an extra emergencystop state. Figure 3.7 shows both tabular and diagrammatic descriptions of the transition function; in essence,it is similar to the counter example (in the sense that it cycles from S0 through to S5 and back again) providedrst = 0, but if rst = 1 in any state then we move to the S6. As an aside however, it is important to see thisdescription represents one solution among several derived from what is (by design) an imprecise question. Putanother way, we have already made several choices. On example is the decision to use a separate emergencystop state, and have the FSM enter this as the next state of any current state provided rst = 1; the red lights areboth forced on by virtue of being in the emergency stop state, rather than by rst per se. Another valid approachmight be to haveω depend on rst as well (rather than just Q, so it turns from a Moore-based into a Mealy-basedFSM) and forcing the red lights on as soon as rst = 1 and irrespective of what state the FSM is in. In some waysthis is arguably more attractive, in the sense that the emergency stop is instant: we no longer need to wait forthe next clock cycle when the next state is latched. Likewise, we have opted to make the first state listed in thequestion (i.e., green on the main road and red on the access road) the initial state; since the sequence is cyclicthis choice seems a little arbitrary, so other choices (plus what state the FSM restarts in after an emergency stop)might also seem reasonable.

Given our various choices however, we next follow standard practice by translating the description into animplementation. Since 23 = 8 > 7 we can represent the current and next states via 3-bit integers Q = 〈Q0,Q1,Q2〉

and Q′ = 〈Q′0,Q′

1,Q′

2〉. whereS0 7→ 〈0, 0, 0〉 ≡ 000(2)S1 7→ 〈1, 0, 0〉 ≡ 001(2)S2 7→ 〈0, 1, 0〉 ≡ 010(2)S3 7→ 〈1, 1, 0〉 ≡ 011(2)S4 7→ 〈0, 0, 1〉 ≡ 100(2)S5 7→ 〈1, 0, 1〉 ≡ 101(2)S6 7→ 〈0, 1, 1〉 ≡ 110(2)

and we have one unused state (namely 〈1, 1, 1〉). As such, both input and output registers will be comprisedof three 1-bit storage components, in this case D-type latches. Now we have a concrete value for each abstract

git # ba293a0e @ 2019-11-14 141



state label, we can expand the tabular description of the FSM into a (lengthy) truth table:

δ ωrst Q2 Q1 Q0 Q′2 Q′1 Q′0 Mg Ma Mr Ag Aa Ar

0 0 0 0 0 0 1 1 0 0 0 0 10 0 0 1 0 1 0 0 1 0 0 0 10 0 1 0 0 1 1 0 0 1 0 1 00 0 1 1 1 0 0 0 0 1 1 0 00 1 0 0 1 0 1 0 0 1 0 1 00 1 0 1 0 0 0 0 1 0 0 0 10 1 1 0 0 0 0 0 0 1 0 0 10 1 1 1 ? ? ? ? ? ? ? ? ?1 0 0 0 1 1 0 1 0 0 0 0 11 0 0 1 1 1 0 0 1 0 0 0 11 0 1 0 1 1 0 0 0 1 0 1 01 0 1 1 1 1 0 0 0 1 1 0 01 1 0 0 1 1 0 0 0 1 0 1 01 1 0 1 1 1 0 0 1 0 0 0 11 1 1 0 1 1 0 0 0 1 0 0 11 1 1 1 ? ? ? ? ? ? ? ? ?

Although this looks intimidating, the point is that

• the transition function δ is just three Boolean expressions, one for each Q′i , using rst, Q2, Q1 and Q0 asinput,

• the output function ω is just six Boolean expressions, one for each Mi and A j, using rst, Q2, Q1 and Q0 asinput.

So we just need to derive each expression. For δ, the Karnaugh maps

��

��

��

��

Q′20 01 0

010?

1 11 1

111?rst

Q1

Q2

Q0 ��

��

��

Q′10 10 0

100?

1 11 1

111?rst

Q1

Q2

Q0 �� Q′0

1 01 0

100?

0 00 0

000?rst

Q1

Q2

Q0

can be used to produceQ′2 = ( rst ) ∨

( Q2 ∧ ¬Q1 ∧ ¬Q0 ) ∨( Q1 ∧ Q0 )

Q′1 = ( rst ) ∨( ¬Q2 ∧ ¬Q1 ∧ Q0 ) ∨( ¬Q2 ∧ Q1 ∧ ¬Q0 )

Q′0 = ( ¬rst ∧ ¬Q1 ∧ ¬Q0 ) ∨( ¬rst ∧ ¬Q2 ∧ ¬Q0 )

Likewise for ω, we find

��Mg

1 00 0

000?

Q1

Q2

Q0 ��

Ma

0 10 1

000?

Q1

Q2

Q0 ��

��

Mr

0 01 0

111?

Q1

Q2

Q0

��

Ag

0 00 0

010?

Q1

Q2

Q0 ��Aa

0 01 0

100?

Q1

Q2

Q0 ��

Ar

1 10 1

001?

Q1

Q2

Q0

can be used to produce

Mg = ( ¬Q2 ∧ ¬Q1 ∧ ¬Q0 )Ma = ( ¬Q1 ∧ Q0 )Mr = ( Q1 ) ∨

( Q2 ∧ ¬Q0 )

Ag = ( Q1 ∧ Q0 )Aa = ( ¬Q2 ∧ Q1 ∧ ¬Q0 ) ∨

( Q2 ∧ ¬Q1 ∧ ¬Q0 )Ar = ( ¬Q2 ∧ ¬Q1 ) ∨

( ¬Q1 ∧ Q0 ) ∨( Q2 ∧ Q1 )

git # ba293a0e @ 2019-11-14 142



As before, these expressions can be used to fill in the FSM framework to yield a resulting design for thecontroller.

git # ba293a0e @ 2019-11-14 143



git # ba293a0e @ 2019-11-14 144



CHAPTER

4

BASICS OF COMPUTER ARITHMETIC

The whole of arithmetic now appeared within the grasp of mechanism.

– Babbage

In Chapter 1, we saw how numbers could be represented using bit-sequences. More specifically, we demonstratedvarious techniques to represent both unsigned and signed integers using n-bit sequences. In Chapter 2, we then investigatedhow logic gates capable of computing Boolean operations (such as NOT, AND, and OR) and higher-level building blockcomponents could be designed and manufactured.

One way to view this content is as a set of generic techniques. We have the ability to design and implement componentsthat computes any Boolean function, for example, and reason about their behaviour in terms of Physics. A natural nextstep is to be more specific: what function would be useful? Among many possible options, the field of computerarithmetic provides some good choices. In short, arithmetic is something most people would class as computation;something as simple as a desktop calculator could still be classed as a basic computer. As such, the goal of this Chapteris to combine the previous material, producing a range of high-level building blocks that perform computation involvingintegers: although this useful and interesting in itself, it is important to keep in mind that it also represents a startingpoint for study of more general computation.

4.1 Introduction

In general terms, an Arithmetic and Logic Unit (ALU) is a component (or collection thereof) tasked withcomputation. The concept stems from the design of EDVAC by John von Neumann [10]: he foresaw thata general-purpose computer would need to perform basic Mathematical operations on numbers, so it is“reasonable that [the computer] should contain specialised organs for these operations”. In short, the modernALU is an example of such an organ: as part of a micro-processor, an ALU supports execution of instructionsby computing results associated with arithmetic expressions such as x + y in a given C program.

One can view a concrete ALU at two levels, namely 1) at a low(er) level, in terms of how the constituentcomponents themselves are designed, or 2) at a high(er) level, in terms of how said components are organised.In this Chapter we focus primarily on the former, which implies a focus on computer arithmetic. The challengeis roughly as follows: given one or more n-bit sequences that represent numbers, say x and y, how can wedesign a component, i.e., a Boolean function f we can then implement as a circuit, whose output represents anarithmetic operation? For example, if we want to compute

r = f (x, y) 7→ x + y,

i.e., an r that represents the sum of x and y, how can we design a suitable function

f : Bn× Bn

→ Bn

that realises the operation correctly while also satisfying additional design metrics once implemented as acircuit?

git # ba293a0e @ 2019-11-14 145



C0

C1

Cm−1

r

op

xy

xy

xy

m-input,n-bit

multiplexer

(a) An unintegrated architecture: each i-th sub-componentCi deals with all of a different operation.

C0

C1

Cn−1

opx0y0

r0

opx1y1

r1

opxn−1yn−1

rn−1

(b) An integrated architecture: each i-th sub-component Ci

deals with a different part (e.g., i-th bit of the output) of alloperations.

Figure 4.1: Two high-level ALU architectures: each combines a number of sub-components, but does so using a differentstrategy.

Often you will have already encountered long-hand, “school-book” techniques for arithmetic operationssuch as addition and multiplication. These allow you to perform the associated computation manually, whichcan can be leveraged to address the challenge of designing such an f . That is, we can use an approach wherebywe a) recap on your intuition about what the arithmetic operation means and works at a fundamental level, b)formalise this as an algorithm, then, finally, c) design a circuit to implement the algorithm (often by startingwith a set of 1-bit building blocks, then later extending them to cope with n-bit inputs and outputs). Althougheffective, the challenge of applying this approach is magnified by what is typically a large design space ofoptions and trade-offs. For example, we might implement f using combinatorial components alone, or widenthis remit by considering sequential components to support state and so on: with any approach involving atrade-off, the sanity of opting for one option over another requires careful analysis of the context.

After first surveying higher-level, architectural options for an abstract ALU, this Chapter deals moreconcretely with a set of low-level components: each Section basically applies the approach above to a differentarithmetic operation. From here on, keep in mind that the scope is constrained by several simplifications:

1. The large design space of options for any given operation dictates we take a somewhat selective approach.A major example of this is our focus on integer arithmetic only: arithmetic with fixed- and floating-point numbers is an important topic, but we ignore it entirely and instead refer to [11, Part V] for acomprehensive treatment.

2. We use sequences of n = 8 bits to represent integers, assuming use of two’s-complement representationwhere said integers are signed; any algorithms (eventually) operate in base-2 (i.e., set b = 2) as a result.Even so, most algorithms are developed and presented in a manner amenable to generalisation. Forexample, they often support larger n or different b with minimal alterations.

3. Having fixed the representation of integers, writing x is somewhat redundant: we relax this notation andsimply write x as a result. However, we introduce extra notation to clarify whether a given operationis signed or unsigned: for an operation �, we use �s and �u to denote signed and unsigned versionsrespectively. With no annotation of this type, you can assume the signed’ness of the operator is irrelevant.

4.2 High-level ALU architecture

As the name suggests, a typical ALU will perform roughly three classes of operation: arithmetic, logical (typi-cally focused on operations involving individual bits, contrasting with arithmetic operations on representationsof integers using multiple bits), and comparison. Although a given ALU is often viewed as a single unit, havinga separate ALU for each class can have advantages. For example, this allows different classes of operation(e.g., an addition and comparison) to be performed at the same time. To prevent a single unit becoming too

git # ba293a0e @ 2019-11-14 146



complex, it can also be advantageous to have separate ALUs for different classes of input; use of a dedicated(so separate from the ALU) Floating-Point Unit (FPU) for floating-point computation is a common example.

These possibilities aside, at a high-level an ALU is simply a collection of sub-components; we provide oneor more inputs (wlog. say x and y), and control it using op to select the operation required. Of course, someoperations will produce a different sized output than others: an (n × n)-bit multiplication produces a 2n-bitoutput, but any comparison will only ever produce a 1-bit output for example. One can therefore view the ALUas conceptually producing a single output r, but in reality it might have multiple outputs that are used as andwhen appropriate. To be concrete, imagine we want an ALU which performs say m = 11 different operations

� ∈ {+,−, ·,∧,∨,⊕, ∨ ,�,�,=, <}

meaning it can perform addition, subtraction, multiplication, a range of bit-wise Boolean operations (AND,OR, XOR and NOR), left- and right-shift, and two comparisons (equality and less than): it computes r = x � yfor an � selected by op. Figure 4.1 shows two strategies for realising the ALU, each using sub-components (thei-th of which is denoted Ci) of a different form and in a different way:

1. Figure 4.1a illustrates an architecture where each sub-component implements all of a different operation.For example, C0 and C1 might compute all n bits of x+ y and x− y respectively; the ALU output is selected,from the m sub-component outputs, using op to control a suitable multiplexer.

Although, as shown, each sub-component is always active, in reality it might be advantageous to power-down a sub-component which is not being used. This could, for example, reduce power consumption orheat dissipation.

2. Figure 4.1b illustrates an architecture where each sub-component implements all operations, but does sowrt. a single bit only. For example, C0 and C1 might compute the 0-th and 1-st bits of x + y and x − yrespectively (depending on op).

Tanenbaum and Austin [12, Chapter 3, Figures 3-18/3-19] focus on the second strategy, discussing a 1-bit ALUslice before dealing with their combination. Such 1-bit ALUs are often available as standard building blocks,so this focus makes a lot of sense on one hand. On the other hand, an arguable disadvantage is that such afocus complicates the overarching study of computer arithmetic. Put another way, focusing at a low-level on1-bit ALU slices arguably makes it hard(er) to see how some higher-level arithmetic works. As a result, wefocus instead on the first strategy in what follows: we consider designs for each i-th sub-component, realisingeach operation (a Ci for addition, for example) in isolation.

Essentially this means we ignore high-level organisation and optimisation of the ALU from here on, but ofcourse both strategies have merits. For example, as we will see in the following, overlap exists between differentarithmetic circuit designs: intuitively, the computation of addition and subtraction is similar for example. Thesecond strategy is advantageous therefore, since said overlap can more easily be capitalised upon to reduceoverall gate count. However, arithmetic circuits that require multiple steps to compute an output (using anFSM for example) are hard(er) to realise using the second strategy than the first. As a result, a mix of bothstrategies as and when appropriate is often a reasonable compromise.

4.3 Components for addition and subtraction

Perhaps more so than other aspects of computer arithmetic, the meaning and use of addition and subtractionshould be familiar. (Very) formally, an addition operation computes the sum r = x + y using an x and y whichare both termed an addend in this context; likewise, subtraction computes the difference r = x − y using aminuend x and a subtrahend y. This terminology hints at the fact that addition is commutative but subtractionis not: x + y = y + x but x − y , y − x.

The challenge of course is how we compute these results. The goal in each case is to first describe thecomputation algorithmically, then translate this into a design (or set of designs) for a circuit we can constructfrom logic gates.

4.3.1 Addition

Example 4.1. Consider the following unsigned, base-10 addition of x = 107(10) to y = 14(10):

x = 107(10) 7→ 1 0 7y = 14(10) 7→ 0 1 4 +c = 0 0 1 0r = 121(10) 7→ 1 2 1

git # ba293a0e @ 2019-11-14 147



An aside: sign extension.

Although not an arithmetic operation per se, the issue of type conversion is a an important related conceptnone the less. Where such a conversion is performed explicitly (e.g., by the programmer) it is formally termeda cast, and where performed implicitly (or automatically, e.g., by the compiler) it is termed a coercion; eitherconversion, depending on the types involved, may or may not retain the same value due to the range ofrepresentable values involved.

As an example, imagine you write a C program that includes a cast of an n-bit integer x into an n′-bitinteger r. Four cases can occur:

1. If x and r are unsigned and n ≤ n′, r is formed by padding x with n′ − n bits equal to 0, at the most-significant end.

2. If x and r are signed and n ≤ n′, r is formed by padding x with n′ − n bits equal to the sign bit (i.e., theMSB or (n − 1)-th bit of x) at the most-significant end.

3. If x and r are unsigned and n > n′, r is formed by truncating x, i.e., removing (and discarding) n− n′ bitsfrom the most-significant end.

4. If x and r are signed and n > n′, r is formed by truncating x, i.e., removing (and discarding) n − n′ bitsfrom the most-significant end.

The second case above is often termed sign extension, and is required (vs. the first case) because simplypadding x with 0 may turn it from a negative to positive value. For example, imagine n = 16 (i.e., the shorttype) and n′ = 32 (i.e., the int type): if x = −1(10), the two options yield

x = 1111111111111111(2) = −1(10)

ext320 (x) = 00000000000000001111111111111111(2) = 65535(10)

ext32±

(x) = 11111111111111111111111111111111(2) = −1(10)

where the latter retains the value of x, whereas the former does not.

If we write them naturally, it is clear that |107(10)| = 3 and |14(10)| = 2. However, the resulting mismatch willbecome inconvenient: in this example and from here on, we force x and y to have the same length by paddingthem with more-significant zero digits. Although this may look odd, keep in mind this padding can be ignoredwithout altering the associated value (i.e., we are confident 14(10) = 014(10), however odd the latter looks whenwritten down).

Most people will have at least seen something similar to this, but, to ensure the underlying concept clear, ris being computed by working from the least-significant, right-most digits (i.e., x0 and y0) towards the most-significant, left-most digits (i.e., xn−1 and yn−1) of the operands x and y. In English, in each i-th step (or column,as read from right to left) we sum the i-th digits xi and yi and a carry-in ci (produced by the previous, (i − 1)-thstep); since this sum is potentially larger than a single base-b digit is allowed to be, we produce the i-th digit ofthe result ri and a carry-out ci+1 (for use by the next, (i + 1)-th step). We call c a (or the) carry chain, and saycarries propagate from one step to the next.

This description can be written more formally: Algorithm 1 offers one way to do so. Notice the loop inlines #2 to #5 iterates through values of i from 0 to n − 1, with the body in lines #3 and #4 computing ri and cirespectively. You can read the latter as “if the sum of xi, yi and ci is smaller than a single base-b digit there is acarry into the next step, otherwise there is no carry”. Notice that the algorithm sets c0 = ci to allow a carry-intothe overall operation (in the example we assumed ci = 0), and co = cn allowing a carry-out; the sum of twon-digit integers is an (n + 1)-digit result, but the algorithm produces an n-digit result r and separate 1-digitcarry-out co (which you could, of course, think of as two parts of a single, larger result).

A reasonable question is why a larger carry (i.e., a ci+1 > 1) is not possible? To answer this, we should firstnote that although line #4 is written as a conditional statement, it could be rewritten st.

ri ← (xi + yi + ci) mod bci+1 ← (xi + yi + ci) div b

where mod and div are integer modulo and division: this makes more sense in a way, because the latterassignment can be read as “the number of whole multiples of b carried into the next, (i + 1)-th column”. By

git # ba293a0e @ 2019-11-14 148



co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

ci co

y 0x 0 r 0 y 1x 1 r 1

y n−

1

x n−

1

r n−

1

Figure 4.2: An n-bit, ripple-carry adder described using a circuit diagram.

bo

d

bixy

bo

d

bixy

bo

d

bixy

bo

d

bixy

bi boy 0x 0 r 0 y 1x 1 r 1

y n−

1

x n−

1

r n−

1

Figure 4.3: An n-bit, ripple-carry subtractor described using a circuit diagram.

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

co

y 0x 0 r 0 y 1x 1 r 1

y n−

1

x n−

1

r n−

1

ciop

Figure 4.4: An n-bit, ripple-carry adder/subtractor described using a circuit diagram.

p g ci

sxy

p g ci

sxy

p g ci

sxy

p g ci

sxy

carry look-ahead logiccixy

co

y 0x 0 r 0 y 1x 1 r 1

y n−

1

x n−

1

r n−

1

p 0 g 0 c 0 p 1 g 1 c 1 p n−

1g n−

1c n−

1

Figure 4.5: An n-bit, carry look-ahead adder described using a circuit diagram.

∧

∧ ∧

∧

∧ ∧

∧

∧ ∧

∧

∧ ∧

∨ ∨

∨

ci

O(log n)

O(log n)

Figure 4.6: An illustration depicting the structure of carry look-ahead logic, which is formed by an upper- and lower-treeof OR and AND gates respectively (with leaf nodes representing gi and pi terms for example).

git # ba293a0e @ 2019-11-14 149



Input: Two unsigned, n-digit, base-b integers x and y, and a 1-digit carry-in ci ∈ {0, 1}Output: An unsigned, n-digit, base-b integer r = x + y, and a 1-digit carry-out co ∈ {0, 1}

1 r← 0, c0 ← ci2 for i = 0 upto n − 1 step +1 do3 ri ← (xi + yi + ci) mod b4 if (xi + yi + ci) < b then ci+1 ← 0 else ci+1 ← 15 end6 co← cn7 return r, co

Algorithm 1: An algorithm for addition of base-b integers.

Input: Two unsigned, n-digit, base-b integers x and y, and a 1-digit borrow-in bi ∈ {0, 1}Output: An unsigned, n-digit, base-b integer r = x − y, and a 1-digit borrow-out bo ∈ {0, 1}

1 r← 0, c0 ← bi2 for i = 0 upto n − 1 step +1 do3 ri ← (xi − yi − ci) mod b4 if (xi − yi − ci) ≥ 0 then ci+1 ← 0 else ci+1 ← 15 end6 bo← cn7 return r, bo

Algorithm 2: An algorithm for subtraction of base-b integers.

considering bounds on (i.e., the maximum values of) each of the inputs, we can show ci ≤ 1 for 0 ≤ i ≤ n. Thatis,

1. in the 0-th step we compute x0 + y0 + c0,which can be at most (b−1) + (b−1) + 1 = 2 · b−1 (given we knowx0, y0 ∈ {0, 1, ..., b − 1}, and set c0 = ci ∈ {0, 1} in line #1); as a result, c1 can be at most (2b − 1) div b = 1,

2. in the i-th step we compute xi + yi + ci,which can be at most (b− 1) + (b− 1) + 1 = 2 · b− 1 (given we knowxi, yi ∈ {0, 1, ..., b − 1}, and set ci ∈ {0, 1} per the above); as a result, ci+1 can be at most (2b − 1) div b = 1,

so we know (inductively) that the carry out of the i-th step into the next, (i + 1)-th step is at most 1 (and soeither 0 or 1); this is true no matter what value of b is selected.

Example 4.2. Consider the following trace of Algorithm 1, for x = 107(10) and y = 14(10):

i xi yi ci r xi + yi + ci ci+1 ri r′

〈0, 0, 0〉〈0, 0, 0〉0 7 4 0 〈0, 0, 0〉 11 1 1 〈1, 0, 0〉1 0 1 1 〈1, 0, 0〉 2 0 2 〈1, 2, 0〉2 1 0 0 〈1, 2, 0〉 1 0 1 〈1, 2, 1〉

0 〈1, 2, 1〉

Throughout this Chapter, a similar style is used to describe step-by-step behaviour of an algorithm for specificinputs (particularly those which include one or more loops). Read from left-to-right, there is typically a sectionof loop counters, such as i and j, a section of variables as they are at the start of each iteration, a section ofvariables computed during an iteration, and a section of variables as they are at the end of each iteration. Ifvariable t in the left-hand section is updated during an iteration, we write it as t′ (read as “the new value of t”)in the right-hand section.

An important feature in the presentation of Algorithm 1 is use of a general b: when invoking it, we can selectany concrete value of b we want. When discussing representation of integers, b = 2 was a natural selectionbecause it aligned with concepts in Boolean algebra; the same is true here, within a discussion of computationinvolving such integers.

Example 4.3. Consider the following unsigned, base-2 addition of x = 107(10) = 01101011(2) to y = 14(10) =00001110(2)

x = 107(10) 7→ 0 1 1 0 1 0 1 1y = 14(10) 7→ 0 0 0 0 1 1 1 0 +c = 0 0 0 0 1 1 1 0 0r = 121(10) 7→ 0 1 1 1 1 0 0 1

git # ba293a0e @ 2019-11-14 150



and the corresponding trace of Algorithm 1


〈0, 0, 0, 0, 0, 0, 0, 0〉〈0, 0, 0, 0, 0, 0, 0, 0〉0 1 0 0 〈0, 0, 0, 0, 0, 0, 0, 0〉 1 0 1 〈1, 0, 0, 0, 0, 0, 0, 0〉1 1 1 0 〈1, 0, 0, 0, 0, 0, 0, 0〉 2 1 0 〈1, 0, 0, 0, 0, 0, 0, 0〉2 0 1 1 〈1, 0, 0, 0, 0, 0, 0, 0〉 2 1 0 〈1, 0, 0, 0, 0, 0, 0, 0〉3 1 1 1 〈1, 0, 0, 0, 0, 0, 0, 0〉 3 1 1 〈1, 0, 0, 1, 0, 0, 0, 0〉4 0 0 1 〈1, 0, 0, 1, 0, 0, 0, 0〉 1 0 1 〈1, 0, 0, 1, 1, 0, 0, 0〉5 1 0 0 〈1, 0, 0, 1, 1, 0, 0, 0〉 1 0 1 〈1, 0, 0, 1, 1, 1, 0, 0〉6 1 0 0 〈1, 0, 0, 1, 1, 1, 0, 0〉 1 0 1 〈1, 0, 0, 1, 1, 1, 1, 0〉7 0 0 0 〈1, 0, 0, 1, 1, 1, 1, 0〉 0 0 0 〈1, 0, 0, 1, 1, 1, 1, 0〉

0 〈1, 0, 0, 1, 1, 1, 1, 0〉

which produces r = 01111001(2) = 121(10) as expected.

Better still, if we assume use of two’s-complement (which we reasoned in Chapter 1 was sane), then thealgorithm can compute the sum of signed x and y without change:

Example 4.4. Consider the following signed, base-2 addition of x = 107(10) 7→ 01101011(2) to y = −14(10) 7→

00001110(2) (both represented using two’s-complement)

x = 107(10) 7→ 0 1 1 0 1 0 1 1y = −14(10) 7→ 1 1 1 1 0 0 1 0 +c = 1 1 1 0 0 0 1 0 0r = 93(10) 7→ 0 1 0 1 1 1 0 1



〈0, 0, 0, 0, 0, 0, 0, 0〉〈0, 0, 0, 0, 0, 0, 0, 0〉0 1 0 0 〈0, 0, 0, 0, 0, 0, 0, 0〉 1 0 1 〈1, 0, 0, 0, 0, 0, 0, 0〉1 1 1 0 〈1, 0, 0, 0, 0, 0, 0, 0〉 2 1 0 〈1, 0, 0, 0, 0, 0, 0, 0〉2 0 0 1 〈1, 0, 0, 0, 0, 0, 0, 0〉 1 0 1 〈1, 0, 1, 0, 0, 0, 0, 0〉3 1 0 0 〈1, 0, 1, 0, 0, 0, 0, 0〉 1 0 1 〈1, 0, 1, 1, 0, 0, 0, 0〉4 0 1 0 〈1, 0, 1, 1, 0, 0, 0, 0〉 1 0 1 〈1, 0, 1, 1, 1, 0, 0, 0〉5 1 1 0 〈1, 0, 1, 1, 1, 0, 0, 0〉 2 1 0 〈1, 0, 1, 1, 1, 0, 0, 0〉6 1 1 1 〈1, 0, 1, 1, 1, 0, 0, 0〉 3 1 1 〈1, 0, 1, 1, 1, 0, 1, 0〉7 0 1 1 〈1, 0, 1, 1, 1, 0, 1, 0〉 2 1 0 〈1, 0, 1, 1, 1, 0, 1, 0〉

1 〈1, 0, 1, 1, 1, 0, 1, 0〉

which produces r = 01011101(2) 7→ 93(10) as expected.

Intuitively, the reason no change is required is because both the unsigned and signed, two’s-complementrepresentations perfectly fit the definition of a positional number system: they express the value as a summationof weighted terms, whereas sign-magnitude, for example, needs a special case (namely the factor −1xn−1 ) tocapture the sign. As a result of this feature the carry chain still functions in the same way, for example.

4.3.1.1 Ripple-carry adders

Having developed and reasoned about an algorithm for addition, the next challenge is to translate it into aconcrete design we can implement as a circuit. At first glance this may seem difficult, not least because thealgorithm contains a loop. Crucially however, we can unroll this loop once n is fixed: this means we copy andpaste the loop body (i.e., lines #3 and #4) n times, replacing i with the correct value in each i-th copy.

Example 4.5. Given n = 4, the loop in Algorithm 1 can be unrolled into the straight-line alternative1 c0 ← ci2 r0 ← (x0 + y0 + c0) mod b3 if (x0 + y0 + c0) < b then c1 ← 0 else c1 ← 14 r1 ← (x1 + y1 + c1) mod b5 if (x1 + y1 + c1) < b then c2 ← 0 else c2 ← 16 r2 ← (x2 + y2 + c2) mod b7 if (x2 + y2 + c2) < b then c3 ← 0 else c3 ← 18 r3 ← (x3 + y3 + c3) mod b9 if (x3 + y3 + c3) < b then c4 ← 0 else c4 ← 1

10 co← c4

git # ba293a0e @ 2019-11-14 151



Notice that if we select b = 2, the body of the loop and therefore each replicated step in the unrolled alternativecomputes the 1-bit addition

ri = xi ⊕ yi ⊕ cici+1 = (xi ∧ yi) ∨ (xi ∧ ci) ∨ (yi ∧ ci)

Put another way, it matches the full-adder cell we produced a design for in Chapter 2. Substituting one for theother, we simply have n full-adder instances connected via respective carry-in and carry-out: each i-th instancecomputes the sum of xi and yi and a carry-in ci, and produces the sum ri and a carry-out ci+1. The design,which is termed a ripple-carry adder since the carry “ripples” or propagates through the chain chain, is shownin Figure 4.2.

The algorithm and associated design satisfy the required functionality: they can compute the sum of n-bitaddends x and y. As such, one might question whether exploring other designs is necessary. Any metricapplied to the design may provide some motivation, but the concept of critical path is particularly importanthere. Recall from Chapter 2 that the critical path of a circuit is defined as the longest sequential sequence ofgates; here, the critical path runs through the entire circuit from the 0-th to the (n − 1)-th full-adder instance.Put another way, the carry chain represents an order on the computation of digits in r: ri cannot be computeduntil ci is known, so the i-th step cannot be computed until every j-th step for j < i is computed, due to the carrychain. This implies the critical path can be approximated by O(n) gate delays; our motivation for exploringother designs is therefore the possibility of improving on this, and thus computing r with lower latency (i.e.,less delay).

4.3.1.2 Carry look-ahead adders

One approach to removing the constraint imposed by a carry chain might be to separate computation of thecarry and sum. At first glance this probably seems impossible, or at least difficult: we argued above that thelatter depends on the former! However, notice that we can say at least something about how each i-th step ofthe ripple-carry adder works independently of the others. We know for instance that

1. if xi + yi > b − 1 it generates a carry, i.e., sets ci+1 = 1 irrespective of ci,

2. if xi + yi = b − 1 it propagates a carry, i.e., sets ci+1 = 1 iff. ci = 1. and

3. if xi + yi < b − 1 it absorbs a carry, i.e., sets ci+1 = 0 irrespective of ci.

Example 4.6. Consider the following unsigned, base-10 addition of x = 456(10) to y = 444(10)

x = 456(10) 7→ 4 5 6y = 444(10) 7→ 4 4 4 +c = 0 1 1 0r = 900(10) 7→ 9 0 0

where the three rules above apply as follows:

1. In the 0-th column, xi + yi = x0 + y0 = 6 + 4 = 10 which is greater than b− 1 = 10− 1 = 9. Put another way,this is already too large to represent using a single base-b digit and will hence always generates a carryinto the next, (i + 1)-th step irrespective of whether there is a carry-in or not.

2. In the 1-st column, xi + yi = x1 + y1 = 5 + 4 = 9 which is equal to b − 1 = 10 − 1 = 9. Put another way, thisis at the limit of what we can represent using a single base-b digit: iff. there is a carry-in, then there willbe a carry-out.

3. In the 2-nd column, xi + yi = x2 + y2 = 4 + 4 = 8 which is less than b−1 = 10−1 = 9. Put another way, evenif there is a carry into the i-th stage there will never be a carry-out because 8 + 1 can be accommodatedwithin the single base-b digit r2 of the sum.

A Carry Look-Ahead (CLA) adder takes advantage of the fact that using base-2 makes application of the rulessimple. In particular, imagine we use gi and pi to indicate whether the i-th step will generate or propagate acarry respectively. We can express these as

gi = xi ∧ yipi = xi ⊕ yi

which can be explained in words:

• we generate a carry-out if both xi = 1 and yi = 1 since no matter what the carry-in is, their sum cannot berepresented in a single base-b digit, and

git # ba293a0e @ 2019-11-14 152



• we propagate a carry-out if either xi = 1 or yi = 1 since this plus any carry-in will also produce a sumwhich cannot be represented in a single base-b digit.

Given gi and pi we have thatci+1 = gi ∨ (ci ∧ pi)

where, again, c0 = ci and we produce a carry-out cn = co. Again this can be explained in words: at the i-th stage“there is a carry-out if either the i-th stage generates a carry itself, or there is a carry-in and the i-th stage willpropagate it”. As an aside, note that it is common to see gi and pi written as

gi = xi ∧ yipi = xi ∨ yi

Of course, when used in the above both expressions have the same meaning: if xi = 1 and yi = 1, then gi = 1so it does not matter what the corresponding pi is (given the OR will yield 1, since the left-hand term gi = 1,irrespective of the right-hand term). As such, use of an OR gate rather than an XOR is preferred because theformer requires less transistors.

Like the ripple-carry adder, once we fix n we can unwind the recursion to get an expression for the carryinto each i-th full-adder cell:

c0 = cic1 = g0 ∨ (ci ∧ p0)c2 = g1 ∨ (g0 ∧ p1) ∨ (ci ∧ p0 ∧ p1)c3 = g2 ∨ (g1 ∧ p2) ∨ (g0 ∧ p1 ∧ p2) ∨ (ci ∧ p0 ∧ p1 ∧ p2)

...

This looks horrendous, but notice that the general structure is of the form shown in Figure 4.6: both the bottom-and top-half are balanced binary trees (st. leaf nodes are gi and pi terms, and internal nodes are AND and ORgates respectively) that implement the SoP expression for a a given ci. We are able to use this organisation asa result of having decoupled computation of ci from the corresponding ri, which is, essentially, what yields anadvantage: the critical path (i.e., the depth of the structure, or longest path from the root to some leaf) is shorterthan for a ripple-carry adder. Stated in a formal way, the former is described by O(log n) gate delays due to thetree structure, and the latter by O(n) as a result of the linear structure.

The resulting design is shown in Figure 4.5. In contrast with the ripple-carry adder design in Figure 4.2,all the full-adder instances are independent: the carry chain previously linking them has now been eliminated.Instead, the i-th such instance produces gi and pi; these inputs are used by the carry look-ahead logic to produceci. The design hides an important trade-off, namely the associated gate count. Although we have reducedthe critical path, the gate count is now much higher: a rough estimate would be O(n) and O(n2) gates for aripple-carry and carry look-ahead adders. It can therefore be attractive to combine several, small(er) carrylook-ahead adders (e.g., 8-bit adders) in a large(r) ripple-carry configuration (e.g., to form a larger, 32-bit,adder).

4.3.1.3 Carry-save adders

A second approach to eliminating the carry chain is to bend the rules a little, and look at a slightly differentproblem. The ripple-carry and the carry look-ahead adder compute the sum of two addends x and y; whathappens if we consider three addends x, y, and z, and thus compute x + y + z rather than x + y?

A carry-save adder offers a solution for this alternative problem. It is often termed a 3 : 2 compressorbecause it compresses three n-bit inputs x, y and z into two n-bit outputs r′ and c′ (termed the partial sum andshifted carry). Put another way, a carry-save “adder” computers the actual sum r = x + y + z in two steps: 1)a compression step produces a partial sum and shifted carry, then 2) an addition step combines them into theactual sum.

The first step amounts to replacing c with z in the ripple-carry adder design, meaning that for the i-thfull-adder instance we have

r′i = xi ⊕ yi ⊕ zic′i = (xi ∧ yi) ∨ (xi ∧ zi) ∨ (yi ∧ zi)

Unlike the ripple-carry adder, where the instances are connected via a carry chain, the expressions for r′i andc′i only use the i-th digits of x, y, and z: computation of each i-th digit of r′ and c′ is independent. Crucially,this means each r′i and c′i can be computed at the same time; the critical path runs through just one full-adderinstance, rather than all n instances as in a ripple-carry adder.

git # ba293a0e @ 2019-11-14 153



Example 4.7. Consider computation of the partial sum and shifted carry from x = 96(10) = 01100000(2), y =14(10) = 00001110(2) and z = 11(10) = 00001011(2):

x = 96(10) 7→ 0 1 1 0 0 0 0 0y = 14(10) 7→ 0 0 0 0 1 1 1 0z = 11(10) 7→ 0 0 0 0 1 0 1 1

r′ = 0 1 1 0 0 1 0 1c′ = 0 0 0 0 1 0 1 0

After computing r′ and c′, we combine them via the second step by computing r = r′ + 2 · c′ using a standard(e.g., ripple-carry) adder. You could think of this step as propagating the carries, now represented separately(from the sum) by c′.

Example 4.8. Consider computation of the actual sum from r′ = 01100101 and c′ = 00001010:

r′ = 0 1 1 0 0 1 0 12 · c′ = 0 0 0 0 1 0 1 0 0 +

c = 0 0 0 0 0 0 1 0 0 0r = 121(10) 7→ 0 0 1 1 1 1 0 0 1


Given we need this step to produce r, it is reasonable to question to ask why we bother with this approach atall: it seems as if we are limited in the same way as if we used a ripple-carry adder in the first place. With m = 1compression step, the answer is that we have a critical path of O(1)+O(n) gate delays vs. O(n)+O(n) if we usedtwo ripple-carry adders (one to compute t = x + y, then another to compute r = t + z). The more general idea,however, is we compute many compression steps (i.e., m > 1) and then a single, addition step: if we do this, thecost associated with the addition step becomes less significant (i.e., is amortised) as m grows larger. Later, inSection 4.5 when we look at designs for multiplication, the utility of this approach should become clear.

4.3.2 Subtraction

4.3.2.1 Redesigning a ripple-carry adder

Subtraction is conceptually, and so computationally similar to addition. In essence, the same steps are evident:we again work from the least-significant, right-most digits (i.e., x0 and y0) towards the most-significant, left-most digits (i.e., xn−1 and yn−1). At each i-th step (or column), we now compute the difference of the i-th digitsxi and yi and a borrow-in produced by the previous, (i − 1)-th step; this difference is potentially smaller thanzero, so we produce the i-th digit of the result and a borrow-out into the next, (i + 1)-th step. This description isformalised in a similar way by Algorithm 2. Note that although the name c is slightly counter-intuitive (it nowrepresents a borrow- rather than carry-chain), we stick to the same notation as an adder to stress the similaruse.

Example 4.9. Consider the following unsigned, base-10 subtraction of y = 14(10) from x = 107(10)

x = 107(10) 7→ 1 0 7y = 14(10) 7→ 0 1 4 −c = 0 1 0 0r = 93(10) 7→ 0 9 3


i xi yi ci r xi − yi − ci ci+1 ri r′

〈0, 0, 0〉〈0, 0, 0〉0 7 4 0 〈0, 0, 0〉 3 0 3 〈3, 0, 0〉1 0 1 0 〈0, 0, 0〉 −1 1 9 〈3, 9, 0〉2 1 0 1 〈0, 0, 0〉 0 0 0 〈3, 9, 0〉

0 〈3, 9, 0〉

which produces r = 93(10) as expected.

Example 4.10. Consider the following unsigned, base-2 subtraction of y = 14(10) = 00001110(2) from x = 107(10) =01101011(2)

x = 107(10) 7→ 0 1 1 0 1 0 1 1y = 14(10) 7→ 0 0 0 0 1 1 1 0 −c = 0 0 0 1 1 1 0 0 0r = 93(10) 7→ 0 1 0 1 1 1 0 1

git # ba293a0e @ 2019-11-14 154



Half-Subtractorx y bo d0 0 0 00 1 1 11 0 0 11 1 0 0

(a) The half-subtractor as a truth table.

yx d

bo

(b) The half-subtractor as a circuit.

Full-Subtractorbi x y bo d0 0 0 0 00 0 1 1 10 1 0 0 10 1 1 0 01 0 0 1 11 0 1 1 01 1 0 0 01 1 1 1 1

(c) The full-subtractor as a truth table.

yx

bi

d

bo

(d) The full-subtractor as a circuit.

Figure 4.7: An overview of half- and full-subtractor cells.


i xi yi ci r xi − yi − ci ci+1 ri r′

〈0, 0, 0, 0, 0, 0, 0, 0〉〈0, 0, 0, 0, 0, 0, 0, 0〉0 1 0 0 〈0, 0, 0, 0, 0, 0, 0, 0〉 1 0 1 〈1, 0, 0, 0, 0, 0, 0, 0〉1 1 1 0 〈1, 0, 0, 0, 0, 0, 0, 0〉 0 0 0 〈1, 0, 0, 0, 0, 0, 0, 0〉2 0 1 0 〈1, 0, 0, 0, 0, 0, 0, 0〉 −1 1 1 〈1, 0, 1, 0, 0, 0, 0, 0〉3 1 1 1 〈1, 0, 1, 0, 0, 0, 0, 0〉 −1 1 1 〈1, 0, 1, 1, 0, 0, 0, 0〉4 0 0 1 〈1, 0, 1, 1, 0, 0, 0, 0〉 −1 1 1 〈1, 0, 1, 1, 1, 0, 0, 0〉5 1 0 1 〈1, 0, 1, 1, 1, 0, 0, 0〉 0 0 0 〈1, 0, 1, 1, 1, 0, 0, 0〉6 1 0 0 〈1, 0, 1, 1, 1, 0, 0, 0〉 1 0 1 〈1, 0, 1, 1, 1, 0, 1, 0〉7 0 0 0 〈1, 0, 1, 1, 1, 0, 1, 0〉 0 0 0 〈1, 0, 1, 1, 1, 0, 1, 0〉

0 〈1, 0, 1, 1, 1, 0, 1, 0〉


Since the algorithm is more or less the same, it follow that a circuit to implement it would also be the same:Figure 2 illustrates this. The only difference, of course, is in the loop body, where we need the subtractionequivalent to half- and full-adder cells. More specifically, we need 1) a half-subtractor that takes two 1-bit values,say x and y, and subtracts one from the other to produce a difference and a borrow-out, say d and bo, and 2) afull-subtractor that extends a half-subtractor by including a borrow-in bi as an additional input. Unsurprisingly,Figure 4.7 demonstrate the components themselves are simple to design and write as the Boolean expressions

bo = ¬x ∧ yd = x ⊕ y

andbo = (¬x ∧ y) ∨ (¬(x ⊕ y) ∧ bi)d = x ⊕ y ⊕ bi

respectively. Keep in mind that bi and bo perform the same role as ci and co previously: the subtractionanalogue of the ripple-carry adder, an n-bit ripple-borrow subtractor perhaps, is identical except for the borrowchain through all n full-subtractor instances.

git # ba293a0e @ 2019-11-14 155



4.3.2.2 Reusing a ripple-carry adder

As we have seen, subtraction is similar to addition. This is even more obvious still if we write x− y ≡ x + (−y):the subtraction required (on the LHS) could be computed by adding x to the negation of y (on the RHS). Givenwe compute an addition in both cases, we might opt for a second approach by designing a single componentthat allows selection of either addition or subtraction: given a control signal op, we might have

r =

{x + y + ci if op = 0x − y − ci if op = 1

for example. Notice that as well as controlling computation of the sum or difference of x and y, op will controluse of ci as a carry- or borrow-in depending whether an addition of subtraction is computed. The advantageis that, at a high-level, the design

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

ci′ co′

y′ 0x′ 0 r′ 0 y′ 1x′ 1 r′ 1

y′ n−1

x′ n−1

r′ n−1

ciop

co

x y r

includes one internal adder. Versus two separate, similar components (i.e., an adder and a subtractor), this isalready a useful optimisation outright; in designs for multiplication this will be amplified further.

The question is, how should we control the internal inputs to the adder (namely x′, y′ and ci′) st. givenall the external inputs (namely op, x, y and ci) the correct output r is produced? By using two’s-complementrepresentation, we saw in Chapter 1 that

−y 7→ ¬y + 1

for any given y. The idea is to use this identity, translating from what we want to compute into what we alreadycan compute:

op ci r0 0 x + y + ci0 1 x + y + ci1 0 x − y − ci1 1 x − y − ci

≡

op ci r0 0 x + y + 00 1 x + y + 11 0 x − y − 01 1 x − y − 1

≡

op ci r0 0 x + y + 00 1 x + y + 11 0 x + (¬y + 1) − 01 1 x + (¬y + 1) − 1

≡

op ci r0 0 x + y + 00 1 x + y + 11 0 x + (¬y) + 11 1 x + (¬y) + 0

The left-most table just captures what we said above: if op = 0 (in the top two rows) we want to computex + y + ci, but if op = 1 (in the bottom two rows) we want to compute x − y − ci. Moving from left-to-right, wesubstitute in values of ci then apply the identity for −y in the bottom rows; the right-most table simply foldsthe constants together. In the right-most table, all the cases (for addition and subtraction, so where op = 0 orop = 1) are of the same form, which we can cope with using the internal adder: we have op, x, y and ci, so canjust translate via

op ci xi yi ci′ x′i y′i0 0 0 0 0 0 00 1 1 1 1 1 11 0 0 0 1 0 11 1 1 1 0 1 0

i.e., ci′ = ci ⊕ op, x′i = xi and y′i = yi ⊕ op. That is x is unchanged whereas yi and ci are XOR’ed with op toconditionally invert them (in the bottom two rows, where we need ¬yi rather than yi). Figure 4.4 illustrates theresult, where it is important to see that the overhead, versus in this case a ripple-carry adder, is simply extran + 1 XOR gates.

4.3.3 Carry and overflow detection

Consider the addition of some n-bit inputs x and y: the magnitude of r will be too large to represent in n bits,st. it is incorrect, if either

1. x and y are (and hence the addition is) unsigned and there is a carry-out, or

git # ba293a0e @ 2019-11-14 156



2. x and y are (and hence the addition is) signed but the sign of r makes no sense

which are termed carry and overflow errors respectively. The two cases can be illustrated using some (specific)examples:

Example 4.11. Consider the following unsigned, base-2 addition of x = 15(10) 7→ 1111(2) to y = 1(10) 7→ 0001(2)

x = 15(10) 7→ 1 1 1 1y = 1(10) 7→ 0 0 0 1 +c = 1 1 1 1 0r = 0(10) 7→ 0 0 0 0

which produces an incorrect result r = 0000(2) 7→ 0(10) due to a carry error.

Example 4.12. Consider the following signed, base-2 addition of x = −1(10) 7→ 1111(2) to y = 1(10) 7→ 0001(2)(both represented using two’s-complement)

x = −1(10) 7→ 1 1 1 1y = 1(10) 7→ 0 0 0 1 +c = 1 1 1 1 0r = 0(10) 7→ 0 0 0 0

which produces a correct result r = 0000(2) 7→ 0(10).

Example 4.13. Consider the following signed, base-2 addition of x = 7(10) 7→ 0111(2) to y = 1(10) 7→ 0001(2) (bothrepresented using two’s-complement)

x = 7(10) 7→ 0 1 1 1y = 1(10) 7→ 0 0 0 1 +c = 0 1 1 1 0r = −8(10) 7→ 1 0 0 0

which produces an incorrect result r = 1000(2) 7→ −8(10). due to an overflow error.

To deal with such errors in a sensible manner, we really need two steps: 1) detect that the error has occurred,then 2) apply some mechanism, e.g., to communicate or correct the error.

Detecting the carry error is simple: as suggested by the first example above, we need to inspect the carry-out.In this example, that assumes n = 4, the correct result r = 16 has a magnitude which cannot be accommodatedin the number of bits available; an incorrect result r = 0 is therefore produced, with the carry-out (i.e., the factthat if the result is computed by Algorithm 1, it produces co = 1) signalling an error. However, notice that if wehave signed x and y, as in the second example, any carry-out is irrelevant: in this case, the result r = 0 is correctand the carry-out should be discarded.

This suggests detecting the overflow error requires more thought, with the third example suggesting astarting point. In this case, x is the largest positive integer we can represent using n = 4 bits; adding y = 1means the value wraps-around (as discussed in Chapter 1) to form a negative result r = −8. Clearly this isimpossible, in the sense that for positive x and y we can never end up with a negative sum: this mismatchallows us to conclude than an overflow error occurred. More specifically, in the case of addition, we apply thefollowing set of rules (with a similar set possible for subtraction):

x +ve y -ve ⇒ no overflowx -ve y +ve ⇒ no overflowx +ve y +ve r +ve ⇒ no overflowx +ve y +ve r -ve ⇒ overflowx -ve y -ve r +ve ⇒ overflowx -ve y -ve r -ve ⇒ no overflow

Note that testing the sign of x or y is trivial, because it will be determined by their MSBs as a result of howtwo’s-complement is defined: x is positive, for example, iff. xn−1 = 0 and negative otherwise. Based on this,detection of an overflow error is computed as

o f = ( xn−1 ∧ yn−1 ∧ ¬rn−1 ) ∨

( ¬xn−1 ∧ ¬yn−1 ∧ rn−1 )

or in words: “there is an overflow if either x is positive and y is positive and r is negative, or if x is negativeand y is negative and r is positive”. This can be further simplified to

o f = cn−1 ⊕ cn−2

where c is the carry chain during addition of x and y: basically this XORs the carry-in and the carry-out of the(n − 1)-th full-adder. As such, an overflow is signalled, i.e., o f = 1, in two cases: either

git # ba293a0e @ 2019-11-14 157



An aside: shift operators in C and Java.

The fact there are two different classes of shift operation demands some care when writing programs; putsimply, in a given programming language you need to make sure you select the correct operator. In C, bothleft- and right-shifts use the operators << and >> irrespective of whether they are arithmetic or logical; the typeof the operand being shifted dictates the class of shift. For example

1. if x is of type int (i.e., x is a signed integer) then the expression x >> 2 implies an arithmetic right-shift,whereas

2. if x is of type unsigned int (i.e., x is an unsigned integer) then the expression x >> 2 implies a logicalright-shift.

In contrast, Java has no unsigned integer data types so needs to take a different approach: arithmetic andlogical right-shifts are specified by two different operators, meaning

1. the expression x >> 2 implies an arithmetic right-shift. whereas

2. the expression x >>> 2 implies a logical right-shift,

1. cn−1 = 0 and cn−2 = 1, which can only occur of xn−1 = 0 and yn−1 = 0 (i.e., x and y are both positive but r isnegative), or

2. cn−1 = 1 and cn−2 = 0, which can only occur of xn−1 = 1 and yn−1 = 1 (i.e., x and y are both negative but ris positive).

Once an error condition is detected (during a relevant operation by the ALU, for example), the next questionis what to do about it: clearly the error needs to be managed somehow, or the incorrect result will be used asnormal. There are numerous options, but two in particular illustrate the general approach:

1. provide the incorrect result as normal, (e.g., truncate the result to n bits by discarding bits we cannotaccommodate), but signal the error condition somehow (e.g., via a status register or some form ofexception), or

2. fix the incorrect result somehow, according to pre-planned rules (e.g., saturate or clamp the result to thelargest integer we can represent in n bits).

In short, the choice is between delegating responsibility to whatever is using the ALU (in the former) andmaking the ALU itself responsible (in the latter); both have advantages and disadvantages, and may thereforebe appropriate in different situations.

4.4 Components for shift and rotation

4.4.1 Introductory concepts and theory

4.4.1.1 Abstract shift operations, described as arithmetic

Although one would not normally think of doing a long-hand shift, as with addition or subtraction, it ispossible to consider such an operation in arithmetic terms: a shift of some base-b integer x by a distance of ydigits has the same effect as multiplying x by by. That is,

r = x · by = (n−1∑i=0

xi · bi ) · by

= (n−1∑i=0

xi · bi· by )

=n−1∑i=0

xi · bi+y

Notice that if y is positive it increases the weight associated with a given digit xi, hence “shifting” said digit tothe left in the sense it assumes a more-significant position. If y is negative, on the other hand, it decreases the

git # ba293a0e @ 2019-11-14 158



weight associated with xi and the digit “shifts” to the right; in this case, the operation acts as a division instead,because clearly

x · b−y = x ·1by =

xby .

This argument applies for any b, and, as you might expect, we will ultimately be interested in b = 2 since thisaligns with our approach for representing integers.

Example 4.14. Consider a base-10 shift of x = 123(10) by y = 2

r = x · by =n−1∑i=0

xi · bi+y

= x0 · b0+2 + x1 · b1+2 + x2 · b2+2

= 3 · 102 + 2 · 103 + 1 · 104

= 300 + 2000 + 10000= 12300(10)

which produces r = x · by = 123(10) · 102 = 12300(10) as expected.

Example 4.15. Consider a base-10 shift of x = 123(10) by y = −2

r = x · by =n−1∑i=0

xi · bi+y

= x0 · b0−2 + x1 · b1−2 + x2 · b2−2

= 3 · 10−2 + 2 · 10−1 + 1 · 100

= 0.03 + 0.2 + 1= 1.23(10)

which produces r = x · by = 123(10) · 10−2 = 1.23(10) as expected.

Example 4.16. Consider a base-2 shift of x = 51(10) = 110011(2) by y = 2

r = x · by =n−1∑i=0

xi · bi+y

= x0 · b0+2 + x1 · b1+2 + x2 · b2+2 + x3 · b3+2 + x4 · b4+2 + x5 · b5+2

= 1 · 22 + 1 · 23 + 0 · 24 + 0 · 25 + 1 · 26 + 1 · 27

= 4 + 8 + 0 + 0 + 64 + 128= 11001100(10)= 204(10)

which produces r = x · by = 51(10) · 102 = 204(10) as expected.

Example 4.17. Consider a base-2 shift of x = 51(10) = 110011(2) by y = −2

r = x · by =n−1∑i=0

xi · bi+y

= x0 · b0−2 + x1 · b1−2 + x2 · b2−2 + x3 · b3−2 + x4 · b4−2 + x5 · b5−2

= 1 · 2−2 + 1 · 2−1 + 0 · 20 + 0 · 21 + 1 · 22 + 1 · 23

= 0.25 + 0.5 + 0 + 0 + 4 + 8= 1100.11(2)= 12.75(10)

which produces r = x · by = 51(10) · 10−2 = 12.75(10) as expected.

4.4.1.2 Concrete shift (and rotate) operations of n-bit sequences

Recall from Chapter 1 that we represent signed or unsigned integers using an n-bit sequence or an equivalentliteral. For example, wrt. an unsigned representation using n = 8 bits, each of the following

x = 218(10)= 11011010(2)7→ 〈0, 1, 0, 1, 1, 0, 1, 1〉7→ 11011011

describes the same value: using the literal notation in what follows is more natural, but keep in mind that theequivalence above allows us to translate the same reasoning to any of the alternatives.

Based on our description the in previous Section, we need to consider what a shift operation means whenapplied to an integer represented by an n-bit sequence.

git # ba293a0e @ 2019-11-14 159



Definition 4.1. Two types of shift can be applied to an n-bit sequence x:

1. a left-shift, where y > 0, can be defined as

r = x� y = xn−1−abs(y) ‖ · · · ‖ x1 ‖ x0 ‖ ?? . . .?︸︷︷︸n bits

and

2. a right-shift, where y < 0, can be defined as

r = x� y = ?? . . .? ‖ xn−1 ‖ · · · ‖ xabs(y)+1 ‖ xabs(y)︸︷︷︸n bits

where y is termed the distance, and each ? represents a “gap” bit that must be filled to ensure r has n bits.

Definition 4.2. When computing a shift, any gap is filled in according to some rules:

1. logical shift, where left-shift discards MSBs and fills the gap in LSBs with zeros, and right-shift discards LSBsand fills the gap in MSBs with zeros, and

2. arithmetic shift, where left-shift discards MSBs and fills the gap in LSBs with zeros, and right-shift discards LSBsand fills the gap in MSBs with a sign bit.

Phrased in this way, a rotate operation (of some x by a distance of y) is the same as a logical shift except that any gap isfilled by the other end of x rather than zero: that is,

1. a left-rotate (for which we use the operator≪, vs. � for the corresponding shift) yields a gap in the LSBs whichis filled by the MSBs that would be discarded by a left-shift, and

2. a right-rotate (for which we use the operator ≫, vs. � for the corresponding shift) yields a gap in the MSBswhich is filled by the LSBs that would be discarded by a right-shift.

Example 4.18. Consider the base-2 shift and rotation of an n = 8 bit x = 218(10) = 11011010(2) ≡ 11011010 by adistance of y = 2:

1. logical left- and right-shift produce

x�u y = 11011010�u 2 = 01101000x�u y = 11011010�u 2 = 00110110

2. arithmetic left- and right-shift produce

x�s y = 11011010�s 2 = 01101000x�s y = 11011010�s 2 = 11110110

and

3. logical left- and right-rotate produce

x≪ y = 11011010≪ 2 = 01101011x≫ y = 11011010≫ 2 = 10110110

These examples hopefully illustrate the somewhat convoluted definitions more clearly: in reality, the underly-ing concepts are reasonably simple. Consider the the logical left-shift: looking step-by-step at

x�u y = 11011010�u 2= 011010??= 01101000

the idea is that

1. we discard two bits from the left-hand, most-significant end because they cannot be accommodated, plus

2. at the right-hand, less-significant end we need to fill the resulting gap: this is a logical shift, so they arereplaced with 0.

git # ba293a0e @ 2019-11-14 160



On the other hand, some more subtle points are important. First, note the importance of knowing n whenperforming these operations. If we did not know n, or did not fix it say, a left-shift would just elongate theliteral: instead of discarding MSBs, the literal grows to form an n + y bit result. Likewise, if we do not know nthen rotate cannot be defined in a sane way; a left-rotate cannot fill the gap in the LSBs with discarded MSBs,because they are not discarded! Second, use of terminology including “left” and “right” explains why it iseasier to reason about these operations by using literals. In short, doing so means the operations both havethe intuitive effect by moving elements left or right: using a sequence, under our notation at least, the effect iscounter-intuitive (i.e., the wrong way around). Third, and finally, if y is a known, fixed constant then shift androtate operations require no actual arithmetic: we are simply moving bits in x left or right. As a result, a circuitthat left- or right-shifts x by a fixed distance y simply connects wires from each xi (or zero say, to fill LSBs orMSBs) to ri+y. We can use this fact as a building block in more general circuits that can cater for scenarios wherey is not fixed. Even then, however, we typically assume a) the sign of y is always positive (which is captured inthe above via use of abs(y)), which is sane because we have specific left- and right-shift (or rotate) operationsvs. one generic operation, and b) the magnitude of y is restricted to 0 ≤ y < n meaning y has m = dlog2(n)e bits.We then have a choice of how to deal with a y outside this range. Typically, we either let r be undefined for anyy > n or y < 0 or consider y modulo n.

Although logical and arithmetic left-shift are equivalent (i.e., a gap is zero-filled in both cases), this is notso for right-shift; as such, it is fair to question why arithmetic right-shift is included as a special case. Recallthe original description above, where a shift of x by y was equated to a multiplication of x by by. If x uses asigned representation, a multiplication, and therefore also a shift, ideally preserves the sign: if x is positive(resp. negative) then we expect x · by to be positive (resp. negative). This is essentially the purpose of anarithmetic right-shift, in the sense it preserves the sign of x and hence has the correct arithmetic meaning. Boththe underlying issue and impact of this special case is clarified by an example:

Example 4.19. Assuming n = 8, consider that

x = −38(10) 7→ 11011010x/2 = −38(10)/2 = −19(10) 7→ 11101101

when represented using two’s-complement. We know a shift using y = −1 should mean

x · by = x · 2−1 = x ·12

=x2.

However, using logical right-shift, we get

r = x�u y = 11011010�u 1= 011011017→ 109(10)

whereas if we use arithmetic right-shift we get

r = x�s y = 11011010�s 1= 111011017→ −19(10)

as expected. In the former we fill the MSBs with zero, which turns x into a positive r; in the later we fill theMSBs with the sign bit of x to preserves the sign in r. This highlights the reason there is no need for a specialcase for arithmetic left-shift. With right-shift we fill MSBs of, so dictate the sign bit of, the result; in contrast, aleft-shift means filling LSBs in the result, so the sign bit remains as is (i.e., is preserved by default).

4.4.2 Iterative designs

Imagine we want to shift some x (wlog. left or right) by a distance of y, then shift that result again by a distancey′; a subtle but important fact is that we can combine the two shifts into one, i.e., we know that

(x� y)� y′ ≡ x� (y + y′).

Example 4.20. Consider the base-2, logical left-shift of an n = 8 bit x = 218(10) = 11011010(2) ≡ 11011010, first bya distance of y = 2 then by a distance of y′ = 4:

( x� y ) � y′ = ( 11011010� 2 ) � 4= ( 01101000 ) � 4= 10000000

x � (y + y′) = 11011010 � (2 + 4)= 11011010 � 6= 10000000

git # ba293a0e @ 2019-11-14 161



Input: An n-bit sequence x, and an unsigned integer distance 0 ≤ y < nOutput: The n-bit sequence x� y

1 r← x2 for i = 0 upto y − 1 step +1 do3 r← r� 14 end5 return r

Algorithm 3: An algorithm for n-bit (left-)shift.

Input: An n-bit sequence x, and an unsigned integer distance 0 ≤ y < nOutput: The n-bit sequence x� y

1 r← x, m← dlog2(n)e2 for i = 0 upto m − 1 step +1 do3 if yi = 1 then4 r← r� 2i

5 end6 end7 return r

Algorithm 4: An algorithm for n-bit (left-)shift.

r � 1 r′

Figure 4.8: An iterative design for n-bit (left-)shift described using a circuit diagram.

� 20c

x

yr � 21

c

x

yr

c

x

yr � 2m−1

c

x

yrx r

y0 y1 ym−1

y

Figure 4.9: A combinatorial design for n-bit (left-)shift described using a circuit diagram.

git # ba293a0e @ 2019-11-14 162



Using the same reasoning, it should be obvious that if y = 6 then

r = x� y = x� 6= (((((x� 1)� 1)� 1)� 1)� 1)� 1

Put another way, we can decompose one large shift on the LHS into several smaller shifts on the RHS: sixrepeated shifts each by a distance of 1 bit produce the same result as one shift by a distance of 6 bits. Thisapproach is formalised by Algorithm 3.

Example 4.21. Consider the following trace of Algorithm 3, for y = 6(10):

i r r′

x0 x x� 1 r′ ← r� 11 x� 1 x� 2 r′ ← r� 12 x� 2 x� 3 r′ ← r� 13 x� 3 x� 4 r′ ← r� 14 x� 4 x� 5 r′ ← r� 15 x� 5 x� 6 r′ ← r� 1

x� 6

which produces r = x� 6 as expected.

Figure 4.8 captures the components required to implement this algorithm; the design highlights a trade-offbetween area and latency in which smaller area is favoured. Specifically, we only need a) a register to store r(left-hand side), and b) a component to perform a 1-bit left-shift (center), which realises line #3) of Algorithm 3and so needs no actual logic (since the shift distance is constant). However, this data-path demands an associatedcontrol-path that realises the loop. We can do so using an FSM of course: in each i-th step, the FSM latchesr′ (representing the combinatorial result r � 1) into r ready for the (i + 1)-th step; implementation of such anFSM clearly demands a register to hold i and suitable control logic, both of which add somewhat to the area(and design complexity). Even so, the trade-off is essentially that we have a simple computational step but, asa result, need to iterate through y such steps to compute the (eventual) result.

So far so good, but what about right-shift? Or, rotate?! Crucially, we can support the entire suite of shift-likeoperations via a fairly simple alteration to line #3 of Algorithm 3: we simply need to the component at thecenter of our design. For example, if we replace r ← r � 1 with r ← r � 1 we get a design that performs aright- vs. left-shift. Even better, if we replace r← r� 1 with something more involved, namely

r←

r� 1 if op = 0r� 1 if op = 1r≪ 1 if op = 2r≫ 1 if op = 3

then provided we supply op as an extra input, the resulting design can perform left- and right-shift, andleft- and right-rotate: a multiplexer, controlled by op, decides which result of (produced by each different,individual operation) to update r with. We still iterate through y steps, meaning the end result is now a left-or right-shift, or left- or right-rotate by a distance of y. One can view this as an application of the unintegratedALU architecture in Figure 4.1a, but at a lower (or internal, component) level vs. higher, ALU level.

4.4.3 Combinatorial designs

Again following the same reasoning as above, it should be clear that if y = 6 then

r = x� y = x� 6= (x� 2)� 4= (x� 21)� 22

Although the example is the same, the underlying strategy is to express y st. each smaller shift is by a power-of-two distance (i.e., by 2i for some i). As such, if we write y in base-2 then each bit yi tells us whether or notto shift by a distance derived from i: we can compute the result via application of a simple rule “if yi = 1 thenshift the accumulated result by a distance of 2i, otherwise leave it as it is” which is formalised by Algorithm 4.

Example 4.22. Consider the following trace of Algorithm 3, for y = 6(10) st. m = dlog2(n)e = 3:

i r yi r′

x0 x 0 x r′ ← r1 x 1 x� 2 r′ ← r� 21

2 x� 2 1 x� 6 r′ ← r� 22

x� 6

git # ba293a0e @ 2019-11-14 163



which produces r = x� 6 as expected.

Translating the algorithm into a corresponding design harnesses the same idea as the ripple-carry adder: oncen is known, we unroll the loop by copy and pasting the loop body (i.e., lines #3 to #5) n times, replacing i withthe correct value in each i-th copy. Doing so given y = 6, for example, produces the straight-line alternative

1 r← x2 if y0 = 1 then r← r� 20

3 if y1 = 1 then r← r� 21

4 if y2 = 1 then r← r� 22

which makes it (more) clear that we are essentially performing a series of choices: if the (i−1)-th stage producest as output, the i-th uses yi to choose between producing t or t� 2i for use by the (i + 1)-th stage. All the shiftsthemselves are by fixed constants (which we already argued are trivial), so these stages are really just a cascadeof multiplexers.

Figure 4.9 translates this idea into a concrete circuit. The trade-off between latency and area is swapped vs.that for the previous, iterative design. On one hand, the component is combinatorial: it takes 1 step to performeach operation (vs. n), whose latency is dictated by the critical path, and can do so without the need for an FSM.On the other hand, however, it is likely to use significantly more area (relating to the logic gates required foreach multiplexer).

4.5 Components for multiplication

Formally, a multiplication operation1 computes the product r = y·x based on the multiplier y and multiplicandx. Despite a focus on integer values of x and y here, the techniques covered sit within a more general case oftendescribed as scalar multiplication: abstractly, x could be any object from a suitable structure (wlog. an integer,meaning x ∈ Z) that is multiplied, while y is an integer scalar that does the multiplying.

In the case of addition, we covered several possible strategies with some associated trade-offs. This isexacerbated with multiplication, where a much larger design space exists. Even so, the same approach2

is adopted: we again start by investigating the computation above from an algorithmic perspective, thensomehow translate this into a design for a circuit we can construct from logic gates.

4.5.1 Introductory concepts and theory

4.5.1.1 Options for long-hand multiplication

Example 4.23. Consider the following unsigned, base-10 addition of x = 623(10) to y = 567(10):

x = 623(10) 7→ 6 2 3y = 567(10) 7→ 5 6 7 ×

p0 = 7 · 3 · 100 = 21(10) 7→ 2 1p1 = 7 · 2 · 101 = 140(10) 7→ 1 4p2 = 7 · 6 · 102 = 4200(10) 7→ 4 2p3 = 6 · 3 · 101 = 180(10) 7→ 1 8p4 = 6 · 2 · 102 = 1200(10) 7→ 1 2p5 = 6 · 6 · 103 = 36000(10) 7→ 3 6p6 = 5 · 3 · 102 = 1500(10) 7→ 1 5p7 = 5 · 2 · 103 = 10000(10) 7→ 1 0p8 = 5 · 6 · 104 = 300000(10) 7→ 3 0r = 353241(10) 7→ 3 5 3 2 4 1

The idea of long-hand multiplication is that to compute r = y · x (at the bottom) from x and y (at the top), wegenerate and then sum a set of partial products (in the middle): each pi is generated by multiplying a digitfrom y with a digit from x, which we term a digit-multiplication. Within this context, and multiplication ingeneral, we use the following definition:

Definition 4.3. The result of a digit-multiplication between x j and yi is said to be reweighted by the combined weightof the digits being multiplied: if x j has weight j and yi has weight i, the result will have weight j + i.

1 Why write y ·x rather than x · y, which would match addition for example?! Since multiplication is commutative, we could legitimatelyuse the operands either way around: it makes no difference to the result. Given the choice, we opt for y · x basically because it matches thenotation

[y]

x often used for more general scalar multiplication.2 Note that we ignore various optimisations for squaring operations, i.e., a multiplication r = y · x where we know x = y so in fact

r = x2. See, for example, [11, Chapter 12.5].

git # ba293a0e @ 2019-11-14 164



x0x1x2

y0y1y2

y0 · x0y1 · x0

y2 · x0y0 · x1

y1 · x1y2 · x1

y0 · x2y1 · x2

y2 · x2

r5 r4 r3 r2 r1 r0

×

(a) Using operand scanning.

x0x1x2

y0y1y2

y0 · x0y1 · x0y0 · x1

y2 · x0y1 · x1y0 · x2

y2 · x1y1 · x2

y2 · x2

r5 r4 r3 r2 r1 r0

×

(b) Using product scanning.

Figure 4.10: Two examples demonstrating different strategies for accumulation of base-b partial products resulting fromtwo 3-digit operands.

Informally at least, this explains why each partial product is offset by some distance from the right-hand edge.In the example above, note that

• y0 and x0 have weight 0, so p0 = y0 · x0 = 21(10) has weight 0 + 0 = 0,

• y1 and x1 have weight 1, so p4 = y1 · x1 = 12(10) has weight 1 + 1 = 2, and

• y2 and x1 have weight 2 and 1 respectively, so p7 = y2 · x1 = 10(10) has weight 2 + 1 = 3

st. p7 is offset (or left-shifted) by 3 digits and so weighted by 103: during summation of the partial products, itis representing y2 · x1 · 103 = 10000(10).

The question then is how to generate and sum the partial products. It turns out there are (at least) twostrategies for doing so. These are described in Figure 4.10, which hightlights a difference wrt. how thedigit-multiplications are managed. More specifically:

• The left-hand strategy is termed operand scanning, and is formalised by Algorithm 5. The idea is to loopthrough digits of x and y, accumulating the associated digit-multiplications into whatever the relevantdigit of the result r is.

• The right-hand strategy is termed product scanning, and is formalised by Algorithm 6. The idea is toloop through digits of the result r, so that when computing the i-th such digit ri we accumulate all relevantdigit-multiplications stemming from x and y.

Example 4.24. Consider the following trace of Algorithm 5, which computes a base-10 operand scanningmultiplication for x = 623(10) and y = 567(10)

j i r c yi x j t = yi · xi + ri+ j + c r′ c′

〈0, 0, 0, 0, 0, 0〉0 0 〈0, 0, 0, 0, 0, 0〉 0 7 3 21 〈1, 0, 0, 0, 0, 0〉 20 1 〈1, 0, 0, 0, 0, 0〉 2 7 2 16 〈1, 6, 0, 0, 0, 0〉 10 2 〈1, 6, 0, 0, 0, 0〉 1 7 6 43 〈1, 6, 3, 0, 0, 0〉 40 〈1, 6, 3, 0, 0, 0〉 4 〈1, 6, 3, 4, 0, 0〉1 0 〈1, 6, 3, 4, 0, 0〉 0 6 3 24 〈1, 4, 3, 4, 0, 0〉 21 1 〈1, 4, 3, 4, 0, 0〉 2 6 2 17 〈1, 4, 7, 4, 0, 0〉 11 2 〈1, 4, 7, 4, 0, 0〉 1 6 6 41 〈1, 4, 7, 1, 0, 0〉 41 〈1, 4, 7, 1, 0, 0〉 4 〈1, 4, 7, 1, 4, 0〉2 0 〈1, 4, 7, 1, 4, 0〉 0 5 3 22 〈1, 4, 2, 1, 4, 0〉 22 1 〈1, 4, 2, 1, 4, 0〉 2 5 2 13 〈1, 4, 2, 3, 4, 0〉 12 2 〈1, 4, 2, 3, 5, 0〉 1 5 6 35 〈1, 4, 2, 3, 5, 0〉 32 〈1, 4, 2, 3, 5, 0〉 3 〈1, 4, 2, 3, 5, 3〉 3

〈1, 4, 2, 3, 5, 3〉

producing r = 353241(10) as expected.

Example 4.25. Consider the following trace of Algorithm 6, which computes a base-10 product scanning

git # ba293a0e @ 2019-11-14 165



Input: Two unsigned, n-digit, base-b integers x and yOutput: An unsigned, 2n-digit, base-b integer r = y · x

1 r← 02 for j = 0 upto n − 1 step +1 do3 c← 04 for i = 0 upto n − 1 step +1 do5 u · b + v = t← y j · xi + r j+i + c6 r j+i ← v7 c← u8 end9 r j+n ← c

10 end11 return r

Algorithm 5: An algorithm for multiplication of base-b integers using on operand scanning.


1 r← 0, c0 ← 0, c1 ← 0, c2 ← 02 for k = 0 upto n + n − 1 step +1 do3 for j = 0 upto n − 1 step +1 do4 for i = 0 upto n − 1 step +1 do5 if ( j + i) = k then6 u · b + v = t← y j · xi7 c · b + c0 = t← c0 + v8 c · b + c1 = t← c1 + u + c9 c2 ← c2 + c

10 end11 end12 end13 rk ← c0, c0 ← c1, c1 ← c2, c2 ← 014 end15 rn+n−1 ← c016 return r

Algorithm 6: An algorithm for multiplication of base-b integers using on product scanning.

git # ba293a0e @ 2019-11-14 166




1 r← 02 for i = 0 upto y − 1 step +1 do3 r← r + x4 end5 return rAlgorithm 7: An algorithm for multiplication, using repeated addition (with y treated as any integer).


1 r← 02 for i = 0 upto y − 1 step +1 do3 r← r + yi · x · bi

4 end5 return r

Algorithm 8: An algorithm for multiplication, using repeated addition (with y is written in base-b).

multiplication for x = 623(10) and y = 567(10)

k j i r c2 c1 c0 yi x j t = yi · xi r′ c′2 c′1 c′0〈0, 0, 0, 0, 0, 0〉 0 0 0

0 0 0 〈0, 0, 0, 0, 0, 0〉 0 0 0 7 3 21 〈0, 0, 0, 0, 0, 0〉 0 2 10 〈0, 0, 0, 0, 0, 0〉 0 2 1 〈1, 0, 0, 0, 0, 0〉 0 0 21 0 1 〈1, 0, 0, 0, 0, 0〉 0 0 2 7 2 14 〈1, 0, 0, 0, 0, 0〉 0 1 61 1 0 〈1, 0, 0, 0, 0, 0〉 0 1 6 6 3 18 〈1, 0, 0, 0, 0, 0〉 0 3 41 〈1, 0, 0, 0, 0, 0〉 0 3 4 〈1, 4, 0, 0, 0, 0〉 0 0 32 0 2 〈1, 4, 0, 0, 0, 0〉 0 0 3 7 6 42 〈1, 4, 0, 0, 0, 0〉 0 4 52 1 1 〈1, 4, 0, 0, 0, 0〉 0 4 5 6 2 12 〈1, 4, 0, 0, 0, 0〉 0 5 72 2 0 〈1, 4, 0, 0, 0, 0〉 0 4 7 5 3 15 〈1, 4, 0, 0, 0, 0〉 0 7 22 〈1, 4, 0, 0, 0, 0〉 0 7 2 〈1, 4, 2, 0, 0, 0〉 0 0 73 1 2 〈1, 4, 2, 0, 0, 0〉 0 0 7 6 6 36 〈1, 4, 2, 0, 0, 0〉 0 4 33 2 1 〈1, 4, 2, 0, 0, 0〉 0 4 3 5 2 10 〈1, 4, 2, 0, 0, 0〉 0 5 33 〈1, 4, 2, 0, 0, 0〉 0 5 3 〈1, 4, 2, 3, 0, 0〉 0 0 54 2 2 〈1, 4, 2, 3, 0, 0〉 0 0 5 5 6 30 〈1, 4, 2, 3, 0, 0〉 0 3 54 〈1, 4, 2, 3, 0, 0〉 0 3 5 〈1, 4, 2, 3, 5, 0〉 0 0 3

〈1, 4, 2, 3, 5, 0〉 0 0 3 〈1, 4, 2, 3, 5, 3〉 0 0 3〈1, 4, 2, 3, 5, 3〉

producing r = 353241(10) as expected.

Notice that given n-digit x and y, we produce a larger 2n-digit product r = y · x; this can be rewritten as

y · x = r1 · bn + r0

to show the 2n-digit r can be considered as two n-digit halves of the same size as x and y. The reason fordoing so is to stress the fact that although we typically want to compute r, sometimes it is enough to computer0: this so-called truncated multiplication basically just discards r1, the n most-significant digits of r (or doesnot compute them in the first place), and retains r0.

4.5.1.2 Multiplication as repeated addition

In Section 4.3.1, the study of long-hand addition led naturally to a implementable design: the ripple-carryadder in Figure 4.2 is a very direct translation of Algorithm 1. It is harder to make the same claim here, in thesense there is no (or at least a lot less of an) obvious route from one to the other. This suggests taking a stepback to rethink what multiplication actually means.

At a more fundamental level than the long-hand approaches described, one can view multiplication as justrepeated addition. Put another way,

r = y · x = x + x + · · · + x + x︸︷︷︸y terms

,

git # ba293a0e @ 2019-11-14 167



st. if we select y = 14(10), then we obviously have

r = 14 · x = x + x + x + x + x + x + x + x + x + x + x + x + x + x.

This is important because we already covered how to compute an addition, plus how to design associatedcircuits. So to compute a multiplication, we essentially just need to reuse our addition circuit in the right way:Algorithm 7 states the obvious, in the sense it captures this idea by simply adding x to r (which is initialised to0) in a loop that iterates y times. Directly using repeated addition is unattractive, however, since the numberof operations performed relates to the magnitude of y. That is, we need y − 1 operations3 in total, so for somen-bit y we perform O(2n) operations: this grows quickly, even for modest values of n (say n = 32).

Fortunately, improvements are easy to identify. Another way to look at the multiplication of x by y is asinclusion of an extra weight to the digits that describe y. That is, writing y in base-b yields

r = y · x = (n−1∑i=0

yi · 2i) · x

=n−1∑i=0

yi · x · 2i

Example 4.26. Consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2), which we can expand into a sumof n = 4 terms as follows:

y · x = y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23

= 0 · x · 20 + 1 · x · 21 + 1 · x · 22 + 1 · x · 23

= 0 · x + 2 · x + 4 · x + 8 · x= 14 · x

Intuitively, this should already seem more attractive: there are only n terms (relating to the n digits in y) inthe summation so we only need n − 1, or O(n), operations to compute their sum. Using a similar format toAlgorithm 7, this is formalised by Algorithm 8. However, a problem lurks: line #3 of the algorithm reads

r← r + yi · x · bi

or, put another way, our goal is to compute a multiplication but each step in doing so needs a further twomultiplications itself! This is a chicken-and-egg style problem, but can be resolved by our selecting b = 2:

1. Multiplying x by yi can be achieved without a multiplication: given we know yi ∈ {0, 1, . . . b − 1} = {0, 1},if yi = 0 then yi · x = 0, and if yi = 1 then yi · x = x. Put simply, we make a choice between 0 and x using yirather than multiply x by yi.

2. Multiplying r by 2 can be achieved without a multiplication: clearly 2 · r = r + r = r� 1, so we use a shift(or, if you prefer, an addition) instead.

So, in short, these facts mean the two multiplications in line #3 are pseudo-multiplications (or “fake” multipli-cations) because we can replace them with a non-multiplication equivalent whenever b = 2.

4.5.1.3 A high-level overview of the design space

Although the reformulations above might not seem useful yet, they represent an important starting point fromwhich we can later construct various concrete algorithms and associated designs and implementations. Withinthe large design space of all possible options, we will focus on a selection summarised as follows:

more timeless space

less timemore space

iterative,bit-serial

Section 4.5.2

iterative,digit-serial

Section 4.5.3

combinatorial,digit-parallel

combinatorial,bit-parallel

Section 4.5.4

3 Why y − 1 and not y: Algorithm 7 certainly performs y iterations of the loop! If you think about it, although doing so would makeit more complicated, the algorithm could be improved given we know the first addition (i.e., when i = 0) will add x to 0. Put another way,we could avoid this initial addition and simply initialise r ← n and perform one less iteration (i.e., y − 1 vs. y). Although the differenceis minor, and so a distraction from the main argument here, you can see this more easily by counting the number of + operators in theexpansion above: for y = 14 we have 13 such additions.

git # ba293a0e @ 2019-11-14 168



You can think of options within this design space in a similar way to Section 4.4, where, for example, weoverviewed options for the shift operation. Iterative options for multiplication typically deal with one (or atleast few) partial products in each step; many (i.e., more than 1) steps and hence more time will be requiredto compute the result, but less space is required to do so (essentially because less computation is performedper step). Combinatorial options make the opposite trade-off, requiring just a single step to compute theresult. However, the a) critical path, and so time said step takes due to the associated delay, and b) thespace required, are typically both large(r). Unlike shift operations, a clear separation between iterative andcombinatorial options is harder to make; trade-offs that blur the boundaries between some options are attractive,and explored where relevant.

4.5.1.4 Decomposition (or divide-and-conquer) techniques

Consider two designs which compute r = y · x. Irrespective of how they compute r, they differ wrt. the limitsplaced on x and y: the first design can deal with n-bit y and x, whereas the second can only deal with smaller,m-bit values (st. m < n).

Within this context, consider the specific case of m = n2 (in which we assume n is even). As such, we can

split x and y into two parts, i.e., writex = x1 · 2n/2 + x0y = y1 · 2n/2 + y0

where each xi and yi are n2 -bit integers. Likewise, we can write

r = r2 · 2n + r1 · 2n/2 + r0

wherer2 = y1 · x1r1 = y1 · x0 + y0 · x1r0 = y0 · x0

and st. working through the multiplication as follows

r = y · x = (y1 · 2n/2 + y0) · (x1 · 2n/2 + x0)= (y1 · 2n/2

· x1 · 2n/2) + (y1 · 2n/2· x0) + (y0 · x1 · 2n/2) + (y0 · x0)

= (y1 · x1 · 2n) + (y1 · x0 · 2n/2) + (y0 · x1 · 2n/2) + (y0 · x0)= (y1 · x1) · 2n + (y1 · x0 + y0 · x1) · 2n/2 + (y0 · x0)= r2 · 2n + r1 · 2n/2 + r0

demonstrates the result is correct. The more general, underlying idea is we decompose the single, larger n-bitmultiplication into several, smaller n

2 -bit multiplications: in this case, we compute the larger n-bit product rusing four n

2 -bit multiplications (plus several auxiliary additions). In a sense, this is an instance of divide-and-conquer, a strategy often used in the design of algorithms: sorting algorithms such as merge-sort andquick-sort, for example, will decompose the problem of sorting a larger sequence into that of sorting severalsmaller sequences. The Karatsuba-Ofman [8] (re)formulation4 offers further improvement, by first computing

t2 = y1 · x1t1 = (y0 + y1) · (x0 + x1)t0 = y0 · x0

then rewrites the terms of r asr2 = t2r1 = t1 − t0 − t2r0 = t0

Doing so requires three n2 -bit multiplications (although now there are more auxiliary additions and/or sub-

tractions). This suggests a general trade-off: we could consider performing fewer, larger n-bit multiplicationsor more, smaller n

2 -bit multiplications. If we accept the premise that designs for n2 -bit multiplication will be

inherently less complex than an n-bit equivalent, this leads us to adopt one of (at least) two approaches:

1. instantiate and operate several smaller multipliers in parallel (e.g., compute y1 · x1 at the same time asy0 · x0) in an attempt to reduce the overall latency,

2. instantiate and reuse one smaller multipliers (e.g., first compute y1 ·x1 then y0 ·x0) in an attempt to reducethe overall area.

Although this can be useful in the sense it widens the design space of options, making a decision whether theoriginal monolithic approach or the decomposed approach is better wrt. some metric can be quite subtle (anddepend delicately on the concrete value of n).

4 Other extensions and generalisations also exist. For example, we could apply the strategy recursively (i.e., decomposing the n2 -bit

multiplications in a similar way), or attempt other forms of split (st. x and y are split into say 3 parts, rather than 2 as above).

git # ba293a0e @ 2019-11-14 169



4.5.1.5 Multiplier recoding techniques

The use of multiplier recoding provides a broad set of strategies for improving both iterative and combinatorialdesigns; we focus on examples of the former, but keep in mind the general principles can be applied to both.The underlying idea is to

1. spend some effort before multiplication to recode (or transform) y into some equivalent y′, then

2. be more efficient during multiplication by using y′ as the multiplier rather than y.

This is a rough description, however, because simple (enough) recoding may be possible during multiplicationrather than strictly beforehand.

In simple terms, recoding y means using a different representation: we represent the same value, but in away that allows some sort of advantage during multiplication.

Example 4.27. Consider y = 30(10), here written in base-10. Among many options, some alternative representa-tions for this value are

y = 30(10)7→ 〈0, 1, 1, 1, 1, 0, 0, 0〉(2)7→ 〈2, 3, 1, 0〉(4)7→ 〈0,−1, 0, 0, 0,+1, 0, 0〉(2)7→ 〈−2, 0, 2, 0〉(4)

The first case is y represented in base-2, but, after this, which is somewhat obvious given what we alreadyknow; after this, the cases less obviously represent the same value. However, it is important to see why theyare equivalent. The second case requires a larger digit set. Each yi ∈ {0, 1, 2, 3} as a result of it using a base-4representation, but, even so, we still have

〈2, 3, 1, 0〉(4) 7→ 2 · 40 + 3 · 41 + 1 · 42 + 0 · 43

= 2 + 12 + 16= 30

The third and fourth cases use signed digit sets; for example, in the third case each yi ∈ {−1, 0,+1}. Again, westill have

〈0,−1, 0, 0, 0,+1, 0, 0〉(2) 7→ 0 · 20− 1 · 21 + 0 · 22 + 0 · 23 + 0 · 24 + 1 · 25 + 0 · 26 + 0 · 27

= −2 + 32= 30

It might not be immediately clear why these representations offer any advantage. Intuitively, however, notetwo things:

1. the first case requires eight base-2 digits to represent a given y, but the forth case can do the same withonly four, and

2. the first case requires four non-zero base-2 digits to represent this y, but the forth case can do the samewith only two.

In short, features such as these, when generalised, allow more efficient strategies (in time and/or space) formultiplication using y′ than y.

Whatever representation we select, however, it is crucial that any overhead related to producing and usingthe recoded y′ is always less than the associated improvement during multiplication. Put another way, if theimprovement is small and the overhead is large, then overall we are worse off: we may as well not usingrecoding at all! This requires careful analysis, for example to judge the relative merits of a specific recodingstrategy given a specific n.

4.5.2 Iterative, bit-serial designs

Definition 4.4. As originally written, Horner’s Rule [7] is a method for evaluating polynomials: it states that apolynomial a(x) can be written as

a0 + a1 · x + · · · + an−1· xn−1

≡ a0 + x · (a1 + x · (· · · + x · (an)))

where the RHS factors out powers of the indeterminate x from the LHS.

git # ba293a0e @ 2019-11-14 170



Input: Two unsigned, n-bit, base-2 integers x and yOutput: An unsigned, 2n-bit, base-2 integer r = y · x

1 r← 02 for i = n − 1 downto 0 step −1 do3 r← 2 · r4 if yi = 1 then5 r← r + x6 end7 end8 return rAlgorithm 9: An algorithm for multiplication of base-2 integers using a iterative, left-to-right, bit-serialstrategy.

Input: Two unsigned, n-bit, base-2 integers x and yOutput: An unsigned, 2n-bit, base-2 integer r = y · x

1 r← 02 for i = 0 upto n − 1 step +1 do3 if yi = 1 then4 r← r + x5 end6 x← 2 · x7 end8 return rAlgorithm 10: An algorithm for multiplication of base-2 integers using a iterative, right-to-left, bit-serialstrategy.

This fact provides a starting point for an iterative multiplier design. Consider the similarity between apolynomial

a(x) =

i<n∑i=0

ai · xi

and an integer y represented using a positional number system

y =

i<n∑i=0

yi · bi.

Put simply, there is no difference wrt. the form: only the names of variables are changed, plus b representsan implicit parameter in the latter whereas x is an explicit indeterminate in the former. As a result, we canconsider a similar way of evaluating

y · x =

i<n∑i=0

yi · x · bi

which again has the same form.

Example 4.28. Consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2). As previously stated we wouldwrite

y · x =i<n∑i=0

yi · x · bi

= y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23

git # ba293a0e @ 2019-11-14 171



� 1

+

r

c

x

yr r′

x

yi

Figure 4.11: An iterative, bit-serial design for (n × n)-bit multiplication described using a circuit diagram.

but now this can be rewritten using Horner’s Rule as

y · x = y0 · x + 2 · ( y1 · x + 2 · ( y2 · x + 2 · ( y3 · x + 2 · ( 0 ) ) ) )= 0 · x + 2 · ( 1 · x + 2 · ( 1 · x + 2 · ( 1 · x + 2 · ( 0 ) ) ) )= 0 · x + 2 · ( 1 · x + 2 · ( 1 · x + 2 · ( 1 · x + 0 ) ) )= 0 · x + 2 · ( 1 · x + 2 · ( 1 · x + 2 · ( 1 · x ) ) )= 0 · x + 2 · ( 1 · x + 2 · ( 1 · x + 2 · x ) )= 0 · x + 2 · ( 1 · x + 2 · ( 3 · x ) )= 0 · x + 2 · ( 1 · x + 6 · x )= 0 · x + 2 · ( 7 · x )= 0 · x + 14 · x= 14 · x

There are two sane approaches to evaluate the bracketed expression: we either

1. work inside-out, starting with the inner-most sub-expression and processing y from most- to least-significant bit (i.e., from yn−1 to y0), meaning they are read left-to-right, or

2. work outside-in, starting with the outer-most sub-expression and processing y from least- to most-significant bit (i.e., from y0 to yn−1), meaning they are read right-to-left.

Either way, note that each successive multiplication by 2 eventually accumulates to produce each 2i. Usingy3 · x as an example, we see it multiplied by 2 a total of three times: this means we end up with

2 · (2 · (2 · (y3 · x))) = y3 · x · 23,

and hence the original term required. Putting everything together, to compute the result we maintain anaccumulator r that hold the current (or partial) result during evaluation; using r, the computation could bedescribed as

• start with the inner sub-expression, initially setting r = 0, then

• to realise each step of evaluation, apply a simple 2-part rule: first double the accumulated result (i.e., setr to 2 · r), then add yi · x to the accumulated result (i.e., set r to r + yi · x).

Slightly more formally this implies iterative application of the rule

r← yi · x + 2 · r

which is further formalised by Algorithm 9: notice that line #1 realises the first point above while lines #3 to #6realise the second point, with a loop spanning lines #2 to #7 iterating over them to realise each step. Althoughwe will continue to focus on this approach, it is interesting to note, as an aside, that Algorithm 10 will yield thesame result.

Example 4.29. Consider the following trace of Algorithm 9, for y = 14(10) 7→ 1110(2):

i r yi r′

03 0 1 x r′ ← 2 · r + x2 x 1 3 · x r′ ← 2 · r + x1 3 · x 1 7 · x r′ ← 2 · r + x0 7 · x 0 14 · x r′ ← 2 · r

14 · x

Algorithm 9 is termed the left-to-right variant, since it processes y from the most- down to the least-significantbit (i.e., starting with yn−1, on the left-hand end of y when written as a literal).

git # ba293a0e @ 2019-11-14 172



Input: Two unsigned, n-bit, base-2 integers x and y, an integer digit size dOutput: An unsigned, 2n-bit, base-2 integer r = y · x

1 r← 02 for i = n − 1 downto 0 step −d do3 r← 2d

· r4 if yi...i−d+1 , 0 then5 r← r + yi...i−d+1 · x6 end7 end8 return rAlgorithm 11: An algorithm for multiplication of base-2 integers using a iterative, left-to-right, digit-serialstrategy.


i r x yi r′ x′

0 x0 0 x 0 0 2 · x x′ ← 2 · x1 0 2 · x 1 2 · x 4 · x r′ ← r + x, x′ ← 2 · x2 2 · x 4 · x 1 6 · x 8 · x r′ ← r + x, x′ ← 2 · x3 6 · x 8 · x 1 14 · x 16 · x r′ ← r + x, x′ ← 2 · x

14 · x

Algorithm 10 is termed the right-to-left variant, since it processes y from the least- up to the most-significantbit (i.e., starting with y0, on the right-hand end of y when written as a literal).

Whereas the left-to-right variant only updates r, the right-to-left alternative updates r and x; this may be deemedan advantage for the former, since we only need one register (at least one that is updated in any way, vs. simplya fixed input) rather than two. Beyond this, however, how does either strategy compare to the approachbased on repeated addition which took O(2n) operations in the worst case? In both algorithms, the numberof operations performed is dictated by the number of loop iterations: using Algorithm 9 as an example, ineach iteration we a) always perform a shift to compute r ← 2 · r, then b) conditionally perform an addition tocompute r ← r + x (which will be required in half the iterations on average assuming a random y). In otherwords we perform O(n) operations, which is now dictated by the size of y (say n = 8 or n = 32) rather than thethe magnitude of y (say 2n = 28 = 256 or 2n = 232 = 4294967296) as it was before.

Whether we use Algorithm 9 or Algorithm 10, the general strategy is termed bit-serial multiplicationbecause use the 1-bit value yi in each iteration; the remaining challenge is to translate this strategy into aconcrete design we can implement. We did something similar by translating Algorithm 3 into an iterativedesign for left-shift in Figure 4.8, so can adopt the same idea here: Figure 4.11 outlines a (partial) design thatimplements the loop body (in lines #3 to #6) of Algorithm 9. Notice that, as before,

• the left-hand side shows a register to store r (i.e., the current value of r at the start of the loop body),

• the right-hand side shows a register to store r′ (i.e., the next value of r at the end of the loop body), and

• the middle shows some combinatorial logic that computes r′ from r: this is more complex than theleft-shift case, but the idea is that a) the 1-bit left-shift component computes r � 1 = 2 · r, then b) themultiplexer component selects between 2 · r and 2 · r + x (the latter of which is computed by an adder)depending on yi.

To control this data-path, we again need an FSM: in each i-th step it will take r′ (representing yi · x + 2 · r, perthe above) and latches it back into t ready for the (i + 1)-th step. A similar trade-off is again evident, in thesense that although we only need an adder and multiplexer (plus a register for r and the FSM), the result willbe computed after n steps.

4.5.3 Iterative, digit-serial designs

4.5.3.1 Improvements via standard digit-serial multiplication: using an unsigned digit set

By definition, a bit-serial multiplier processes a 1-bit digit of y in each of n steps. However, this actuallyrepresents a special case of more general digit-serial multiplication: a digit size d is selected, and used torecode y by splitting it into d-bit digits then processed in each of n

d steps (noting that d = 1 is the special casereferred to above). Provided d divides n, extracting each d-bit digit from y is easy: by writing y in binary,recoding it to form y′ means splitting the sequence of bits into d-element sub-sequences.

git # ba293a0e @ 2019-11-14 173




1 r← 02 for i = 0 upto n − 1 step +d do3 if yi+d−1...i , 0 then4 r← r + yi+d−1...i · x5 end6 x← 2d

· x7 end8 return rAlgorithm 12: An algorithm for multiplication of base-2 integers using a iterative, right-to-left, digit-serialstrategy.

Example 4.31. Consider d = 2, and some y st. n = 4: this implies we process nd = 4

2 = 2 digits in y, each of 2 bits.Based on what we covered originally, we already know, for example, that in base-2

r = y · x = (n−1∑i=0

yi · 2i) · x

=n−1∑i=0

yi · x · 2i

= y0 · x · 20 + y1 · x · 21 + y2 · x · 22 + y3 · x · 23

= y0 · x + 2 · (y1 · x + 2 · (y2 · x + 2 · (y3 · x + 2 · (0))))

The only change is to combine y0 and y1 into a single digit whose value is y0 + 2 · y1; this is basically just treatingthe two 1-bit digits in y as one 2-bit digit. By doing so, we can rewrite the expression as follows:

r = y · x = (y0 + 2 · y1) · x · 20 + (y2 + 2 · y3) · x · 22

= y1...0 · x · 20 + y2...3 · x · 22

= y1...0 · x + 22· (y2...3 · x + 22

· (0))

The term y1...0 should be read as “the bits of y from 1 down to 0 inclusive”, so clearly y1...0 ∈ {0, 1, 2, 3}. As such,consider a base-2 multiplication of x by y = 14(10) 7→ 1110(2):

r = y · x = y1...0 · x + 22· (y2...3 · x + 22

· (0))= 10(2) · x + 22

· (11(2) · x + 22· (0))

= 2 · x + 22· (3 · x + 22

· (0))= 2 · x + 12 · x= 14 · x

To implement this new strategy, however, Algorithm 9 needs to be generalised for any d. Recall that for thespecial case of d = 1, we already saw and used a rule

r← yi · x + 2 · r.

Looking at the example above, a similar form

r← yi...i−d+1 · x + 2d· r

can be identified, which differs slightly in both left- and right-hand terms of the addition.

• The right-hand term is simple to accommodate. Rather than multiply r by 2 as before, we now multiplyit by 2d; we already know this can be realised by left-shifting r by a distance of d, i.e., computing2d· r ≡ r� d.

• The left-hand term is more tricky. For d = 1, we needed to compute yi · x but argued doing so wasessentially a choice: because yi ∈ {0, 1}, the result is either 0 or x. Now, each d-bit digit

yi...i−d+1 ∈ {0, 1, . . . 2d− 1}

could be any one of 2d values rather than 21 = 2, so either a) the choice is more involved, i.e., includesmore cases, or b) we abandon the idea of it being a choice at all, instead using a combinatorial (d × n)-bitmultiplier to compute yi...i−d+1 · x directly (related designs are covered in Section 4.5.4); you can view thiscomponent as replacing the multiplexer shown in Figure 4.11, which, by analogy, realised the (1 × n)-bitmultiplication yi · x.

git # ba293a0e @ 2019-11-14 174



Making these changes yields Algorithm 11. Note that line #4 could be implemented via either option above,and that, as outlined above, extracting the digit yi...i−d+1 from y is simple enough that we view is as happeningduring multiplication, ignoring the need to formally recode y into y′ beforehand. Either way, a clear advantageis already evident: we now require n

d steps to compute the result.


i r yi...i−d+1 r′

03 0 11(2) = 3(10) 3 · x r′ ← 22

· r + 3 · x1 3 · x 10(2) = 2(10) 14 · x r′ ← 22

· r + 2 · x14 · x

Assuming use of a combinatorial (d × n)-bit multiplier, one way to think about a digit-serial multiplier is as ahybrid combination of iterative and combinatorial designs: it is iterative, in that it performs n

d steps, but noweach i-th such step utilises a (d × n)-bit combinatorial multiplier component. Given we can select d, the hybridcan be configured to make a trade-off between time and space: larger d implies fewer steps of computation butalso a larger combinatorial multiplier, and vice versa.

4.5.3.2 Improvements via Booth multiplication: using a signed digit set

A question: what is the most efficient way to compute r = 15 · x, i.e., r = y · x for the fixed multiplier y = 15?We already know that left-shifting x by a fixed distance requires no computation, so a reasonable first answermight be to compute

r = 15 · x = 8 · x + 4 · x + 2 · x + 1 · x= 23

· x + 22· x + 21

· x + 20· x

= x� 3 + x� 2 + x� 1 + x� 0

A better strategy exists however: remember that computing a subtraction is (more or less) as easy as anaddition, so we might instead opt for

r = 15 · x = 16 · x − 1 · x= 24

· x − 20· x

= x� 4 − x� 0

Intuitively, this latter strategy should seems preferable given we only sum two terms rather than four. Boothrecoding [2] is a standard recoding-based strategy for multiplication which generalises this example. Althoughvarious versions of the approach are considered in what follows, the advantages they all offer stem from useof a signed representation of y and hence use of addition and subtraction operations.

Original, base-2 Booth recoding

Definition 4.5. Given a binary sequence y, a run of 1 (resp. 0) bits between i and j means yk = 1 (resp. yk = 0) fori ≤ k ≤ j; in simply terms, this means there is a sub-sequence of consecutive bits in y whose value is 1 (resp. 0).

Example 4.33. Consider y = 30(10) 7→ 00011110(2): we can clearly identify

• a run of one 0 bit between i = 0 and j = 0, i.e., yk = 0 for 0 ≤ k ≤ 0,

• a run of four 1 bits between i = 1 and j = 4, i.e., each yk = 1 for 1 ≤ k ≤ 4, and

• a run of three 0 bits between i = 5 and j = 7 i.e., each yk = 0 for 5 ≤ k ≤ 7.

As a starting point we consider base-2 Booth recoding; the idea is to identify a run of 1 bits in y between i andj, and then replace it with a single digit whose weight is 2 j+1

− 2i.

Example 4.34. Consider y = 30(10) 7→ 00011110(2): since there is a run of four 1 bits between i = 1 and j = 4, andthe fact that

2 j+1− 2i = 24+1

− 21 = 25− 21 = 30,

we can recodey = 30(10)7→ 24 + 23 + 22 + 21

7→ 〈0,+1,+1,+1,+1, 0, 0, 0〉(2)

git # ba293a0e @ 2019-11-14 175



intoy′ = 〈0,−1,+0,+0,+0,+1, 0, 0〉(2)7→ −21 + 25

7→ 30(10)

which clearly still represents the same value (albeit now via a signed digit set, st. y′i ∈ {0,±1} vs. yi ∈ {0, 1}).Using the same intuition as previously, the recoded y′ is preferable to y because it has a lower weight (i.e.,number of non-zero digits). We can see the impact this feature has by illustrating how such a y′ might be usedduring multiplication. Given x = 6(10) 7→ 00000110(2), for example, we would normally compute r = y · x as

x = 6(10) 7→ 0 0 0 0 0 1 1 0y = 30(10) 7→ 0 0 0 1 1 1 1 0 ×

p0 = 0 · x · 20 = 0(10) 7→ 0 0 0 0 0 0 0 0p1 = +1 · x · 21 = +12(10) 7→ 0 0 0 0 0 1 1 0p2 = +1 · x · 22 = +24(10) 7→ 0 0 0 0 0 1 1 0p3 = +1 · x · 23 = +48(10) 7→ 0 0 0 0 0 1 1 0p4 = +1 · x · 24 = +96(10) 7→ 0 0 0 0 0 1 1 0p5 = 0 · x · 25 = 0(10) 7→ 0 0 0 0 0 0 0 0p6 = 0 · x · 26 = 0(10) 7→ 0 0 0 0 0 0 0 0p7 = 0 · x · 27 = 0(10) 7→ 0 0 0 0 0 0 0 0r = 180(10) 7→ 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0

and thus accumulate four non-zero partial products. However, by first recoding y into y′ we find

x = 6(10) 7→ 0 0 0 0 0 1 1 0y = 30(10) 7→ 0 0 0 1 1 1 1 0 ×y′(2) = 30(10) 7→ 0 0 +1 0 0 0 −1 0p0 = 0 · x · 20 = 0(10) 7→ 0 0 0 0 0 0 0 0p1 = −1 · x · 21 = −12(10) 7→ 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0p2 = 0 · x · 22 = 0(10) 7→ 0 0 0 0 0 0 0 0p3 = 0 · x · 23 = 0(10) 7→ 0 0 0 0 0 0 0 0p4 = 0 · x · 24 = 0(10) 7→ 0 0 0 0 0 0 0 0p5 = +1 · x · 25 = +192(10) 7→ 0 0 0 0 0 1 1 0p6 = 0 · x · 26 = 0(10) 7→ 0 0 0 0 0 0 0 0p7 = 0 · x · 27 = 0(10) 7→ 0 0 0 0 0 0 0 0r = 180(10) 7→ 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0

which requires accumulation of two non-zero partial products.

Modified, base-4 Booth recoding A base-2 Booth recoding already seems to produce what we want. However,there is a subtle problem: using y′ does not always yield an improvement over y itself. This can be demonstratedby example:

Example 4.35. Consider x = 6(10) 7→ 00000110(2) and y = 5(10) 7→ 00000101(2), which, based on recoding y, wewould compute r = y · x as

x = 6(10) 7→ 0 0 0 0 0 1 1 0y = 5(10) 7→ 0 0 0 0 0 1 0 1 ×y′(2) = 5(10) 7→ 0 0 0 0 +1 −1 +1 −1p0 = −1 · x · 20 = −6(10) 7→ 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0p1 = +1 · x · 21 = +12(10) 7→ 0 0 0 0 0 1 1 0p2 = −1 · x · 22 = −24(10) 7→ 1 1 1 1 1 1 1 1 1 1 1 0 1 0p3 = +1 · x · 23 = +48(10) 7→ 0 0 0 0 0 1 1 0p4 = 0 · x · 24 = 0(10) 7→ 0 0 0 0 0 0 0 0p5 = 0 · x · 25 = 0(10) 7→ 0 0 0 0 0 0 0 0p6 = 0 · x · 26 = 0(10) 7→ 0 0 0 0 0 0 0 0p7 = 0 · x · 27 = 0(10) 7→ 0 0 0 0 0 0 0 0r = 30(10) 7→ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0

This requires accumulation of four non-zero partial products, as would be the case if using y as is.

Based on the original Booth recoding as a first step, to resolve this problem we employ a second recoding stepbased on the y′(2) we already have:

git # ba293a0e @ 2019-11-14 176



1. reading y′ right-to-left, group the recoded digits into pairs of the form (y′i , y′

i+1), then

2. treat each pair as a single digit whose value is y′i + 2 · y′i+1 per

y′i = 0 y′i+1 = 0 7→ 0y′i = +1 y′i+1 = 0 7→ +1y′i = −1 y′i+1 = 0 7→ −1y′i = 0 y′i+1 = +1 7→ +2y′i = +1 y′i+1 = +1 7→ not possibley′i = −1 y′i+1 = +1 7→ +1y′i = 0 y′i+1 = −1 7→ −2y′i = +1 y′i+1 = −1 7→ −1y′i = −1 y′i+1 = −1 7→ not possible

meaning that y′i+1 has twice the weight of y′i .

Given we originally had a signed base-2 recoding of y, we now have a signed base-4 recoding of the same y(termed the modified Booth recoding): each pair represents a digit in {0,±1,±2}. Note that the two invalid (orimpossible) pairs exists because of the original Booth recoding: we cannot encounter them, because the firstrecoding step will have already eliminated the associated run.

Example 4.36. Consider x = 6(10) 7→ 00000110(2) and y = 5(10) 7→ 00000101(2); based on the modified recoding,we would compute r = y · x as

x = 6(10) 7→ 0 0 0 0 0 1 1 0y = 5(10) 7→ 0 0 0 0 0 1 0 1 ×y′(2) = 5(10) 7→ 0 0 0 0 +1 −1 +1 −1y′(4) = 5(10) 7→ +1 +1p0 = +1 · x · 20 = +6(10) 7→ 0 0 0 0 0 1 1 0

p2 = +1 · x · 22 = +24(10) 7→ 0 0 0 0 0 1 1 0

p4 = 0 · x · 24 = 0(10) 7→ 0 0 0 0 0 0 0 0

p6 = 0 · x · 26 = 0(10) 7→ 0 0 0 0 0 0 0 0

r = 30(10) 7→ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0

and this accumulate two non-zero partial products rather than four.

An algorithm for Booth-based recoding Both the first and second recoding steps above are still presented ina somewhat informal manner, because the goal was to demonstrate the idea; to make use of them in practice,we obviously need an algorithm. Fortunately, such an algorithm is simple to construct: notice that in a base-2Booth recoding

• y′i depends on yi−1 and yi, while

• y′i+1 depends on yi and yi+1

and since these digits are paired to form the base-4 Booth recoding, each digit in that depends on yi−1, yi, andyi+1. Thanks to this observation, the recoding process is easier than it may appear: assuming suitable paddingof y (i.e., y j = 0 for j < 0 and j ≥ n), we can produce digits of y′ from a 2- or 3-bit sub-sequence (or window) ofbits in y via

Unsignedbase-2

Signedbase-2

Signedbase-4

yi+1 yi yi−1 y′i+1 y′i y′i/20 0 0 0 0 00 0 1 0 +1 +10 1 0 +1 −1 +10 1 1 +1 0 +21 0 0 −1 0 −21 0 1 −1 +1 −11 1 0 0 −1 −11 1 1 0 0 0

git # ba293a0e @ 2019-11-14 177



Input: An unsigned, n-bit, base-2 integer yOutput: A base-2 Booth recoding y′ of y

1 y′ ← ∅2 for i = 0 upto n step 2 do3 if (i − 1) < 0 then t0 ← 0 else t0 ← yi−14 if i ≥ n then t1 ← 0 else t1 ← yi5 if (i + 1) ≥ n then t2 ← 0 else t2 ← yi+1

6 y′i ←

0 if t = 000(2)+1 if t = 001(2)−1 if t = 010(2)

0 if t = 011(2)0 if t = 100(2)

+1 if t = 101(2)−1 if t = 110(2)

0 if t = 111(2)

y′i+1 ←

0 if t = 000(2)0 if t = 001(2)

+1 if t = 010(2)+1 if t = 011(2)−1 if t = 100(2)−1 if t = 101(2)

0 if t = 110(2)0 if t = 111(2)

7 end8 return y′

Algorithm 13: An algorithm for base-2 Booth recoding.

Input: An unsigned, n-bit, base-2 integer yOutput: A base-4 Booth recoding y′ of y

1 y′ ← ∅2 for i = 0 upto n step 2 do3 if (i − 1) < 0 then t0 ← 0 else t0 ← yi−14 if i ≥ n then t1 ← 0 else t1 ← yi5 if (i + 1) ≥ n then t2 ← 0 else t2 ← yi+1

6 y′i/2 ←

0 if t = 000(2)+1 if t = 001(2)+1 if t = 010(2)+2 if t = 011(2)−2 if t = 100(2)−1 if t = 101(2)−1 if t = 110(2)

0 if t = 111(2)

7 end8 return y′

Algorithm 14: An algorithm for base-4 Booth recoding.

Algorithm 13 and Algorithm 14 capture these rules in algorithms that produce a base-2 and base-4 recodings ofa given y respectively. Crucially, one can unroll the loop to produce a combinatorial circuit. For Algorithm 14say, one would replicates a single recoding cell: each instance of the cell would accepts three bits of y as input(namely yi+1, yi, and yi−1) and produce a digit of the recoding as output. This implies that the recoding couldbe performed during rather than before the subsequent multiplication; the only significant overhead relates toincreased area.

An algorithm for Booth-based multiplication Finally, we can address the problem of using the recodedmultiplier to actually perform the multiplication above: ideally this should be more efficient than the bit-serialstarting point. Algorithm 15 captures the result, which one can think of as a form of digit-serial multiplier:each iteration of the loop processes a digit of a recoding formed from multiple bits in y.

1. In Algorithm 9, |y| = n dictates the number of loop iterations; Algorithm 11 improves this to nd for

appropriate choices of d. In comparison, Algorithm 15 requires fewer, i.e.,

|y′| '|y|2'

n2,

iterations. As with the digit-serial strategy, to allow this to work, we need to compute 22· r in line #3

(rather than 2 ·r), but this can be realised by left-shifting r by a distance of d, i.e., computing 22·r ≡ r� 2.

2. In Algorithm 9 we had yi ∈ {0, 1} and in Algorithm 9 we had yi...i−d+1 ∈ {0, 1, . . . 2d− 1}. In Algorithm 15,

however, we have y′i ∈ {0,±1,±2}. This basically means we have to test each non-zero y′i against more

git # ba293a0e @ 2019-11-14 178



Input: An unsigned, n-bit, base-2 integer x, and a base-4 Booth recoding y′ of some integer yOutput: An unsigned, 2n-bit, base-2 integer r = y · x

1 r← 02 for i = |y′| − 1 downto 0 step −1 do3 r← 22

· r

4 r←

r − 2 · x if y′i = −2r − 1 · x if y′i = −1r + 1 · x if y′i = +1r + 2 · x if y′i = +2

5 end6 return rAlgorithm 15: An algorithm for multiplication of base-2 integers using an iterative, left-to-right, digit-serialstrategy with base-4 Booth recoding.


1 r← 02 for i = 0 upto n − 1 step +d do3 if yd−1...0 , 0 then4 r← r + yd−1...0 · x5 end6 x← x · 2d

7 y← y/2d

8 if y = 0 then9 return r

10 end11 end12 return rAlgorithm 16: An algorithm for multiplication of base-2 integers using an iterative, left-to-right, digit-serialstrategy with early termination.

cases than before: line #4 captures them in one rather than use a more lengthy set of conditions. In short,dealing with y′i = −1 vs. y′i = +1 is easy: we simply subtract x from r rather than adding x to r. In thesame way, dealing with y′i = −2 and y′i = +2 mean subtracting (resp. adding) 2 · x from (resp. to) r; since2 · x can be computed via a shift of x (vs. an extra addition), there is no real overhead vs. subtracting(resp. adding) x itself.

Example 4.37. Consider y = 14(10) 7→ 1110(2): we first use Algorithm 14 as follows

i yi+1 yi yi−1 t2 t1 t0 t y′

∅

0 1 0 ⊥ 1 0 0 100(2) 〈−2〉2 1 1 1 1 1 1 111(2) 〈−2, 0〉4 ⊥ ⊥ 1 0 0 1 001(2) 〈−2, 0,+1〉

〈−2, 0,+1〉

to recode y into y′, then use Algorithm 15 as follows

i y′i r r′

02 +1 0 x r′ ← 22

· r + 1 · x1 0 x 4 · x r′ ← 22

· r0 −2 4 · x 14 · x r′ ← 22

· r − 2 · x14 · x

to produce the result expected in three rather than six steps.

4.5.3.3 Improvements via early termination: avoiding unnecessary iterations

Example 4.38. Consider digit-serial multiplication using d = 2 where y = 30(10) 7→ 00011110(2): as a first stepwe recode y into y′ = 〈10(2), 11(2), 01(2), 00(2)〉 by splitting the former sequence of bits into 2-bit sub-sequences.

git # ba293a0e @ 2019-11-14 179



We then process y′ either left-to-right or right-to-left, as reflected by traces of Algorithm 11

i r yi...i−d+1 r′

07 0 00(2) = 0(10) 0 r′ ← 22

· r5 2 · x 01(2) = 1(10) 1 · x r′ ← 22

· r + 1 · x3 1 · x 11(2) = 3(10) 7 · x r′ ← 22

· r + 3 · x1 7 · x 10(2) = 2(10) 30 · x r′ ← 22

· r + 2 · x30 · x

Algorithm 12i r x yi+d−1...i r′ x′

0 x0 0 x 10(2) = 2(10) 2 · x 22

· x r′ ← r + 2 · x, x′ ← 22· x

2 2 · x 22· x 11(2) = 3(10) 14 · x 24

· x r′ ← r + 3 · x, x′ ← 22· x

4 14 · x 24· x 01(2) = 1(10) 30 · x 26

· x r′ ← r + 1 · x, x′ ← 22· x

6 30 · x 26· x 00(2) = 0(10) 30 · x 28

· x r′ ← r + 0 · x, x′ ← 22· x

30 · x

respectively; both produce r = 30 · x as expected.

Notice that the 2 MSBs of y are both 0, i.e., y7 = y6 = 0 st. y7...6 = 0 and hence y′3 = 00(2). This fact canbe harnessed to optimise both algorithms. Algorithm 11 processes y′ left-to-right so y′3 is the first digit: theiteration where i = 7 extracts the digit l

yi...i−d+1 = y7...6 = y′3 = 0.

As such, we know thatr← r + yi...i−d+1 · x

leaves r unchanged: the condition yi...i−d+1 , 0 allows us to skip said update for i = 7, and thus be more efficient.Algorithm 12 processes y′ right-to-left so y′3 is the last digit: the iteration where i = 6 extracts the digit l

yi+d−1...i = y7...6 = y′3 = 0.

The same argument applies here, in the sense we can skip the associated update of r. In fact, we can be moreaggressive by skipping multiple such updates. If in some i-th iteration the digits processed by all j-th iterationsfor j > i are zero, then we may as well stop: none of them will update r, meaning the algorithm can returnit early as is (rather than perform extra iterations). This strategy is normally termed early termination; usingAlgorithm 12 as a starting point, it is realised by Algorithm 16.

Example 4.39. Consider the following trace of Algorithm 16 for y = 30(10) 7→ 00011110(2):

i r x y yd−1...0 r′ x′ y′

0 x 00011110(2)0 0 x 00011110(2) 10(2) = 2(10) 2 · x 22

· x 00000111(2) r′ ← r + 2 · x, x′ ← x · 22, y′ ← y/22

2 2 · x 22· x 00000111(2) 11(2) = 3(10) 14 · x 24

· x 00000001(2) r′ ← r + 3 · x, x′ ← x · 22, y′ ← y/22

4 14 · x 24· x 00000001(2) 01(2) = 1(10) 30 · x 26

· x 00000000(2) r′ ← r + 1 · x, x′ ← x · 22, y′ ← y/22

30 · x

Once r, x, and y have been updated within the iteration for i = 4, we find y′ = 0: this triggers the conditionalstatement, meaning r is returned early after three (via line #9) vs. four (via line #12) iterations: the correct resultr = 30 · x is produced as expected.

Although this should seem attractive, some trade-offs and caveats apply. First, the loop body, spanning lines#3 to #10 of Algorithm 16, is obviously more complex than the equivalent in Algorithm 12. Specifically, r, x,and y all need to be updated, and the FSM controlling iteration needs to test y and conditionally return r: thismakes it more complex as well. Second, this added complexity, which typically means an increased area, onlypotentially (rather than definitively) reduces the latency of multiplication. Put simply, the number of iterationsnow depends on the value of y (i.e., whether y′ contains more-significant digits that are 0 st. the algorithm canskip them), which we cannot know a priori: if this property does not hold, the algorithm will be no better thanstandard digit-serial multiplication.

git # ba293a0e @ 2019-11-14 180



4.5.4 Combinatorial designs

4.5.4.1 A vanilla tree multiplier

In Section 4.5.2, we made use of Horner’s Rule as our starting point; the iterative nature by which the associatedexpression

y · x = y0 · x + 2 · (y1 · x + 2 · (· · · yn−1 · x + 2 · (0)))

was evaluated translated naturally into an iterative algorithm. For a combinatorial alternative, however, weadopt a different starting point and (re)consider

y · x =

i<n∑i=0

yi · x · bi.

Developing a design directly from this expression is surprisingly easy: we just need to generate each term,which represents a partial product, then add them up. Figure 4.13 is a (combinatorial) tree multiplier whosedesign stems from this idea. It can be viewed, from top-to-bottom, as three layers:

1. The top layer is comprised of n groups of n AND gates: the i-th group computes x j ∧ yi for 0 ≤ j < n,meaning it outputs either 0 if yi = 0 or x if yi = 1. You can think of the AND gates as performing all n2

possible (1 × 1)-bit multiplications of some x j and yi, or a less general form of multiplexer that selectsbetween 0 and x based on yi.

2. The middle layer is comprised of n left-shift components. The i-th component shifts by a fixed distanceof i bits, meaning the output is either 0 if yi = 0, or x · 2i if yi = 1. Put another way, the output of the i-thcomponent in the middle layer is

yi · x · 2i

i.e., some i-th partial product in the long-hand description of y · x.

3. The bottom layer is a balanced, binary tree of adder components: these accumulate the partial productsresulting from the middle layer, meaning the output is

r =

n−1∑i=0

yi · x · 2i = y · x

as required.

In Section 4.5.2, both iterative multiplier designs we produced made a trade-off: they required O(n) time andO(1) space, thus representing high(er) latency but low(er) area. Here we have more or less the exact oppositetrade-off. The design is combinatorial, so takes O(1) time where the constant involved basically represents thecritical path. However, it clearly takes a lot more space; is is difficult to state formally how much, but the factthe design includes a tree of several adders vs. one adder hints this could be significant.

Beyond this comparison, it is important to consider various subtleties that emerge if the block diagram isimplemented as a concrete circuit. First, notice that the critical path looks like O(log2(n)) gate delays, becausethis describes the depth of the (balanced) tree as used to form the bottom layer. However, because each node insaid tree is itself an adder, the actual critical path is more like O(n log2(n)). Even this turns out to be optimistic:notice, second, that those adders lower-down in the tree (i.e., closer to the root) must be larger (and hencemore complex) than those higher-up. This is simply because the intermediate results get larger; the first leveladds two n-bit partial products to produce a (n + 1)-bit intermediate result, whereas the last level adds two(2n − 1)-bit intermediate values to produce the 2n-bit result.

4.5.4.2 Wallace and Dadda tree multipliers

An obvious next question is whether and how we can improve the vanilla tree multiplier design. There arevarious possibilities, but one would be to focus on reducing the critical path and latency. One idea is toreimplement the tree using carry-save rather than ripply-carry adders; this is a natural replacement given theformer was specifically introduced for use in contexts where we accumulate multiple inputs (or partial products).Another idea is to examine Figure 4.13 in detail, and identify features that can be optimised at a low(er)-level.Candidates might include where we can use half- vs. full-adder cells, which are, of course, less complex. TheWallace multiplier [14], and Dadda multiplier [5] designs employ a combination of both approaches, with thegoal of reducing the critical path: they still represent combinatorial designs, but aim to have a smaller criticalpath and hence lower latency.

As with the vanilla tree multiplier, Wallace and Dadda multipliers are comprised of a number of layers.More specifically, you should think both as comprising

git # ba293a0e @ 2019-11-14 181



1. In the initial layer, multiply (i.e., AND) together each x j with each yi to produce a total of n2

intermediate wires. Recall each wire (essentially the result of a 1-bit digit-multiplication) has a weightstemming from the digits in x and y, e.g., x0 · y0 has weight 0, x1 · y2 has weight 3 and so on.

2. Reduce the number of intermediate wires using layers composed of full and half adders:

• Combine any three wires with same weight using a full-adder; the result in the next layer is onewire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).

• Combine any two wires with same weight using a half adder; the result in the next layer is onewire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).

• If there is only one wire with a given weight, just pass it through to the next layer.

3. In the final layer, after enough reduction layers, there will be at most two wires of any given weight:merge the wires to form two 2n-bit values (padding as required), then add them together with an addercomponent.

Algorithm 17: An algorithm to generate a Wallace tree multiplier design.

1. In the initial layer, multiply (i.e., AND) together each x j with each yi to produce a total of n2

intermediate wires. Recall each wire (essentially the result of a 1-bit digit-multiplication) has a weightstemming from the digits in x and y, e.g., x0 · y0 has weight 0, x1 · y2 has weight 3 and so on.

2. Reduce the number of intermediate wires using layers composed of full and half adders:

• Combine any three wires with same weight using a full-adder; the result in the next layer is onewire of the same weight (i.e. the sum) and one wire a higher weight (i.e. the carry).

• If there are two wires with the same weight left, let w be that weight then:

– If w ≡ 2 mod 3 then combine the wires using a half-adder; the result in the next layer is onewire of the same weight (i.e., the sum) and one wire a higher weight (i.e., the carry).

– Otherwise, just pass them through to the next layer.

• If there is only one wire with a given weight, just pass it through to the next layer.

3. In the final layer, after enough reduction layers, there will be at most two wires of any given weight:merge the wires to form two 2n-bit values (padding as required), then add them together with an addercomponent.

Algorithm 18: An algorithm to generate a Dadda tree multiplier design.

git # ba293a0e @ 2019-11-14 182



Layer 1 Layer 2

WeightInputWires Operation

OutputWires

InputWires Operation

OutputWires

0 1 PT 1 1 PT 11 2 HA 1 1 PT 12 3 FA 2 2 HA 13 4 FA 3 3 FA 24 3 FA 2 2 HA 25 2 HA 2 2 HA 26 1 PT 2 2 HA 27 0 0 0 1

(a) Using a Wallace-based multiplier.

Layer 1 Layer 2

WeightInputWires Operation

OutputWires

InputWires Operation

OutputWires

0 1 PT 1 1 PT 11 2 PT 2 2 PT 22 3 FA 1 1 PT 13 4 FA 3 3 FA 14 3 FA 2 2 HA 25 2 PT 3 3 FA 26 1 PT 1 1 PT 27 0 0 0 0

(b) Using a Dadda-based multiplier.

Figure 4.12: A tabular description of stages in example (4 × 4)-bit Wallace and Dadda tree multiplier designs.

1. an initial layer,

2. O(log n) layers of reduction, and

3. a final layer

where the difference is, basically, how those layers are designed. The initial layer generates the partial products,then the second and third layers accumulate them; this is somewhat similar to the tree multiplier. However,rather than perform the latter using a tree of general-purpose adders, however, a carefully designed, special-purpose tree is employed. Producing a design for Wallace and Dadda multipliers follows a different processthan we have used before. Rather than develop an algorithm then translate it into design, the multipliers aregenerated directly by an algorithm. Given a value of n as input, Algorithm 17 and Algorithm 18 generateWallace and Dadda multipliers respectively; both are described in three steps that mirror the layers above.

Example 4.40. Consider n = 4, where we want to produce a Wallace multiplier design that computes theproduct r = y · x for 4-bit x and y; to do so, we use Algorithm 17.

An initial layer multiplies x j with yi for 0 ≤ i, j < 4, st. we produce

• one weight-0 wire, i.e., x0 · y0,

• two weight-1 wires, i.e., x0 · y1 and x1 · y0,

• three weight-2 wires, i.e., x0 · y2, x2 · y0, and x1 · y1,

• four weight-3 wires, i.e., x0 · y3, x3 · y0, x1 · y2, and x2 · y1,

• three weight-4 wires, i.e., x1 · y3, x3 · y1, and x2 · y2,

• two weight-5 wires, i.e., x2 · y3, and x3 · y2,

• one weight-6 wire, i.e., x3 · y3, and, finally,

• zero weight-7 wires.

Figure 4.12a details the subsequent two reduction layers. For example, in the first reduction layer

git # ba293a0e @ 2019-11-14 183



y 0x 0 y 0x 1 y 0x n−

1

y 1x 0 y 1x 1 y 1x n−

1

y n−

1x 0 y n−

1x 1 y n−

1x n−

1

� 0 � 1 � n − 1

+ +

+

r

Figure 4.13: An (n × n)-bit tree multiplier design, described using a circuit diagram.

x0y0

x1y0

x0y1

x1y1

x2y0

x0y2

x2y1

x1y2

x3y0

x0y3

x2y2

x3y1

x1y3

x3y2

x2y3

x3y3

co

sxy

co

s

cixy

co

s

cixy

co

s

cixy

co

sxy

co

sxy

co

s

cixy

co

sxy

co

sxy

co

sxy

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

co

s

cixy

00

0

0

00

r0

r1

r2

r3

r4

r5

r6

r7

Figure 4.14: A example, (4 × 4)-bit Wallace-based tree multiplier design, described using a circuit diagram.

git # ba293a0e @ 2019-11-14 184



• there is one input wire with weight-0, so we use a pass-through operation (denoted PT) which results inone weight-0 wire as output,

• there are two input wires with weight-1, so we use a half-adder operation (denoted HA) which results inone weight-2 wire and one weight-4 wire as output, and

• there are three input wires with weight-2, so we use a full-adder operation (denoted FA) which results inone weight-4 wire and one weight-8 wire as output.

The resulting design, including the final layer, is illustrated by Figure 4.14.

Notice that n is the only input to the algorithm(s), so although the example is specific to n = 4 the generalstructure will remain similar. In fact, the example highlights some important general points:

• we have 1 initial layer, log2(n) = log2(4) = 2 reduction layers, and 1 final layer,

• the reduction layers yield at most two wires with a given weight; we form then sum two 2n-bit values(e.g., using a ripple-carry adder) to produce the result, and, crucially,

• there are no intra-layer carries in the reduction layer(s): the only carry chains that appear are inter-layer,during reduction, or in the final layer.

Phrased as such, it should be clear why the concept of carry-save addition is relevant: the reduction and finallayers employ essentially the same concept, by compressing many inputs into few(er) outputs until the pointthey can be summed to produce the result. If you look again at Algorithm 17 and Algorithm 18, the differencebetween the two is within the second step: in the Dadda design, the number of wires of a given weight remains,by-design, close to a multiple of three, which facilitate use of 3 : 2 compressors as a means of reduction. Ashinted at, a crucial feature of both designs is that each adder cell within the reduction layer(s) operates inparallel so has an O(1) critical path; this suggests the overall critical path will be O(1 + log2(n) + n) = O(n) gatedelays in both cases. Comparing Figure 4.12a with Figure 4.12b, we see the main difference is wrt. space nottime. More specifically, for n = 4 the Wallace multiplier uses 6 half-adders and 4 full-adders, and the Daddamultiplier would use 1 half-adder and 5 full adders; this is a trend that holds for larger n.

4.5.5 Some multiplier case-studies

One of the challenges outlined at the start of this Section related to the large design space wrt. multiplication;although we have only covered a sub-set of that design space, hopefully the challenge is already clear! As aresult of the possible trade-offs, it is hard to identify a single “correct” design. This means different micro-processors, for example, legitimately opt for different designs so as to match their design constraints. In thisSection, we attempt to survey such choices in a set of real micro-processors: the survey is by no meansexhaustive, but will, none the less, offer better understanding of the associated constraints and designs used toaddress them.

Example 4.41. The ARM Cortex-M0 [4] processor supports multiplication via the muls instruction: it yieldsa truncated result, st. r = x · y is the least-significant 32 bits of the actual product given 32-bit x and y. Theinstruction can be supported in two ways by a given implementation: the design can be a combinatorial,requiring 1 cycle, or a iterative, requiring 32 cycles. The Cortex-M0 is typically deployed in an embeddedcontext, where area and power consumption are paramount. The latter, iterative multiplier design maytherefore be attractive choice: assuming increased latency (or time) can be tolerated, it satisfies the goal ofminimising area (or space) associated with this component of the associated ALU.

Example 4.42. The ARM7TDMI [1] processor houses a (8 · 32)-bit combinatorial multiplier; it supports digit-serial multiplication (for the fixed case where d = 8) with early termination, as invoked by a range of instructionsincluding umull. One must assume ARM selected this design based on careful analysis. For instance, it seemsfair to claim that

• using a digit-serial multiplier makes an good trade-off between time and space (due to the hybrid,combinatorial and iterative nature), which is particularly important for embedded processors, plus

• although early termination adds some overhead, it often produces a reduction in latency because of thetypes of y used: quite a significant proportion will relate to address arithmetic, where y is (relatively)small (e.g., as used to compute offsets from a fixed base address).

Example 4.43. Thomas and Balatoni [13] describe a (12×12)-bit multiplier design, intended for use in the PDP-8computer: the design is based on an iterative strategy that makes use of Booth recoding.

git # ba293a0e @ 2019-11-14 185



Example 4.44. The MIPS R4000 [6] processor takes a somewhat similar, somewhat dissimilar approach to thatdescribed here: it houses a Booth-based multiplier using exactly the recoding strategy described, but within a(64 · 64)-bit combinatorial rather than an iterative design.

Mirapuri et al. [9, Page 13] detail said design, which splits the multiplication into Booth recoding, multi-plicand selection, partial product generation and product accumulation steps. A series of carry-save addersaccumulate the partial products, which produces a result r that is stored into two 64-bit registers called hi andlo (meaning more- and less-significant 64-bit halves).

Example 4.45. An iterative, bit-serial multiplier requires n steps to compute the product; with no furtheroptimisation, this constraint is inherent in the design. Although the data-path required is minimal, the needfor iterative use of that data-path demands a control-path (i.e., an FSM) of some sort. When placed in amicro-processor, a resulting question is why we bother having a dedicated multiplier at all: why not just havean instruction that performs one step of multiplication, and let the program make iterative use of it?

The MIPS-X processor [3] provides an concrete example of this approach: using a slightly rephrased notationto match what has been described here, [3, Section 4.4.4] basically defines

mstep GPR[x],GPR[y],GPR[r] 7→

if GPR[y]31 = 1 thenGPR[r]← GPR[r] + GPR[x]GPR[y]← GPR[y]� 1

elseGPR[r]← GPR[r]GPR[y]← GPR[y]� 1

end

i.e., a multiply-step instruction essentially matching lines #3 to #6 in Algorithm 9. As such, the idea is implementa loop that iterates over mstep as described in [3, Appendix IV]. The reason y is left-shifted, is so that one cantest GPR[y]31 rather than GPR[y]i; the former is updated by the shift, st. in each i-th iteration it does containGPR[y]i as required (give iteration is left-to-right, so starts with i = 31 and ends with i = 0).

There are advantages and disadvantages of either approach, i.e., use of a dedicated multiplier vs. an mstepinstruction, with some examples including:

• The mstep instruction removes the need for an FSM to control the dedicated multiplier, essentiallyharnessing the existing processor as a control-path. As such, the overhead to support multiplicationwithin the processor is further reduced.

• On one hand, the 1-step nature of mstep suggests single-cycle execution; in contrast, the n-step nature ofthe dedicated multiplier suggests multi-cycle execution. However, this is phrased in terms of processorcycles: it could be reasonable for a dedicated multiplier and processor to make use of different clockfrequencies. Iff. the former is higher than the latter, n multiplier cycles can be less than n processor cyclesand so execution of n mstep instructions.

• By including the mstep instruction, the MIPS-X ISA exposes details of the implementation and so fixeshow a given program should compute r = y · x. If, in contrast, it had a mul instruction with obvioussemantics, then any given implementation of the processor could opt for an iterative or combinatorialmultiplier while maintaining compatibility.

• At least for simple processors, one instruction is executed at a time: for n-bit x and y, this means theprocessor will be kept busy for n cycles while executing n mstep instruction. With a dedicated multiplier,however, one could at least imagine the processor doing something else in the n cycles while the multiplieris kept busy.

4.6 Components for comparison

As with the 1-bit building blocks for addition (namely the half- and full-adder), we already covered designsfor 1-bit equality and less than comparators in Chapter 2; these require a handful of logic gates to implement.Again as with addition, the challenge is essentially how we extend these somehow. The idea of this Section isto tackle this step-by-step: first we consider designs for comparison of unsigned yet larger, n-bit x and y, thenwe extend these designs to cope with signed x and y, and finally consider how to support a suite of comparisonoperations beyond just equality and less than.

git # ba293a0e @ 2019-11-14 186



An aside: comparison using arithmetic.

It is tempting to avoid designing dedicated circuits for general-purpose comparison, by instead using arithmeticto make the task easier (or more special-purpose at least). Glossing over the issue of signed’ness, we know forexample that

• x = y is the same as x − y = 0, and

• x < y is the same as x − y < 0

so we could re-purpose a circuit for subtraction to perform both tasks: we just compute t = x− y and then claim

• x = y iff. each ti = 0, and

• x < y iff. t < 0, or rather tn−1 = 1 given we are using two’s-complement.

There idea here is that the general-purpose comparison of x and y is translated into a special-purpose compar-ison of t and 0.

This slight of hand seems attractive, but turns out to have some arguable disadvantages. Primarily, weneed to cope with signed x and y, and hence deal with cases where x − y overflows for example. In addition,one could argue a dedicated circuit for comparison can be more efficient than subtraction: even if we reuse onecircuit for subtraction for both operations, cases might occur when this is not possible (e.g., in a micro-processor,where often we need to do both at the same time).

=

=

=

=

x0y0

x1y1

xn−1yn−1

r

(a) An AND plus equality comparator based design.

,

,

,

,

x0y0

x1y1

xn−1yn−1

r

(b) An OR plus non-equality comparator based design.

Figure 4.15: An n-bit, unsigned equality comparison described using a circuit diagram.

4.6.1 Unsigned comparison

4.6.1.1 Unsigned equality

Example 4.46. Consider two cases of comparison between unsigned x and y expressed in base-10

x = 123(10) x = 121(10)y = 123(10) y = 123(10)

where, obviously, x = y in the left-hand case, and x , y in the right-hand case.

More formally, x and y are equal iff. each digit of x is equal to the corresponding digit of y, so xi = yi for0 ≤ i < n. As such, in the left-hand case x = y because xi = yi for 0 ≤ i < 3; in the right-hand case x , y becausexi , yi for i = 0. This fact is true in any base, and in base-2 we have a component that can perform the 1-bitcomparison xi = yi: to cope with larger x and y, we just combine instances of it together.

Read out loud, “if x0 equals y0 and x1 equals y1 and ... xn−1 equals yn−1 then x equals y, otherwise x doesnot equal y” highlights the basic strategy: each i-th of n instances of a 1-bit equality comparator will comparexi and yi, then we AND together the results. However, we need to take care wrt. then gate count: by lookingat the truth table

xi yi xi , yi xi = yi0 0 0 10 1 1 01 0 1 01 1 0 1

git # ba293a0e @ 2019-11-14 187



< = < = < = <

x n−

1y n−

1

x n−

1y n−

1

x 1 y 1 x 1 y 1 x 0 y 0

r

Figure 4.16: An n-bit, unsigned less than comparison described using a circuit diagram.

Input: Two unsigned, n-digit, base-b integers x and yOutput: If x = y then true, otherwise false

1 for i = n − 1 downto 0 step −1 do2 if xi , yi then3 return false4 end5 end6 return true

Algorithm 19: An algorithm for equality comparison between base-b integers.

it should be clear that the former (inequality) is simply an XOR gate, whereas the latter (equality) needs anXOR and a NOT gate to implement directly. So we could either

1. use a dedicated XNOR gate whose cost is roughly the same as XOR given that

x ⊕ y ≡ (x ∧ ¬y) ∨ (¬x ∧ y)

andx ⊕ y ≡ (¬x ∧ ¬y) ∨ (x ∧ y),

or

2. compute x =u y ≡ ¬(x ,u y) instead, i.e., test whether x is not equal to y, then invert the result.

Both designs are illustrated in Figure 4.15: it is important to see that both compute the same result, but use adifferent internal design motivated loosely by the standard cell library available (i.e., what gate types we canuse and their relative efficiency in time and space).

4.6.1.2 Unsigned less than

Example 4.47. Consider three cases of comparison between unsigned x and y expressed in base-10

x = 121(10) x = 323(10) x = 123(10)y = 123(10) y = 123(10) y = 123(10)

where, obviously, x < y in the left-hand case, x > y in the middle case, and x = y in the right-hand case.

Although the examples offer intuitively obvious results, determining why, in a formal sense, x is less than y (ornot) is more involved than the case of equality. A somewhat algorithmic strategy is as follows: work from themost-significant, left-most digits (i.e., xn−1 and yn−1) towards the least-significant, right-most digits (i.e., x0 andy0) and at each i-th step, apply a set of rules that say

1. if xi < yi then x < y,

2. if xi > yi then x > y, but

3. if xi = yi then we need to check the rest of x and y, i.e., move on to look at xi−1 and yi−1.

This can be used to explain the example:

• in the left-hand case we find xi = yi for i = 2 and i = 1 but x0 = 1 < 3 = y0 and conclude x < y,

git # ba293a0e @ 2019-11-14 188



Input: Two unsigned, n-digit, base-b integers x and yOutput: If x < y then true, otherwise false

1 for i = n − 1 downto 0 step −1 do2 if xi < yi then3 return true4 end5 else if xi > yi then6 return false7 end8 end9 return false

Algorithm 20: An algorithm for less than comparison between base-b integers.

• in the middle case, when i = 2, we find x2 = 3 > 1 = y2 and conclude x > y, while

• in the left-hand case, we find xi = yi for all i and conclude x = y.

Figure 20 captures this more formally: we described, a loop iterates from the most- to least-significant digits ofx and y, and at each i-th step applies the rules above. That is, if xi < yi then x < y and if xi > yi then x ≮ y; ifxi = yi then the loop continues iterating, dealing with the next (i− 1)-th step until it has processed all the digits.Notice that if the loop actually concludes, then we know that xi = yi for all i and so x ≮ y.

Of course when x and y are written in base-2, our task is easier still because each xi, yi ∈ {0, 1}; this meanswe can use our existing 1-bit comparators. As such, translating the algorithm into a concrete design meansreformulating more directly it wrt. said comparators. The idea is to recursively compute

t0 = (x0 < y0)ti = (xi < yi) ∨ ((xi = yi) ∧ ti−1)

which matches our less formal rules above: at each i-th step, “x is less than y if xi < yi or xi = yi and comparingthe rest of x is less than the rest of y”. Each step simply requires one of each comparator plus an extra AND andan extra OR gate; if we have n-bit x and y, we have n such steps as illustrated in Figure 4.16.

Example 4.48. Consider less than comparison for n = 4 bit x and y, st. the unwound recursion

t0 = (x0 < y0)t1 = (x1 < y1) ∨ ((x1 = y1) ∧ t0)t2 = (x2 < y2) ∨ ((x2 = y2) ∧ t1)t3 = (x3 < y3) ∨ ((x3 = y3) ∧ t2)

yields a result t3. For x = 5(10) 7→ 0101(2) and y = 7(10) 7→ 0111(2) we can see that

x0 < y0 = false x0 = y0 = truex1 < y1 = true x1 = y1 = falsex2 < y2 = false x2 = y2 = truex3 < y3 = false x3 = y3 = true

sot0 = (x0 < y0)

= false

t1 = (x1 < y1) ∨ ((x1 = y1) ∧ t0)= true ∨ false= true

t2 = (x2 < y2) ∨ ((x2 = y2) ∧ t1)= false ∨ true= true

t3 = (x3 < y3) ∨ ((x3 = y3) ∧ t2)= false ∨ true= true

and, since t3 = true, conclude that x <u y as expected.

git # ba293a0e @ 2019-11-14 189



4.6.2 Signed comparison

Signed and unsigned equality comparison are equivalent, meaning we can use the unsigned comparison abovein both cases. To see why, note the unsigned comparison we formulated tests whether each xi is the same as yifor 0 ≤ i < n. For signed x and y we do exactly the same thing: if xi differs from yi, then the value x representswill differ from the value y represents irrespective of whether the representation is signed or unsigned.

However, signed less than comparison is not as simple. To produce the behaviour required, we use unsignedless than as a sub-component within a design for signed less than: for x <s y the rules

x +ve y -ve 7→ x ≮s yx -ve y +ve 7→ x <s yx +ve y +ve 7→ x <s y if abs(x) <u abs(y)x -ve y -ve 7→ x <s y if abs(y) <u abs(x)

produce the result we want. The first two cases are obvious: if x is positive and y is negative it cannot ever betrue that x < y, while if x is negative and y is positive it is always true that x < y. The other two cases need moreexplanation, but basically the idea is to consider the magnitudes of x and y only by computing then comparingabs(x) and abs(y), the absolute values of x and y. Note that in the case where x and y are both negative theorder of comparison is flipped. This is because a larger negative x will be less than a smaller negative y (andvice versa); when considering their absolute values, the comparison is therefore reversed.

Example 4.49. set n = 4:

1. if x = +4(10) 7→ 〈0, 0, 1, 0〉(2) and y = −6(10) 7→ 〈0, 1, 0, 1〉(2), then x ≮s y since x is +ve and y is -ve,

2. if x = +6(10) 7→ 〈0, 1, 1, 0〉(2) and y = −4(10) 7→ 〈0, 0, 1, 1〉(2), then x ≮s y since x is +ve and y is -ve,

3. if x = −4(10) 7→ 〈0, 0, 1, 1〉(2) and y = +6(10) 7→ 〈0, 1, 1, 0〉(2), then x <s y since x is -ve and y is +ve, and

4. if x = −6(10) 7→ 〈0, 1, 0, 1〉(2) and y = +4(10) 7→ 〈0, 0, 1, 0〉(2), then x ≮s y since x is -ve and y is +ve.

Example 4.50. 1. if x = +4(10) 7→ 〈0, 0, 1, 0〉(2) and y = +6(10) 7→ 〈0, 1, 1, 0〉(2), then x <s y since x is +ve and y is+ve and abs(x) = 4 <u 6 = abs(y),

2. if x = +6(10) 7→ 〈0, 1, 1, 0〉(2) and y = +4(10) 7→ 〈0, 0, 1, 0〉(2), then x ≮s y since x is +ve and y is +ve andabs(x) = 6 ≮u 4 = abs(y),

3. if x = −4(10) 7→ 〈0, 0, 1, 1〉(2) and y = −6(10) 7→ 〈0, 1, 0, 1〉(2), then x ≮s y since x is -ve and y is -ve andabs(y) = 6 ≮u 4 = abs(x), and

4. if x = −6(10) 7→ 〈0, 1, 0, 1〉(2) and y = −4(10) 7→ 〈0, 0, 1, 1〉(2), then x <s y since x is -ve and y is -ve andabs(y) = 4 <u 6 = abs(x).

Since x and y are representing using two’s-complement, we can make a slight improvement by rewrite therules more simply as

x +ve y -ve 7→ x ≮s yx -ve y +ve 7→ x <s yx +ve y +ve 7→ x <s y if chop(x) <u chop(y)x -ve y -ve 7→ x <s y if chop(x) <u chop(y)

where chop(x) = xn−2...0, meaning chop(x) is x with the MSB (which determines the sign of x) removed; thisis valid because a small negative integer becomes a large positive integer (and vice versa) when the MSB isremoved. Doing so is much simpler than computing abs(x), because we just truncate or ignore the MSBs.

Example 4.51. 1. if x = +4(10) 7→ 〈0, 0, 1, 0〉(2), y = +6(10) 7→ 〈0, 1, 1, 0〉(2), x <s y since x is +ve and y is +ve andchop(x) = 4 <u 6 = chop(y),

2. if x = +6(10) 7→ 〈0, 1, 1, 0〉(2), y = +4(10) 7→ 〈0, 0, 1, 0〉(2), x ≮s y since x is +ve and y is +ve and chop(x) = 6 ≮u4 = chop(y),

3. if x = −4(10) 7→ 〈0, 0, 1, 1〉(2), y = −6(10) 7→ 〈0, 1, 0, 1〉(2), x ≮s y since x is -ve and y is -ve and chop(x) = 4 ≮u2 = chop(y), and

4. if x = −6(10) 7→ 〈0, 1, 0, 1〉(2), y = −4(10) 7→ 〈0, 0, 1, 1〉(2), x <s y since x is -ve and y is -ve and chop(x) = 2 <u4 = chop(y).

git # ba293a0e @ 2019-11-14 190



The question is, finally, how do we implement these rules as a design? As in the case of overflow detection, weuse the fact that testing the sign of x or y is trivial. As a result, we can write

xs<y =

false if ¬xn−1 ∧ yn−1true if xn−1 ∧ ¬yn−1chop(x)u< chop(y) otherwise

which can be realised by a multiplexer: producing the LHS just amounts to selecting an option from the RHSusing xn−1 and yn−1, i.e., the sign of x and y, as control signals.

4.6.3 Beyond equality and less than

Once we have components for equality and less than comparison, whether they are signed or unsigned, allother comparisons can be derived using a set of identities. For example, one can easily verify that

x , y ≡ ¬(x = y)x ≤ y ≡ (x < y) ∨ (x = y)x ≥ y ≡ ¬(x < y)x > y ≡ ¬(x < y) ∧ ¬(x = y)

meaning the result of all six comparisons between x and y on the LHS can easily be realised using just

• one component for x = y,

• one component for x < y, and

• four (two NOT, and OR and an AND) extra logic gates

rather than instantiating additional, dedicated components.

References

[1] ARM7TDMI Technical Reference Manual. Tech. rep. DDI-0210C. ARM Ltd., 2004. url: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0210c/index.html (see p. 185).

[2] A.D. Booth. “A Signed Binary Multiplication Technique”. In: Quarterly Journal of Mechanics and AppliedMathematics 4.2 (1951), pp. 236–240 (see p. 175).

[3] P. Chow. MIPS-X Instruction Set And Programmer’s Manual. Tech. rep. CSL-86-289. Computer SystemsLaboratory, Stanford University, 1998 (see p. 186).

[4] Cortex-M0 Technical Reference Manual. Tech. rep. DDI-0432C. ARM Ltd., 2009. url: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0432c/index.html (see p. 185).

[5] L. Dadda. “Some Schemes for Parallel Multipliers”. In: Alta Frequenza 34 (1965), pp. 349–356 (see p. 181).

[6] J. Heinrich. MIPS R4000 Microprocessor User’s Manual. 2nd. 1994 (see p. 186).

[7] W.G. Horner. “A new method of solving numerical equations of all orders, by continuous approximation”.In: Philosophical Transactions (1819), pp. 308–335 (see p. 170).

[8] A. Karatsuba and Y. Ofman. “Multiplication of Many-Digital Numbers by Automatic Computers”. In:Physics-Doklady 7 (1963), pp. 595–596 (see p. 169).

[9] S. Mirapuri, M. Woodacre, and N. Vasseghi. “The MIPS R4000 processor”. In: IEEE Micro 12.2 (1992),pp. 10–22 (see p. 186).

[10] J. von Neumann. First Draft of a Report on the EDVAC. Tech. rep. 1945 (see p. 145).

[11] B. Parhami. Computer Arithmetic: Algorithms and Hardware Designs. 1st ed. Oxford University Press, 2000(see pp. 146, 164).

[12] A.S. Tanenbaum and T. Austin. Structured Computer Organisation. 6th ed. Prentice-Hall, 2012 (see p. 147).

[13] P.A.V. Thomas and N. Balatoni. “A hardware multiplier/divider for the PDP 8S computer”. In: BehaviorResearch Methods & Instrumentation 3.2 (1971), pp. 89–91 (see p. 185).

[14] C.S. Wallace. “A Suggestion for Fast Multipliers”. In: IEEE Transactions on Computers 13.1 (1964), pp. 14–17(see p. 181).

git # ba293a0e @ 2019-11-14 191


http://infocenter.arm.com/help/topic/com.arm.doc.ddi0210c/index.html





git # ba293a0e @ 2019-11-14 192



CHAPTER

5

HARDWARE DESIGN USING VERILOG

If I designed a computer with 200 chips, I tried to design it with 150. And then I would try to design it with100. I just tried to find every trick I could in life to design things real tiny.

– Wozniak

When reading the content in Chapter 2, at least two features may have been frustrating: first, the concepts andtechniques were presented in a fairly theoretical way meaning that, second, any examples were limited in their scope.This Chapter attempts to resolve such issues by allowing a more practical, accessible way to experiment with hardwaredesign. Specifically, it introduces Verilog, an industry standard Hardware Description Language (HDL). By simulating aVerilog description of some (potentially large) hardware component, one can for example quickly and definitively examinebehaviour for given inputs (perhaps identifying an error in the design).

The aim is introduce just enough of the Verilog language that a reader can understand and implement concepts fromChapter 2 and elsewhere; the strategy throughout is to contrast hardware design using Verilog with the more familiarsetting of software development using C. This limited remit contrasts, by design, with dedicated guides to Verilog whichcan (and do) fill books in their own right. In particular, we omit the study of concepts such as transistor-level design,auxiliary and related topics such as formal verification, and advanced topics such as the Programming Language Interface(PLI).

5.1 Introduction

5.1.1 The problem of design complexity

As recently as fifty years ago the components used to construct a given circuit was limited in number (in therange of 10 to 100) and in diversity (i.e., were selected from a relatively small set of choices). Operating atthis scale, it was not unrealistic for a single engineer to design an entire circuit on paper and then sit with asoldering iron to construct it by hand. Malone documents one extreme example of what could be achieved [2,Pages 148–152]. Development of a floppy disk controller for the Apple II was, in early 1977, viewed as a vitaltask so the computer could remain competitive in a blossoming market. Steve Wozniak, a brilliant technicalmind and co-founder of Apple, developed and improved on existing designs and within two weeks had aworking circuit. Not content with eclipsing the man-hour and financial effort required by other companiesto reach the same point, Wozniak redesigned and optimised the entire circuit over a twelve-hour period afterspotting a potential bug before it was sent into production!

As circuit complexity has increased over the years however, and lacking a Wozniak to deploy in everycompany, this model fails to scale; a modern computer processor, for example, will typically consist of manymillions of components (e.g., transistors). In common with the analogous task in the context of software,producing such circuits is difficult. The first issues to bite are related to design. For example, the sheer numberof inputs, outputs and states demands careful decomposition into a large number of smaller subsystems, theinterface between which must be carefully managed. Even then, coping with constraints such as propagationdelay becomes a major issue: the optimisation, layout and connection of components quickly becomes too

git # ba293a0e @ 2019-11-14 193



difficult to perform (without error) by hand. Next, physically constructing the circuit from the design presentsfurther challenges. For example, the size of components rules out doing this by hand: one would needan exceptionally tiny soldering iron to wire connections together! Finally, one is faced with the challengeof verifying the system as a whole functions correctly and satisfies behavioural requirements such as powerconsumption and heat dissipation; as circuit complexity increase, the impact of both issues are magnified.

Several key advances have helped tame the problems in this gloomy picture. Certainly improvements inimplementation and fabrication technologies that allow miniaturisation of components have been an enablingfactor, but the advent of design automation (i.e., software tools to help manage the complexity of hardwaredesign) has arguably been just as important.

5.1.2 Design automation as a solution

Verilog is a Hardware Description Language (HDL). A language like C is used to describe software programsthat execute on some hardware device; a language like Verilog can describe the hardware device itself. Usinga HDL, usually within some form of Electronic Design Automation (EDA) software suite, offers the engineersimilar benefits to a programmer using a high-level language like C over a low-level machine language.

The Verilog language was initially developed in 1984, at a company called Gateway Design Automation, as ahardware modelling tool. Some of the language design goals were simplicity and familiarity; as a result, Verilogresembles both C and Pascal with many similar constructs. Coupled with the provision of a familiar and flexibledevelopment environment, Verilog promotes many of the same advantages that software developers are usedto. That is, the language makes it easy to abstract away from implementation and concentrate on design; thelanguage makes design modularisation and reuse easier, while associated tools such as simulators make test andverification easier and more efficient. There are now several iterations of the standard language: for example,an initial version termed Verilog-95 was improved and resubmitted to the IEEE for standardisation as Verilog-2001 or IEEE 1364-2001. Standardisation, along with the production of software to produce hardware fromVerilog descriptions at a variety of abstraction levels, has made the language a popular choice for engineers.

Verilog models, so-called because they are somewhat abstract models of concrete circuits, can be describedat several different levels. Each level is more abstract than the last, leaving less work for the engineer andallowing them to be more expressive:

Switch Level At the lowest level, Verilog allows one to describe a circuit in terms of transistors. Although thisgives very fine-grained control over the circuit composition, developing large models is still a significantchallenge since the building blocks are so small.

Gate Level Above the transistor level, and clearly more attractive for actually getting anything useful done, isthe concept of gate-level description. At this level, the designer describes the functionality of the circuitin terms of logic gates and the connections between them; this is somewhat equivalent to the level ofdetail in Chapter 2.

Register Transfer Level (RTL) RTL allows the designer to largely abstract away the concept of logic gates asan implementation technology, and simply describe the flow of data around the circuit and the operationsperformed on it.

Behavioural Level The most abstract and hence most expressive form of Verilog is so-called behavioural-leveldesign. At this level, the designer uses high-level constructs, similar to those in programming languageslike C, to specify the required behaviour itself rather than how that behaviour is implemented.

Roughly speaking, production of physical hardware from a Verilog model is a three-phase process managedby a suite of software tools called the Verilog tool-chain:

Simulate During the first phase, the Verilog model is fed into a software tool which simulates behaviour. Onone hand, the simulation is highly accurate; it may track how individual gates and signals change astime progresses for example. On the other hand, it is typically idealised in the sense that some physicalconstraints (e.g., the impact of propagation delay) may be ignored.

Use of simulation typically accelerates the design process versus working “on paper” alone. For ex-ample, evaluating the model in given situations (e.g., for particular inputs) can be automated, and eachdevelopment cycle can therefore be relatively quick. As a result, the simulation phase is typically usedto ensure a level of functional correctness before the model is processed further: if the model simulatescorrectly, then one can start to translate it into concrete hardware with a greater confidence of success.

Synthesise The second phase of development is similar to the compilation step in which a C program istranslated into low-level machine language. The Verilog model is fed into synthesis software which

git # ba293a0e @ 2019-11-14 194



Design

Functional Verification

Synthesis

Place and Route

Behavioural Verification

Manufacture

Testing

Verilog

Verilog

Netlist

Netlist + Annotation

Netlist + Annotation

Hardware

Hardware

Figure 5.1: A waterfall-style hardware development cycle using Verilog.

converts the model into a list of hardware components which together implement the required behaviour;this is often called a netlist.

Optimisation steps may be applied automatically by the synthesiser to improve the time or space char-acteristics of the result. Such optimisations are usually conceptually simple (e.g., eliminating redundantparts of a circuit) but repetitive: ideal fodder for a computer, but not so for a human engineer.

Implement The synthesis phase produces a list of components and a description of how they are connectedtogether to form an implementation of the model. However, two further steps are required to translatethis into concrete hardware.

First, we need to select components from the implementation technology with which to implement thecomponents in the netlist. For example, on an FPGA it might be advantageous to implement an ANDgate one way whereas on an ASIC a different method might be appropriate. Second, we need to workout where the components should be placed and how the wiring between them should be organised orrouted. Long wires are undesirable since they increase propagation delay; the place and route processattempts to minimise this through intelligent organisation of the components. Again, both steps are wellsuited to solution by a computer rather than a human engineer.

In reality, these steps form part of a waterfall style development cycle (or similar) which iterates each stageto remove errors after some form of verification. The goal of the repeated backward-facing paths is to avoidgetting to the end with a defective result: the manufacture step in particular is very expensive, so fixing errorsearly by revisiting initial design and verification steps is vital.

5.2 Structural modelling

5.2.1 Comments

Like most other programming languages, Verilog allows comments in the source code description of a model.Comments have the same syntax as C and Java so that single-line comments

// comment

and multi-line comments

/*commentcomment...*/

are both possible. It goes without saying that since Verilog models can be equally as complicated and hard toread as a C program, comments are a vital way to convey meaning which might not be present in the sourcecode itself.

5.2.2 Wires

In Verilog, the term net is (roughly) used to describe a connection between components; we will cover thedefinition of components later but for now focus on a specific type of net. A wire is a specific type of net, and

git # ba293a0e @ 2019-11-14 195



An aside: describing Verilog wires using C variables as a rough analogy.

C Verilog

• The definition char umeans

1. there are 8 separate 1-bit elements in u,but

2. u is typically used as 1 single 8-bit objectrather than 8 separate 1-bit elements.

• The definition char v[ 32 ]means

1. there are 32 separate 8-bit elements inv, and

2. v is typically used as 32 separate 8-bitelements rather than 1 single 256-bit ob-ject.

• The definition wire xmeans there is 1 single1-bit wire in x.

• The definition wire [3:0] y means thereare 4 separate 1-bit wires in y and

1. z can be used as 1 single 4-bit object, or

2. z can be used as 4 separate 1-bit ele-ments.

Wire definition Meaning

wire w0; A 1-bit unsigned wire called w0.wire [ 3 : 0 ] w1; A 4-bit unsigned wire vector called w1.wire [ 0 : 3 ] w2; A 4-bit unsigned wire vector called w2.wire [ 4 : 1 ] w3; A 4-bit unsigned wire vector called w3.wire signed [ 3 : 0 ] w4; A 4-bit signed wire vector called w4.

Figure 5.2: Several different examples of wire definition.

the exact analogy of the same thing in real life: it acts as a conducting medium that allows a signal (or value) tobe carried between the two end-points. We say that a value is driven onto one end before propagating alongthe wire so it can be read at the other end. We define Verilog wires using a similar syntax to variable declarationin C, as shown in Figure 5.2: each case includes the keyword wire, a type and an identifier. A case-by-casedescription follows.

The first case defines a wire called w0 capable of carrying 1-bit values. Although this is already enoughfor some situations, we will often want to communicate larger values. So-called wire vectors allow us to doexactly this by “bundling” several 1-bit wires into larger groups. The second case demonstrates this by defininga wire vector called w1 that can carry 4-bit values. The part of the definition which reads [ 3 : 0 ] specifiesthe size of the wire vector as well as the order and labels of wires within it. More generally, [ i : j ] can beinterpreted as “the elements in this vector are labelled i to j”. Since i > j this implies there are i− j + 1 elementsin the wire vector; in this case, w1 has 3− 0 + 1 = 4 elements labelled 3, 2, 1, 0. Since we can select the individualelements (i.e., individual wires) from a given wire vector, we can think of the vector as either as an array offour 1-bit wires, or a single compound wire that can carry 4-bit values.

The third case of Figure 5.2 defines another 4-bit wire vector but the order of the elements is reversed: theleft-most, most-significant element of w2 is 0 while the right-most, least-significant is 3. In the previous, secondcase the left-most, most-significant element was 3 while the right-most, least-significant was 0; here, since j > i,there are j− i + 1 elements in the wire vector meaning w2 has 3−0 + 1 = 4 elements labelled 0, 1, 2, 3. The fourthcase uses another variation whereby the order of wires in the wire vector is the same as w1, but their index ischanged. This time the left-most element of w3 is numbered 4 rather than 3, while the right-most is numbered1 rather than 0. Again, since i > j, there are i − j + 1 elements in the wire vector meaning w3 has 4 − 1 + 1 = 4elements labelled 4, 3, 2, 1. These variations might seem confusing but it is crucial to see that they simply givedifferent labels to the constituent elements. Typically the original approach (i.e., that for w1) is enough: thevariations are useful mainly in niche situations such as compliance with the endianness requirements of somespecification or third-party component.

It is worth noting that more recent Verilog language specifications include the facility to declare signedwire vectors, i.e., wire vectors that carry signed, two’s-complement values rather than unsigned values; this isthe direct analogy of so-called type modifiers in C. For example the next, fifth case in Figure 5.2 declares a 4-bitsigned wire vector called w4; it can carry values between −8 and 7 inclusive. This differs from the second cases,where w1 is declared as a 4-bit unsigned wire vector capable of carrying values between 0 and 15 inclusive.

git # ba293a0e @ 2019-11-14 196



Value Meaning1'b0 0 (or logical low, or false).1'b1 1 (or logical high, or true).1'bX Undefined (or “unknown” ).1'bZ High impedance (or “disconnected” ).

(a) Verilog literal values.

Strength Meaningsupply Strongest.strong

↓

pulllargeweakmediumsmallhighz Weakest.

(b) Verilog literal strengths.

Figure 5.3: Verilog literal values and their strength.

Clearly this is a matter of interpretation to some extent, since one can view a given bit pattern as representinga signed or unsigned value (per Chapter 1) regardless of the wire vector type. However, adding the signedkeyword “hints” to the Verilog tool-chain that, for example, if the value on signed wire w4 is inspected ordisplayed somehow it should be viewed as signed.

5.2.3 Values and literals

In idealised hardware, a wire should only ever take the values 0 or 1 (i.e., low and high, or false and true).These values are written as the Verilog literals 1'b0 or 1'b1. However, physical reality dictates they are notideal; as a result Verilog supports four different values as detailed by Figure 5.3a.

The undefined value 1'bX is required to model what happens when some form of conflict occurs. It isnot that we do not care what the value is, we genuinely do not know what the value should be. As a simpleexample, consider driving the value 1'b1 onto one end of a wire and the value 1'b0 onto the other end, whatvalue does the wire take? The answer is generally undefined: it roughly depends on how strong the twosignals are. Verilog offers a way to model how strong a value of 1'b0 or 1'b1 is; one prefixes the value withone of the strength types shown in Figure 5.3b. However, this sort of assumption should be used with care:generally, if you rely on an undefined value being resolved to something known, there is probably a betterway to describe the design and avoid the problem. The high impedance value 1'bZ is a kind of pass-throughdescribed previously by Chapter 2 in the context of 3-state logic. If high impedance conflicts with any othervalue, it instantly resolves to the other value. This is useful for modelling components which must producesome continuous output but might not want to interfere with another component using the same wire.

So far we have not described why the syntax for Verilog literals is this way. For example, what are the 1and b characters in the literal 1'b0? The general syntax can be informally described as the combination of fourparts, namely

1. an optional sign, i.e., -, which denotes that the literal is negative (or positive if omitted),

2. an optional size field which specifies the number of bits required to represent the literal,

3. an optional base field which specifies the radix the literal is is expressed in (for example H or h forhexadecimal, D or d for decimal and B or b for binary), and

4. the literal itself

where an apostrophe character is used to separate the size and base fields. So, for example we have

• 2'b10 is a 2-bit binary value 10(2), 2(10) or 2(16).

• 4'hF is a 4-bit hexadecimal value 1111(2), 15(10) or F(16).

• 8'd17 is an 8-bit decimal value 17(10), 00010001(2) or 11(16).

• -8'd10 is an 8-bit decimal value −10(10), represented in two’s-complement.

• 3'bXXX is a 3-bit undefined binary value (all three bits are undefined).

• 8'hZZ is an 8-bit high impedance hexadecimal value (all eight bits are undefined).

• 4'b10XZ is a 4-bit binary value where bit zero is the high impedance value, bit one is the unknown value,bit two is 0 and bit three is 1. This value does not really translate into another base; although some of thebits are defined, because some are not the whole value is undefined.

git # ba293a0e @ 2019-11-14 197



Where size is not important, one can omit the size field and accept a default (undefined bits, whether specifiedor not, are assumed to be zero); likewise one can omit the base field to accept the default of decimal. Thesefeatures mean, for example, that the literal 10 is the same as 32'd10, i.e., the 32-bit decimal value ten.

5.2.4 Simple operators on wires and wire vectors

Having defined some wire or wire vector, using it is fairly straightforward: we simply refer to it via theidentifier we gave it, in a similar way to use of a variable in a C program. As well as this direct means of usingthe wire or wire vector, we can apply several operators which allow slightly more complicated uses. Cruciallyhowever, such operators simply relabel wires rather than implying any computation: the aim is simply to splita larger wire vector into smaller wires or wire vectors, or group a smaller wire or wire vectors into a largerwire vector. Examples of these two cases are shown in Figure 5.4a and Figure 5.4b respectively, with a thirdexample in Figure 5.4c.

Notice that the Verilog subscript (or “grab”) operator resembles the C array subscript operator: the aim isto specify an element within a wire vector. Verilog allows ranges in the subscript itself, so in fact we can specifymore than one element. Either way, the example in Figure 5.4a shows that we can split the large wire vector xinto parts, accessing the individual wires within it. If x has the value 8'b11110000 then we have that

• x[ 7 ], x[ 6 ], x[ 5 ] and x[ 4 ], are all 1-bit wires with value 1'b1,

• x[ 3 ], x[ 2 ], x[ 1 ] and x[ 0 ], are all 1-bit wires with value 1'b0,

• x[ 7 : 4 ] is a 4-bit wire vector with value 4'b1111, and

• x[ 3 : 0 ] is a 4-bit wire vector with value 4'b0000.

In contrast, the Verilog concatenate operator performs roughly the opposite role by grouping together wiresor wire vectors; there is no clear analogy for this in C. In Figure 5.4b, if x, y and z have the values 2'b10, 1'b1and 1'b0 then we have that

• { x, y, z } is a 4-bit wire vector with value 4'b1010,

• r[ 3 ] is a 1-bit wire with value 1'b1 (matching x[ 1 ]),

• r[ 2 ] is a 1-bit wire with value 1'b0 (matching x[ 0 ]),

• r[ 1 ] is a 1-bit wire with value 1'b1 (matching y), and

• r[ 0 ] is a 1-bit wire with value 1'b0 (matching z).

Finally, Figure 5.4c illustrates the replicate operator, which makes a specified number of “copies” of wires orwire vectors. If we say x has the value 1'b1, then { 4{ x } }, which is a 4-bit wire vector, has the value4'b1111: the number of copies is the first thing within the braces, and the thing being copied is the second.This is basically the same as if we had written { x, x, x, x }, i.e., used the concatenation operator to groupfour copies of x together.

To reiterate, and as the diagrams illustrate, all any operator does is relabel elements in things we havedefined into more convenient forms: in each case we can relate the value of the things we produce directly tothe things we use.

5.2.5 Modules

Programs written in C can be thought of as collections of functions that use each other in order to performcomputation: the exact computation is determined by how the functions call each other and by what goes oninside each function. The analogous construct in Verilog is a module. In common with a function, a modulesprovide us with an abstraction mechanism: we can view them as a black-box, with inputs and outputs, withoutworrying how the internals work. For example, the multiplexer device introduced previously selects betweenseveral values to produce a result:

c

x

yr

git # ba293a0e @ 2019-11-14 198



wire [ 7 : 0 ] x;

x[ 0 ]x[ 1 ]x[ 2 ]x[ 3 ]x[ 4 ]x[ 5 ]x[ 6 ]x[ 7 ]

x[ 7 : 4 ] x[ 3 : 0 ]

(a) The Verilog subscript (or “grab”) operator: given an 8-bit wire vector x we can access each individual 1-bit wire using a subscriptoperator (middle). Note that ranges are allowed, so we can also access contiguous groups of bits; for example, we might access the two4-bit halves of x (bottom).

wire ywire [ 1 : 0 ] x wire z

{ x, y, z }

r

r[ 3 ] r[ 2 ] r[ 1 ] r[ 0 ]

(b) The Verilog concatenate operator: we can group wires (top), or even wire vectors, into larger wire vectors (middle). Note that theeffect is simply to “merge” all the wires together into one result. For example (bottom) the 3-rd or most-significant bit of the result r,which is a 4-bit wire vector, is simply the 1-st bit of x, and the 0-th or least-significant bit of r is z.

wire x

{ 4{ x } }

r

r[ 3 ] r[ 2 ] r[ 1 ] r[ 0 ]

(c) The Verilog replicate operator: as a short-hand for specific uses of concatenation, we can make n “copies” of an input wire or wirevector. For example (top) the 1-bit wire x is replicated four times (middle) to form the 4-bit wire vector r: each bit of r is equal to x.

Figure 5.4: Diagrammatic descriptions of simple Verilog operators on wires and wire vectors.

git # ba293a0e @ 2019-11-14 199



An aside: describing Verilog modules using C functions as a rough analogy.

C Verilog

• A program is described using static functiondefinitions.

• Each function has a body that describes howit behaves, i.e., how it computes outputsfrom inputs.

• The functions reference each other via calls;a function call implies an active use.

• Values are carried by variables, on whichcomputation is performed by functions.

• A model is described using static moduledefinitions.

• Each module has a body that describes howit behaves, i.e., how it computes outputsfrom inputs.

• The modules reference each other via in-stantiations; a module instantiation impliesan active use.

• Values are carried by nets, on which com-putation is performed by modules.

• Each module instance represents somephysical hardware component, and themodel describes a hardware system as awhole.

module mux2_1bit( output wire r,input wire c,input wire x,input wire y );

...

endmodule

(a) The “inline” module definition style.

module mux2_1bit( r, c, x, y );

output wire r;input wire c;input wire x;input wire y;

...

endmodule

(b) The “inside” module definition style.

Figure 5.5: Two styles of module interface for a 1-bit, 2-way multiplexer.

From a functional point of view, what the component does (e.g., how we interact with it via the inputs andoutputs) is more important than how it does it (e.g., what is inside the component). The interface to thecomponent focused more on the former: it describes only the inputs (i.e., x, y and c) and outputs (i.e., r).A module definition is a static description, or model, of some component that we can later use; definitionsconsist of three parts, namely

1. a name or identifier which we can refer to the module by,

2. an interface, meaning the module inputs and outputs, which we call ports and which form the moduleport list, and

3. the body which determines the module behaviour, i.e., how the component does computation to producethe outputs from the inputs and any stored state.

Using a Verilog module, we can describe the interface of the multiplexer discussed above. In fact, two differentstyles of definition are possible (in both cases we replace the module body with continuation dots to be filledin later):

1. Figure 5.5a describes the port list “inline” (roughly corresponding to newer ANSI-style C functiondefinitions), whereas

2. Figure 5.5b describes the port list “inside” the module (roughly corresponding to older K&R-style Cfunction definitions).

Although the two styles are largely equivalent, for simple modules the first method is more compact and so ispreferred; there are some advantages to the second style however, which we discuss later.

git # ba293a0e @ 2019-11-14 200



5.2.5.1 Module instantiation

Continuing the analogy between C functions and Verilog modules, a function clearly does not do anythinguntil it is called: the function is just a static description. Equally, a module does not do anything until it isinstantiated. If you think of a module definition as a template, each time we instantiate the module we take thetemplate and reproduce the component it describes. In a sense we are including a copy of it down in our model;if we instantiate a multiplexer module, this is the analogy of taking a physical multiplexer component from abox and placing it on a circuit. This implies a crucial difference between function calls and module instances:whereas a function call is transient (we call the function and once we get the result, the call is “gone”), a moduleinstance is permanent (the components it implies always exists).

Put like this, module instantiation is how we create an active instance of some component; instantiationsconsist of three parts, namely

1. a type which specifies the module we want to instantiate,

2. an identifier that names the module instance, and

3. a list of external wires, i.e., connections to the internal port list of the module instance.

Again using the multiplexer example, the Verilog module instantiation on the left-hand side does basically thesame thing as the diagrammatic description on the right-hand side:

mux2_1bit t( s, k, p, q ); 7→c

x

yr

q

p

k

st

That is, we create an instance of the mux2_1bitmodule and call it t; the internal ports of the multiplexer, i.e., r,c, x and y are connected to the external ports s, k, p and q. Having performed the instantiation, any input wedrive onto k can be “seen” by t on c, and any output t drives onto r can be “seen” by us on s.

5.2.5.2 Primitive, or built-in modules

Verilog offers a number of primitive modules that represent logic gates. In a sense you can think of primitivemodules as similar to the C standard library which makes basic operations available to all C programs. We caninstantiate and connect together primitive modules to describe the behaviour of more complex components(e.g., the body of our multiplexer module). The syntax is the same as modules we define, meaning the followingcorrespondences hold:

buf t0( r, x ); 7→ r = xnot t1( r, x ); 7→ r = ¬xnand t2( r, x, y ); 7→ r = x ∧ ynor t3( r, x, y ); 7→ r = x ∨ yand t4( r, x, y ); 7→ r = x ∧ yor t5( r, x, y ); 7→ r = x ∨ yxor t6( r, x, y ); 7→ r = x ⊕ y

For example in the last case we instantiate a 2-input XOR gate, whose type is xor, and name it t6; the inputsto the gate are connected to the external wires x and y, the the output from the gate to r. This means we areeffectively computing r = x ⊕ y.

In fact, Verilog is nicer to us than simply providing a 2-way version of XOR. It provides multi-input versionsof the primitive modules as well, allowing the inclusion of more inputs to the same primitive gate types asfollows:

xor t8( r, w, x, y ); 7→ r = w ⊕ x ⊕ yxor t9( r, w, x, y, z ); 7→ r = w ⊕ x ⊕ y ⊕ z

Again in the last case, we instantiate a 4-input XOR gate: the inputs are connected to the external wires w, x, yand z, and the output to r. This means we are effectively computing r = w ⊕ x ⊕ y ⊕ z.

Using primitive gates we can start filling in, i.e., implementing the behaviour of, the 1-bit, 2-way multiplexermodule whose interface is described by Figure 5.5. We encountered the truth table and SoP-based Booleanexpression in Chapter 2; the corresponding circuit is reproduced in Figure 5.6a, which is annotated to makesome features clear by naming them. There is a one-to-one correspondence between these features and lines inFigure 5.6b which describes the Verilog implementation. For example,

git # ba293a0e @ 2019-11-14 201



c

r

x

y

t1

t3

t2

t0w0

w1

w2

(a) A 1-bit, 2-way multiplexer described using a circuit dia-gram.


wire w0, w1, w2;

not t0( w0, c );

and t1( w1, x, w0 );and t2( w2, y, c );

or t3( r, w1, w2 );

endmodule

(b) A 1-bit, 2-way multiplexer described using gate-levelVerilog.

yx

ci

s

co

t0 t2

t1 t3

t4

w0

w1 w2

(c) A full-adder cell described using a circuit diagram.

module fa( output wire co,output wire s,input wire ci,input wire x,input wire y );

wire w0, w1, w2;

xor t0( w0, x, y );and t1( w1, x, y );

xor t2( s, w0, ci );and t3( w2, w0, ci );

or t4( co, w1, w2 );

endmodule

(d) A full-adder cell described using gate-level Verilog.

¬Q

Q

en

D t6

t7

t4

t5

t1t0 t3

w0

w1

w2

w3

w4 w6

w5 w7

(e) A D-type flip-flop described using a circuit diagram.

module dff( input wire en,

input wire D,output wire Q );

wire w0, w1, w2, w3, w4, w5, w6, w7;

not t0( w0, en );and t1( w1, w0, en );

buf t2( w2, D );not t3( w3, D );

and t4( w4, w2, w1 );and t5( w5, w3, w1 );

nor t6( w6, w4, w7 );nor t7( w7, w5, w6 );

buf t8( Q, w7 );

endmodule

(f) A D-type flip-flop described using gate-level Verilog.

Figure 5.6: Gate-level implementations, and their diagrammatic analogues, of several building block components usingprimitive modules.

git # ba293a0e @ 2019-11-14 202



1. the first line declares three internal wires called w0, w1 and w2,

2. the next line instantiates a NOT gate called t0, whose input is c and output is w0,

3. the next line instantiates an AND gate called t1, whose inputs are x and w0 and output is w1,

4. the next line instantiates an AND gate called t2, whose inputs are y and c and output is w2,

5. the last line instantiates an OR gate called t3, whose inputs are w1 and w2 and output is r.

Basically all we have done is instantiate modules to represent the gates, and then connect them up per thecircuit with internal wires (as well as the module inputs and outputs). Figure 5.6c and Figure 5.6d offer anotherexample, this time for a full-adder cell. Exactly the same principles apply for the implementation in Figure 5.6das with the multiplexer.

Stated in this way, Verilog module instantiation appears very similar to a C function call. For example, itlooks a little like we call the notmodule with c as input to produce the output w0. However, each primitive gateinstance executes in parallel with every other; they can be described as continuous in the sense that they do notwait for their inputs to be ready before computing their output. That is, they are always computing an output.Put another way, there are some crucial differences to keep in mind which make this analogy inaccurate:

• Typically, a compiler translates each function declared in a C program into one implementation in theexecutable. That is, two separate calls to the same function actually use the same instructions. Withmodule instantiation, rather than being shared, each instance is a separate physical component.

For example, the instances t1 and t2 in Figure 5.6b are both AND gates, i.e., described by the samemodule and, but represent separate physical components.

• Within a typical C program, each function call in the program is executed in sequence so that at any giventime we know where in the program execution is: it cannot be in two functions at the same time forexample. With a Verilog model, every module instance executes in parallel with every other instance sothat one module can be doing things at the exact same time as another.

For example, the instances t0 and t1 in Figure 5.6b are working continuously and at the same time (eventhough their order in the source code may imply t0 is before t1 somehow). We could swap the two linesover and still get a valid result: in C w0would be incorrectly used by t1 before it is assigned a value by t0,but in Verilog the connection between t0 and t1 is still valid and so the wire still connects them correctly.

The continuous operation of Verilog module instances can be confusing if you are more used to the sequentialnature of C. For example, what happens if some inputs to an instance have values driven onto them while othersdo not? The instance is always generating an output, so the output must be something, but it will typically beresolved as the undefined value. This also implies a challenge in terms of timing. Specifically, we already sawthat propagation delay (within the gates that implement the instance) can impact on when the output becomesvalid; as the size and complexity of the components we look at increases, these small delays can accumulateand present more significant delays. So even when we do drive values onto all the inputs, we need to take carethe output is only used once it is valid.

5.2.5.3 Gate behaviours for unknown values

A somewhat subtle wrinkle exists when directly equating primitive modules to logic gates. That is, sinceVerilog allows wires to carry the unknown and high impedance values, i.e., 1'bX and 1'bZ, said modules needto act accordingly.

Fortunately, the implications of this fact are easy to reason about. First, if an input is 1'bZ, it is treated as1'bX. Then, the rules in Figure 5.7 are applied to describe the output based on the inputs. Note the additionalentries versus traditional truth tables for NOT, AND, OR and XOR; they show, for example, that if we ANDtogether 1'b0 and 1'bX then the result would be 1'b0. Intuitively, and indeed in physical terms, this mightseem an odd claim: how can we decide on a result if one of the inputs is unknown? In theory at least, theproperties of AND mean this is sane because when one input is 0, it does not matter what the other input is:the output will always be 0. Similar shortcuts exist for the other operators. For example, if either input of anOR operator is 1 it does not matter what the other one is (even if it is unknown) because the output will alwaysbe 1.

git # ba293a0e @ 2019-11-14 203



NOT1x r0 11 0X X

(a) NOT.

AND2x y r0 0 00 1 01 0 01 1 10 X 01 X XX 0 0X 1 XX X X

(b) AND.

OR2x y r0 0 00 1 11 0 11 1 10 X X1 X 1X 0 XX 1 1X X X

(c) OR.

XOR2x y r0 0 00 1 11 0 11 1 00 X X1 X XX 0 XX 1 XX X X

(d) XOR.

Figure 5.7: Gate behaviours where one or more inputs are the unknown value.

5.2.5.4 User-defined modules

Primitive modules are just like user-defined modules (except the fact they are provided by Verilog) so the syntaxfor instantiation is the same as well. As a result, we can consider implementing the behaviour of modulesusing our own user-defined modules in the same way we did above using primitive modules.

Consider for example the 1-bit, 2-way multiplexer implemented using primitive gates in Figure 5.6b. Wecan extend this in (at least) two ways by altering either the size or number of inputs: Figure 5.8b describesthe implementation of a 4-bit, 2-way multiplexer while Figure 5.8d describes a 1-bit, 4-way multiplexer. Inboth cases, the module behaviour is implemented using instances of mux2_1bit: exactly the same rules applywhen instantiating this user-defined module as primitive modules such as or. That is, we specify a type, anidentifier and a port list; each instance of mux2_1bit operates continuously and in parallel with every other.Of course we could have implemented both modules directly using primitive modules. Ignoring which is themost efficient, reuse of the mux2_1bitmodule clearly leans on one of the advantages of Verilog: we can easilybuild more complex behaviour from more simple behaviour, capitalising in this case on the design strategiesof replication and cascading.

One of the goals of this example is to highlight the fact that instantiation creates a hierarchy of instances;each instance contains instances of modules it in turn instantiates. To keep track of this hierarchy, Verilogoutlines a simple naming scheme. For example, the 4-bit, 2-way multiplexer implemented in Figure 5.8bcontains four instances of mux2_1bit (the 1-bit, 2-way multiplexer) named t0, t1, t2 and t3. Looking back atFigure 5.6b, each one of these instances contains an instance of not called t0, two instances of and called t1 andt2, and an instance of or called t3. Hierarchical naming means each primitive gate has a unique name: thenot instance within the mux2_1bit instance t1 is called t1.t0while the one within instance t2 is called t2.t0.This naming scheme should be somewhat familiar: we use the same sort of scheme when referring to fieldswithin C structure instances for example. In Verilog it is useful because we can embed debugging mechanismswithin a module and determine exactly where the messages are coming from; without the naming scheme, alldebugging from the module mux2_1bit would be mixed together and become useless with respect to locatingerrors for a specific instance.

5.2.6 User-Defined Primitives

In some cases, describing the behaviour of a module using primitive gates can feel like hard work. In particular,this approach contradicts the goal of reducing our work by handing off as much work to the Verilog tool-chainas we can. The User-Defined Primitive (UDP) offers a more convenient way to implement (some) modules:we specify the the truth table and leave the implementation (i.e., how the behaviour is realised using primitivegates) to the tool-chain.

UDPs have a similar looking interface to modules. For example, we might reimplement our 2-way mul-tiplexer as shown in Figure 5.9b. The body of the UDP is the corresponding truth table (for reference shownin Figure 5.9a) stating, given some combination of input, what output should be produced. Each line lists thevalues of the inputs (in the same order as the port list) followed by a colon and then the value required onthe output. We are free to include the unknown value 1'bX, but the high impedance value 1'bZ is treated as1'bX. We are also free to include don’t care states in the table as we might with a Karnaugh map; such statesare written using a question mark. Thus, the first line above can be read as “when both x and c are 0 and y is

git # ba293a0e @ 2019-11-14 204



c

x

yr

c

x

yr

c

x

yr

c

x

yr

r0x0y0

c

r1x1y1

c

r2x2y2

c

r3x3y3

c

t0

t1

t2

t3

(a) A 4-bit, 2-way multiplexer describedusing a circuit diagram.

module mux2_4bit( output wire [ 3 : 0 ] r,input wire c,input wire [ 3 : 0 ] x,input wire [ 3 : 0 ] y );

mux2_1bit t0( r[ 0 ], c, x[ 0 ], y[ 0 ] );mux2_1bit t1( r[ 1 ], c, x[ 1 ], y[ 1 ] );mux2_1bit t2( r[ 2 ], c, x[ 2 ], y[ 2 ] );mux2_1bit t3( r[ 3 ], c, x[ 3 ], y[ 3 ] );

endmodule

(b) A 4-bit, 2-way multiplexer described using gate-level Verilog.

c

x

yr

c

x

yr

c

x

yr

c0

wx

c1

r

c0

yz

t0

t2

t1

w0

w1

(c) A 1-bit, 4-way multiplexer describedusing a circuit diagram.

module mux4_1bit( output wire r,input wire c0,input wire c1,input wire w,input wire x,input wire y,input wire z );

wire w0, w1;

mux2_1bit t0( w0, c0, w, x );mux2_1bit t1( w1, c0, y, z );mux2_1bit t2( r, c1, w0, w1 );

endmodule

(d) A 1-bit, 4-way multiplexer described using gate-level Verilog.

Figure 5.8: Gate-level implementations, and their diagrammatic analogues, of two components using user-definedmodules.

git # ba293a0e @ 2019-11-14 205



MUX2c x y r0 0 ? 00 1 ? 11 ? 0 01 ? 1 1

(a) A 1-bit, 2-way multiplexer described using a truth table.

primitive mux2_1bit( output r,input c,input x,input y );

table0 0 ? : 0;0 1 ? : 1;1 ? 0 : 0;1 ? 1 : 1;

endtable

endprimitive

(b) A 1-bit, 2-way multiplexer described using a VerilogUDP.

Figure 5.9: A 1-bit, 2-way multiplexer described using a UDP.

any value, set the output r to 0”.Compared to description of the same behaviour in terms of primitive gates, this seems a much nicer way

to do things. However, there are some rules to consider when defining a UDP:

1. There can only be one output port, although there can be as many inputs as required; the output portmust be specified first in the port list.

2. Both input and output ports can only be single wires; wire vectors are not allowed.

5.2.7 RTL-based constructs

Expressing a Verilog model purely in terms of logic gates is analogous to writing software directly in a low-levelmachine language: it permits a lot of control over the result, but requires a significant amount of effort by theprogrammer. To combat this we can use RTL, which allows us to work at a higher-level of abstraction. Clearlythere is some overlap between RTL and gate-level design; a rough way to think of RTL is simply as a short-handfor combinatorial logic we have previously described long-hand using gates.

5.2.7.1 Continuous assignment

In Section 5.2 we described the operation of gates, realised using primitive modules, as continuous in the sensethat they continuously compute outputs from their inputs rather than waiting for the inputs to be available. Wecan use an RTL continuous assignment to do the same thing. Such a statement looks similar to an assignmentin C in the sense that it has Left-Hand Side (LHS) and Right-Hand Side (RHS) expressions: the LHS is drivenwith (or assigned) a value produced by evaluating the RHS.

assign r = ( x | y ) & z; 7→or t0( w, x, y );and t1( r, w, z );

7→

t0

t1

xy

z r

w

Similarly to C where the LHS would be a variable, in a Verilog continuous assignment the LHS must be a wire orwire vector. Furthermore, but now unlike C, a Verilog continuous assignment is continuously being evaluatedrather than only being executed when control-flow reaches the assignment. This means a Verilog continuousassignment is perhaps better described as (permanently) connecting the LHS to the output of a circuit describedby the RHS.

There are two key benefits relating to the use of a continuous assignment over, for example, an equivalentgate-level description:

1. The RHS can be constructed using a rich set of high-level, C-style logical and arithmetic operators;Figure 5.10 includes a comprehensive list.The goal is to describe the RHS expression in a natural form which focuses on the required behaviour. Ofcourse, there is still an underlying requirement for gate instances and connections, but implementationof the expression in these low-level terms is deferred to (i.e., automated by) the Verilog tool-chain.

git # ba293a0e @ 2019-11-14 206



Type Symbol Meaning OperandsArithmetic * multiply 2

/ divide 2+ add 2- subtract 2% modular reduction 2** exponentiation 2

Logical ! logical NOT 1&& logical AND 2|| logical OR 2

Relational > greater than 2< less than 2>= greater than or equal to 2<= less than or equal to 2== equal to 2!= not equal to 2=== case equal to 2!== case not equal to 2

Bitwise ~ bitwise NOT 1& bitwise AND 2| bitwise OR 2^ bitwise XOR 2

Reduction & reduce AND 1~& reduce NAND 1| reduce OR 1~| reduce NOR 1^ reduce XOR 1~^ reduce XNOR 1

Shift << logical left-shift 2<<< arithmetic left-shift 2>> logical right-shift 2>>> arithmetic right-shift 2

Figure 5.10: A list of computationally-oriented Verilog operators.

git # ba293a0e @ 2019-11-14 207




assign r = c ? y : x;

endmodule

(a) A 1-bit, 2-way multiplexer described using a Verilog continuous assignment.



endmodule

(b) A 4-bit, 2-way multiplexer described using a Verilog continuous assignment.

Figure 5.11: Two (re)implementations of user-defined multiplexer modules using RTL-based continuous assignments.

2. Consider an example where the goal is to OR corresponding bits of two 4-bit wire vectors called x and y.Using a gate-level description, we might instantiate four gates as follows

or t0( r[ 0 ], x[ 0 ], y[ 0 ] );or t1( r[ 1 ], x[ 1 ], y[ 1 ] );or t2( r[ 2 ], x[ 2 ], y[ 2 ] );or t3( r[ 3 ], x[ 3 ], y[ 3 ] );

with the i-th gate OR’ing the i-th bits of x and y together to form the i-th bit of r. One could easily arguethis is already quite verbose, and for larger r, x and y increasingly so.

In contrast, using a continuous assignment we could instead simply write

assign r = x | y;

Since the Verilog tool-chain knows the types of x and y, the result is what we might intuitively expect.That is, it matches the above wrt. element-wise application of OR. Furthermore, if we change the typesof x and y, say to 1-bit wire vectors, the same continuous assignment still gives the result expected. Infact, it even works if the types are mismatched: if x is a 4-bit wire vector and y is a 2-bit wire vector, y ispadded (or extended) with two more-significant 0 bits.

Hopefully the implication is clear: using RTL, we as the Verilog developer do less work and focus on morecreative tasks while avoiding tasks which are mechanical and easy to automate. In certain cases we sacrificesome control over the result, but we almost always get a trade-off in terms of increase productivity.

5.2.7.2 Expressions with conditional behaviour

Verilog includes a so-called ternary (or “choice”) operator, which has a direct analogue in C. The operator isused with three operands (say c, x and y) with the first selecting between the second and third, i.e.,

assign r = c ? y : x; 7→ r =

{x if c = 0y if c = 1

This should look familiar: the operator is essentially describing the same behaviour as a multiplexer, with cacting as the control signal. That is, used as above the RHS will be continuously evaluated meaning any changein c, x or y will update the LHS r in exactly the same was as a multiplexer we described previously usinggate-level Verilog.

We can direct reimplement the 1-bit, 2-way multiplexer previously described using gates in Figure 5.8b andusing a UDP in Figure 5.9b: Figure 5.11a shows a third, RTL-based implementation. In terms of the claimedbenefit of an RTL-based approach, Figure 5.11b offers some neat evidence: it reimplements the 4-bit, 2-waymultiplexer originally described using gates in Figure 5.8b. Notice that the two implementations are the same:they both use a single line continuous assignment to describe the required behaviour, and the Verilog tool-chainunderstands how to interpret this based on the types of, in this case, x, y and r.

git # ba293a0e @ 2019-11-14 208




wire [ 1 : 0 ] t;

assign t = ci + x + y;

assign s = t[ 0 ];assign co = t[ 1 ];

endmodule

(a) The full-adder described long-hand, with an extra inter-mediate wire t.


assign { co, s } = ci + x + y;

endmodule

(b) The full-adder described naturally using an LHS concate-nation operator.

Figure 5.12: A full-adder cell described using a Verilog continuous assignment.

5.2.7.3 Creative use of concatenation

Figure 5.12 describes a reimplementation of the gate-level full-adder in Figure 5.6d; the two different versionsare intended to make it easier to understand.

1. Figure 5.12a makes use of a single continuous assignment to describe the required behaviour. The RHShas a fairly intuitive meaning: via the built-in Verilog plus operator, it adds together the 1-bit inputs ci,x and y. The result is driven onto the LHS, namely the 2-bit wire t.

The assignment is valid since the type of the RHS is a 2-bit wire vector (since we are computing the sumof three 1-bit wires) and this matches the type of the LHS which is a 2-bit wire vector by definition.

Finally, the 1-bit outputs co and s, are driven with the most- and least-significant bits of t respectively.

2. Figure 5.12b simplifies things further. Essentially the same approach is taken, but now the continuousassignment uses a more cryptic LHS formed from the concatenation of co and s.

The types of the RHS and LHS still match: the only change is the RHS, which is the concatenation of two1-bit wires meaning it is a 2-bit wire vector.

By connecting the wires of the LHS and RHS together, which is all a continuous assignment does, weagain get the result what we wanted but without the need of t: the most-significant bit of the RHS isconnected to the most-significant of the LHS (i.e., to co); the least-significant bit of the RHS is connectedto the least-significant of the LHS (i.e., to s).

This simple example demonstrates a few points worth making. First, there is a clear conceptual differencebetween Verilog and C, where the RHS of an assignment must be a single variable: here we are allowed toinclude operators in Section 5.2.4 since they simply relabel wires rather than perform computation. Second,it lends evidence to the claimed advantages of an RTL-based approach. In this case, the RHS expression isliterally the most natural description of the behaviour we want.

As an aside, there is a further benefit worthy of note. If the technology we use to implement this modulesupports particularly efficient structures for adder cells, the Verilog tool-chain can spot the plus operator andcapitalise on this fact; the same is not necessarily true if we obfuscate the behaviour we want by implementingit via a gate-level description. Such a scenario might occur, for example, if we use an FPGA (which often housededicated blocks of logic for operations such as addition and multiplication) rather than a transistor-basedstandard cell library. Of course in this case the benefit is perhaps marginal, but for more complex modules itcan make a real difference to the end result.

5.2.7.4 Reduction operators

One class of Verilog operator which will be unfamiliar to most C programmers are those which performreduction. Each Verilog such operator takes an operand and produces a 1-bit result; essentially it applies thislogical operator to each bit in the single operand, bit-by-bit from right-to-left. Those familiar with functionalprogramming languages (Haskell for example) may recognise this style of operation as similar to foldr andfoldl.

Consider a 4-bit wire vector called xwhich has the value 4'b1010. The reduction expression

|x

git # ba293a0e @ 2019-11-14 209



takes each wire in x and places a OR operator between them. The result is is exactly equivalent to the long-handexpression

x[ 0 ] | x[ 1 ] | x[ 2 ] | x[ 3 ]

and evaluates to 1'b1. Another example could be

&x

which is exactly equivalent to the expression

x[ 0 ] & x[ 1 ] & x[ 2 ] & x[ 3 ]

and evaluates to 1'b0. In part, this sort of operation does not exist nor make as much sense in C because theprogrammer less often combines bits from the same word: once some integer x is defined, it is typically usedas a single object rather than inspecting individual bits. With Verilog however, a wire vector can be viewedsimply as a collection of wires (per Section 5.2.2, and rather than a single object) so it is useful to extract andoperate on each wire individually.

5.2.7.5 Timing and delays

To model the issue of timing, Verilog offers several ways to specify delay. For example, in the context ofcontinuous assignment we can use a regular delay to control the time between changes to the RHS expressionand the assignment to (or update of) the LHS. One can think of this as a way of modelling propagation delay,where it takes some time for values to propagate through the circuit represented by the RHS before provokinga change in the LHS. In contrast to the example at the start of this Section, the continuous assignment

assign #10 r = ( x | y ) & z;

uses a regular delay of 10 time units: a value is assigned to the LHS r 10 time units after any change to x, y or zin the RHS. One caveat to this simple rule is that any change to the RHS must persist for at least as long as thedelay for it to be “seen” on the LHS. For example if x changes from 1'b1 to 1'b0 and then back again in lessthan 10 time units, rwill not change correspondingly.

5.3 Behavioural modelling

The basic premise behind behavioural-level Verilog is description of a module body purely in terms of behaviour,effectively taking another step forward from RTL. Three concepts are important:

1. The behaviour of each module is at least partly described using processes, each of which

(a) can be triggered to perform computation, and

(b) operates in parallel with the others (much like a module instance does).

2. Each process is composed of blocks of statements (and only valid behavioural statements).

3. The statements in a block are “executed” when the containing process is triggered, i.e., they representsteps in some larger computation specified by the process.

The final concept has an implicit dependence on time and state. That is, if statements are executed in a sequenceof steps, we need to maintain state between those steps: behavioural Verilog allows declaration of registers tosatisfy this requirement.

Crucially, it is possible to “mix and match” styles of Verilog in the same description of module behaviour.For example, we might describe the behaviour of some module partly using a behavioural process and thenpartly using some module instances or RTL continuous assignments. Per the goals of using Verilog at all,this is attractive: it enables us to use the right level of abstraction for whatever we are doing. However, thisapproach must follow the rules above: since a behavioural process (or block) must be composed of behaviouralstatements only, it is not legal to mix other styles inside them. Any module instanciation, for example, mustexist outside a behavioural process since this is not a behavioural statement. Performing an instanciation insidea process makes no sense: this treats module instanciation like a C function call. Rather, the instance alwaysexists, rather than existing only when the process is triggered.

git # ba293a0e @ 2019-11-14 210



Register definition Meaningreg r0; A 1-bit unsigned register called r0.reg [ 3 : 0 ] r1; A 4-bit unsigned register vector called r1.reg [ 0 : 3 ] r2; A 4-bit unsigned register vector called r2.reg [ 4 : 1 ] r3; A 4-bit unsigned register vector called r3.reg signed [ 3 : 0 ] r4; A 4-bit signed register vector called r4.reg r5 [ 7 : 0 ]; A 1-bit, 8-entry register memory called r5.reg [ 3 : 0 ] r6 [ 7 : 0 ]; A 4-bit, 8-entry register vector memory called r6.

Figure 5.13: Several different examples of register definition.

input wirewire

orreg

output wireor

output regwire

inout wire

wire

Figure 5.14: Connection rules for internal and external registers and wires.

5.3.1 Registers

Verilog allows the definition of registers which, in contrast to wires, retain their value even when not beingcontinuously driven; while a wire is continuously updated with whatever value drives it, a register is onlyupdated when a new value is assigned to it. In this sense, registers are more directly analogous to variables inC than wires are. Definition of registers is essentially the same as definition of wires: one simply replaces thewire keyword with reg.

Figure 5.13 describes several examples, with the first five acting as register versions of those previouslyexplained in relation to wires in Figure 5.2.

5.3.1.1 Interfacing rules

The introduction of registers adds some complication in terms of inputs and outputs to and from modules. Inshort, only certain types can be used in each circumstance. Figure 5.14 describes these rules, which one canthink of as analogous to rules for casting and coercion in C. The diagram tries to capture two rules, which inmore human-readable terms are as follows:

1. Any input port must be a wire internally, even though externally it can be a wire or register.

Put another way, the module must be pessimistic about how it is used: the module cannot rely onwhatever instantiates it using a register as input, which means there can never be a problem with itfalsely assuming the input will maintain state.

2. Any output port must be a wire externally, even though internally it can be a wire or register.

Put another way, whatever instantiates the module must be pessimistic about how it is implemented. Itcannot rely on use of a register in said implementation, which means there can never be a problem withit falsely assuming the input will maintain state.

Within this description, we have introduced a new type of port, marked with the inout keyword, which canact as both an input and output; use of this feature is not discussed further.

5.3.1.2 Register arrays, or memories

Although one might describe a register vector (i.e., the register version of a wire vector) as an array of registers,this is not a perfect analogy. In particular, this is because to access elements in a register (or wire) vector via thesubscript operator, we need to supply a static index. For example, given a register vector r we might accessit via r[ 0 ] where the index 0 is static (i.e., is constant, so does not change). Notice the same is not truein C array subscripts for example: we might define an array A and then access it via A[ i ] with i perhapsrepresenting a loop counter.

The reason Verilog is restrictive in this case is down to efficiency: using static indexes we only every relabelindividual registers (or wires) via the subscript, concatenation and replication operators. However, it alsosupports the use of components which act like more traditional arrays; it terms them memories. The reason

git # ba293a0e @ 2019-11-14 211



for this nomenclature is that their concrete implementation typically requires an actual RAM as described inChapter ??: this lifts the restriction wrt. static indexes, but implies a larger, heavy-weight component.

The number of elements in (resp. range of indexes used to access) the memory is declared after the memoryidentifier: this is illustrated by the final two cases in Figure 5.13 which define the memories r5 and r6. Theformer, for example, has eight entries each of which is a 1-bit register; the latter also has either entries but eachis now a 4-bit register vector. There are some constraints on where and when one can reference elements insuch memories, but roughly speaking the syntax follows that of register vectors so that the expression

r5[ 2 ]

selects element 2, a 1-bit register, from memory r5while the expression

r6[ 2 ]

selects element 2, a 4-bit register, from memory r6. If we have some x whose value is 3'b2 then we are nowallowed to write

r6[ x ]

and get the same result as in the first case: we again select element 2, a 4-bit register, from memory r6 due tothe value provided by x. Finally, we can also combine the two forms to select individual wires from the result.For example, the expression

r6[ 2 ][ 1 : 0 ]

selects element 2 from r6 as above, then extracts bits 1 and 0 from this element to finally form a 2-bit registervector as the result.

5.3.2 Processes

Within a module definition, behavioural statements must exist within a Verilog process: they cannot be usedoutside a process, and other forms of Verilog cannot be used inside a process. A process can be thought ofas a parallel thread of execution. To rationalise this, consider using an RTL-style approach to drive valuesonto two wires via two continuous assignments. These assignments are always being evaluated, and alsoexecuting at the same time. How would one cope with this using a behavioural approach? In short, we splitour behavioural statements into two separate groups, or processes, that each deal with updating one of thevalues. The processes will execute in parallel with each other just like the continuous assignments, they simplyuse behavioural statements rather than RTL-style constructs to implement the required functionality.

Figure 5.15 illustrates two types of Verilog process marked with the keywords always and initial; in bothcases the process are named with the identifier id, but this is optional (for reference, it allows the process toexist within the naming hierarchy outlined in Section 5.2.5.4). An always process is like infinite loop: executionof the content starts at the top, continues through the body and then restarts again at the top. An initialprocess is executed just once; the goal is to initialise the module when powered on.

However, this behaviour is limited in the sense that we cannot control when a process executes. Thatis, these processes either execute just once or continually; we might ideally wish for more control, havingthem execute as the result of some event for example. In particular, having studied sequential logic and statemachines in Chapter 2, it seems sane to trigger execution of a process using a positive or negative clock edgeor level; the idea is that the process waits until such an event occurs, then executes.

This is achieved via specification of a sensitivity list, which can be added to an always process; the conceptmakes no sense for an initial process since it only ever executes once.

1. Figure 5.16a has no sensitivity list; in a sense it is untriggered because it simply executes without waitingfor an event. There are situations where this can be useful, but usually it is considered bad design practice:we expand on this issue later when discussing timing and delay.

2. Figure 5.16b has a sensitivity list that includes the unannotated signal x. Whenever x changes, fromanything to anything, the process is triggered and subsequently executes.

Note that this is the only mechanism which makes sense if x is not a 1-bit wire or register: positive andnegative edges only make sense for single wires or register, so if x is a wire or register vector we can onlyreason about overall (rather than per-element) change.

3. Figure 5.16c has a sensitivity list that includes the signalx annotated with theposedgekeyword. Wheneverthere is a positive edge on x (i.e., it changes from 0 to 1), the process is triggered and subsequently executes.

4. Figure 5.16d has a sensitivity list that includes the signal x annotated with the negedge keyword. When-ever there is a negative edge on x (i.e., it changes from 1 to 0), the process is triggered and subsequentlyexecutes.

git # ba293a0e @ 2019-11-14 212



always begin:id...

end

(a) An always process.

initial begin:id...

end

(b) An initial process.

Figure 5.15: (Incomplete) examples of Verilog process types.

always begin:id...

end

(a) An “untriggered” always process: normally bad prac-tise!

always @ ( x ) begin...

end

(b) An always process triggered by any change in x.

always @ ( posedge x ) begin...

end

(c) An always process triggered by a positive edge (i.e.,change from 0 to 1) on x.

always @ ( negedge x ) begin...

end

(d) An always process triggered by a negative edge (i.e.,change from 1 to 0) on x.

Figure 5.16: (Incomplete) examples of Verilog always processes with associated sensitivity lists.

begin:id...

end

(a) A sequential block.

fork:id...

join

(b) A parallel block.

Figure 5.17: (Incomplete) examples of Verilog block types.

beginx = 10;y = 20 + x;

end

(a) A blocking assignment.

beginx <= 10;y <= 20 + x;

end

(b) A non-blocking assignment.

Figure 5.18: (Incomplete) examples of Verilog procedural assignments.

git # ba293a0e @ 2019-11-14 213



module dff( input wire en,

input wire D,output wire Q );

reg t;

assign Q = t;

always @ ( posedge en ) begint = D;

end

endmodule

Figure 5.19: A D-type flip-flop, implemented using behavioural Verilog.

Note that these example include only one signal in the sensitivity list (i.e., x), but more generally can includea comma separated list of signals; the module is triggered by associated changes in any signal in the list. Forexample, we might specify that a process is triggered when either some x or y changes.

5.3.3 Statement blocks (or groups)

Within a process, we specify blocks of statements delineated using keywords that act in a similar way to the {and } braces in C. Figure 5.17 illustrates two types of block:

• Figure 5.17a describes a sequential block of statements using the begin and end keywords. Eachstatement in the block is executed in sequence, any specification of delay is relative to the previousstatement; execution of the block is complete when the last statement in the block completes execution.

• Figure 5.17b describes a parallel block of statements using the fork and join keywords. Each statementin the block is executed in parallel, any specification of delay is relative to the start of the block; executionof the block is complete when the statement that takes the longest to execute competes execution.

As with the processes in Figure 5.15 both cases the process are named with the identifier id, but this is optional.Also note that like C, where we can omit { and } when they surround single-line statements, we can omit thestart and end of block keywords in certain circumstances. From here on we try to avoid this where possible,even though the result may be less aesthetically pleasing.

5.3.4 Statements

Within a block, which must in turn exist within a process, we can write a sequence of statements: the statementsare executed, in sequence, when encompassing process is triggered.

The syntax and semantics of behavioural Verilog statements starts to get more complicated; specific exam-ples are not really enough to explain their general form and function. As such, we first try to explain eachstatement generally, using syntax placeholders where necessary, and then give some specific examples. Eachplaceholder shows where a particular syntactic construct should go; we are then free to fill them in with anyconstruct of that particular type. Specifically,

• regi is the i-th register, for example x or y,

• wirei is the i-th wire, for example x or y,

• literali is the i-th literal value, for example 1'b0 or 2'd3,

• expressioni is the i-th expression composed from wire or register operands (and vectors thereof) andoperators per Figure 5.10, for example x | y or x + y,

• statementi is the i-th statement, for example x = y + z;,

• the continuation dots . . . are intended to save space by indicating repetition of a particular construct.

Note the alternate font use to typeset placeholders versus actual Verilog source code, and that we omit thesubscript i when only one instance of the associated type exists.

git # ba293a0e @ 2019-11-14 214



module slb_8bit( output wire [ 7 : 0 ] r,input wire [ 7 : 0 ] x,input wire [ 2 : 0 ] y );

wire [ 7 : 0 ] x0 = x;

wire [ 7 : 0 ] x1 = y[ 0 ] ? { x0[ 6 : 0 ], 1'b0 } : x0;wire [ 7 : 0 ] x2 = y[ 1 ] ? { x1[ 5 : 0 ], 2'b0 } : x1;wire [ 7 : 0 ] x3 = y[ 2 ] ? { x2[ 3 : 0 ], 4'b0 } : x2;

assign r = x3;

endmodule

(a) A combinatorial logarithmic shifter.

module slb_8bit( input wire adv,

output wire [ 7 : 0 ] r,input wire [ 7 : 0 ] x,input wire [ 2 : 0 ] y );

reg [ 7 : 0 ] r1a, r2a, r3a;reg [ 2 : 0 ] r1b, r2b, r3b;

wire [ 7 : 0 ] x0 = x;wire [ 2 : 0 ] y0 = y;

wire [ 7 : 0 ] t1 = y0[ 0 ] ? { x0[ 6 : 0 ], 1'b0 } : x0;wire [ 7 : 0 ] t2 = r1b[ 1 ] ? { r1a[ 5 : 0 ], 2'b0 } : r1a;wire [ 7 : 0 ] t3 = r2b[ 2 ] ? { r2a[ 3 : 0 ], 4'b0 } : r2a;

assign r = r3a;

always @ ( posedge adv ) beginr3a = t3;r2a = t2; r2b = r1b;r1a = t1; r1b = y0;

end

endmodule

(b) A pipelined logarithmic shifter with 3 stages.

Figure 5.20: A logarithmic shifter, implemented using behavioural Verilog.

5.3.4.1 Procedural assignment

The most simple, but perhaps most useful statement is the procedural assignment: as the name suggests, itevaluates some RHS expression and assigns the value to an LHS expression. However, and despite their nameand syntactic appearance, it is crucial to keep in mind that procedural assignments are fundamentally differentfrom continuous assignments. A simple example will illustrate this fact:

assign wire = expression;always @ ( posedge clk ) beginregister = expression;

end

The (left-hand) RTL continuous assignment is being continuously executed; any change in the RHS provokesan update to the LHS. Although the (right-hand) procedural assignment looks similar, there are some keydifferences. First, notice that the assignment is within an encompassing block which is, in turn, within anencompassing process. This means the assignment is only executed when the processes is triggered (i.e.,thereis a positive edge on clk) and control-flow reaches the assignment. The implication is that the RHS mightchange at various points in time, but the LHS is only ever updated when the assignment is executed. Second,while continuous assignments must use a a wire or wire vector as their LHS, the LHS of a procedural assignmentmust be a register or register vector. Set within the description above, this should make sense: if statements areexecuted in sequence, the LHS must retain the value assigned to it so that value can be used later.

Even this brief introduction to procedural assignment goes quite a way toward allowing useful examplesof behavioural Verilog. Recall that a D-type flip-flop is a component that can store 1-bit values; it has an outputQ which represents the currently stored value an input D which is used to set the value when a positive edgeis driven onto the enable signal en. Figure 5.19 replicates this interface and implements the required behaviourusing two processes. First, we define a 1-bit register called twhich is used to represent the value stored withinthe flip-flop; a continuous assignment connects this to the output Q, meaning whatever value is assigned to t

git # ba293a0e @ 2019-11-14 215



can be “seen” externally on Q. The value of t is initialised using an initial process: this executes just once,setting t to zero. The behaviour of the flip-flop is modelled by an always process triggered by en, the enablesignal: whenever there is a positive edge one en, statements in the process execute in sequence. In fact, thereis only one statement which updates twith the input D.

Further, Figure 5.20 shows two Verilog implementations of the logarithmic shifter design introduced inChapter 2, more specifically in Figure 2.46. Recall that the goal is to left-shift some input x by a distance of ybits; Chapter 2 used this as an example of how one can form a pipeline, by describing a design that had threestages (one for each pair of shift and multiplexer). While Figure 5.20a shows an unpiplined implementation asa combinatorial module, Figure 5.20b uses the behavioural Verilog covered to pipeline the design as describedpreviously. The implementation can be viewed roughly in three parts:

• The first part is a series of register and wire definitions; the registers r1, r2 and r3 and similar act as thepipeline registers. r1 holds the intermediate value being shifted for example.

• The second part represents the pipeline stages, in the sense that each one of the continuous assignmentstakes input from one of the (sets of) previous pipeline register(s), and computes an output.

• The third part is an always process: when a positive edge is detected on adv, this triggers the block ofprocedural assignments which act to advance the pipeline. This is achieved by taking the output of agiven stage and storing it into the subsequent pipeline register(s).

Moving on from these examples, more generally there are two flavours of procedural assignment termedblocking and non-blocking. A blocking assignment is said to block subsequent statements in the same block:they only execute once it has. In contrast, a non-blocking assignments allow subsequent statements to executeat the same time as it. For example, consider Figure 5.18.

• Figure 5.18a uses two blocking assignments; the second assignment to yonly executes once the assignmentto x is complete. This means that when the assignment to y executes, zwill definitely have been assignedthe value 10(10).

• Figure 5.18b uses two non-blocking assignments; the assignments to x and y execute at the same time.

Their use requires some care. In Figure 5.18b for example, we can no longer guarantee that xwill definitely havebeen assigned the value 10(10) before the assignment to y is executed; this means y potentially gets assigned adifferent (or in this case wrong) value.

5.3.4.2 Conditional statements

Execution of a statement only occurs if control-flow reaches it; we can alter control-flow, for example skippinga statement, by utilising a similar set of conditional statements as exist in C.

if statements The if statement is the most obvious example. It takes the form of a sequence of clauses, eachincluding a condition expression and an associated statement. For example, in

if ( expression ) beginstatement;

end

a single clause associates expressionwith statement. Note that for single-line statements there is no need to usethe block keywords begin and end. To decide whether the statement in a clause is executed, we need evaluatethe associated condition. Verilog interprets any non-zero result to be true, and zero to be false; although notformally zero or non-zero, the unknown and high impedance values are treated as zero. This means, forexample, that

• 1'b0 and 4'b0000 are interpreted as false since both are zero, i.e., all bits are equal to zero,

• 1'b1 and 4'b0010 are interpreted as true since both are non-zero, i.e., at least one bit is not equal to zero,

• 1'bX and 4'b00X0 are interpreted as false since the unknown bit is interpreted as zero,

• 4'b01X0 is interpreted as true since although the unknown bit is interpreted as zero, there is still at leastone bit not equal to zero,

• 1'bZ and 4'b00Z0 are interpreted as false since the high impedance bit is interpreted as zero,

• 4'b01Z0 is interpreted as true since although the high impedance bit is interpreted as zero, there is stillat least one bit not equal to zero.

git # ba293a0e @ 2019-11-14 216



Armed with this, execution of the if statement itself simply means considering each clause in turn. If evaluationof the condition produces true as a result, the associated statement is executed and the if statement as a wholeis complete; otherwise, the statement is not executed (or skipped) and we move onto the next clause. Thismeans that although more than one condition may evaluate to true, the fact we stop considering clauses onceone is satisfied means at most one statement is executed. In the example above therefore, since there is onlyone clause, we evaluate condition and if the result is non-zero we execute statement, otherwise we skip it.

We can add at most one, but optionally none as above, default clauses as per the following example:

if ( expression ) beginstatement0;

end else beginstatement1;

end

Now we have two clauses, i.e.,

1. an explicit condition expression associated with statement0, and

2. a default condition associated with statement1

but essentially the same approach applies in terms of execution. That is, expression is first evaluated: if theresult is non-zero we execute statement0. Otherwise, i.e., if evaluation of expression produced a zero result,we continue with the next clause: since the default condition is the same as having an explicit condition thatalways evaluates to true, statement1 is executed.

Clearly one of the clause statements can, in turn, be an if statement; this is sometimes called a nested if.For example we might write

if ( expression0 ) beginif ( expression1 ) beginstatement0;

endend else beginstatement1;

end

to specify an “outer” if statement with two clauses, and an “inner” if statement with one. Likewise, we canextend the number of causes in a single if statement; this is sometimes called a cascaded if:

if ( expression0 ) beginstatement0;

end else if ( expression1 ) beginstatement1;

end else beginstatement2;

end

There are three clauses:

1. an explicit condition expression0 associated with statement0,

2. an explicit condition expression1 associated with statement1, and

3. a default condition associated with statement2.

As above, expression0 is evaluated first: if the result is non-zero, statement0 is executed. Otherwise wecontinue with the next clause by evaluating expression1: if the result is non-zero, statement1 is executed.Finally, if neither expression0 nor expression1 evaluate to non-zero results then we reach the default condition,which is always satisfied, and therefore statement2 is executed.

case statements if statements can become hard to understand and maintain when the number and complex-ity of clauses grows. To combat this, Verilog includes a case statement which is similar in form and functionto a C switch statement.

A single expression selects between one of several clauses. The condition is first evaluated, and then theresult is matched against literal values provided in each clause; the matching process is applied to the clausesin turn, and performed element-wise on the result and values. For example,

• 1'b0matches 1'b0 but not 1'b1,

• 1'b1matches 1'b1 but not 1'b0,

• 2'b01matches 2'b01 because all (i.e., both) the elements match, but not 2'b00, 2'b10, or 2'b11 becausethere is a mismatch in at least one element for each case.

git # ba293a0e @ 2019-11-14 217



module traffic( input wire clk,

input wire rst,

output reg Mr,output reg Ma,output reg Mg,output reg Ar,output reg Aa,output reg Ag );

reg [ 2 : 0 ] Q;

initial beginQ = 0;

end

always @ ( posedge clk ) begincase( Q )3'd0 : Q = rst ? 3'd6 : 3'd1;3'd1 : Q = rst ? 3'd6 : 3'd2;3'd2 : Q = rst ? 3'd6 : 3'd3;3'd3 : Q = rst ? 3'd6 : 3'd4;3'd4 : Q = rst ? 3'd6 : 3'd5;3'd5 : Q = rst ? 3'd6 : 3'd0;3'd6 : Q = rst ? 3'd6 : 3'd0;default : Q = 3'd0;

endcaseend

always @ ( Q ) begincase( Q )3'd0 : begin Mr = 0; Ma = 0; Mg = 1; Ar = 1; Aa = 0; Ag = 0; end3'd1 : begin Mr = 0; Ma = 1; Mg = 0; Ar = 1; Aa = 0; Ag = 0; end3'd2 : begin Mr = 1; Ma = 0; Mg = 0; Ar = 0; Aa = 1; Ag = 0; end3'd3 : begin Mr = 1; Ma = 0; Mg = 0; Ar = 0; Aa = 0; Ag = 1; end3'd4 : begin Mr = 1; Ma = 0; Mg = 0; Ar = 0; Aa = 1; Ag = 0; end3'd5 : begin Mr = 0; Ma = 1; Mg = 0; Ar = 1; Aa = 0; Ag = 0; end3'd6 : begin Mr = 1; Ma = 0; Mg = 0; Ar = 1; Aa = 0; Ag = 0; enddefault : begin Mr = 0; Ma = 0; Mg = 0; Ar = 0; Aa = 0; Ag = 0; end

endcaseend

endmodule

Figure 5.21: An implementation of the traffic light controller from Chapter 2.

git # ba293a0e @ 2019-11-14 218



Equalx y r0 0 10 1 01 0 01 1 1X 0 0X 1 0X X 1X Z 0Z 0 0Z 1 0Z X 0Z Z 1

(a) A case statement.


(b) A casex statement.


(c) A casez statement.

Figure 5.22: Matching (or equality comparison) where one or more inputs are the unknown or high impedance values.

If there is a match between result and literal, the associated statement is executed, and the case statement as awhole is complete. It is invalid to include the same value in different clauses of the same case statement; thisimplies that at most one statement is executed, because only one clause can be satisfied.

For example, in the following

case( expression )literal0 : beginstatement0;

endliteral1 : beginstatement1;

endendcase

two clauses are selected between by expression:

1. an explicit value literal0 associated with statement0, and

2. an explicit value literal1 associated with statement1.

Note that for single-line statements there is no need to use the block keywords begin and end, and that unlikeC there is no need to “break” out of a clause. expression is first evaluated: if the result matches literal0 thenstatement0 is executed, if it matches literal1 then statement1 is executed, otherwise no statement is executed.

We can add at most one, but optionally none as above, default clauses as per the following example:

case( expression )literal0 : beginstatement0;

endliteral1 : beginstatement1;

enddefault : beginstatement2;

endendcase

three clauses are selected between by expression:

1. an explicit value literal0 associated with statement0,

2. an explicit value literal1 associated with statement1, and

3. a default value associated with statement2.

Again, expression is first evaluated: if the result matches literal0 then statement0 is executed, if it matchesliteral1 then statement1 is executed, otherwise statement2 is executed because the default value is like a“wildcard” that matches anything.

As with if, the unknown and high impedance values need to be considered when matching. In shortthese values match themselves only, but variants of case alter this behaviours. In particular, casex treats

git # ba293a0e @ 2019-11-14 219



the unknown and high impedance values as don’t care, meaning a match with them is true; casez does thesame thing but for high impedance only. Figure 5.22 expands on this in detail; the rules are still performedelement-wise, so for example

• using case, 4'b0110matches 4'b0110 but not 4'b0100, 4'b00X0 or 4'b00Z0,

• using casex, 4'b0110matches 4'b0110, 4'b00X0 and 4'b00Z0 but not 4'b0100,

• using casez, 4'b0110matches 4'b0110, and 4'b00Z0 but not 4'b0100 or 4'b00X0.

As an example of the case statement in action, recall the traffic light controller developed previously inChapter 2. Figure 5.21 sketches an implementation in roughly three parts:

1. a 3-bit register vector called Qwhich represents the state of the FSM and which is initialised to zero usingan initial process,

2. an always process which models δ, the transition function, and

3. an always process which models ω, the output function.

The clk signal is used to update the FSM: every positive edge of clk triggers the first always process. It updatesthe state, using a case statement, by assigning a new value to Q based on the current value. For example, ifthe current value of Qmatches 3'd1 then the new value will be 3'd2. Every time Q changes, the second alwaysprocess is triggered, and the outputs (i.e., the registers representing the traffic lights) and set accordingly; againa case statement is used to do this.

5.3.4.3 Iteration statements

A loop statement is associated with a body statement; the aim is basically to repeatedly execute (or iterate over)the body, with the rule which determines how many repetitions (or iterations) depending on the loop type. Ina sense, an always process offers some form of iteration but it is unsuitable for the same task in that it is toocoarse-grain: we want a set of loop statements similar to the do, while and for loops available in C.

forever statements The most basic Verilog iteration statement is an infinite loop which simply iteratesforever. In software, this behaviour might be considered undesirable; usually when a loop fails to terminate,a bug is the likely cause. In hardware on the other hand, components often behave like this: for example akeyboard operates in an infinite loop, processing key presses until it is eventually powered-off.

The statement is written using the forever keyword followed a single-line statement or block of statementsas follows:

forever beginstatement;

end

In common with all iteration statements in Verilog, there is no way to exit from the loop prematurely; in C onemight “break” out of the loop, but there is no equivalent in Verilog.

repeat statements The next step, represented by the repeat statement, allows a bounded, fixed number ofiterations. For example, the loop

repeat ( literal ) beginstatement;

end

is controlled by literal: this is the fixed number of iterations performed, so if we write

repeat ( 32 ) beginstatement;

end

then the loop body, i.e., statement, is executed thirty two times.

git # ba293a0e @ 2019-11-14 220



while statements In a sense, the iteration count used by a repeat statement is an expression: we simplyconstrain what form the expression can take, forcing it to be a literal value. If we relax this constrain and allowthe expression to be more general, we get a loop which can iterate an unbounded number of times.

For example, execution of the while loop

while ( expression ) beginstatement;

end

proceeds as follows:

• at the start of each iteration, expression is evaluated; if the result is zero then the loop terminates, i.e.,execution of the loop is complete,

• otherwise the loop body, in this case statement, is executed and we start again for another iteration.

for statements Finally, the most complicated but also most expressive loop mirrors the C for loop. Inaddition to the loop body, use of a Verilog for loop requires an initialiser statement (e.g., to set a loop counterto zero), a condition expression (e.g., to test whether the counter is below some limit), and an update statement(e.g., to increment the counter). These are illustrated by

for ( statement0; expression; statement1 ) beginstatement2;

end

which, in a sense, is just a short-hand for

beginstatement0;

while ( expression ) beginstatement2;statement1;

endend

As such, execution of the loop proceeds as follows:

• before execution, statement0 (the initialiser statement) is executed,

• at the start of each iteration, expression (the condition expression) is evaluated; if the result is zero thenthe loop terminates, i.e., execution of the loop is complete,

• otherwise the loop body, in this case statement2, is executed followed by statement1 (the update state-ment) and we start again for another iteration.

Returning to a previous example, the following

for ( i = 0; i < 32; i = i + 1 ) beginstatement;

end

executes statement thirty two times. We start by initailising a counter i (a previously defined register) to zero;at the start of each iteration we check whether i is less than thirty two and execute statement if so. At the endof each iteration, we update i by adding one to it. This means i ranges from zero to thirty one, making thirtytwo iterations in total.

Note that unlike the equivalent statement in C, we cannot use the pre- or post-increment or decrementoperators in Verilog: we are forced to update i via the assignment i = i + 1 instead.

5.3.4.4 Timing and delays

Timing control We saw previously how attaching regular delays to continuous assignments modelled prop-agation delay; the basic idea was to specify the length of time a change in the RHS took to affect a change inthe LHS. A similar concept can be applied to behavioural statements; the syntax is similar, but the semanticsare slightly different. Consider the following two cases:

• Within a sequential block such as

begin#10 statement0;#20 statement1;#30 statement2;

end

git # ba293a0e @ 2019-11-14 221



associating a regular delay associated with a statement specifies it should execute a prescribed numberof time units after the previous one finishes. Here for example,

– statement0 executes 10 time units after the start of the block,

– statement1 executes 20 time units after statement0 finishes (i.e., 30 time units after the start of theblock),

– statement2 executes 30 time units after statement1 finishes (i.e., 50 time units after the start of theblock).

• Within a parallel block such as

fork#10 statement0;#20 statement1;#30 statement2;

join

associating a regular delay associated with a statement specifies it should execute a prescribed numberof time units after the start of the block. Here for example,

– statement0 executes 10 time units after the start of the block,

– statement1 executes 20 time units after the start of the block, (i.e., 10 times units after statement0finishes)

– statement2 executes 30 time units after the start of the block, (i.e., 10 times units after statement1finishes).

In common with the semantics of regular delay of continuous assignments, this essentially defers execution ofthe entire assignment statement for some period of time. Trying to work out how timing delays interact withnon-blocking forms of procedural assignment is difficult, and generally it is better not to mix the two forms(without good reason).

In contrast to regular delays, intra-assignment delay defers update of the LHS. Taking the previous assign-ment and moving the timing control to the RHS, we get

beginreg0 = #10 expression0;reg1 = #20 expression1;reg2 = #30 expression2;

end

Each assignment executes immediately after the previous statement completes: each RHS is evaluated straightaway. However, the corresponding LHS is only updated after the corresponding delay; focusing on the firstassignment for example, the LHS a is assigned to 10 time units after the statement is executed (and the RHS isevaluated).

wait statements In Section 5.3.2 we introduced the notion of a sensitivity list whereby execution of a processcould be triggered based on some event. However, this is a coarse-grained mechanism: it allows us to triggerentire processes only, rather than individual statements for example. Verilog does include a more fine-grainedanalogue however; a wait statement

wait ( expression ) beginstatement;

end

basically blocks (or pauses) execution until expression evaluates to a non-zero result, at which point statementis executed. In a sense therefore, a wait statement triggers execution of statement based on an event capturedby expression.

Preventing uncontrolled iteration In Section 5.3.2, we described an always process as an infinite loop: theassociated block of statements was described as being executed (or iterated over) again and again. Now wehave introduced enough to be more specific about the claim that an “untriggered” always process is normallybad practise: if there is no trigger and also no delay within the associated block, we are sort of saying that everyiteration happens at the same time (since the process itself does not wait). The same is true of all the iterationstatements (infinite loop or otherwise) described in Section 5.3.4.3: if there is no delay within the loop body,we are sort of saying that every iteration happens at the same time.

In both cases, this behaviour is usually undesirable. Fortunately, the solution can be implemented byfollowing some simple rules:

git # ba293a0e @ 2019-11-14 222



• for an always process, make sure there is either an associated sensitivity list (so the process waits untiltriggered rather than iterating immediately) or some form of delay within the associated block (soexecution does not complete instantaneously), and

• for an iteration statement, make sure there is some form of delay within the associated loop body (soexecution does not complete instantaneously).

5.3.5 Tasks and functions

We have already drawn analogies between modules in Verilog and functions in C, but also tried to show howthey are fundamentally different. Even so, it is attractive to have something like a function or sub-routine sothat we can define and reuse common or shared functionality rather than replicating it wherever required; inVerilog, tasks and functions fill this role.

Tasks and functions are both defined using a similar syntax, which in turn is similar to a module definition(if one replaces module and endmodulewith the appropriate keywords) and includes

1. a name identifier which we can refer to it by,

2. an interface, meaning the inputs and outputs, and

3. the body which determines the behaviour.

Both are local to (i.e., can only be used by) the module they are defined in, both can access all wires and registersdefined within said module and can define their own wires and registers (and vectors thereof). Despite thesesimilarities, some major differences then constrain when and how each construct can be used:

Tasks Functions

Functions are pure in the sense that they cannot haveany side effects; they are typically used to representcombinatorial circuits. Specifically, they

• are defined using the function andendfunction keywords,

• can only invoke other functions,

• can be invoked as part of an expression,

• must have at least one input and no (explicit)outputs because there is always one implicitoutput,

• cannot contain delays.

Tasks represent more arbitrary functionality; they aretypically used to represent combinatorial or sequen-tial circuits, in a sense acting as local modules thatcapture some behaviour. Specifically, they

• are defined using the task and endtask key-words,

• can invoke other tasks and other functions,

• cannot be invoked as part of an expression(rather invocation is more like a statement),

• any number of inputs and outputs,

• can contain fairly arbitrary statements includ-ing delays.

5.3.5.1 Functions

Figure 5.23 includes three example Verilog functions; keep in mind that really, these would need to be definedwithin some module.

A good example is Figure 5.23c: for each i-th bit for 0 ≤ i < 8, the function computes the majority of inputsx, y and z. That is, in an element-wise manner it decides whether two or more of x, y and z have their i-th bitset to one (for each i). Notice that the function interface explicitly names the inputs x, y and z and their type:these are each 8-bit inputs. However the output is implicitly called majority (to match the function identifier);the type of this output is specified inline with the identifier. Contrast this, where majority produces an 8-bitoutput, with Figure 5.23a where there is no type, meaning that parity produces a 1-bit output. The functionbody in this case includes just one statement: an assignment to majority, the output.

After definition, a function can be invoked much like one would in C, e.g., as the RHS of a continuousassignment

assign r = parity( x );

or to form part of an expression as per Figure 5.24a; in either case, the function represents a combinatorialcircuit.

git # ba293a0e @ 2019-11-14 223



function parity;input [ 7 : 0 ] x;

beginparity = x[ 0 ] ^ x[ 1 ] ^ x[ 2 ] ^ x[ 3 ] ^

x[ 4 ] ^ x[ 5 ] ^ x[ 6 ] ^ x[ 7 ] ;end

endfunction

(a) An implementation of the parity function which computes the number of bits in x set to one.

function [ 31 : 0 ] byteswap;input [ 31 : 0 ] x;

beginbyteswap = { x[ 7 : 0 ], x[ 15 : 8 ],

x[ 23 : 16 ], x[ 31 : 24 ] };end

endfunction

(b) An implementation of the byteswap function which swaps the order of the four bytes in x.

function [ 7 : 0 ] majority;input [ 7 : 0 ] x;input [ 7 : 0 ] y;input [ 7 : 0 ] z;

beginmajority = ( x & y ) | ( y & z ) | ( x & z );

endendfunction

(c) An implementation of the majority function which computes, in an element-wise manner, whether two or bits of x, y and z are setto one.

Figure 5.23: Some example Verilog functions.

task parity_check;output r;input [ 7 : 0 ] x;input y;

beginif ( parity( x ) == y ) beginr = 1;

end else beginr = 0;

endend

endtask

(a) An implementation of the parity_check task which uses the parity function to test whether the parity of x is y.

task apply_test;input ci;input x;input y;

begint_ci = ci;t_x = x;t_y = y;

#10 $display( "%b %b %b %b %b", t_co, t_s, t_ci, t_x, t_y );end

endtask

(b) An implementation of the apply_test task which takes several inputs, copies their value into some global registers and thendisplays said global registers.

Figure 5.24: Some example Verilog tasks.

git # ba293a0e @ 2019-11-14 224



5.3.5.2 Tasks

Figure 5.24 includes two example Verilog tasks; keep in mind that really, these would need to be defined withinsome module.

In contrast to the functions above, when there is an output (e.g., the parity_check task in Figure 5.24a) thisis named explicitly in the interface. Unlike a function, this means a given task must be invoked as a statementrather than form part of an expression. For example, a (contrived) use of parity_check within some blockmight read as follows (assuming definitions of r0 through to r3):

beginparity_check( r0, 8'b00000000 , 1'b0 );parity_check( r1, 8'b00000001 , 1'b0 );parity_check( r2, 8'b00000011 , 1'b0 );parity_check( r3, 8'b00000111 , 1'b0 );

end

In each line, the shared description of behaviour is reused: first to check whether the parity of 8'b00000000 is1'b0, then whether the parity of 8'b00000001 is 1'b0 and so on.

5.4 Effective development

Like most forms of programming, effective hardware design using Verilog demands more than just knowledgeof the language syntax and semantics. An often quoted maxim is that to learn programming one has to doprogramming rather than have it taught; the implication is that without practical experience, many pitfalls andintricacies will be masked.

With this in mind the following Section attempts to cover useful features in the Verilog language, and moregeneral techniques, that can help bridge the gap towards writing practical and maintainable models.

5.4.1 A rough guide to simulation

A Verilog simulator is simply a software tool; it “executes” an abstract HDL model for us before the costlyprocess of producing a concrete implementation. That is, the simulator takes a HDL model and, given someinput, tries to work out what happens in the corresponding circuit at each point in time. Using it, we caninspect the outputs (and also any intermediate values within the circuit) computed, and work out whetherour model is in fact correct, i.e., whether it has functional “bugs” that would prevent the concrete hardwareworking as expected.

In a very rough sense, simulators can be classified in terms of how they achieve their goal:

Cycle-based simulators do not work out behaviour in a detailed way, but rather just try to work out what thestate is at regular intervals in time. For example, if a value changes numerous times in one interval thisis not visible: only the value at the end of the interval is worked out.

Event-based simulators maintain an internal clock that keeps track of time, but works out the state each timean event is triggered. Each time a value is changed somewhere (e.g., an input), the simulator works outwhat impact this has; this means that every intermediate change is visible.

Both classes can be subdivided into so-called compiler-based simulators that first translate the HDL model intoa platform dependent internal format, or interpreter-based simulators that look at the HDL model line-by-line;usually the former is more efficient, i.e., completes a given simulation faster, than the latter.

5.4.2 System tasks and functions

Since a Verilog simulator is simply some software, it executes on some host platform, e.g., your workstation.It is often attractive to allow access to resources on said platform by the Verilog model being simulated. Forexample it might be useful to allow a model to display messages, accept input from the user, or access thefile system to load or store data used for testing. Of course the resource will not necessarily be present whenthe model is implemented in concrete hardware, but until then the facility can be of massive benefit to theunderlying goals of development and functional verification.

The concept of a system task (resp. system function) acts as an interface between the model and thesimulator. You can think of a system task (resp. function) as being like a user defined task (resp. function) interms of how it is invoked within the model, but the implementation is provided “behind the scenes” by thesimulator rather than the model itself.

On one hand, their availability and function depends somewhat on the exact simulator used; a reasonableanalogy is that of an operating system that manages access to some resource, via a system call, on behalf of

git # ba293a0e @ 2019-11-14 225



a user process. On the other hand, Verilog standards includes a number of system tasks and functions whichshould always be available. A useful selection of these standard system tasks are discussed below; note thatthe identifiers of each system task or function is prefixed by a dollar character to distinguish them from userdefined identifiers.

5.4.2.1 The display task

The display system task is similar to printf in C; the idea is that a programmer invokes the task by passing ita list of arguments and a “format string”. Inside the format string are a number of format specifiers which arefilled, in order, by the arguments; each specifier details how the corresponding argument should be formatted(e.g. in decimal, with leading zeros and so on). The format string is therefore translated by display into theresult that is written to the simulator console.

Although there are several format specifiers, three are most useful:

• %s, translates an argument into a string,

• %b, translates an argument into a binary string,

• %d, translates an argument into a decimal string, and

• %h, translates an argument into a hexadecimal string.

For example, the following (contrived) block of statements

beginx = 10;y = 20;

$display( "x is %b in binary, y is %d in decimal", x, y );end

uses the %b format specifier to translate the argument x into a binary string, and %d to translate y into a decimalstring; the result “x is 1010 in binary, y is 20 in decimal” is printed to the simulator console meaning we canverify x and y have the expected values.

5.4.2.2 The monitor task

Use of display can quickly get cumbersome; often we want to just write the value whenever it changes ratherthan have to manually embed invocations of displaywhere we think it might have changed. The monitor taskoffers a simple way to achieve this. The syntax is similar to a display task, but rather than being executed justonce it is executed whenever one of the arguments changes.

Thus, using the block

begin$monitor( "x is %b in binary, y is %d in decimal", x, y );

$monitoron;x = 10;y = 20;$monitoroff;

x = 30;y = 40;

end

we would expect two messages to be printed to the console: one each after the first assignments to x and y.The second assignments to x and y do not trigger anything to be printed because the monitoroff system taskdisables the mechanism (having initially been enabled by monitoron).

5.4.2.3 The random function

The random system function provides a source of (pseudo-) random numbers by returning 32 bits of randomoutput each time it is invoked; it takes an optional argument (a “seed”) that initialises this process. For example,we might set some register x to a random value as follows:

beginx = $random;

end

git # ba293a0e @ 2019-11-14 226



Instantiation Meaningfa t0( w0, w1, w2, w3, w4 ); Unnamed ports.fa t1( .co( w0 ), .s( w1 ), .ci( w2 ), .x( w3 ), .y( w4 ) ); Named ports, normal order.fa t2( .co( w0 ), .s( w1 ), .ci( w2 ), .y( w4 ), .x( w3 ) ); Named ports, alternate order.fa t3( .s( w1 ), .ci( w2 ), .x( w3 ), .y( w4 ) ); Named ports, co omitted.

Figure 5.25: Using named port lists to instantiate a full-adder cell.

5.4.2.4 The stop and finish tasks

It can be useful for the model being simulated to control the simulation. The finish and stop system tasks dothis in two similar but complementary ways:

• stop instructs the simulator to pause in order to accept input from the user. For example, if an errorcondition occurs during simulation this is useful since it allows the user to manually inspect (andpotentially restart) the simulation.

• finish terminates the simulation (and probably the simulator itself).

5.4.3 Named port lists

As modules get more complex, the number of input and output ports will typically grow; this can presentproblems that extend beyond finding a meaningful name for each port. First, it will start to become difficultto reason about which port is which. This means connecting an external wire or register to the wrong port ismore likely, and changes to the module (e.g., adding an extra port) are harder. Second, it is quite common towant to omit specific connections to a module because they are irrelevant somehow functionality they provide.For example, if we have a 1-bit, 4-way multiplexer but only use three of the four inputs then we might want toomit one of them (i.e., simply leave it unconnected).

The concept of a named port list can solve both of problems at the same time. Consider having to instantiatethe full-adder module named fa and as described in Section 5.2. Figure 5.25 shows four different instanciations:

• In the first case, we adopt the standard approach of specifying connections via their order: each externalwire is connected to the corresponding entry in the port list defined by the module being instanciated.

As such, w0 is connected to the first entry in the module port list (which is co), w0 to the second entry(which is s), and so on.

• The second case is equivalent to the first, although the syntax now differs. The “dot” notation nowspecifies connections by name: each external wire (in the braces) is connected to a named entry (after thedot) in the port list.

For example, .co( w0 ) specifies that the external wire w0 should be connected to the port named co.

• The third case demonstrates that by specifying connections by name, we can reorder the connections butstill specify the same result.

Specifically, entries for the ports named x and y are in a different order than the second case. Theinstanciation is still equivalent however: each external wire is still connected to the same port as before.

• The fourth case demonstrates that by specifying connections by name, we can omit one (or more) withoutimpact on the rest.

In this case, a connection to the co, or carry-out, port is omitted; you can think of this as simply ignoringthe output since the result is not used. We could not do the same thing using the ordering approachhowever: there must be a first entry in our list of external wires, so this will always be connected to co bydefault.

5.4.4 The Verilog pre-processor

Compilation of C programs typically includes some pre-processing; this may be invoked implicitly by thecompiler, e.g., gcc, rather than explicitly, e.g., as cpp. The idea is that directives in the source code control thepre-processor; in a sense the pre-processor “executes” the directives in order to manipulate or translate thesource code before it is fed to the compiler.

Although less sophisticated than the C pre-processor, Verilog supports a similar idea through directives;note that unlike C where each directive is prefixed by the hash character, in Verilog a quote character is usedfor the same purpose.

git # ba293a0e @ 2019-11-14 227



`define N 8

module mux2_nbit( output wire [ `N - 1 : 0 ] r,input wire c,input wire [ `N - 1 : 0 ] x,input wire [ `N - 1 : 0 ] y );


endmodule

Figure 5.26: Implementation of an N-bit, 2-way multiplexer where the pre-processor defines a symbol N.


ìfdef GATESwire w0, w1, w2;

not t0( w0, c );

and t1( w1, x, w0 );and t2( w2, y, c );

or t3( r, w1, w2 );èlseassign r = c ? y : x;

èndif

endmodule

Figure 5.27: Implementation of a 1-bit, 2-way multiplexer in either gate-level or RTL Verilog depending on whether thesymbol GATE is defined or not.

5.4.4.1 The include directive

Perhaps the most simple example is the include directive which allows inclusion of one Verilog source codefile in another. Usage is simple: we simple name the file which should be included, and the pre-processorinjects the content at that point before feeding the overall result to the simulator. For example we might use

ìnclude "constants.v"

to take the contents of the source code file constants.v, some globally defined constant values say, and injectit at this point in the current file. Note that this approach should be used with care: cyclic inclusion, whichforms an infinite loop, needs to be avoided for example.

5.4.4.2 The define directive

Literal values without an obvious meaning are sometimes termed “magic”: when used in some source code,it can be unclear why their specific value is chosen or what they actually mean. The define directive (partly)solves this problem by associating a symbolic name with some literal value; this works nearly the same way asin C in the sense that every use of the symbol is instead translated into the literal. The result can potentially bemuch easier to understand; for example the symbol PI is arguably more meaningful than the value 3.142.

Figure 5.26 presents an example: the symbol N is associated, in the first line, with the decimal literal 8. Notethat one can omit the literal value entirely, and simply specify that the symbol is defined: this can be usefullater when considering ifdef.

Unlike C, where we could use the symbol simply as N, in Verilog each use must be prefixed by a quote markcharacter, i.e., written as `N. In this case, the symbol is used within the module interface to dictate the width ofports r, x and y: for example, wire [ `N - 1 : 0 ] r describes an 8-bit wire called r because N is translatedinto 8. With this approach, we more or less implement an N-bit, 2-way multiplexer in the sense that the numberof bits can be changed by simply changing the definition (rather than the module itself).

Note that there is convention in C by which pre-processor symbols are given upper-case names to avoidclashes with other variables; in Verilog it can make sense to follow the same convention, as above, so thatpeople can understand the source code more easily.

git # ba293a0e @ 2019-11-14 228



5.4.4.3 The ifdef directive (and friends)

Conditional inclusion or exclusion of Verilog source code can be achieved using the ifdef and endif directives(and a range or relations such as ifndef, else and elseif). These work much like their counterparts in C,allowing sections of source code to be conditionally included for or excluded from use; a single source codedescription can be manipulated automatically into the exact source code used by the simulator, rather thanforce the developer to (un)comment sections by hand.

Figure 5.27 offers an example of this concept. In this case, the implementation style used for a 1-bit, 2-waymultiplexer module is controlled using GATE: if the symbol is defined then gate-level Verilog is used, otherwiseRTL Verilog is used. That is, if the symbol is defined (resp. not defined), then the pre-processor discards thebottom (resp. top) section of source code and retains the top (resp. bottom) section before feeding the result tothe simulator.

5.4.5 The timescale directive

Until now, any mention of time has been carefully described in terms of abstract time units. The timescaledirective is a means of specifying the concrete units of time used during simulation; this impacts on anyspecification of delay for example. For example,

`timescale 10ns / 1ns;

sets the reference time unit to 10ns, and the precision to 1ns. This mean that 1 time unit now has the concretemeaning 10ns, and 1ns is the smallest specifiable quantity of time to which any smaller are rounded. Thus,after the directive above, the previous example

begin#10 statement0;#20 statement1;#30 statement2;

end

means

• statement0 executes 10 · 10 = 100ns after the start of the block,

• statement1 executes 20 · 10 = 200ns after statement0 finishes,

• statement2 executes 30 · 10 = 300ns after statement1 finishes.

5.4.6 Module parameters

Although pre-processor directives are sufficient in many situations, in others they are not powerful enoughto achieve the desired result. For example, reconsider Figure 5.26 where we controlled the size of inputsto 2-input multiplexer using the symbol N: by redefining N via the pre-processor, we could resize the inputswithout altering the module definition itself. Although this is convenient up to a point, the disadvantage is thatthe definition is global: we cannot have one multiplexer with 4-bit inputs and one with 8-bit inputs becausethere is only one N, i.e., all multiplexers of this type have the same sized inputs.

The concept of module parameters solves this problem by allowing the declaration of a local parameterwhich can be redefined independently for each instance. Inside the module definition, the parameter keywordis used to declare a parameter with a default value; the parameter symbol can then be used freely within themodule itself (although clearly it cannot be assigned to). This is highlighted by Figure 5.28, which defines amodule called mux2_nbit using a parameters called N. Notice that the “inside” module definition style is usedhere so that the value of N can be used to control the size of r, x and y.

The major difference between using a parameter is that each instance can set it to a different value. Thisis again shown in Figure 5.28 where two modules called mux2_4bit and mux2_8bit are defined: each caseinstantiates mux2_nbit, naming the instance t, then sets the parameter Nwithin t using the defparam keyword.In the former case N is set to four meaning that r, x and y have a 4-bit size; in the latter case N is set to eightmeaning that r, x and y have a 8-bit size.

5.4.7 Generate statements

Consider (yet again) the task of writing a 4-bit, 2-way multiplexer. Having increased the size of r, x and y(either directly or using a parameter as above), the next task is to implement the required module behaviour.In this case, one approach would be to use RTL Verilog as in Figure 5.11b. Another approach would be tosimply use gate-level Verilog and instantiate four instances of a 1-bit, 2-way multiplexer in a similar way to

git # ba293a0e @ 2019-11-14 229



module mux2_nbit( r, c, x, y );

parameter N = 1;

output wire [ N - 1 : 0 ] r;input wire c;input wire [ N - 1 : 0 ] x;input wire [ N - 1 : 0 ] y;


endmodule


mux2_nbit t( r, c, x, y );

defparam t.N = 4;

endmodule


mux2_nbit t( r, c, x, y );

defparam t.N = 8;

endmodule

Figure 5.28: Implementation of an N-bit, 2-way multiplexer using module parameters, and instanciation as 4-bit, 2-wayand 8-bit, 2-way variants.


genvar i;

generatefor( i = 0; i < 4; i = i + 1 ) begin:idmux2_1bit t( r[ i ], c, x[ i ], y[ i ] );

endendgenerate

endmodule

Figure 5.29: Implementation of an 8-bit, 2-way multiplexer using a generate statement to instanciate four 1-bit, 2-waymultiplexer instances.

git # ba293a0e @ 2019-11-14 230



Figure 5.8b. Of course the clear disadvantage is that as the size of inputs and output grows, this becomesincreasingly laborious and prone to mistakes.

A Verilog generate statement is designed to automate this sort of task, programatically generating hardwareinstances for is. In a conceptually similar way to the pre-processor, the idea is that the generate statement is“executed” before the model is simulated: it expands into the components which are simulated, rather thanimplying components itself.

Here, the idea is that instead of writing four instantiations of the 1-bit, 2-way multiplexer by hand we let theVerilog tool-chain do the work for us. Figure 5.29 provides an example where the generate and endgeneratekeywords enclose a for statement. The tool-chain executes the for statement before simulation: the loop bodyis replicated with i, the loop counter defined using the genvar keyword, replaced by the associated value foreach iteration. So since i ranges from zero through to three, we get four copies ofmux2_1bit t( r[ i ], c, x[ i ], y[ i ] );

where i is replaced by zero through to three. The end result is that we get what we want: four instances of the1-bit, 2-way multiplexer where the i-th instance takes the i-th bit of x and y as input and produces the i-th bitof r as output. Put another way, this generate statement is equivalent to writingmux2_1bit t( r[ 0 ], c, x[ 0 ], y[ 0 ] );mux2_1bit t( r[ 1 ], c, x[ 1 ], y[ 1 ] );mux2_1bit t( r[ 2 ], c, x[ 2 ], y[ 2 ] );mux2_1bit t( r[ 3 ], c, x[ 3 ], y[ 3 ] );

One caveat to this equivalence is that in the above, we have named all the instances t: this would not be allowedwere we to write out the instantiations by hand, but the generate statement does allow it by automaticallyconstructing unique identifiers for each instance. More specifically, if as here the generate loop is labelledid, the instances are identified by id[ 0 ].t, id[ 1 ].t and so on. That is, the instances generated can beindexed, which is useful when constructing more complicated generated structure.

This is perhaps better explained by example: consider the 1-bit, 2-way multiplexer example again. Inisolation, matching the fragment above, one way to utilise the generate loop is as follows:wire c

wire [ 3 : 0 ] r;wire [ 3 : 0 ] x;wire [ 3 : 0 ] y;

genvar j;

generatefor( int j = 0; j < 4; j = j + 1 ) begin:g0mux2_1bit t( r[ j ], c, x[ j ], y[ j ] );

endendgenerate

That is, we first define connecting wire vectors and use j to index individual wires. An alternative would beas follows:wire c;

genvar i, j;

generatefor( int i = 0; i < 4; i = i + 1 ) begin:g0wire r, x, y;

endendgenerate

generatefor( int j = 0; j < 4; j = j + 1 ) begin:g1mux2_1bit t( g0[ j ].r, c, g0[ j ].x, g0[ j ].y );

endendgenerate

This time, we use an initial generate loop to define wires r, x and y; the difference is that each definition hasa unique identifier g0[ 0 ].x, g0[ 1 ].x and so on. This means in a second generate loop we can index theconnecting wires via this identifier using j, rather than as an index into what was previously a wire vector. Ofcourse the utility of one approach over the other depends on the context, but the key point is that either way,we are simply statically indexing named wires and hence specifying connections between components: oncethe pre-processed output is fed to the simulator, there is no fundamental difference between either of theseapproaches or the hand-written alternative.

5.4.8 Developing test stimuli

None of the Verilog modules presented thus far have been self-contained: they all include inputs and outputs,with the implication that they will be used by some external component. In a sense, any useful module will

git # ba293a0e @ 2019-11-14 231



module fa_test();

wire t_co, t_s;reg t_ci; t_x, t_y;

fa t( .co( t_co ), .s( t_s ), .ci( t_ci ), .x( t_x ), .y( t_y ) );

initial begin#10 t_ci = 1'b0; t_x = 1'b0; t_y = 1'b0;#10 $display( "co=%b s=%b ci=%b x=%b y=%b", t_co, t_s, t_ci, t_x, t_y );#10 t_ci = 1'b0; t_x = 1'b0; t_y = 1'b1;#10 $display( "co=%b s=%b ci=%b x=%b y=%b", t_co, t_s, t_ci, t_x, t_y );#10 t_ci = 1'b0; t_x = 1'b1; t_y = 1'b0;#10 $display( "co=%b s=%b ci=%b x=%b y=%b", t_co, t_s, t_ci, t_x, t_y );#10 t_ci = 1'b0; t_x = 1'b1; t_y = 1'b1;#10 $display( "co=%b s=%b ci=%b x=%b y=%b", t_co, t_s, t_ci, t_x, t_y );

#10 $finish;end

endmodule

(a) A test stimulus for fa that uses the display system task to print results explicitly.

module fa_test();



initial begin$monitor( "co=%b s=%b ci=%b x=%b y=%b", t_co, t_s, t_ci, t_x, t_y );

$monitoron;

#10 t_ci = 1'b0; t_x = 1'b0; t_y = 1'b0;#10 t_ci = 1'b0; t_x = 1'b0; t_y = 1'b1;#10 t_ci = 1'b0; t_x = 1'b1; t_y = 1'b0;#10 t_ci = 1'b0; t_x = 1'b1; t_y = 1'b1;

#10 $monitoroff;$finish;

end

endmodule

(b) A test stimulus for fa that uses the monitor system task to print results implicitly.

Figure 5.30: Two styles of test stimulus for a full-adder cell.

git # ba293a0e @ 2019-11-14 232



module fa_test();



wire o_co, o_s;

assign { o_co, o_s } = t_ci + t_x + t_y;

wire f = ( o_co == t_co ) &( o_s == t_s ) ;

initial begin$monitor( "f=%s ci=%b x=%b y=%b", f ? "pass" : "fail", t_ci, t_x, t_y );

$monitoron;

#10 t_ci = 1'b0; t_x = 1'b0; t_y = 1'b0;#10 t_ci = 1'b0; t_x = 1'b0; t_y = 1'b1;#10 t_ci = 1'b0; t_x = 1'b1; t_y = 1'b0;#10 t_ci = 1'b0; t_x = 1'b1; t_y = 1'b1;

#10 $monitoroff;$finish;

end

endmodule

Figure 5.31: A test stimulus for fa with integrated oracle (using the built-in Verilog plus operator).

module traffic_test();

reg t_clk;reg t_rst;

wire t_Mr, t_Ma, t_Mg;wire t_Ar, t_Aa, t_Ag;

traffic t( .clk( t_clk ), .rst( t_rst ), .Mr( t_Mr ), .Ar( t_Ar ),.Ma( t_Ma ), .Aa( t_Aa ),.Mg( t_Mg ), .Ag( t_Ag ) );

initial begint_clk = 1'b0;

end

always begin#1 t_clk = ~t_clk;

end

endmodule

Figure 5.32: A (partial) test stimulus for the traffic light FSM (which requires a clock signal)

look like this: if it has no inputs and produces no output as a side effect, we can replace it with an emptymodule and get the same result!

However modules without inputs or outputs, which are often termed top level modules in reference totheir location in a hierarchy of instances, are useful. Specifically, in order to describe a test stimulus as outlinedin Chapter ?? we need to write a module which acts as a container for an instance of whatever we are testing.In short, the role of the stimulus module is as a surrogate for the external component mentioned above: itprovides inputs to the DUT, and inspects the outputs to check they are as expected (e.g., match an oracle). Notethat the remit of the stimulus is usually limited to testing our Verilog model; we typically do not manufactureit for example, so there is usually some flexibility in terms of how we implement it. For example even if weuse a very detailed, gate-level description of the DUT, it usually makes sense to use a high-level approach todescribing the stimulus.

5.4.8.1 Providing input to and inspecting output from a DUT

Figure 5.30 includes two approaches to writing a Verilog test stimulus for a full-adder. Both take a similarapproach in the sense that they comprise roughly four parts:

1. definition of a register (or vector thereof) for each input to the DUT,

2. definition of a wire (or vector thereof) for each output from the DUT,

git # ba293a0e @ 2019-11-14 233



3. an instantiation of the DUT, connecting the input and output ports to the registers and wires above, and

4. a set of behavioural processes that stimulate the inputs to the DUT and inspect the outputs.

In Figure 5.30a for example, we define

1. registers t_ci, t_x and t_y,

2. wires t_co and t_s

3. define an instance of fa called t whose inputs ci, x and y and outputs co and s are connected to t_ci,t_x and t_y and t_co and t_s respectively, and

4. an initial process comprised of procedural assignments to registers t_ci, t_x and t_y, and invocationsof the display system task to inspect wires t_co and t_s.

Of course this is a reasonable starting point (for combinatorial circuits at least), but one can easily point at waysto improve it. For example, Figure 5.30b resolves the need for explicit calls to display by using the monitorsystem task: now, every time one of t_co, t_s, t_ci, t_x or t_y changes their values are displayed.

5.4.8.2 Using an oracle to automatically check for correct DUT behaviour

Another way to improve Figure 5.30a is to consider the use of an oracle as introduced in Chapter ??. The ideais to include a mechanism that automatically checks whether the output from a given test is correct or incorrectrather than forcing the user do this by inspection (and potentially make mistakes). In the case of our full-addermodule, this is reasonably straight-forward: we can use the built-in Verilog addition operator as an oracle.

Figure 5.31 demonstrates on way to achieve this. Notice that o_co and o_s are driven with the result ofadding t_ci, t_x and t_y using the built-in Verilog plus operator. These are compared with t_co and t_s(the outputs from the full-adder instance), to form f; the value of this wire therefore indicates whether thefull-adder is producing the output we expect for each test, since

f =

{1 if t_co = o_co and t_s = o_s, i.e., the test passed0 if t_co , o_co or t_s , o_s, i.e., the test failed

As such, the monitor output now highlights whether or not a given test passed or failed, rather than the outputst_co and t_s (although clearly these could be included as well, for example to assist debugging if a given testfails).

5.4.8.3 Generating a clock signal for use by a DUT

As with all other inputs, the stimulus must provide a clock signal for any module that requires one. This canbe achieved automatically by some simulators by “marking” a wire as a clock signal somehow: the simulatorthen manages generation the clock signal with a given period. However, it is reasonably simply to generateone using vanilla Verilog; one approach is demonstrated in Figure 5.32. The idea is for the stimulus to maintaina register called t_clk: this is set to zero by the initial process. The (untriggered) always process loops,updating t_clk. To avoid timing issues with such untriggered processes as outlined earlier, the proceduralassignment that updates t_clk includes a delay. This essentially means, in this case, t_clk is update (byinverting the value) every 10 time units: it toggles from 0 to 1 and back to 0 with a total clock period of 20 timeunits.

References

[1] D. Harris and S. Harris. Digital Design and Computer Architecture: From Gates to Processors. Morgan-Kaufmann, 2007. isbn: 0-123-70497-9.

[2] M.S. Malone. Infinite Loop: How Apple, the World’s Most Insanely Great Computer Company, Went Insane.1999 (see p. 193).

[3] S. Palnitkar. Verilog HDL: A Guide in Digital Design and Synthesis. 2nd ed. Prentice-Hall, 2003.

git # ba293a0e @ 2019-11-14 234



Part II

Appendices

git # ba293a0e @ 2019-11-14 235



APPENDIX

A

EXAMPLE EXAM-STYLE QUESTIONS

A.1 Chapter 1

Q1. a For the sets A = {1, 2, 3}, B = {3, 4, 5} andU = {1, 2, 3, 4, 5, 6, 7, 8}, compute the following:

i |A|.

ii A ∪ B.

iii A ∩ B.

iv A − B.

v A.

vi {x | 2 · x ∈ U}.

b For each of the following decimal integers, write down the 8-bit binary representation in sign-magnitudeand two’s-complement:

i +0.

ii −0.

iii +72.

iv −34.

v −8.

vi 240.

Q2. For some 32-bit integer x, explain what is meant by the Hamming weight of x; write a short C function tocompute the Hamming weight of a given 32-bit input.

Q3. a Write out a truth table for the Boolean function

f (a, b, c) = (a ∧ b ∧ ¬c) ∨ (a ∧ ¬b ∧ c) ∨ (¬a ∧ ¬b ∧ c),

then decide how many

i input combinations, and

ii outputs where f (a, b, c) = 1

exist in it.

git # ba293a0e @ 2019-11-14 237



b Consider the Boolean functionf (a, b, c, d) = ¬a ∧ b ∧ ¬c ∧ d.

Which of the following assignments

i a = 0, b = 0, c = 0 and d = 1,

ii a = 0, b = 1, c = 0 and d = 1,

iii a = 1, b = 1, c = 1 and d = 1,

iv a = 0, b = 0, c = 1 and d = 0.

produces the output f (a, b, c, d) = 1?

c Which of the following Boolean expressions

i (a ∨ b ∨ d) ∧ (¬c ∨ d),

ii (a ∧ b ∧ d) ∨ (¬c ∧ d),

iii (a ∨ b ∨ d) ∨ (¬c ∨ d).

is in Sum-of-Products (SoP) standard form?

d Identify each equivalence that is correct:

i a ∨ 1 ≡ a.

ii a ⊕ 1 ≡ ¬a.

iii a ∧ 1 ≡ a.

iv ¬(a ∧ b) ≡ ¬a ∨ ¬b.

e Identify each equivalence that is correct:

i ¬¬a ≡ a.

ii ¬(a ∧ b) ≡ ¬a ∨ ¬b.

iii ¬a ∧ b ≡ a ∧ ¬b.

iv ¬a ≡ a ⊕ a.

Q4. a The OR form of the null axiom is x ∨ 1 ≡ 1. Which of the following options

i x ∧ 1 ≡ 1,

ii x ∧ 0 ≡ 0,

iii x ∨ 0 ≡ 0,

iv x ∧ x ≡ x,

is the dual of this axiom?

b Given the Boolean equationf = ¬a ∧ ¬b ∨ ¬c ∨ ¬d ∨ ¬e,

which of the following

i ¬ f = a ∨ b ∨ c ∨ d ∨ e,

ii ¬ f = a ∧ b ∧ c ∧ d ∧ e,

iii ¬ f = a ∧ b ∧ (c ∨ d ∨ e),

iv ¬ f = a ∧ b ∨ ¬c ∨ ¬d ∨ ¬e,

v ¬ f = (a ∨ b) ∧ c ∧ d ∧ e

is correct?

c If we write the de Morgan axiom in English, which of the following

i NOR is equivalent to AND if each input to AND is complemented,

ii NAND is equivalent to OR if each input to OR is complemented,

git # ba293a0e @ 2019-11-14 238



iii AND is equivalent to NOR if each input to NOR is complemented, or

iv NOR is equivalent to NAND if each input to NAND is complemented.

describes the correct equivalence?

Q5. a Identify which one of these Boolean expressions

i c ∨ d ∨ e

ii ¬c ∧ ¬d ∧ ¬e

iii ¬a ∧ ¬b

iv ¬a ∧ ¬b ∧ ¬c ∧ ¬d ∧ ¬e

is the correct result of simplifying

(¬(a ∨ b) ∧ ¬(c ∨ d ∨ e)) ∨ ¬(a ∨ b).

b If you simplify the Boolean expression

(a ∨ b ∨ c) ∧ ¬(d ∨ e) ∨ (a ∨ b ∨ c) ∧ (d ∨ e)

into a form that contains the fewest operators possible, which of the following options

i a ∨ b ∨ c,

ii ¬a ∧ ¬b ∧ ¬c,

iii d ∨ e,

iv ¬d ∧ ¬e,

v none of the above

do you end up with and why?

c If you simplify the Boolean expression

a ∧ c ∨ c ∧ (¬a ∨ a ∧ b)

into a form that contains the fewest operators possible, which of the following options

i (b ∧ c) ∨ c,

ii c ∨ (a ∧ b ∧ c),

iii a ∧ c,

iv a ∨ (b ∧ c),

v none of the above

do you end up with and why?

d Consider the Boolean expression

a ∧ b ∨ a ∧ b ∧ c ∨ a ∧ b ∧ c ∧ d ∨ a ∧ b ∧ c ∧ d ∧ e ∨ a ∧ b ∧ c ∧ d ∧ e ∧ f .

Which of the following simplifications

i a ∧ b ∧ c ∧ d ∧ e ∧ f ,

ii a ∧ b ∨ c ∧ d ∨ e ∧ f ,

iii a ∨ b ∨ c ∨ d ∨ e ∨ f ,

iv a ∧ b,

v c ∧ d,

vi e ∧ f ,

vii a ∨ b ∧ (c ∨ d ∧ (e ∨ f ))

viii ((a ∨ b) ∧ c) ∨ d ∧ e ∨ f

git # ba293a0e @ 2019-11-14 239



is correct?

e Given the options

i 1,

ii 2,

iii 3,

iv 4,

decide which is the least number of operator required to compute the same result as

f (a, b, c) = (a ∧ b) ∨ a ∧ (a ∨ c) ∨ b ∧ (a ∨ c).

Show how you arrived at your decision.

f Prove that(¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) ≡ ¬x ∨ ¬y.

g Prove that(x ∧ y) ∨ (y ∧ z ∧ (y ∨ z)) ≡ y ∧ (x ∨ z).

A.2 Chapter 2

Q6. Write the simplest (i.e., with fewest operators) possible Boolean expression that implements the Booleanfunction

r = f (x, y, z)

described byf

x y z r0 0 0 00 0 1 10 1 0 10 1 1 ?1 0 0 11 0 1 01 1 0 ?1 1 1 1

where ? denotes don’t care.

Q7. Take the Boolean expression¬(x ∨ y)

and draw a gate-level circuit diagram that computes an equivalent resulting using only 2-input NAND gates.

Q8. Recall that an SR latch has two inputs S (or set) and R (or reset); if S = R = 1, the two outputs Q and ¬Q areundefined. This issue can be resolved by using a reset-dominate latch: the alternative design has the sameinputs and outputs, but resets the latch (i.e., has Q = 0 and ¬Q = 1) whenever S = R = 1.

Using a gate-level circuit diagram, describe how a reset-dominate latch can be implemented using onlyNOR gates and at most one AND gate.

Q9. The quality of the design for some hardware component is often judged by measuring efficiency, for examplehow quickly it can produce output on average. Name two other metrics that might be considered.

Q10. a Describe how N-type and P-type MOSFET transistors are constructed using silicon and how they operateas switches.

b Draw a diagram to show how N-type and P-type MOSFET transistors can be used to implement a NANDgate. Show your design works by describing the transistor states for each input combination.

git # ba293a0e @ 2019-11-14 240



Q11. The following diagram

x

y

r

Vss

Vdd

details a 2-input NAND gate comprised of two P-MOSFET transistors (top) and two N-MOSFET transistors(bottom). Draw a similar diagram for a 3-input NAND gate.

Q12. Moore’s Law predicts the number of CMOS-based transistors we can manufacture within a fixed sized areawill double roughly every two years; this is often interpreted as doubling computational efficiency over thesame period. Briefly explain two limits which mean this trend cannot be sustained indefinitely.

Q13. Given that ? is the don’t care state, consider the following truth table which describes a function p with fourinputs (a, b, c and d) and two outputs (e and f ):

pa b c d e f0 0 0 0 0 00 0 0 1 0 10 0 1 0 1 00 0 1 1 ? ?0 1 0 0 0 10 1 0 1 1 00 1 1 0 0 00 1 1 1 ? ?1 0 0 0 1 01 0 0 1 0 01 0 1 0 0 11 0 1 1 ? ?1 1 0 0 ? ?1 1 0 1 ? ?1 1 1 0 ? ?1 1 1 1 ? ?

a From the truth table above, write down the corresponding Sum of Products (SoP) equations for e and f .

b Simplify the two SoP equations so that they use the minimum number of logic gates possible. You canassume the two equations can share logic.

Q14. Using a Karnaugh map, derive a Boolean expression for the function

r = f (x, y, z)

git # ba293a0e @ 2019-11-14 241



described by the truth tablef

x y z r0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 01 0 1 11 1 0 01 1 1 ?

where ? denotes don’t care.

Q15. NAND is a universal logic gate in the sense that the behaviour of NOT, AND and OR gates can be implementedusing only NAND. Show how this is possible using a truth table to demonstrate your solution.

Q16. Both NAND and NOR gates are described as universal because any other Boolean gate (i.e., AND, OR, NOT)can be constructed using them. Imagine your friend suggests a 4-input, 1-bit multiplexer (that selects betweenfour 1-bit inputs using two 1-bit control signals to produce a 1-bit output) is also universal: state whether ornot you believe them, and explain why.

Q17. Consider the following circuit where the propagation delay of logic gates in the circuit are 10ns for NOT, 20nsfor AND, 20ns for OR and 60ns for XOR:

c

b

d

b

d

d

c

a

a

c

e

a Draw a Karnaugh map for this circuit and derive a Sum of Products (SoP) expression for the result.

b Describe advantages and disadvantages of your SoP expression and the dynamic behaviour it produces.

c If the circuit is used as combinatorial logic within a clocked system, what is the maximum clock speed ofthe system?

Q18. A game uses nine LEDs to display the result of rolling a six-sided dice; the i-th LED, say Li for 0 ≤ i < 9, isdriven with 1 or 0 to turn it on or off respectively. A 3-bit register D represents the dice as an unsigned integer.

a The LEDs are arranged as follows,

git # ba293a0e @ 2019-11-14 242



L0 L3 L6

L1 L4 L7

L2 L5 L8

and the required mapping between dice and LEDs, given a filled dot means an LED is on, is

D = 1 D = 2 D = 3 D = 4 D = 5 D = 6↓ ↓ ↓ ↓ ↓ ↓

Using Karnaugh maps as appropriate, write a simplified Boolean expression for each LED (i.e., for eachLi in terms of D).

b The 2-input XOR, AND, OR and NOT gates used to implement your expressions have propagation delaysof 40, 20, 20 and 10 nanoseconds respectively. Calculate how many times per-second the dice can berolled, i.e., D can be updated, if the LEDs are to provide the correct output.

c The results of individual dice throws will be summed using a ripple-carry adder circuit, to give a total;each 3-bit output D will be added to and stored in an n-bit accumulator register A.

i Using a high-level block diagram, show how an n-bit ripple-carry adder circuit is constructed fromfull-adder cells.

ii If m = 8 throws of the dice are to be summed, what value for n should be selected?

iii Imagine that instead of D, we want to add 2 · D to A. Doubling D can be achieved by computingeither D + D or D� 1 (i.e., a left-shift of D by 1 bit). Carefully state which method is preferable, andwhy.

Q19. Consider a simple component called C that compares two inputs x and y (both are unsigned 8-bit integers) inorder to produce their maximum and minimum as two outputs:

Cx

y

min(x, y)

max(x, y)

- -

?

?

Instances of C can be connected in a mesh to sort integers: the input is fed into the top and left-hand edges ofthe mesh, the sorted output appears on the bottom and right-hand edges. An example is given below:

git # ba293a0e @ 2019-11-14 243



7 6 5 4

7 C C C C 3

6 C C C C 2

2 C C C C 2

3 C C C C 1

5 2 4 1

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

6 5 4

5 3 3

2 2 2

3 2 2

6

5

5

5

3

3

4

4

4

3

2

2

a Using standard building blocks (e.g., adder, multiplexer etc.) rather than individual logic gates, draw ablock diagram that implements the component C.

b Imagine that an n× n mesh of components is created. Based on your design for C and clearly stating anyassumptions you need to make, write down an expression for the critical path of such a mesh.

c Algorithms for sorting integers can clearly be implemented on a general-purpose processor. Explain twoadvantages and two disadvantages of using such a processor versus using a mesh like that above.

Q20. Imagine you are working for a company developing the “Pee”, a portable games console. The user interface isa fancy controller that has

• three fire buttons represented by the 1-bit inputs F0, F1 and F2, and

• a 8-direction D-pad represented by the 3-bit input D

and you are charged with designing some aspects of it.

a The fire button inputs are described as level triggered and active high; explain what this means (incomparison to the alternatives in each case).

b Some customers want an “autofire” feature that will automatically and repeatedly press the F0 fire buttonfor them. The autofire can operate in four modes, selected by a switch called M: off (where the fire buttonF0 works as normal), slow, fast or very fast (where the fire button F0 is turned on and off repeatedly at theselected speed). Stating any assumptions and showing your working where appropriate, design a circuitthat implements such a feature.

c In an attempt to prevent counterfeiting, each controller can only be used with the console it was soldwith. This protocol is used:

P C

c $←− {0, 1}3

c−→

r = T(c)r←−

r ?= T(c)

which, in words, means that

• the console generates a random 3-bit number c and sends it to the controller,

• the controller computes a 3-bit result r = T(c) and sends it to the console,

• the console checks that r matches T(c) and assumes the controller is valid if so.

git # ba293a0e @ 2019-11-14 244



i There is some debate as to whether the protocol should be synchronous or asynchronous; explainwhat your recommendation would be and why.

ii The function T is simply a look-up table. For example

T(x) =

2 if x = 0 4 if x = 46 if x = 1 0 if x = 57 if x = 2 5 if x = 61 if x = 3 3 if x = 7

Each pair of console and controller has such a T fixed inside them during the manufacturing process.Stating any assumptions and showing your working where appropriate, explain how this T mightbe implemented as a circuit.

Q21. Imagine you have three Boolean values x, y, and z. Given access to as many AND and OR gates as you wantbut only two NOT gates, write a set of Boolean expressions to compute all three results ¬x, ¬y and ¬z.

Q22. SAT is the problem of finding an assignment to n Boolean variables which means a given Boolean expressionis satisfied, i.e., evaluates to 1. For example, given n = 3 and the expression

(x ∧ y) ∨ ¬z,

x = 1, y = 1, z = 0 is one assignment (amongst several) which solves the associated SAT problem.The ability to solve SAT can be used to test whether or not two n-input, 1-output combinatorial circuits C1

and C2 are equivalent. Show how this is possible.

Q23. Consider the following combinatorial circuit, which is the composition of four parts (labelled A, B, C and REG):each part is annotated with a name and an associated critical path. The circuit computes an output r = f (x)from the corresponding input x.

A10ns

B30ns

C20ns

REG10ns

x r = f (x)

With respect to this circuit,

a first define the terms latency and throughput, then

b explain how and why you would expect use of pipelining to influence both metrics.

Q24. The figure below shows a block of combinatorial logic built from seven parts; the name and latency of eachpart is displayed inside it. Note that the last part is a register which stores the result:

A40ns

B10ns

C30ns

D10ns

E50ns

F10ns

REG10ns

x r = f (x)

It is proposed to pipeline the block of logic using two stages such that there is a pipeline register in betweenparts D and E:

A40ns

B10ns

C30ns

D10ns

REG10ns

E50ns

F10ns

REG10ns

x r = f (x)

a Explain the terms latency and throughput in relation to the idea of pipelining.

b Calculate the overall latency and throughput of the initial circuit described above.

c Calculate the overall latency and throughput of the circuit after the proposed change.

d Calculate the number of extra pipeline registers required to maximise the circuit throughput; state thisnew throughput and the associated latency. Explain the advantages and disadvantages of this change.

This is a (large) set of example Boolean minimisation questions: each asks you to transform some truth tabledescribing an n-input Boolean function into a Boolean expression. Each solution includes

git # ba293a0e @ 2019-11-14 245



1. a reference implementation (produced by forming a SoP expression with a full term for each minterm,i.e., row where r = 1), and

2. a Karnaugh map annotated with sensible groups, and an optimised implementation based on thosegroups.

The goal is to focus on producing the latter, since the former is somewhat easier. Keep in mind and take carewrt. the following:

• There are 22nBoolean functions with n inputs (or 32n

if you include don’t-care as a valid output); whereasfor small n a complete set of functions is included, but for large n there is only a random sub-set.

• No real effort is made to order the questions, and only minor effort to avoid duplicates. That said, thereshould be no trivial (in the sense r = 1 or r = 0 for all inputs, e.g., tautological) cases.

• The questions and solutions are generated automatically, meaning a small but real chance of bugs in theassociated implementation!

2-input problems, without don’t-care outputs

Q25.y z r0 0 10 1 01 0 11 1 1

Q26.y z r0 0 10 1 11 0 01 1 1

Q27.y z r0 0 10 1 01 0 01 1 0

Q28.y z r0 0 00 1 11 0 11 1 0

Q29.y z r0 0 10 1 01 0 11 1 0

Q30.y z r0 0 00 1 01 0 01 1 1

git # ba293a0e @ 2019-11-14 246



Q31.y z r0 0 00 1 01 0 11 1 1

Q32.y z r0 0 10 1 01 0 01 1 1

Q33.y z r0 0 00 1 11 0 01 1 0

Q34.y z r0 0 00 1 01 0 11 1 0

Q35.y z r0 0 00 1 11 0 01 1 1

Q36.y z r0 0 10 1 11 0 11 1 0

Q37.y z r0 0 00 1 11 0 11 1 1

Q38.y z r0 0 10 1 11 0 01 1 0

git # ba293a0e @ 2019-11-14 247



2-input problems, with don’t-care outputs

Q39.y z r0 0 00 1 ?1 0 01 1 1

Q40.y z r0 0 00 1 11 0 11 1 ?

Q41.y z r0 0 10 1 ?1 0 ?1 1 0

Q42.y z r0 0 00 1 ?1 0 ?1 1 1

Q43.y z r0 0 00 1 11 0 ?1 1 1

Q44.y z r0 0 10 1 ?1 0 11 1 0

Q45.y z r0 0 10 1 01 0 01 1 ?

Q46.y z r0 0 ?0 1 11 0 01 1 1

git # ba293a0e @ 2019-11-14 248



Q47.y z r0 0 00 1 01 0 11 1 ?

Q48.y z r0 0 00 1 ?1 0 11 1 0

Q49.y z r0 0 00 1 11 0 11 1 1

Q50.y z r0 0 10 1 11 0 ?1 1 0

Q51.y z r0 0 10 1 ?1 0 01 1 1

Q52.y z r0 0 10 1 ?1 0 01 1 ?

Q53.y z r0 0 00 1 01 0 01 1 1

Q54.y z r0 0 00 1 01 0 ?1 1 1

git # ba293a0e @ 2019-11-14 249



Q55.y z r0 0 ?0 1 11 0 01 1 ?

Q56.y z r0 0 00 1 11 0 ?1 1 0

Q57.y z r0 0 ?0 1 ?1 0 01 1 1

Q58.y z r0 0 10 1 01 0 11 1 0

Q59.y z r0 0 00 1 11 0 01 1 0

Q60.y z r0 0 00 1 11 0 11 1 0

Q61.y z r0 0 ?0 1 ?1 0 11 1 0

Q62.y z r0 0 ?0 1 01 0 11 1 ?

git # ba293a0e @ 2019-11-14 250



Q63.y z r0 0 10 1 11 0 11 1 0

Q64.y z r0 0 10 1 11 0 01 1 0

Q65.y z r0 0 10 1 01 0 01 1 1

Q66.y z r0 0 00 1 01 0 11 1 0

Q67.y z r0 0 00 1 ?1 0 11 1 ?


Q68.x y z r0 0 0 10 0 1 00 1 0 00 1 1 01 0 0 11 0 1 01 1 0 01 1 1 0

Q69.x y z r0 0 0 10 0 1 10 1 0 00 1 1 01 0 0 01 0 1 11 1 0 01 1 1 1

git # ba293a0e @ 2019-11-14 251



Q70.x y z r0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 01 1 1 0

Q71.x y z r0 0 0 10 0 1 00 1 0 10 1 1 01 0 0 01 0 1 01 1 0 01 1 1 0

Q72.x y z r0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 01 1 0 11 1 1 1

Q73.x y z r0 0 0 00 0 1 00 1 0 00 1 1 01 0 0 11 0 1 01 1 0 11 1 1 0

Q74.x y z r0 0 0 00 0 1 10 1 0 10 1 1 01 0 0 01 0 1 11 1 0 01 1 1 0

git # ba293a0e @ 2019-11-14 252



Q75.x y z r0 0 0 00 0 1 10 1 0 00 1 1 11 0 0 11 0 1 01 1 0 01 1 1 1

Q76.x y z r0 0 0 00 0 1 10 1 0 10 1 1 01 0 0 01 0 1 11 1 0 11 1 1 0

Q77.x y z r0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 1

Q78.x y z r0 0 0 00 0 1 00 1 0 00 1 1 11 0 0 01 0 1 11 1 0 01 1 1 1

Q79.x y z r0 0 0 10 0 1 00 1 0 00 1 1 11 0 0 01 0 1 01 1 0 01 1 1 1

git # ba293a0e @ 2019-11-14 253



Q80.x y z r0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 01 0 1 01 1 0 01 1 1 0

Q81.x y z r0 0 0 10 0 1 10 1 0 00 1 1 01 0 0 11 0 1 11 1 0 01 1 1 1

Q82.x y z r0 0 0 00 0 1 10 1 0 00 1 1 01 0 0 01 0 1 11 1 0 01 1 1 0

Q83.x y z r0 0 0 10 0 1 10 1 0 10 1 1 11 0 0 01 0 1 01 1 0 01 1 1 1

Q84.x y z r0 0 0 00 0 1 00 1 0 00 1 1 01 0 0 11 0 1 01 1 0 01 1 1 1

git # ba293a0e @ 2019-11-14 254



Q85.x y z r0 0 0 10 0 1 00 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

Q86.x y z r0 0 0 00 0 1 10 1 0 00 1 1 01 0 0 01 0 1 01 1 0 11 1 1 1

Q87.x y z r0 0 0 10 0 1 00 1 0 00 1 1 11 0 0 11 0 1 11 1 0 01 1 1 0

Q88.x y z r0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 11 1 0 01 1 1 0

Q89.x y z r0 0 0 10 0 1 00 1 0 10 1 1 01 0 0 11 0 1 11 1 0 01 1 1 0

git # ba293a0e @ 2019-11-14 255



Q90.x y z r0 0 0 00 0 1 10 1 0 00 1 1 01 0 0 11 0 1 11 1 0 11 1 1 1

Q91.x y z r0 0 0 00 0 1 10 1 0 00 1 1 01 0 0 01 0 1 11 1 0 11 1 1 1

Q92.x y z r0 0 0 00 0 1 00 1 0 10 1 1 01 0 0 11 0 1 11 1 0 11 1 1 1

Q93.x y z r0 0 0 10 0 1 10 1 0 10 1 1 11 0 0 01 0 1 01 1 0 01 1 1 0

Q94.x y z r0 0 0 00 0 1 10 1 0 00 1 1 01 0 0 01 0 1 01 1 0 01 1 1 0

git # ba293a0e @ 2019-11-14 256



Q95.x y z r0 0 0 10 0 1 00 1 0 00 1 1 01 0 0 11 0 1 11 1 0 01 1 1 1

Q96.x y z r0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 11 0 1 01 1 0 11 1 1 1

Q97.x y z r0 0 0 00 0 1 10 1 0 00 1 1 01 0 0 01 0 1 01 1 0 11 1 1 0

Q98.x y z r0 0 0 10 0 1 00 1 0 10 1 1 01 0 0 01 0 1 11 1 0 11 1 1 1

Q99.x y z r0 0 0 00 0 1 00 1 0 10 1 1 01 0 0 01 0 1 11 1 0 01 1 1 1

git # ba293a0e @ 2019-11-14 257



Q100.x y z r0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 11 0 1 11 1 0 01 1 1 1

Q101.x y z r0 0 0 00 0 1 10 1 0 10 1 1 01 0 0 01 0 1 01 1 0 01 1 1 1

Q102.x y z r0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 01 1 1 1

Q103.x y z r0 0 0 00 0 1 00 1 0 10 1 1 01 0 0 11 0 1 11 1 0 01 1 1 0

Q104.x y z r0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 01 1 1 0

git # ba293a0e @ 2019-11-14 258



Q105.x y z r0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 11 0 1 11 1 0 01 1 1 0

Q106.x y z r0 0 0 10 0 1 10 1 0 10 1 1 11 0 0 01 0 1 11 1 0 01 1 1 0

Q107.x y z r0 0 0 10 0 1 10 1 0 00 1 1 01 0 0 11 0 1 01 1 0 11 1 1 0

Q108.x y z r0 0 0 00 0 1 00 1 0 00 1 1 11 0 0 11 0 1 11 1 0 01 1 1 0

Q109.x y z r0 0 0 10 0 1 00 1 0 00 1 1 01 0 0 11 0 1 11 1 0 11 1 1 1

git # ba293a0e @ 2019-11-14 259



Q110.x y z r0 0 0 00 0 1 00 1 0 10 1 1 01 0 0 01 0 1 01 1 0 01 1 1 1

Q111.x y z r0 0 0 00 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1

Q112.x y z r0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0

Q113.x y z r0 0 0 10 0 1 00 1 0 00 1 1 11 0 0 01 0 1 11 1 0 11 1 1 1

Q114.x y z r0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 01 0 1 11 1 0 01 1 1 1

git # ba293a0e @ 2019-11-14 260



Q115.x y z r0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 01 1 1 1

Q116.x y z r0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 01 1 1 1

Q117.x y z r0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 01 0 1 11 1 0 01 1 1 0


Q118.x y z r0 0 0 ?0 0 1 ?0 1 0 00 1 1 ?1 0 0 01 0 1 11 1 0 11 1 1 0

Q119.x y z r0 0 0 00 0 1 10 1 0 00 1 1 ?1 0 0 ?1 0 1 ?1 1 0 ?1 1 1 1

git # ba293a0e @ 2019-11-14 261



Q120.x y z r0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 11 0 1 01 1 0 ?1 1 1 0

Q121.x y z r0 0 0 ?0 0 1 00 1 0 ?0 1 1 ?1 0 0 11 0 1 ?1 1 0 ?1 1 1 ?

Q122.x y z r0 0 0 00 0 1 ?0 1 0 ?0 1 1 01 0 0 ?1 0 1 11 1 0 11 1 1 1

Q123.x y z r0 0 0 ?0 0 1 00 1 0 ?0 1 1 11 0 0 ?1 0 1 11 1 0 ?1 1 1 1

Q124.x y z r0 0 0 10 0 1 00 1 0 ?0 1 1 ?1 0 0 11 0 1 01 1 0 01 1 1 0

git # ba293a0e @ 2019-11-14 262



Q125.x y z r0 0 0 00 0 1 00 1 0 ?0 1 1 ?1 0 0 ?1 0 1 ?1 1 0 01 1 1 1

Q126.x y z r0 0 0 00 0 1 10 1 0 ?0 1 1 01 0 0 ?1 0 1 ?1 1 0 11 1 1 1

Q127.x y z r0 0 0 ?0 0 1 10 1 0 00 1 1 11 0 0 ?1 0 1 01 1 0 ?1 1 1 ?

Q128.x y z r0 0 0 00 0 1 00 1 0 00 1 1 11 0 0 ?1 0 1 01 1 0 01 1 1 0

Q129.x y z r0 0 0 00 0 1 10 1 0 10 1 1 ?1 0 0 ?1 0 1 ?1 1 0 ?1 1 1 1

git # ba293a0e @ 2019-11-14 263



Q130.x y z r0 0 0 ?0 0 1 10 1 0 ?0 1 1 11 0 0 01 0 1 ?1 1 0 01 1 1 ?

Q131.x y z r0 0 0 00 0 1 ?0 1 0 10 1 1 01 0 0 11 0 1 11 1 0 11 1 1 ?

Q132.x y z r0 0 0 00 0 1 ?0 1 0 ?0 1 1 11 0 0 01 0 1 ?1 1 0 01 1 1 ?

Q133.x y z r0 0 0 00 0 1 00 1 0 10 1 1 01 0 0 01 0 1 01 1 0 ?1 1 1 0

Q134.x y z r0 0 0 ?0 0 1 00 1 0 00 1 1 ?1 0 0 ?1 0 1 11 1 0 11 1 1 0

git # ba293a0e @ 2019-11-14 264



Q135.x y z r0 0 0 00 0 1 10 1 0 10 1 1 ?1 0 0 11 0 1 01 1 0 11 1 1 0

Q136.x y z r0 0 0 ?0 0 1 ?0 1 0 ?0 1 1 ?1 0 0 ?1 0 1 01 1 0 11 1 1 1

Q137.x y z r0 0 0 00 0 1 00 1 0 10 1 1 11 0 0 ?1 0 1 11 1 0 01 1 1 1

Q138.x y z r0 0 0 10 0 1 ?0 1 0 ?0 1 1 01 0 0 ?1 0 1 01 1 0 01 1 1 0

Q139.x y z r0 0 0 10 0 1 10 1 0 ?0 1 1 ?1 0 0 01 0 1 01 1 0 ?1 1 1 1

git # ba293a0e @ 2019-11-14 265



Q140.x y z r0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 ?1 0 1 11 1 0 ?1 1 1 ?

Q141.x y z r0 0 0 00 0 1 ?0 1 0 ?0 1 1 01 0 0 11 0 1 ?1 1 0 01 1 1 1

Q142.x y z r0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 ?1 0 1 01 1 0 ?1 1 1 1

Q143.x y z r0 0 0 10 0 1 ?0 1 0 ?0 1 1 01 0 0 ?1 0 1 11 1 0 ?1 1 1 ?

Q144.x y z r0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 01 0 1 11 1 0 ?1 1 1 ?

git # ba293a0e @ 2019-11-14 266



Q145.x y z r0 0 0 00 0 1 10 1 0 10 1 1 01 0 0 11 0 1 ?1 1 0 01 1 1 1

Q146.x y z r0 0 0 ?0 0 1 ?0 1 0 ?0 1 1 ?1 0 0 11 0 1 11 1 0 01 1 1 1

Q147.x y z r0 0 0 ?0 0 1 00 1 0 10 1 1 01 0 0 01 0 1 ?1 1 0 ?1 1 1 1

Q148.x y z r0 0 0 ?0 0 1 ?0 1 0 10 1 1 11 0 0 ?1 0 1 11 1 0 ?1 1 1 0

Q149.x y z r0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 ?1 0 1 01 1 0 01 1 1 1

git # ba293a0e @ 2019-11-14 267



Q150.x y z r0 0 0 00 0 1 00 1 0 00 1 1 ?1 0 0 01 0 1 11 1 0 11 1 1 0

Q151.x y z r0 0 0 ?0 0 1 10 1 0 00 1 1 01 0 0 ?1 0 1 ?1 1 0 11 1 1 0

Q152.x y z r0 0 0 10 0 1 10 1 0 ?0 1 1 11 0 0 ?1 0 1 01 1 0 ?1 1 1 0

Q153.x y z r0 0 0 10 0 1 00 1 0 10 1 1 ?1 0 0 ?1 0 1 11 1 0 ?1 1 1 1

Q154.x y z r0 0 0 ?0 0 1 ?0 1 0 ?0 1 1 11 0 0 11 0 1 01 1 0 ?1 1 1 1

git # ba293a0e @ 2019-11-14 268



Q155.x y z r0 0 0 ?0 0 1 00 1 0 ?0 1 1 11 0 0 01 0 1 01 1 0 01 1 1 0

Q156.x y z r0 0 0 ?0 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 ?1 1 1 1

Q157.x y z r0 0 0 ?0 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 01 1 1 1

Q158.x y z r0 0 0 ?0 0 1 ?0 1 0 00 1 1 11 0 0 11 0 1 01 1 0 ?1 1 1 0

Q159.x y z r0 0 0 00 0 1 ?0 1 0 10 1 1 01 0 0 11 0 1 01 1 0 ?1 1 1 ?

git # ba293a0e @ 2019-11-14 269



Q160.x y z r0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 ?1 0 1 ?1 1 0 01 1 1 0

Q161.x y z r0 0 0 10 0 1 00 1 0 ?0 1 1 ?1 0 0 ?1 0 1 01 1 0 ?1 1 1 ?


Q162.w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 1

Q163.w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 270



Q164.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 01 1 1 1 0

Q165.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 01 1 1 1 1

Q166.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 1

git # ba293a0e @ 2019-11-14 271



Q167.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

Q168.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 0

Q169.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 0

git # ba293a0e @ 2019-11-14 272



Q170.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 01 1 1 1 0

Q171.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 11 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 0

Q172.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 10 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 11 1 0 1 01 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 273



Q173.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 1

Q174.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 0

Q175.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 0

git # ba293a0e @ 2019-11-14 274



Q176.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 1

Q177.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 01 1 1 1 1

Q178.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 0

git # ba293a0e @ 2019-11-14 275



Q179.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 1

Q180.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 0

Q181.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 0

git # ba293a0e @ 2019-11-14 276



Q182.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 11 1 1 1 1

Q183.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 1

Q184.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 0

git # ba293a0e @ 2019-11-14 277



Q185.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 0

Q186.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 11 1 1 1 0

Q187.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 0

git # ba293a0e @ 2019-11-14 278



Q188.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 01 1 1 1 0

Q189.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 1

Q190.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 279



Q191.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 0

Q192.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 1

Q193.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 280



Q194.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 1

Q195.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 11 1 0 1 01 1 1 0 01 1 1 1 1

Q196.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 1

git # ba293a0e @ 2019-11-14 281



Q197.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 1

Q198.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 1

Q199.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 11 1 1 1 1

git # ba293a0e @ 2019-11-14 282



Q200.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 1

Q201.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 01 1 1 1 0

Q202.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 283



Q203.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 0

Q204.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

Q205.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 0

git # ba293a0e @ 2019-11-14 284



Q206.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 11 1 1 1 1

Q207.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 1

Q208.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 285



Q209.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 0

Q210.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 0

Q211.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 01 1 1 0 11 1 1 1 0

git # ba293a0e @ 2019-11-14 286




Q212.w x y z r0 0 0 0 ?0 0 0 1 ?0 0 1 0 ?0 0 1 1 ?0 1 0 0 00 1 0 1 ?0 1 1 0 ?0 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 01 1 0 0 ?1 1 0 1 11 1 1 0 ?1 1 1 1 1

Q213.w x y z r0 0 0 0 00 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 01 1 0 0 ?1 1 0 1 11 1 1 0 01 1 1 1 1

Q214.w x y z r0 0 0 0 ?0 0 0 1 00 0 1 0 10 0 1 1 ?0 1 0 0 00 1 0 1 10 1 1 0 ?0 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 ?1 1 0 1 11 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 287



Q215.

w x y z r0 0 0 0 ?0 0 0 1 ?0 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 ?0 1 1 0 00 1 1 1 01 0 0 0 ?1 0 0 1 ?1 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 ?1 1 1 1 1

Q216.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 ?0 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 01 0 1 1 ?1 1 0 0 ?1 1 0 1 11 1 1 0 11 1 1 1 1

Q217.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 ?0 0 1 1 00 1 0 0 10 1 0 1 ?0 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 288



Q218.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 10 0 1 1 ?0 1 0 0 ?0 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 11 0 1 0 ?1 0 1 1 ?1 1 0 0 11 1 0 1 01 1 1 0 ?1 1 1 1 ?

Q219.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 00 0 1 1 ?0 1 0 0 00 1 0 1 ?0 1 1 0 ?0 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 ?1 1 0 1 ?1 1 1 0 ?1 1 1 1 ?

Q220.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 ?1 0 1 1 01 1 0 0 ?1 1 0 1 11 1 1 0 11 1 1 1 1

git # ba293a0e @ 2019-11-14 289



Q221.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 ?0 0 1 1 ?0 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 ?1 0 1 0 ?1 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 ?

Q222.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 ?1 0 0 1 01 0 1 0 01 0 1 1 ?1 1 0 0 11 1 0 1 11 1 1 0 ?1 1 1 1 1

Q223.

w x y z r0 0 0 0 ?0 0 0 1 10 0 1 0 00 0 1 1 10 1 0 0 ?0 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 ?1 0 0 1 11 0 1 0 11 0 1 1 ?1 1 0 0 ?1 1 0 1 ?1 1 1 0 01 1 1 1 0

git # ba293a0e @ 2019-11-14 290



Q224.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 ?0 1 1 0 00 1 1 1 ?1 0 0 0 01 0 0 1 ?1 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 ?

Q225.

w x y z r0 0 0 0 ?0 0 0 1 10 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 ?0 1 1 0 00 1 1 1 ?1 0 0 0 11 0 0 1 ?1 0 1 0 11 0 1 1 ?1 1 0 0 11 1 0 1 11 1 1 0 01 1 1 1 0

Q226.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 ?1 0 0 1 11 0 1 0 01 0 1 1 11 1 0 0 ?1 1 0 1 01 1 1 0 ?1 1 1 1 1

git # ba293a0e @ 2019-11-14 291



Q227.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 ?0 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 ?0 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 11 1 0 1 ?1 1 1 0 01 1 1 1 1

Q228.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 ?0 1 0 1 00 1 1 0 10 1 1 1 ?1 0 0 0 ?1 0 0 1 11 0 1 0 ?1 0 1 1 ?1 1 0 0 ?1 1 0 1 ?1 1 1 0 01 1 1 1 0

Q229.

w x y z r0 0 0 0 ?0 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 00 1 0 1 ?0 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 ?1 0 1 0 01 0 1 1 11 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 ?

git # ba293a0e @ 2019-11-14 292



Q230.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 ?0 1 0 0 10 1 0 1 10 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 ?1 0 1 1 ?1 1 0 0 01 1 0 1 11 1 1 0 ?1 1 1 1 ?

Q231.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 ?0 0 1 1 10 1 0 0 00 1 0 1 ?0 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 ?1 0 1 1 ?1 1 0 0 ?1 1 0 1 11 1 1 0 ?1 1 1 1 ?

Q232.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 10 0 1 1 10 1 0 0 ?0 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 ?1 0 1 0 11 0 1 1 01 1 0 0 ?1 1 0 1 01 1 1 0 01 1 1 1 0

git # ba293a0e @ 2019-11-14 293



Q233.

w x y z r0 0 0 0 ?0 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 01 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 ?1 1 0 1 ?1 1 1 0 01 1 1 1 ?

Q234.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 ?0 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 ?1 1 1 0 ?1 1 1 1 0

Q235.

w x y z r0 0 0 0 ?0 0 0 1 ?0 0 1 0 ?0 0 1 1 ?0 1 0 0 00 1 0 1 10 1 1 0 ?0 1 1 1 ?1 0 0 0 01 0 0 1 ?1 0 1 0 11 0 1 1 ?1 1 0 0 11 1 0 1 11 1 1 0 ?1 1 1 1 1

git # ba293a0e @ 2019-11-14 294



Q236.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 11 1 1 1 0

Q237.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 ?0 0 1 1 00 1 0 0 ?0 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 ?1 0 1 1 11 1 0 0 01 1 0 1 ?1 1 1 0 01 1 1 1 0

Q238.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 ?0 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 ?1 0 0 1 11 0 1 0 11 0 1 1 ?1 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 295



Q239.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 ?1 1 0 0 ?1 1 0 1 11 1 1 0 01 1 1 1 0

Q240.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 ?0 0 1 1 00 1 0 0 00 1 0 1 ?0 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 ?1 0 1 0 01 0 1 1 01 1 0 0 ?1 1 0 1 01 1 1 0 11 1 1 1 0

Q241.

w x y z r0 0 0 0 10 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 ?1 0 0 1 01 0 1 0 01 0 1 1 ?1 1 0 0 11 1 0 1 11 1 1 0 01 1 1 1 0

git # ba293a0e @ 2019-11-14 296



Q242.

w x y z r0 0 0 0 ?0 0 0 1 ?0 0 1 0 ?0 0 1 1 00 1 0 0 10 1 0 1 ?0 1 1 0 ?0 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 ?1 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 1

Q243.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 ?1 0 0 1 ?1 0 1 0 11 0 1 1 11 1 0 0 ?1 1 0 1 11 1 1 0 01 1 1 1 ?

Q244.

w x y z r0 0 0 0 00 0 0 1 ?0 0 1 0 00 0 1 1 10 1 0 0 ?0 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 ?1 0 0 1 11 0 1 0 01 0 1 1 ?1 1 0 0 11 1 0 1 ?1 1 1 0 ?1 1 1 1 ?

git # ba293a0e @ 2019-11-14 297



Q245.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 00 0 1 1 ?0 1 0 0 10 1 0 1 10 1 1 0 ?0 1 1 1 11 0 0 0 ?1 0 0 1 ?1 0 1 0 ?1 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 0

Q246.

w x y z r0 0 0 0 ?0 0 0 1 ?0 0 1 0 00 0 1 1 ?0 1 0 0 10 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 ?1 0 0 1 11 0 1 0 11 0 1 1 ?1 1 0 0 ?1 1 0 1 11 1 1 0 ?1 1 1 1 1

Q247.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 00 0 1 1 10 1 0 0 ?0 1 0 1 00 1 1 0 ?0 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 ?1 0 1 1 ?1 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 0

git # ba293a0e @ 2019-11-14 298



Q248.

w x y z r0 0 0 0 ?0 0 0 1 ?0 0 1 0 00 0 1 1 10 1 0 0 10 1 0 1 00 1 1 0 10 1 1 1 01 0 0 0 ?1 0 0 1 01 0 1 0 ?1 0 1 1 11 1 0 0 ?1 1 0 1 ?1 1 1 0 01 1 1 1 1

Q249.

w x y z r0 0 0 0 ?0 0 0 1 ?0 0 1 0 00 0 1 1 ?0 1 0 0 ?0 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 ?1 0 0 1 01 0 1 0 ?1 0 1 1 ?1 1 0 0 11 1 0 1 01 1 1 0 ?1 1 1 1 0

Q250.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 10 1 0 1 ?0 1 1 0 00 1 1 1 ?1 0 0 0 01 0 0 1 01 0 1 0 ?1 0 1 1 ?1 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 1

git # ba293a0e @ 2019-11-14 299



Q251.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 ?0 1 0 0 00 1 0 1 ?0 1 1 0 10 1 1 1 01 0 0 0 11 0 0 1 11 0 1 0 ?1 0 1 1 11 1 0 0 ?1 1 0 1 11 1 1 0 ?1 1 1 1 0

Q252.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 10 1 1 0 10 1 1 1 11 0 0 0 ?1 0 0 1 ?1 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 ?

Q253.

w x y z r0 0 0 0 00 0 0 1 ?0 0 1 0 00 0 1 1 ?0 1 0 0 00 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 01 0 0 1 ?1 0 1 0 ?1 0 1 1 01 1 0 0 ?1 1 0 1 ?1 1 1 0 ?1 1 1 1 0

git # ba293a0e @ 2019-11-14 300



Q254.

w x y z r0 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 ?0 1 0 0 10 1 0 1 00 1 1 0 00 1 1 1 ?1 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 ?1 1 1 1 ?

Q255.

w x y z r0 0 0 0 10 0 0 1 ?0 0 1 0 10 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 ?0 1 1 1 11 0 0 0 01 0 0 1 ?1 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 ?1 1 1 1 ?

Q256.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 ?0 1 0 0 ?0 1 0 1 00 1 1 0 00 1 1 1 01 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 ?1 1 0 0 ?1 1 0 1 11 1 1 0 ?1 1 1 1 0

git # ba293a0e @ 2019-11-14 301



Q257.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 ?0 0 1 1 ?0 1 0 0 ?0 1 0 1 10 1 1 0 10 1 1 1 ?1 0 0 0 01 0 0 1 01 0 1 0 ?1 0 1 1 ?1 1 0 0 ?1 1 0 1 ?1 1 1 0 ?1 1 1 1 ?

Q258.

w x y z r0 0 0 0 00 0 0 1 00 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 ?0 1 1 1 01 0 0 0 11 0 0 1 ?1 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 ?1 1 1 1 0

Q259.

w x y z r0 0 0 0 10 0 0 1 00 0 1 0 00 0 1 1 ?0 1 0 0 10 1 0 1 ?0 1 1 0 00 1 1 1 ?1 0 0 0 11 0 0 1 01 0 1 0 ?1 0 1 1 01 1 0 0 11 1 0 1 11 1 1 0 11 1 1 1 ?

git # ba293a0e @ 2019-11-14 302



Q260.w x y z r0 0 0 0 ?0 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 ?0 1 1 0 ?0 1 1 1 ?1 0 0 0 11 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 ?1 1 0 1 ?1 1 1 0 ?1 1 1 1 0

Q261.w x y z r0 0 0 0 00 0 0 1 00 0 1 0 00 0 1 1 10 1 0 0 ?0 1 0 1 00 1 1 0 10 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 11 1 0 0 11 1 0 1 ?1 1 1 0 11 1 1 1 1

A.3 Chapter 3

Q262. The parity function f accepts an n-bit sequence X as input, and yields f (X) = 1 iff. X has an odd numberof elements equal to 1. If f (X) = 1 (resp. f (X) = 0), we say the parity of X is odd (resp. even). Using acombinatorial circuit, one can compute this as

f (X) = X0 ⊕ X1 ⊕ · · · ⊕ Xn−1

since XOR can be thought of as addition modulo two. However, how could we design a Finite State Machine(FSM) to compute f (X) when supplied with X one element at a time? Explain step-by-step how you wouldsolve this challenge: start with a high-level design for any FSM then fill in detail required for this FSM. Are thereany features or requirements you can add to this basic description so the FSM is deemed “better” somehow?

Q263. Imagine you are asked to to build a simple DNA matching hardware circuit as part of a research project. Thecircuit will be given DNA strings which are sequences of tokens that represent chemical building blocks. Thegoal is to search a large input sequence of DNA tokens for a small sequence indicative of some feature.

The circuit will receive one token per clock cycle as input; the possible tokens are adenine (A), cytosine(C), guanine (G) and thymine (T). The circuit should, given the input sequence, set an output flag to 1 whenthe matching sequence ACT is found somewhere in the input or 0 otherwise. You can assume the inputs areinfinitely long, i.e., the circuit should just keep searching forever and set the flag when the match is a success.

a Design a circuit to perform the required task, show all your working and explain any design decisionsyou make.

git # ba293a0e @ 2019-11-14 303



b Now imagine you are asked to build two new matching circuits which should detect the sequences CAGand TTT respectively. It is proposed that instead of having three separate circuits, they are combined into asingle circuit that matches the input sequence against one matching sequence selected with an additionalinput. Describe one advantage and one disadvantage you can think of for the two implementationoptions.

Q264. A revolutionary, ecologically sound washing machine is under development by your company. When turnedon, the machine starts in the idle state awaiting input. The washing cycle consists of the three stages: fill (whenit fills with water), wash (when the wash occurs), spin (when spin dying occurs); the machine then returns toidle when it is finished. Two buttons control the machine: pressing B0 starts the washing cycle, pressing B1cancels the washing cycle at any stage and returns the machine to idle; if both buttons are pressed at the sametime, the machine continues as normal as if neither were pressed.

a You are asked to design a circuit to control the washing machine. Draw a diagram illustrating states thewashing machine can be in, and valid transitions between them.

b Translate your diagram from above into a corresponding, tabular description of the transition function.

c Using an appropriate technique, derive Boolean expressions which allow computation of the transitionfunction; note that because the washing machine is ecologically sound, minimising the overall gate countis important.

Q265. Recall that an n-bit Gray code is a cyclic, 2n-element sequence S where each i-th element Si is itself an n-elementbinary sequence, and the Hamming distance between adjacent elements is one, i.e.,

D(Si,Si−1 (mod 2n)) = D(Si,Si+1 (mod 2n)) = 1.

a Using an expression (rather than words), define

i H(X), the Hamming weight of a binary sequence X, and

ii D(X,Y), the Hamming distance between binary sequences X and Y.

b Consider a D-type flip-flop, capable of storing a 1-bit value, realised using CMOS-based transistorsarranged into logic gates. Using a gate-level circuit diagram, describe the design of such a component(clearly explaining the purpose of each part).

c Imagine successive elements of a 3-bit Gray code sequence are stored, one after another, in a registerrealised using flip-flops of the type described above. The fact only one bit changes each time the registeris updated could be viewed as advantageous: explain why.

d Using a block diagram, draw a generic Finite State Machine (FSMs) framework, including for exampleδ, ω and any input and output; clearly explain the purpose of each component in the framework.

e Using the framework outlined above, design a concrete FSM which has

• two 1-bit inputs rst and clk, and

• one 3-bit output r.

and whose behaviour is as follows: at each positive edge of the clock signal clk, if rst = 0 then r should beupdated with the next element of a 3-bit Gray code, otherwise r should be reset to the first element.

Note that your answer should provide enough detail to fully specify each component in the framework(e.g., Boolean expressions for δ).

Q266. An electronic security system, designed to prevent unauthorised use of a door, is attached to a mains electricitysupply. The system has the following components:

• Three buttons, say Bi for 0 ≤ i < 3, whose value is initially 0; when pressed, a button remains pressed andthe value changes to 1.

• A door handle modelled by

H =

{1 when the handle is turned0 when the handle is unturned

git # ba293a0e @ 2019-11-14 304



• A lock mechanism modelled by

L =

{1 when the door is locked0 when the door is unlocked

If the door handle is turned after the order of button presses matches a 3-element password sequence P, thedoor should be unlocked; if there is a mismatch, it should remain locked. The mechanism is reset (and allbuttons released) whenever the handle is turned (whether or not the door is unlocked). If P = 〈B1,B0,B2〉, thenfor example

• B1 then B0 then B2 is pressed, then the handle is turned, the door is unlocked, i.e., L is set to 0, and themechanism is reset,

• B0 then B1 then B2 is pressed, then the handle is turned, the door remains locked, i.e., L is set to 1, and themechanism is reset,

• B1 then B0 is pressed, then the handle is turned, the door remains locked, i.e., L is set to 1, and themechanism is reset.

a Using a block diagram, draw a generic Finite State Machine (FSMs) framework, including for examplethe transision and output functions (i.e., δ and ω) and any input and output; clearly explain the purposeof each component in the framework.

b Imagine the password is fixed to P = 〈B2,B0,B1〉. Using the framework outlined above, design a concreteFSM which can be used to control the security system as required.

Note that your answer should provide enough detail to fully specify each component in the framework(e.g., Boolean expressions for the transision function).

c After inspecting your design, someone claims they can avoid the need for a clock signal: explain howthis is possible.

d The same person suggests an alternative approach whereby P is not fixed, but rather stored in an SRAMmemory device. Although this approach could be more useful, explain one reason it could be viewed asdisadvantageous.

e Before being sold, each physical system needs to be tested to ensure it functions as advertised. Explain asuitable testing strategy for your design, and any alterations required to facilitate it.

Q267. Imagine you are John Connor in the film Terminator II: your aim is to design a device that guesses ATM (orcash machine) Personal Identification Numbers (PINs) using brute-force search. The ATM uses 4-digit decimalPINs, examples being 1234 and 9876. The device stores a current PIN denoted P: it performs each guess insequence by first checking whether P is correct, then incrementing P ready for the next step. The processconcludes when P is deemed correct.

a Two potential representations for the PIN are suggested:

a decimal representation in which the PIN is stored as a sequence of four unsigned integers, i.e., P =〈P0,P1,P2,P3〉, with each 0 ≤ Pi < 10, or

a binary representation in which the PIN is stored as a single unsigned integer, i.e., P, with 0 ≤ P < 10000.

State one advantage of each option, and explain which you think is more appropriate.

b A combinatorial component within the device should take the current PIN P as input, and produce twooutputs:

• the guess sent to the ATM, i.e., G = 〈G0,G1,G2,G3〉, where each 0 ≤ Gi < 10 is the i-th decimal digitof the current PIN, and

• the incremented PIN P′ ready for the next guess.

Produce a design for this component; include a block diagram and enough detail to fully specify how agate-level implementation could be performed.

c The device is controlled by a simple Finite State Machine (FSM) which can be described diagrammatically:

git # ba293a0e @ 2019-11-14 305



S0start S1 S2 S3 S4

b = 0

b = 1 ε ε

r = 0

r = 1

ε

In a more explanatory form, the idea is as follows:

• The device starts in state S0, in which P is initialised; once the start button b is pressed, it moves intostate S1.

• In state S1, P is driven as input into combinatorial component and the device moves into state S2.

• In state S2, G is sent to the ATM and P′ is latched to form the new value of P; the device moves intostate S3.

• In state S3 the device checks the ATM response r. If r = 1 then G was the correct guess and the devicemoves into state S4 where it halts (i.e., remains in S4); otherwise, the device moves into state S1 andthe process repeats.

Focusing on the diagram above only, produce a design for the FSM; include a block diagram, and enoughdetail to fully specify how a gate-level implementation could be performed.

A.4 Chapter 4

Q268. An n-bit ripple-carry adder has a critical path that can be described as O(n) gate delays. Explain intuitivelywhy this is the case, and name an alternative whose critical path is shorter.

Q269. Give a single-line C expression to test if a non-zero integer x is an exact power-of-two; i.e., if x = 2n for some nthen the expression should evaluate to a non-zero value, otherwise it evaluates to zero.

Q270. Imagine you are writing a C program that includes a variable called x. If x has the type char and a currentvalue of 127, what is the new value after

a decrementing (i.e., subtracting 1 from it), or

b incrementing (i.e., adding 1 to it)

the variable?

Q271. Imagine x represents a two’s-complement, signed integer using 4 bits; xi denotes the i-th bit of x. Write ahuman-readable description (i.e., the meaning) of what the Boolean function

f (x) = ¬x3 ∧ (x2 ∨ x1 ∨ x0)

computes arithmetically.

Q272. Given an n-bit input x, draw a block diagram of an efficient (i.e., with a short critical path) combinatorial circuitthat can compute r = 7 · x (i.e., multiply x by the constant 7). Take care to label each component, and the size(in bits) of each input and output.

Q273. a Comparison operations for a given processor take two 16-bit operands and return zero if the comparisonis false or non-zero if it is true. By constructing some of the comparisons using combinations of otheroperations, show that implementing all of =,,, <,≤, > and ≥ is wasteful. State the smallest set ofcomparisons that need dedicated hardware such that all the standard comparisons can be executed.

b The ALU in the same processor design does not include a multiply instruction. So that programmers canstill multiply numbers, write an efficient C function to multiply two 16-bit inputs together and return the16-bit lower half of the result. You can assume the inputs are always positive.

git # ba293a0e @ 2019-11-14 306



c The population count or Hamming weight of x, denoted byH(x) say, is the number of bits in the binaryexpansion of x that equal one. Some processors have a dedicated instruction to do this but the proposedone does not; write an efficient C function to compute the population count of 16-bit inputs.

Q274. Imagine we want to compute the result of multiplying two n-bit numbers x and y together, i.e., r = x · y, wheren is even. One can adopt a divide-and-conquer approach to this computation by splitting x and y into twoparts each of size n/2 bits

x = x1 · 2n/2 + x0y = y1 · 2n/2 + y0

and then computing the full resultr = r2 · 2n + r1 · 2n/2 + r0

via the partsr2 = x1 · y1r1 = x1 · y0 + x0 · y1r0 = x0 · y0.

The naive approach above uses four multiplications of (n/2)-bit values. The Karatsuba-Ofman method reducesthis to three multiplications (and some extra low-cost operations); show how this is achieved.

Q275. Assume that unsigned integers are represented in 4 bits.

a What is the result of using a normal 4-bit adder circuit to compute the sum 10 + 12?

b A saturating (or clamped) adder is such that if an overflow occurs, i.e., the result does not fit into 4 bits,the highest possible result is returned instead. With a clamped 4-bit addition denoted by ], we have that10 ] 12 = 15 for example. In general, for an n-bit clamped adder

x ] y =

{x + y if x + y < 2n

2n− 1 otherwise

Design a circuit that implements a 4-bit adder of this type.

Q276. A software application needs 8-bit, unsigned modular multiplication, i.e., it needs to compute

x · y (mod N)

which is the same ast − (N · bt/Nc)

where t = x · y. You have been asked to extend an existing ALU to support this operation. The high cost ofa dedicated circuit for division rules out that option; using standard building blocks (e.g., adder, multiplexer)rather than individual logic gates, draw a block diagram of an alternative solution.

git # ba293a0e @ 2019-11-14 307



git # ba293a0e @ 2019-11-14 308



APPENDIX

B

EXAMPLE EXAM-STYLE SOLUTIONS

B.1 Chapter 1

S1. a i |A| = 3.

ii A ∪ B = {1, 2, 3, 4, 5}.

iii A ∩ B = {3}.

iv A − B = {1, 2}.

v A = {4, 5, 6, 7, 8}.

vi {x | 2 · x ∈ U} = {1, 2, 3, 4}.

b i +0 in sign-magnitude is 00000000, in two’s-complement is 00000000.

ii −0 in sign-magnitude is 10000000, in two’s-complement is 00000000.

iii +72 in sign-magnitude is 01001000, in two’s-complement is 01001000.

iv −34 in sign-magnitude is 10100010, in two’s-complement is 11011110.

v −8 in sign-magnitude is 10001000, in two’s-complement is 11111000.

vi This is a trick question: one cannot represent 240 in 8-bit sign-magnitude or two’s-complement; theincorrect guess of 11111000 in two’s-complement for example is actually −8.

S2. The population count or Hamming weight of x, denoted by H(x) say, is the number of bits in the binaryexpansion of x that equal one. Using an unsigned 32-bit integer x for example, an implementation might bewritten as follows:

int H( uint32_t x ) {int t = 0;

for( int i = 0; i < 32; i++ ) {if( ( x >> i ) & 1 ) {t = t + 1;

}}

return t;}

git # ba293a0e @ 2019-11-14 309



S3. a The truth table for this function is as follows

a b c (a ∧ b ∧ ¬c) (a ∧ ¬b ∧ c) (¬a ∧ ¬b ∧ c) f (a, b, c)0 0 0 0 0 0 00 0 1 0 0 1 10 1 0 0 0 0 00 1 1 0 0 0 01 0 0 0 0 0 01 0 1 0 1 0 11 1 0 1 0 0 11 1 1 0 0 0 0

Since there are n = 3 input variables, there are clearly 2n = 23 = 8 input combinations; three of theseproduce 1 as an the output from the function.

b The truth table for this function is as follows

a b c d f (a, b, c, d)0 0 0 0 00 0 0 1 00 0 1 0 00 0 1 1 00 1 0 0 00 1 0 1 10 1 1 0 00 1 1 1 01 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 01 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 0

so since there is only one case where f (a, b, c, d) = 1, the only assignment given which matches the criteriais a = 0, b = 1, c = 0 and d = 1.

This hints at a general principle: when we have an expression like this, a term such as ¬x can be read as“x should be 0” and x as “x should be 1”. So the expression as a whole is read as “a should be 0 and bshould be 1 and c should be 0 and d should be 1”. Since we have basically fixed all four inputs, only oneentry of the truth table matches. On the other hand, if we instead had

f (a, b, c, d) = ¬a ∧ b ∧ ¬c

for example, we would be saying “a should be 0 and b should be 1 and c should be 0, and d can beanything” which gives two possible assignments (i.e., a = 0, b = 1, c = 0 and either d = 0 or d = 1).

c Informally, SoP form means there are say n terms in the expression: each term is the conjunction of somevariables (or their complement), and the expression is the disjunction of the terms. As conjunction anddisjunction basically means the AND and OR operators, and AND and OR act sort of like multiplicationand addition, the SoP name should make some sense: the expression is sort of like the sum of termswhich are themselves each a product of variables. The second option is correct as a result; the first andlast violate the form described above somehow (e.g., the first case is in the opposite, PoS form).

d One can easily make a comparison using a truth table such as

a b a ∨ 1 a ⊕ 1 ¬a a ∧ 1 ¬(a ∧ b) ¬a ∨ ¬b0 0 1 1 1 0 1 10 1 1 1 1 0 1 11 0 1 0 0 1 1 11 1 1 0 0 1 0 0

from which it should be clear that all the equations are correct except for the first one. That is, a ∨ 1 , abut rather a ∨ 1 = 1.

git # ba293a0e @ 2019-11-14 310



e i Inspecting the following truth tablea ¬a ¬¬a0 1 00 1 01 0 11 0 1

shows this equivalence is correct (this is the involution axiom).

ii Inspecting the following truth table

a b ¬a ¬b a ∧ b ¬(a ∧ b) ¬a ∨ ¬b0 0 1 1 0 1 10 1 1 0 0 1 11 0 0 1 0 1 11 1 0 0 1 0 0

shows this equivalence is correct (this is the de Morgan axiom).

iii Inspecting the following truth table

a b ¬a ¬b ¬a ∧ b a ∧ ¬b0 0 1 1 0 00 1 1 0 1 01 0 0 1 0 11 1 0 0 0 0

shows this equivalence is incorrect.

iv Inspecting the following truth tablea ¬a a ⊕ a0 1 01 0 0

shows this equivalence is incorrect.

S4. a The dual of any expression is constructed by using the principle of duality, which informally meansswapping each AND with OR (and vice versa) and each 0 with 1 (and vice versa); this means, forexample, we can take the OR form of each axiom and produce the AND form (and vice versa).

So in this case, we start with an OR form: this means the dual will the corresponding AND form. Makingthe swaps required means we end up with

x ∧ 0 ≡ 0

so the second option is correct.

b This question is basically asking for the complement of f , since the options each have ¬ f on the left-hand side: this means using the principle of complements, a generalisation of the de Morgan axiom, byswapping each variable with the complement (and vice versa), each AND with OR (and vice versa), andeach 0 with 1 (and vice versa). If we apply these rules (taking care with the parenthesis) to

f = ¬a ∧ ¬b ∨ ¬c ∨ ¬d ∨ ¬e,

we end up with¬ f = (a ∨ b) ∧ c ∧ d ∧ e

which matches the last option.

c The de Morgan axiom, which can be generalised using by the principle of complements, says that

¬(x ∧ y) ≡ ¬x ∨ ¬y

or conversely that¬(x ∨ y) ≡ ¬x ∧ ¬y

You can think of either form as “pushing” the NOT operator on the left-hand side into the parentheses:this acts to complement each variable, and swap the AND to an OR (or vice versa). We know that

x ∧ y ≡ ¬(x ∧ y)x ∨ y ≡ ¬(x ∨ y)

git # ba293a0e @ 2019-11-14 311



So pattern matching against the options, it is clear the first one is correct, for example, because

x ∨ y ≡ ¬(x ∨ y) ≡ ¬x ∧ ¬y

where the right-hand side matches the description of an AND whose two inputs are complemented.Likewise, the second one is correct because

x ∧ y ≡ ¬(x ∧ y) ≡ ¬x ∨ ¬y.

S5. a The third option, i.e., ¬a ∧ ¬b is the correct one; the three simplification steps, via two axioms, are asfollows:

¬ (a ∨ b) ∧ ¬ (c ∨ d ∨ e) ∨ ¬ (a ∨ b)= (¬a ∧ ¬b) ∧ ¬ (c ∨ d ∨ e) ∨ (¬a ∧ ¬b) (de Morgan)= (¬a ∧ ¬b) ∧ (¬c ∧ ¬d ∧ ¬e) ∨ (¬a ∧ ¬b) (de Morgan)= ¬a ∧ ¬b (absorption)

b We can clearly see that

(a ∨ b ∨ c) ∧ ¬(d ∨ e) ∨ (a ∨ b ∨ c) ∧ (d ∨ e)= (a ∨ b ∨ c) ∧ (¬(d ∨ e) ∨ (d ∨ e)) (distribution)= (a ∨ b ∨ c) ∧ ((d ∨ e) ∨ ¬(d ∨ e)) (commutativity)= (a ∨ b ∨ c) ∧ 1 (inverse)= a ∨ b ∨ c (identity)

meaning the first option is the correct one.

c We can clearly see that

a ∧ c ∨ c ∧ (¬a ∨ a ∧ b)= (a ∧ c) ∨ (c ∧ (¬a ∨ (a ∧ b))) (precedence)= (c ∧ a) ∨ (c ∧ (¬a ∨ (a ∧ b))) (commutativity)= c ∧ (a ∨ ¬a ∨ (a ∧ b)) (distribution)= c ∧ (1 ∨ (a ∧ b)) (inverse)= c ∧ ((a ∧ b) ∨ 1) (commutativity)= c ∧ 1 (null)= c (identity)

meaning the last option is the correct one: none of the above is correct, since the correct simplification isactually just c.

d The fourth option, i.e., a ∧ b is correct. This basically stems from repeated application of the absorptionaxiom, the AND form of which states

x ∨ (x ∧ y) ≡ x.

Applying it from left-to-right, we find that

a ∧ b ∨ a ∧ b ∧ c ∨ a ∧ b ∧ c ∧ d ∨ a ∧ b ∧ c ∧ d ∧ e ∨ a ∧ b ∧ c ∧ d ∧ e ∧ f= (a ∧ b) ∨ (a ∧ b) ∧ (c) ∨ (a ∧ b) ∧ (c ∧ d) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (precedence)= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d ∧ e) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)= (a ∧ b) ∨ (a ∧ b) ∧ (c ∧ d ∧ e ∧ f ) (absorption)= (a ∧ b) (absorption)

e We can simplify this function as follows

f (a, b, c) = (a ∧ b) ∨ a ∧ (a ∨ c) ∨ b ∧ (a ∨ c)= (a ∧ b) ∨ a ∨ b ∧ (a ∨ c) (absorbtion)= a ∨ (a ∧ b) ∨ b ∧ (a ∨ c) (commutitivity)= a ∨ b ∧ (a ∨ c) (absorbtion)= a ∨ (b ∧ a) ∨ (b ∧ c) (distribution)= a ∨ (a ∧ b) ∨ (b ∧ c) (commutitivity)= a ∨ (b ∧ c) (commutitivity)

at which point there is nothing else that can be done: we end up with 2 operators (and AND and an OR),so the second option is correct.

git # ba293a0e @ 2019-11-14 312



f Working from the right-hand side toward the left, we have that

¬x ∨ ¬y= (¬x ∧ 1) ∨ ¬y (identity)= (¬x ∧ 1) ∨ (¬y ∧ 1) (identity)= (¬x ∧ (y ∨ ¬y)) ∨ (¬y ∧ 1) (inverse)= (¬x ∧ (y ∨ ¬y)) ∨ (¬y ∧ (x ∨ ¬x)) (inverse)= (¬x ∧ y) ∨ (¬x ∧ ¬y) ∨ (¬y ∧ (x ∨ ¬x)) (distribution)= (¬x ∧ y) ∨ (¬x ∧ ¬y) ∨ (¬y ∧ x) ∨ (¬y ∧ ¬x) (distribution)= (¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) ∨ (¬x ∧ ¬y) (commutativity)= (¬x ∧ y) ∨ (¬y ∧ x) ∨ (¬x ∧ ¬y) (idempotency)

g By writingt0 = x ∧ yt1 = y ∧ zt2 = y ∨ zt3 = x ∨ zt4 = t1 ∧ t2

we can shorten the LHS and RHS tof = t0 ∨ t4g = y ∧ t3

and then perform a brute-force enumeration

x y z t0 t1 t2 t3 t4 f g0 0 0 0 0 0 0 0 0 00 0 1 0 0 1 1 0 0 00 1 0 0 0 1 0 0 0 00 1 1 0 1 1 1 1 1 11 0 0 0 0 0 1 0 0 01 0 1 0 0 1 1 0 0 01 1 0 1 0 1 1 0 1 11 1 1 1 1 1 1 1 1 1

to demonstrate that f = g, i.e., the equivalence holds. Note that this approach is not as robust if theintermediate steps are not shown; simply including f and g in the truth table does not give much moreconfidence that simply writing the equivalence!

To prove the equivalence using an axiomatic approach, the following steps can be applied:

(x ∧ y) ∨ (y ∧ z ∧ (y ∨ z))= (x ∧ y) ∨ (y ∧ z ∧ y) ∨ (y ∧ z ∧ z) (distribution)= (x ∧ y) ∨ (y ∧ y ∧ z) ∨ (y ∧ z ∧ z) (commutativity)= (x ∧ y) ∨ (y ∧ z) ∨ (y ∧ z) (idempotency)= (x ∧ y) ∨ (y ∧ z) (idempotency)= (y ∧ x) ∨ (y ∧ z) (commutativity)= y ∧ (x ∨ z) (distribution)

B.2 Chapter 2

S6. Using a Karnaugh map, for example, one can produce the result

r = f (x, y, z) = y ∨ (z ∧ ¬x) ∨ (x ∧ ¬z)

which, by inspection, givesf

x y z y z ∧ ¬x x ∧ ¬z r0 0 0 0 0 0 00 0 1 0 1 0 10 1 0 1 0 0 10 1 1 1 1 0 11 0 0 0 0 1 11 0 1 0 0 0 01 1 0 1 0 1 11 1 1 1 0 0 1

git # ba293a0e @ 2019-11-14 313



However, notice that the sub-expression(z ∧ ¬x) ∨ (x ∧ ¬z)

can be simplified toz ⊕ x

so per the question, the simplest implementation of f is

r = f (x, y, z) = y ∨ (z ⊕ x).

S7. First, note that the following identities can be applied:

¬x ≡ x ∧ xx ∨ y ≡ (x ∧ x) ∧ (y ∧ y)x ∧ y ≡ (x ∧ y) ∧ (x ∧ y)

As such, we can write¬(x ∨ y) ≡ ((x ∧ x) ∧ (y ∧ y)) ∧ ((x ∧ x) ∧ (y ∧ y)).

S8. The excitation table of a standard SR latch is

Current NextS R Q ¬Q Q′ ¬Q′

0 0 0 1 0 10 0 1 0 1 00 1 ? ? 0 11 0 ? ? 1 01 1 ? ? ? ?

meaning the reset-dominate alteration gives

Current NextS R Q ¬Q Q′ ¬Q′

0 0 0 1 0 10 0 1 0 1 00 1 ? ? 0 11 0 ? ? 1 01 1 ? ? 0 1

Given the inputs S and R, the circuit can be constructed by using a standard SR latch with two cross-coupledNOR gates whose inputs are S′ and R′. We simply set R′ = R and S′ = ¬R ∧ S so S′ is only 1 when R = 0 andS = 1: if R = 1 (including the case when both R = 1 and S = 1), the latch it reset since S = 0. The circuit is asfollows:

-------+| +---+

+---+ +->| | +---+| | |AND|-- S' -->| ||NOR|---->| | |NOR|-- ~Q --+| | +---+ +->| | |+---+ | +---+ |

+--|---------------+| || +---------------+| +---+ |+---->| | |

|NOR|-- Q --+----------------- R' -->| |

+---+

S9. A wide range of answers are clearly possible. Obvious examples include physical size, and power consumptionor heat dissipation. Other variants include worst-case versus average-case versions of each metric, for examplein the case of efficiency.

S10. a MOSFET transistors work by sandwiching together N-type and P-type semiconductor layers. The dif-ferent types of layer are doped with different substances to create more holes or more electrons. Forexample, in an N-type MOSFET the layers are constructed as follows

git # ba293a0e @ 2019-11-14 314



gate+-------+| metal |

==== source ========= drain ==== silicon oxide layer+--+--------+---------+--------+--+| | N-type | | N-type | || +--------+ +--------+ || P-type |+---------------------------------+

with additional layers of silicon oxide and metal. There are three terminals on the transistor. Roughlyspeaking, applying a voltage to the gate creates a channel between the source and drain through whichcharge can flow. Thus the device acts like a switch: when the gate voltage is high, there is a flow of chargebut when it is low there is little flow of charge. A P-type MOSFET swaps the roles of N-type and P-typesemiconductor and hence implements the opposite switching behaviour.

b One can construct an NAND to compute z = x ∧ y gate from such transistors as follows:

V_dd|

+-------+-------+| |v v

+--------+ +--------+x -->| P-type | y -->| P-type |

+--------+ +--------+| |+---------------+---> z|

+--------+x -->| N-type |

+--------+|

+--------+y -->| N-type |

+--------+^|+-------+

|VSS

If x and y are connected to Vss then both top P-type transistors will be connected, and both bottomN-type transistors will be disconnected; r will be connected to Vdd. If x and y are connected to Vddand Vss respectively then the right-most P-type transistor will be connected, and both lower-most N-type transistor will be disconnected; r will be connected to Vdd. If x and y are connected to Vss andVdd respectively then the left-most P-type transistor will be connected, and both upper-most N-typetransistor will be disconnected; r will be connected to Vdd. If x and y are connected to Vdd then both topP-type transistors will be disconnected, and both bottom N-type transistors will be connected; r will beconnected to Vss. In short, the behaviour we get is described by

x y rVss Vss VddVss Vdd VddVdd Vss VddVdd Vdd Vss

which, if we substitute 0 and 1 for Vss and Vdd, is matches that of the NAND operation.

S11. This question is a lot easier than it sounds; basically we just add two extra transistors (one P-MOSFET and oneN-MOSFET) to implement a similar high-level approach. That is, we want r connected to Vss only when each ofx, y and z are connected to Vdd; this means the bottom, N-MOSFETs are in series. If any of x, y or z are connectedto Vss, we want r connected to Vdd; this means the top, P-MOSFETs are in parallel. Diagrammatically, the resultis as follows:

git # ba293a0e @ 2019-11-14 315



x

y

z

r

Vss

Vdd

S12. This is quite an open-ended question, but basically it asks for high-level explanations only. As such, someexample answers include the following:

a CMOS transistors are constructed from atomic-level understanding and manipulation; the immutablesize of atoms therefore acts as a fundamental limit on the size of any CMOS-based transistor.

b Feature scaling improves the operational efficiency of transistors, simply because smaller features reducedelay. Beyond this however, one must utilise the extra transistors to achieve some useful task if compu-tational efficiency is to scale as well: improvements to an architecture or design are often required, forinstance, to exploit parallelism and so on.

c Even assuming the transistors available can be harnessed to improve computational efficiency, this hasimplications: more transistors within a fixed size area will increase power consumption and also heatdissipation for example, both of which act as limits even if managed (e.g., via aggressive forms of cooling).

d On one hand, smaller transistors mean less cost per-transistor: with a fixed number of transistors, theirarea and manufacturing cost will decrease. With a fixed sized area and hence more transistors in ithowever, this probably means increase defect rate during manufacture. The resulting cost implicationcould act as an economic limit to transistor size.

S13. a The most basic interpretation (i.e., not really doing any grouping using Karnaugh maps but just pickingout each cell with a 1 in it) generates the following SoP equations

e = (¬a ∧ ¬b ∧ c ∧ ¬d) ∨ (a ∧ ¬b ∧ ¬c ∧ ¬d) ∨ (¬a ∧ b ∧ ¬c ∧ d)f = (¬a ∧ ¬b ∧ ¬c ∧ d) ∨ (¬a ∧ b ∧ ¬c ∧ ¬d) ∨ (a ∧ ¬b ∧ c ∧ ¬d)

b From the basic SoP equations, we can use the don’t care states to eliminate some of the terms to get

e = (¬a ∧ ¬b ∧ c) ∨ (a ∧ ¬c ∧ ¬d) ∨ (b ∧ d)f = (¬a ∧ ¬b ∧ d) ∨ (b ∧ ¬c ∧ ¬d) ∨ (a ∧ c)

then, we can share both the terms ¬a ∧ ¬b and ¬c ∧ ¬d since they occur in e and f .

git # ba293a0e @ 2019-11-14 316



S14. Simply transcribing the truth table into a suitable Karnaugh map gives

1 10 1

100?

y

x

z

from which we can derive the SoP expression

r = (¬y ∧ z) ∨ (¬x ∧ ¬z).

S15. Define ∧ as the NAND operation with the truth table:

x y x ∧ y0 0 10 1 11 0 11 1 0

Using NAND, we can implement NOT, AND and OR as follows:

¬x = x ∧ xx ∧ y = (x ∧ y) ∧ (x ∧ y)x ∨ y = (x ∧ x) ∧ (y ∧ y)

To prove this works, we can construct truth tables for the expressions and compare the results with what wewould expect; for NOT we have:

x x ∧ x ¬x0 1 11 1 0

while for AND we have:x y x ∧ y (x ∧ y) ∧ (x ∧ y) x ∧ y0 0 1 0 00 1 1 0 01 0 1 0 01 1 0 1 1

and finally for OR we have:

x y x ∧ x y ∧ y (x ∧ y) ∧ (x ∧ y) x ∨ y0 0 1 1 0 00 1 1 0 1 11 0 0 1 1 11 1 0 0 1 1

such that it should be clear all three are correct.

S16. Conventionally a 4-input, 1-bit multiplexer might be described using a truth table such as the following:

c1 c0 w x y z r0 0 ? ? ? 0 00 0 ? ? ? 1 10 1 ? ? 0 ? 00 1 ? ? 1 ? 11 0 ? 0 ? ? 01 0 ? 1 ? ? 11 1 0 ? ? ? 01 1 1 ? ? ? 1

git # ba293a0e @ 2019-11-14 317



This assumes that there are four inputs, namely w, x, y and z, with two further control signals c1 and c0 decidingwhich of them provides the output r. However, another valid way to write the same thing would be

c1 c0 r0 0 w0 1 x1 0 y1 1 z

This reformulation describes a 2-input, 1-output Boolean function whose behaviour is selected by fixing w, x,y and z, i.e., connecting each of them directly to either 0 or 1. For instance, if w = x = y = 0 and z = 1 then thetruth table becomes

c1 c0 r0 0 w = 00 1 x = 01 0 y = 01 1 z = 1

which is of course the same as AND. So depending on how w, x, y and z are fixed (on a per-instance basis) wecan form any 2-input, 1-output Boolean function; this includes NAND and NOR, which we know are universal,meaning the multiplexer is also universal.

S17. a The expression for this circuit can be written as

e = (¬c ∧ ¬b) ∨ (b ∧ d) ∨ (¬a ∧ c ∧ ¬d) ∨ (a ∧ c ∧ ¬d)

which yields the Karnaugh map

1 01 1

1011

1 10 1

1101c

a

d

b

and from which we can derive a simplified SoP form for e, namely

e = (b ∧ d) ∨ (¬b ∧ ¬c) ∨ (c ∧ ¬d)

b The advantages of this expression over the original are that is is simpler, i.e., contains less terms andhence needs less gates for implementation, and shows that the input a is essentially redundant. We haveprobably also reduced the critical path through the circuit since it is more shallow. The disadvantagesare that we still potentially have some glitching due to the differing delays through paths in the circuit,although these existed before as well, and the large propagation delay.

c The longest sequential path through the circuit goes through a NOT gate, two AND gates and two ORgates; the critical path is thus 90ns long. This time bounds how fast we can used it in a clocked systemsince the clock period must be at least 90ns. So the shortest clock period would be 90ns, meaning theclock ticks about 11111111 times a second (or at about 11MHz).

S18. a Examining the behaviour required, we can construct the following truth table:

D2 D1 D0 L8 L7 L6 L5 L4 L3 L2 L1 L0

0 0 0 ? ? ? ? ? ? ? ? ?0 0 1 0 0 0 0 1 0 0 0 00 1 0 0 1 0 0 0 0 0 1 00 1 1 1 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0 0 1 0 11 0 1 1 0 1 0 1 0 1 0 11 1 0 1 1 1 0 0 0 1 1 11 1 1 ? ? ? ? ? ? ? ? ?

git # ba293a0e @ 2019-11-14 318



Note thatL3 = 0L5 = 0L6 = L2L7 = L1L8 = L0

so actually we only need expressions for L0...2 and L4, and that don’t care states are used to capture theidea that D = 0 and D = 7 never occur. The resulting four Karnaugh maps

? 01 1

011?

D1

D2

D0

? 00 0

101?

D1

D2

D0

? 01 1

001?

D1

D2

D0

? 10 1

010?

D1

D2

D0

can be translated into the expressions:

L0 = D2 ∨ (D1 ∧D0)L1 = (D1 ∧ ¬D0)L2 = D2L4 = D0

b All the LEDs can be driven in parallel, i.e., the critical path relates to the single expression whose criticalpath is the most. L2...6 have no logic involved, so we can discount them immediately. Of the two remainingLEDs, we find

L0 { 20ns + 20nsL1 { 10ns + 20ns

hence L0 represents the critical path of 40ns. Thus if one throw takes 40ns, we can perform

1s40ns

=1 · 109ns

40ns= 25000000

throws per-second. Which is quite a lot, and certainly too many to actually see with the human eye!

c i A rough block diagram would resemble

+-----+ +-----+ +-----+co <----------|co ci|<- - - - - |co ci|<----------|co ci|<---------- ci = 0

+--|s y|<----+ +--|s y|<----+ +--|s y|<----+| | x|<-+ | | | x|<-+ | | | x|<-+ || +-----+ | | | +-----+ | | | +-----+ | || | | | | | | | || x_{n-1} | | x_1 | | x_0 || | | | | || y_{n-1} | y_1 | y_0v v vr_{n-1} r_1 r_0

ii If we sum 8 values 1 ≤ xi ≤ 6, where xi is the i-th throw (or i-th value of D supplied), then themaximum total is 8 · 6 = 48. We can represent this in 6 bits, hence n = 6.

iii Using the left-shift method, we compute D′ = 2 ·D by simply relabelling the bits in D. That is, D′0 = 0and D′i+1 = Di for 0 ≤ i < 3. For example, given D = 6(10) = 110(2) we have

D′0 = 0D′1 = D0 = 0D′2 = D1 = 1D′3 = D2 = 1

and hence D′ = 1100(2) = 12(10). Since there is no need for any logic gates to implement thismethod, the critical path is essentially nil: the only propagation delay relates to (small) wire delays.In comparison to the larger critical path of a suitable n-bit adder, this clearly means the left-shiftapproach is preferable.

git # ba293a0e @ 2019-11-14 319



S19. a A basic design would use two building blocks:

• lth_8bit compares two 8-bit inputs a and b and produces a 1-bit result r, where r = 1 if a < b andr = 0 if a ≥ b:

a b| |v v

+-----------+| lth_8bit |+-----------+

|vr

• mux2_8bit selects between two 8-bit inputs; if the inputs are a and b, the output r = a if the controlsignal s = 0, or r = b if s = 1:

a b| |v v

+-----------+| mux2_8bit |<-- s+-----------+

|vr

Based on these building blocks, one can describe the component C as follows:

x y| |v v

+-----------+| lth_8bit |+-----------+

y x | x y| | | | |v v | v v

+-----------+ | +-----------+| mux2_8bit |<--+-->| mux2_8bit |+-----------+ r +-----------+

| |v v

min(x,y) max(x,y)

From a functional perspective, C compares x and y using an instance of the lth_8bit building block, andthen uses the result r as a control signal for two instances of mux2_8bit. The left-hand instance selects yas the output if r = 0 and x if r = 1; that is, if x < y then the output is x = min(x, y) otherwise the output isy = min(x, y). The right-hand instance swaps the inputs so it selects x as the output if r = 0 and y if r = 1;that is, if x < y then the output is y = max(x, y) otherwise the output is x = max(x, y).

b The short answer (which gets about half the marks) is that the longest path through the mesh will gothrough 2n − 1 of the C components: this is the path from the top-left corner down along one edge to thebottom-left and then along another edge to the bottom-right. So in a sense, if we write the propagationdelay associated with each instance of C as TC then the overall critical path is

(2n − 1) · TC.

In a bit more detail, the critical path through C is through one instance of lth_8bit and one instance ofmux2_8bit. So we could write the overall critical path is

(2n − 1) · (Tlth_8bit + Tmux2_8bit).

To be more detailed than this, we need to think about individual logic gates. Imagine we assumeTAND = 50ns, TAND = 20ns, TOR = 20ns and TNOT = 10ns.

• mux2_8bit is simply eight mux2_1bit instances placed in parallel with each other; that is, the i-thsuch instance produces the i-th bit of the output based on the i-th bit of the inputs (but all using thesame control signal). Assuming that the propagation delay of AND and OR gates dominates that ofa NOT gate, the critical path through mux2_1bitwill be TAND + TOR.

git # ba293a0e @ 2019-11-14 320



• lth_8bit is a combination of eight sub-components:

x_i y_i x_i y_i| | | |v v v v

+----------+ +----------+| lth_1bit | | equ_1bit |+----------+ +----------+| || | t_{i-1}| | || v v| +----------+| +----| AND || | +----------+v v

+----------+| OR |+----------+

|vt_i

Each of these sub-components is placed in series so that ti−1 is an input from the previous sub-component and ti is an output provided to the next.Based on simple circuits derived from their truth tables, the critical paths for lth_1bit and equ_1bitare TAND + TNOT and TXOR + TNOT respectively. Thus the critical path of the whole sub-componentis TXOR + TNOT + TAND + TOR (since the critical path of equ_1bit is longer). Overall, the critical pathof lth_8bit is

8 · (TXOR + TNOT + TAND + TOR),

or more exactly7 · (TXOR + TNOT + TAND + TOR) + TAND + TNOT

because the 0-th sub-component is “special”: there is no input from the previous sub-component.

Using this we can write the overall critical path for the mesh as

(2n − 1) · (7 · TXOR + 8 · TNOT + 9 · TAND + 8 · TOR)

or roughly (2n − 1) · 770ns if we plug in the assumed delays.

c One problem is that the mesh does not always give the right result! If you were to build a 4× 4 mesh andfeed 5, 6, 7, 8 into the top and 4, 3, 2, 1 into the left-hand side, the bottom reads 5, 6, 7, 8 and the right-handside reads 4, 3, 2, 1: numbers cannot move from bottom to top or from right to left, so there are someinputs a mesh cannot sort.

Beyond this trick question, the main idea is that the mesh is special-purpose, the processor is general-purpose: this implies a number of trade-offs in either direction that could be viewed as advantages ordisadvantages in certain cases. For example, depending on n, one might argue that the processor willbe require more logic to realise it (since it will include features extraneous to the task of sorting). Sinceit operates a fetch-decode-execute cycle to complete each instruction, there is an overhead (i.e., the fetchand decode at least) which means it potentially performs the task of sorting less quickly. On the otherhand, once constructed the mesh is specialised to one task: it cannot be used to sort strings for example,and the size of input (i.e., n) is fixed. The processor makes the opposite trade-off; it should be clear thatwhile it might be slower and potentially larger, it is vastly more flexible.

S20. a Imagine a component which is enabled (i.e., “turned on”) using the input en:

• The idea of the component being level triggered is that the value of en is important, not a change inen: the component is enabled when en has a particular value, rather than at an edge when the valuechanges.

• The fact en is active high means that the component is enabled when en = 1 (rather than en = 0 whichwould make it active low). Though active high might seem the more logical choice, this is just partof the component specification: as long as everything is consistent, i.e., uses the right semantics to“turn on” the component, there is sometimes no major benefit of one approach over the other.

git # ba293a0e @ 2019-11-14 321



b Assume that M is a 4-state switch represented by a 2-bit value M = 〈M0,M1〉: 〈0, 0〉means off, 〈1, 0〉meansslow, 〈0, 1〉means fast and 〈1, 1〉means very fast. Also assume there is a clock signal called clk available,for example supplied by an oscillator of some form.

One approach would basically be to take clk and divide it to create two new clock signals c0 and c1 whichhave a longer period: each of the clock signals could then satisfy the criteria of toggling the fire buttonon and off at various speeds. A clock divider is fairly simple: the idea is to have a counter c clocked byclk and to sample the (i− 1)-th bit of the counter: this behaves like clk divided by 2i. For example the 0-thbit acts like clk but with twice the period.

A circuit to do this is fairly simple: we need some D-type flip-flops to hold the counter state, and somefull-adders to increment the counter:

+-----+ +-----+|co ci|<----------|co ci|<-- 0

+--|s y|<-- 0 +--|s y|<-- 1| | x|<-+ | | x|<-+| +-----+ | | +-----+ || | | || c_1 --+ | c_0 --+| | | || +-----+ | | +-----+ |+->|D Q|--+ +->|D Q|--+| <|<-+ | <|<-+| | | | | |+-----+ | +-----+ |

| |+-----------------+-- clk

Given such a component which runs freely as long as it is driven by clk, we want to feed the originalfire button F0 through to form the new fire button input F′0 when M = 0, and c1, c0 or clk through whenM = 1, M = 2 or M = 3 (meaning a slow, fast or very fast toggling behaviour). We can describe this as thefollowing truth table:

M1 M0 F′00 0 F00 1 c11 0 c01 1 clk

This is essentially a multiplexer controlled by M, and permits us to write

F′0 = ( ¬M0 ∧ ¬M1 ∧ F0 )( M0 ∧ ¬M1 ∧ c1 )( ¬M0 ∧ M1 ∧ c0 )( M0 ∧ M1 ∧ clk )

c i A synchronous protocol demands that the console and controller share a clock signal which actsto synchronise their activity, e.g., ensures each one sends and receives data at the right time. Theproblem with this is ensuring that the clock is not skewed for either component: since they arephysically separate, this might be hard and hence this is not such a good option.An asynchronous protocol relies on extra connections between the components, e.g., “request” and“acknowledge”, that allow them to engage in a form of transaction: the extra connections essentiallysignal when data has been sent or received on the associated bus. This is more suitable given thescenario: the extra connections could potentially be shared with those that already exist (e.g., F0, F1,F2 and D) thereby reducing the overhead, plus performance is not a big issue here (the protocol willpresumably only be executed once when the components are turned on or plugged in).Both approaches have an issue in that

• once the protocol is run someone could just plug in another, fake controller, or• or simply intercept c and T(c) pairs until it recovers the whole look-up table and then “imitate”

it using a fake controller

so neither is particularly robust from a security point of view!

ii The temptation here is to say that the use of a 3-bit memory (or register) is the right way to go.Although this allows some degree of flexibility which is not required since the function is fixed, themain disadvantage is retention of the content when the controller or console is turned off: someform of non-volatile memory is therefore needed.

git # ba293a0e @ 2019-11-14 322



However, we can easily construct some dedicated logic to do the same thing. If we say that y = T(x),the we can describe the behaviour of T using the following truth table:

x2 x1 x0 y2 y1 y0

0 0 0 0 1 00 0 1 1 1 00 1 0 1 1 10 1 1 0 0 11 0 0 1 0 01 0 1 0 0 01 1 0 1 0 11 1 1 0 1 1

This can be transformed into the following Karnaugh maps for y0, y1 and y2

0 00 0

1111

x1

x2

x0

1 10 0

1001

x1

x2

x0

0 11 0

1010

x1

x2

x0

which in turn can be transformed into the following equations

y0 = ( x1 )y1 = ( ¬x2 ∧ ¬x1 ) ∨

( ¬x2 ∧ ¬x0 ) ∨

( x2 ∧ x1 ∧ x0 )y2 = ( x1 ∧ ¬x0 ) ∨

( x2 ∧ ¬x0 ) ∨

( ¬x2 ∧ ¬x1 ∧ x0 )

which are enough to implement the look-up table: we pass x as input, and it produces the right y(for this fixed T) as output.

S21. This is a classic “puzzle” question in digital logic. There are a few ways to describe the strategy, but the oneused here is based on counting the number of inputs which are 1. In short, we start by computing

t1 = ¬(x ∧ y ∨ y ∧ z ∨ x ∧ z)t2 = ¬((x ∧ y ∧ z) ∨ t1 ∧ (x ∨ y ∨ z))

which use our quota of NOT gates. The idea is that t1 = 1 iff. one or zero of x, y and z are 1, and in the sameway t2 = 1 iff. two or zero of x, y and z are 1. This can be hard to see, so consider a truth table

x y z x ∧ y y ∧ z x ∧ z x ∧ y ∧ z x ∨ y ∨ z0 0 0 0 0 0 0 00 0 1 0 0 0 0 10 1 0 0 0 0 0 10 1 1 0 1 0 0 11 0 0 0 0 0 0 11 0 1 0 0 1 0 11 1 0 1 0 0 0 11 1 1 1 1 1 1 1

meaningx y z x ∧ y ∨ y ∧ z ∨ x ∧ z t1 (x ∧ y ∧ z) ∨ t1 ∧ (x ∨ y ∨ z) t2

0 0 0 0 1 0 10 0 1 0 1 1 00 1 0 0 1 1 00 1 1 1 0 0 11 0 0 0 1 1 01 0 1 1 0 0 11 1 0 1 0 0 11 1 1 1 0 1 0

git # ba293a0e @ 2019-11-14 323



and hence t1 and t2 are as required. Now, we can generate the three results as

¬x = (t1 ∧ t2) ∨ (t1 ∧ (y ∨ z)) ∨ (t2 ∧ y ∧ z)= (t1 ∧ t2) ∨ (t1 ∧ y) ∨ (t1 ∧ z) ∨ (t2 ∧ y ∧ z)

¬y = (t1 ∧ t2) ∨ (t1 ∧ (x ∨ z)) ∨ (t2 ∧ x ∧ z)= (t1 ∧ t2) ∨ (t1 ∧ x) ∨ (t1 ∧ z) ∨ (t2 ∧ x ∧ z)

¬z = (t1 ∧ t2) ∨ (t1 ∧ (x ∨ y)) ∨ (t2 ∧ x ∧ y)= (t1 ∧ t2) ∨ (t1 ∧ x) ∨ (t1 ∧ y) ∨ (t2 ∧ x ∧ y)

S22. Imagine that for some n-bit input x, we let yi = Ci(x) denote the evaluation of Ci to get an output yi. As such,the equivalence of C1 and C2 can be stated as a test whether y1 = y2 for all values of x; another way to say thesame thing is to test whether an x exists such that y1 , y2 which will distinguish the circuits, i.e., imply theyare not equivalent.

Using the second formulation, we can write the test as y1 ⊕ y2 since the XOR will produce 1 when y1 differsfrom y2 and 0 otherwise. As such, we have n Boolean variables (the bits of x) and want an assignment thatimplies the expression C1(x) ⊕ C2(x) will evaluate to 1. This is the same as described in the description of SAT,so if we can solve the SAT instance we prove the circuits are (not) equivalent.

S23. a The latency of the circuit is the time taken to perform the computation, i.e., to compute some r given x.For this circuit, the latency is simply the sum of the critical paths.

b The throughput is the number of operations performed per unit time period. This is essentially thenumber of operations we can start (resp. that finish) within that time period.

By pipelining the circuit, using say 3 stages, one might expect the latency to increase slightly (by virtue ofhaving to add pipeline registers between each stage) but the throughput to increase (by virtue of decreasingthe overall critical path to the longest stage, and hence increasing the maximum clock frequency). The trade-offis strongly influenced by the number of and balance between stages, meaning careful analysis of the circuitbefore applying the optimisation is important.

S24. a The latency of a circuit is the time elapsed between when a given operation starts and when it finishes.The throughput of a circuit is the number of operations that can be started in each time period; that is,how long it takes between when two subsequent operations can be started.

b The latency of the circuit is the sum of all the latencies of the parts,i.e.,

40ns + 10ns + 30ns + 10ns + 50ns + 10ns + 10ns = 160ns.

The throughput relates to the length of the longest pipeline stage; the circuit is not pipelined, so morespecifically we can say it is 1

160·10−9 .

c The new latency is still the sum of all the parts, but now includes the extra pipeline register:

40ns + 10ns + 30ns + 10ns + 10ns + 50ns + 10ns + 10ns = 170ns.

However, the throughput is now more because the longest pipeline stage only has a latency of 100ns(including the extra register). Specifically, the throughput increases to 1

100·10−9 which essentially means wecan start new operations more often than before.

d To maximise the throughput we need to minimise the latency of the longest pipeline stage (i.e., the onewhose individual latency is the largest) since this will act as a limit. The latency of part E is largest (at50ns) and hence represents said limit: the longest pipeline stage cannot have a latency of less than 60ns(i.e., the latency of part E plus the latency of a pipeline register).

We can achieve this by creating a 4-stage pipeline: adding two more pipeline registers, between parts Band C and parts E and F, ensures the stages have latencies of

A + B + REG { 40ns + 10ns + 10ns = 60nsC + D + REG { 30ns + 10ns + 10ns = 50nsE + REG { 50ns + 10ns = 60nsF + REG { 10ns + 10ns = 20ns

Overall, the latency is increased to 190ns but the throughput is 160·10−9 .

git # ba293a0e @ 2019-11-14 324




S25. a Reference implementation:r = ( ¬y ∧ ¬z ) ∨

( y ∧ ¬z ) ∨

( y ∧ z )

b Annotated Karnaugh map

�� 1 0

1 1y

z

and associated, optimised implementation:

r = ( ¬z ) ∨

( y )


( ¬y ∧ z ) ∨

( y ∧ z )


�� 1 1

0 1y

z


r = ( ¬y ) ∨

( z )

S27. a Reference implementation:r = ( ¬y ∧ ¬z )


��1 00 0y

z


r = ( ¬y ∧ ¬z )

S28. a Reference implementation:r = ( ¬y ∧ z ) ∨

( y ∧ ¬z )


��0 1

1 0y

z


r = ( y ∧ ¬z ) ∨

( ¬y ∧ z )

git # ba293a0e @ 2019-11-14 325




( y ∧ ¬z )


��1 0

1 0y

z


r = ( ¬z )

S30. a Reference implementation:r = ( y ∧ z )


��0 00 1y

z


r = ( y ∧ z )

S31. a Reference implementation:r = ( y ∧ ¬z ) ∨

( y ∧ z )


�� 0 01 1y

z


r = ( y )


( y ∧ z )


��1 00 1y

z


r = ( y ∧ z ) ∨

( ¬y ∧ ¬z )

S33. a Reference implementation:r = ( ¬y ∧ z )


git # ba293a0e @ 2019-11-14 326



��0 10 0y

z


r = ( ¬y ∧ z )

S34. a Reference implementation:r = ( y ∧ ¬z )


��0 01 0y

z


r = ( y ∧ ¬z )


( y ∧ z )


��0 1

0 1y

z


r = ( z )


( ¬y ∧ z ) ∨

( y ∧ ¬z )


�� 1 1

1 0y

z


r = ( ¬y ) ∨

( ¬z )


( y ∧ ¬z ) ∨

( y ∧ z )


�� 0 1

1 1y

z

git # ba293a0e @ 2019-11-14 327




r = ( z ) ∨

( y )


( ¬y ∧ z )


�� 1 10 0y

z


r = ( ¬y )




��0 ?

0 1y

z


r = ( z )


( y ∧ ¬z )


�� 0 1

1 ?y

z


r = ( z ) ∨

( y )



�� 1 ?? 0y

z


r = ( ¬y )

git # ba293a0e @ 2019-11-14 328





��0 ?

? 1y

z


r = ( z )


( y ∧ z )


��0 1

? 1y

z


r = ( z )


( y ∧ ¬z )


��1 ?

1 0y

z


r = ( ¬z )



��1 00 ?y

z


r = ( ¬y ∧ ¬z )


( y ∧ z )


git # ba293a0e @ 2019-11-14 329



��? 1

0 1y

z


r = ( z )



�� 0 01 ?y

z


r = ( y )



��0 ?1 0y

z


r = ( y ∧ ¬z )


( y ∧ ¬z ) ∨

( y ∧ z )


�� 0 1

1 1y

z


r = ( z ) ∨

( y )


( ¬y ∧ z )


�� 1 1? 0y

z

git # ba293a0e @ 2019-11-14 330




r = ( ¬y )


( y ∧ z )


�� 1 ?

0 1y

z


r = ( ¬y ) ∨

( z )



�� 1 ?0 ?y

z


r = ( ¬y )



��0 00 1y

z


r = ( y ∧ z )



�� 0 0? 1y

z


r = ( y )


git # ba293a0e @ 2019-11-14 331




��? 1

0 ?y

z


r = ( z )



��0 1? 0y

z


r = ( ¬y ∧ z )



��? ?

0 1y

z


r = ( z )


( y ∧ ¬z )


��1 0

1 0y

z


r = ( ¬z )



��0 10 0y

z


r = ( ¬y ∧ z )

git # ba293a0e @ 2019-11-14 332




( y ∧ ¬z )


��0 1

1 0y

z


r = ( y ∧ ¬z ) ∨

( ¬y ∧ z )



��? ?

1 0y

z


r = ( ¬z )



�� ? 01 ?y

z


r = ( y )


( ¬y ∧ z ) ∨

( y ∧ ¬z )


�� 1 1

1 0y

z


r = ( ¬y ) ∨

( ¬z )


( ¬y ∧ z )

git # ba293a0e @ 2019-11-14 333




�� 1 10 0y

z


r = ( ¬y )


( y ∧ z )


��1 00 1y

z


r = ( y ∧ z ) ∨

( ¬y ∧ ¬z )



��0 01 0y

z


r = ( y ∧ ¬z )



�� 0 ?1 ?y

z


r = ( y )


S68. a Reference implementation:r = ( ¬x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ ¬z )


git # ba293a0e @ 2019-11-14 334



�� 1 00 0

1000

x

y

z


r = ( ¬y ∧ ¬z )


( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


�� 1 1

0 00101

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( x ∧ z )

S70. a Reference implementation:r = ( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z )


�� 0 0

1 10100

x

y

z


r = ( ¬x ∧ y ) ∨

( x ∧ ¬y ∧ z )


( ¬x ∧ y ∧ ¬z )


��1 0

1 00000

x

y

z


r = ( ¬x ∧ ¬z )

git # ba293a0e @ 2019-11-14 335




( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


�� 1 0

1 10011

x

y

z


r = ( y ) ∨

( ¬x ∧ ¬z )

S73. a Reference implementation:r = ( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ ¬z )


��0 0

0 01010

x

y

z


r = ( x ∧ ¬z )

S74. a Reference implementation:r = ( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 0 1

1 00100

x

y

z


r = ( ¬x ∧ y ∧ ¬z ) ∨

( ¬y ∧ z )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��0 10 1

1001

x

y

z

git # ba293a0e @ 2019-11-14 336




r = ( ¬x ∧ z ) ∨

( y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z )


�� 0 11 0

0110

x

y

z


r = ( y ∧ ¬z ) ∨

( ¬y ∧ z )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


�� 0 0

1 10111

x

y

z


r = ( y ) ∨

( x ∧ z )

S78. a Reference implementation:r = ( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


�� 0 0

0 10101

x

y

z


r = ( y ∧ z ) ∨

( x ∧ z )

git # ba293a0e @ 2019-11-14 337




( ¬x ∧ y ∧ z ) ∨

( x ∧ y ∧ z )


�� 1 00 1

0001

x

y

z


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨

( y ∧ z )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z )


�� 1 1

1 00000

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( ¬x ∧ ¬z )


( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


�� 1 1

0 01101

x

y

z


r = ( ¬y ) ∨

( x ∧ z )


( x ∧ ¬y ∧ z )


�� 0 10 0

0100

x

y

z

git # ba293a0e @ 2019-11-14 338




r = ( ¬y ∧ z )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ y ∧ z )


��

�� 1 1

1 10001

x

y

z


r = ( ¬x ) ∨

( y ∧ z )


( x ∧ y ∧ z )


��0 0

0 01001

x

y

z


r = ( x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z )


��

�� 1 00 1

1110

x

y

z


r = ( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬z ) ∨

( x ∧ ¬y ) ∨

( ¬y ∧ ¬z )

git # ba293a0e @ 2019-11-14 339




( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


�� 0 10 0

0011

x

y

z


r = ( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ y )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 1 0

0 11100

x

y

z


r = ( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ) ∨

( ¬y ∧ ¬z )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 1 1

1 01100

x

y

z


r = ( ¬y ) ∨

( ¬x ∧ ¬z )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )

git # ba293a0e @ 2019-11-14 340




�� 1 0

1 01100

x

y

z


r = ( x ∧ ¬y ) ∨

( ¬x ∧ ¬z )


( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��

�� 0 10 0

1111

x

y

z


r = ( x ) ∨

( ¬y ∧ z )


( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


�� 0 1

0 00111

x

y

z


r = ( x ∧ y ) ∨

( ¬y ∧ z )


( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��0 0

1 01111

x

y

z

git # ba293a0e @ 2019-11-14 341




r = ( y ∧ ¬z ) ∨

( x )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z )


��

��1 1

1 10000

x

y

z


r = ( ¬x )

S94. a Reference implementation:r = ( ¬x ∧ ¬y ∧ z )


��0 10 0

0000

x

y

z


r = ( ¬x ∧ ¬y ∧ z )


( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


�� 1 0

0 01101

x

y

z


r = ( x ∧ z ) ∨

( ¬y ∧ ¬z )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 342




�� 0 0

1 11011

x

y

z


r = ( x ∧ ¬z ) ∨

( y )


( x ∧ y ∧ ¬z )


�� 0 10 0

0010

x

y

z


r = ( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��1 0

1 00111

x

y

z


r = ( x ∧ y ) ∨

( x ∧ z ) ∨

( ¬x ∧ ¬z )


( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


��0 0

1 00101

x

y

z

git # ba293a0e @ 2019-11-14 343




r = ( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ z )


( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


�� 1 0

1 11101

x

y

z


r = ( ¬y ∧ ¬z ) ∨

( x ∧ z ) ∨

( ¬x ∧ y )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��0 1

1 00001

x

y

z


r = ( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


��

��

�� 1 10 1

1101

x

y

z


r = ( z ) ∨

( ¬y )

git # ba293a0e @ 2019-11-14 344




( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 0 0

1 01100

x

y

z


r = ( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y )


( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 0 1

1 11100

x

y

z


r = ( ¬x ∧ z ) ∨

( x ∧ ¬y ) ∨

( ¬x ∧ y )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 0 01 1

1100

x

y

z


r = ( x ∧ ¬y ) ∨

( ¬x ∧ y )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z )

git # ba293a0e @ 2019-11-14 345




��

��

�� 1 11 1

0100

x

y

z


r = ( ¬x ) ∨

( ¬y ∧ z )


( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ ¬z )


�� 1 1

0 01010

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( x ∧ ¬z )


( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 0 0

0 11100

x

y

z


r = ( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y )


( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

�� 1 0

0 01111

x

y

z

git # ba293a0e @ 2019-11-14 346




r = ( x ) ∨

( ¬y ∧ ¬z )


( x ∧ y ∧ z )


�� 0 01 0

0001

x

y

z


r = ( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

�� 0 11 0

1011

x

y

z


r = ( x ∧ ¬z ) ∨

( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ y ) ∨

( y ∧ ¬z )


( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z )


�� 0 1

1 10110

x

y

z


r = ( ¬x ∧ z ) ∨

( y ∧ ¬z ) ∨

( ¬y ∧ z )

git # ba293a0e @ 2019-11-14 347




( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��1 0

0 10111

x

y

z


r = ( y ∧ z ) ∨

( x ∧ y ) ∨

( ¬x ∧ ¬y ∧ ¬z ) ∨

( x ∧ z )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


��

��1 1

0 10101

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( z )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ z )


�� 1 1

1 01001

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( x ∧ y ∧ z ) ∨

( ¬x ∧ ¬z ) ∨

( ¬y ∧ ¬z )

git # ba293a0e @ 2019-11-14 348




( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


��

��

�� 0 11 1

1101

x

y

z


r = ( z ) ∨

( x ∧ ¬y ) ∨

( ¬x ∧ y )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 1 1

1 00100

x

y

z


r = ( ¬x ∧ ¬z ) ∨

( ¬y ∧ z )


S118. a Reference implementation:r = ( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z )


�� ? ?

0 ?0110

x

y

z


r = ( x ∧ y ∧ ¬z ) ∨

( ¬y ∧ z )


( x ∧ y ∧ z )


git # ba293a0e @ 2019-11-14 349



��

��0 1

0 ????1

x

y

z


r = ( z )


( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z )


�� 1 01 1

10?0

x

y

z


r = ( ¬z ) ∨

( ¬x ∧ y )

S121. a Reference implementation:r = ( x ∧ ¬y ∧ ¬z )


��

��? 0

? ?1???

x

y

z


r = ( x )


( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��0 ?

? 0?111

x

y

z


r = ( x )


( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 350




��

��? 0

? 1?1?1

x

y

z


r = ( y ) ∨

( x )


( x ∧ ¬y ∧ ¬z )


�� 1 0? ?

1000

x

y

z


r = ( ¬y ∧ ¬z )

S125. a Reference implementation:r = ( x ∧ y ∧ z )


�� 0 0? ?

??01

x

y

z


r = ( y ∧ z )


( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��

�� 0 1? 0

??11

x

y

z


r = ( x ) ∨

( ¬y ∧ z )


( ¬x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 351




��? 1

0 1?0??

x

y

z


r = ( ¬x ∧ z )

S128. a Reference implementation:r = ( ¬x ∧ y ∧ z )


��0 00 1

?000

x

y

z


r = ( ¬x ∧ y ∧ z )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

��0 1

1 ????1

x

y

z


r = ( y ) ∨

( z )


( ¬x ∧ y ∧ z )


��

��? 1

? 10?0?

x

y

z


r = ( z )


( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ ¬z )

git # ba293a0e @ 2019-11-14 352




��

�� 0 ?

1 0111?

x

y

z


r = ( x ) ∨

( y ∧ ¬z )



��

��0 ?

? 10?0?

x

y

z


r = ( z )

S133. a Reference implementation:r = ( ¬x ∧ y ∧ ¬z )


�� 0 01 0

00?0

x

y

z


r = ( y ∧ ¬z )


( x ∧ y ∧ ¬z )


��

�� ? 00 ?

?110

x

y

z


r = ( x ∧ ¬z ) ∨

( x ∧ ¬y )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ ¬z )

git # ba293a0e @ 2019-11-14 353




��

�� 0 1

1 ?1010

x

y

z


r = ( ¬x ∧ z ) ∨

( x ∧ ¬z ) ∨

( y ∧ ¬z )

S136. a Reference implementation:r = ( x ∧ y ∧ ¬z ) ∨

( x ∧ y ∧ z )


�� ? ?? ?

?011

x

y

z


r = ( y )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


�� 0 0

1 1?101

x

y

z


r = ( x ∧ z ) ∨

( ¬x ∧ y )

S138. a Reference implementation:r = ( ¬x ∧ ¬y ∧ ¬z )


�� 1 ?? 0

?000

x

y

z


r = ( ¬x ∧ ¬y )

git # ba293a0e @ 2019-11-14 354




( ¬x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


��

�� 1 1

? ?00?1

x

y

z


r = ( ¬x ) ∨

( y )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


�� 1 11 0

?1??

x

y

z


r = ( ¬z ) ∨

( ¬y )


( x ∧ y ∧ z )


�� 0 ?

? 01?01

x

y

z


r = ( x ∧ ¬y ) ∨

( x ∧ z )


( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ y ∧ z )


�� 1 01 1

?0?1

x

y

z

git # ba293a0e @ 2019-11-14 355




r = ( ¬z ) ∨

( y )


( x ∧ ¬y ∧ z )


�� 1 ?? 0

?1??

x

y

z


r = ( ¬y )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z )


��

��1 1

0 101??

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( z )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ z )


��

�� 0 11 0

1?01

x

y

z


r = ( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ) ∨

( x ∧ z ) ∨

( ¬y ∧ z )


( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 356




��

��

�� ? ?? ?

1101

x

y

z


r = ( z ) ∨

( ¬y )


( x ∧ y ∧ z )


�� ? 01 0

0??1

x

y

z


r = ( x ∧ y ) ∨

( y ∧ ¬z )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z )


��

��

�� ? ?1 1

?1?0

x

y

z


r = ( ¬x ) ∨

( ¬y )


( ¬x ∧ y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( x ∧ y ∧ z )


�� 1 0

1 1?001

x

y

z


r = ( y ∧ z ) ∨

( ¬x ∧ ¬z )

git # ba293a0e @ 2019-11-14 357




( x ∧ y ∧ ¬z )


��0 0

0 ?0110

x

y

z


r = ( x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z )


( x ∧ y ∧ ¬z )


��

�� ? 10 0

??10

x

y

z


r = ( x ∧ ¬z ) ∨

( ¬y )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ z )


��

��1 1

? 1?0?0

x

y

z


r = ( ¬x )


( ¬x ∧ y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


��

��1 0

1 ??1?1

x

y

z

git # ba293a0e @ 2019-11-14 358




r = ( ¬z ) ∨

( x )


( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ z )


�� ? ?? 1

10?1

x

y

z


r = ( ¬z ) ∨

( y )



�� ? 0? 1

0000

x

y

z


r = ( ¬x ∧ y )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )


�� ? 0

1 101?1

x

y

z


r = ( y ) ∨

( x ∧ z )


( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( x ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 359




��

��

�� ? 10 1

1101

x

y

z


r = ( z ) ∨

( ¬y )


( x ∧ ¬y ∧ ¬z )


��

��? ?

0 110?0

x

y

z


r = ( ¬x ∧ z ) ∨

( x ∧ ¬z )


( x ∧ ¬y ∧ ¬z )


�� 0 ?

1 010??

x

y

z


r = ( x ∧ ¬z ) ∨

( y ∧ ¬z )


( ¬x ∧ ¬y ∧ z ) ∨

( ¬x ∧ y ∧ ¬z )


�� 1 1

1 0??00

x

y

z


r = ( ¬y ) ∨

( ¬x ∧ ¬z )

git # ba293a0e @ 2019-11-14 360



S161. a Reference implementation:r = ( ¬x ∧ ¬y ∧ ¬z )


1 0? ?

?0??

x

y

z


r = ( ¬z )


S162. a Reference implementation:

r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


� ��

��

1 00 0

0000

1 00 1

1111w

x

y

z


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ y ∧ z ) ∨

( w ∧ x )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

� �� 1 01 0

0010

1 00 0

1101w

x

y

z

git # ba293a0e @ 2019-11-14 361




r = ( w ∧ x ∧ ¬y ) ∨

( w ∧ x ∧ z ) ∨

( ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


� ��

��

��1 1

1 00011

0 10 0

1100w

x

y

z


r = ( ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ) ∨

( ¬w ∧ x ∧ y ) ∨

( ¬w ∧ ¬x ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

1 11 0

0011

0 10 0

1101w

x

y

z


r = ( ¬w ∧ ¬x ∧ ¬y ) ∨

( w ∧ x ∧ ¬y ) ∨

( ¬w ∧ y ∧ ¬z ) ∨

( w ∧ ¬y ∧ z ) ∨

( x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 362




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

1 01 0

1001

1 01 0

1111w

x

y

z


r = ( x ∧ y ∧ z ) ∨

( ¬x ∧ ¬z ) ∨

( w ∧ x ) ∨

( ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


� ��

��

� 1 11 1

1011

1 10 1

0001w

x

y

z


r = ( ¬w ∧ ¬z ) ∨

( y ∧ z ) ∨

( ¬x ∧ ¬y )

git # ba293a0e @ 2019-11-14 363




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

�� 1 1

1 11100

1 00 0

1110w

x

y

z


r = ( w ∧ x ∧ ¬z ) ∨

( ¬w ∧ ¬x ) ∨

( x ∧ ¬y ) ∨

( ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

0 10 0

1010

1 01 1

0110w

x

y

z


r = ( w ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ) ∨

( ¬w ∧ x ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )

git # ba293a0e @ 2019-11-14 364




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

�� 1 1

1 00110

0 10 0

1100w

x

y

z


r = ( ¬w ∧ ¬x ∧ ¬y ) ∨

( w ∧ x ∧ ¬y ) ∨

( ¬w ∧ y ∧ ¬z ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

�� 0 11 1

1010

0 10 1

1010w

x

y

z


r = ( ¬x ∧ z ) ∨

( x ∧ ¬z ) ∨

( ¬w ∧ y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 365




��

�� 0 10 0

1110

0 01 1

1001w

x

y

z


r = ( w ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬z ) ∨

( ¬w ∧ ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

1 00 0

1001

0 01 0

1011w

x

y

z


r = ( w ∧ x ∧ ¬z ) ∨

( w ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬y ∧ ¬z ) ∨

( x ∧ y ∧ z )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

� �� 0 01 0

0111

0 01 1

0110w

x

y

z

git # ba293a0e @ 2019-11-14 366




r = ( ¬w ∧ x ∧ z ) ∨

( w ∧ ¬x ∧ y ) ∨

( x ∧ ¬y ∧ z ) ∨

( y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

� �0 01 1

0111

1 10 0

0100w

x

y

z


r = ( w ∧ ¬x ∧ ¬y ) ∨

( ¬w ∧ y ) ∨

( x ∧ ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��1 00 0

0110

0 00 1

1011w

x

y

z


r = ( w ∧ x ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ y ∧ z ) ∨

( x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z )

git # ba293a0e @ 2019-11-14 367




r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��0 10 1

0010

0 01 0

1001w

x

y

z


r = ( ¬w ∧ ¬x ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

1 00 1

1111

1 11 1

1110w

x

y

z


r = ( w ∧ ¬y ) ∨

( ¬y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ) ∨

( w ∧ ¬z )

git # ba293a0e @ 2019-11-14 368




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

� � ��

�� 1 00 1

1000

1 00 1

0101w

x

y

z


r = ( ¬x ∧ y ∧ z ) ∨

( ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ z ) ∨

( ¬w ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

1 10 1

0000

0 01 1

1010w

x

y

z


r = ( ¬w ∧ ¬x ∧ ¬y ) ∨

( w ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬z ) ∨

( ¬x ∧ y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


git # ba293a0e @ 2019-11-14 369



�� 1 0

0 11100

0 00 1

0100w

x

y

z


r = ( ¬x ∧ y ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


�� 1 1

0 00100

0 00 1

0011w

x

y

z


r = ( ¬w ∧ ¬x ∧ ¬y ) ∨

( w ∧ x ∧ y ) ∨

( w ∧ y ∧ z ) ∨

( ¬w ∧ ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

�� 1 00 1

1000

0 00 0

0101w

x

y

z


r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ z ) ∨

( ¬w ∧ ¬y ∧ ¬z )

git # ba293a0e @ 2019-11-14 370




r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

0 11 1

0011

0 00 0

0100w

x

y

z


r = ( ¬w ∧ ¬x ∧ z ) ∨

( ¬w ∧ y ) ∨

( w ∧ x ∧ ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

��

0 11 1

0011

0 00 1

1010w

x

y

z


r = ( ¬w ∧ ¬x ∧ z ) ∨

( w ∧ x ∧ ¬z ) ∨

( ¬w ∧ y ) ∨

( ¬x ∧ y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )

git # ba293a0e @ 2019-11-14 371




��

��

�� 1 11 0

1110

1 11 1

0010w

x

y

z


r = ( w ∧ ¬x ) ∨

( ¬w ∧ ¬y ) ∨

( y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

� 0 10 0

0111

0 11 0

0100w

x

y

z


r = ( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


� ��

��

� 1 11 1

1100

0 10 0

1100w

x

y

z

git # ba293a0e @ 2019-11-14 372




r = ( x ∧ ¬y ) ∨

( ¬w ∧ ¬x ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

� �� 0 11 1

1010

0 11 0

1111w

x

y

z


r = ( ¬w ∧ ¬x ∧ z ) ∨

( x ∧ ¬z ) ∨

( w ∧ x ) ∨

( y ∧ ¬z ) ∨

( w ∧ ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

� � ��

1 00 1

0101

1 01 1

0001w

x

y

z


r = ( y ∧ z ) ∨

( ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ z ) ∨

( w ∧ ¬x ∧ y )

git # ba293a0e @ 2019-11-14 373




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z )


��

��1 0

1 00011

0 10 0

0000w

x

y

z


r = ( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ) ∨

( ¬w ∧ ¬x ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

��

� �1 10 1

0111

1 00 0

1011w

x

y

z


r = ( w ∧ x ∧ ¬z ) ∨

( x ∧ y ) ∨

( ¬w ∧ z ) ∨

( ¬x ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 374




��

� 0 10 1

0100

0 11 0

0101w

x

y

z


r = ( ¬w ∧ ¬x ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ z ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


� ��

��

1 11 0

0010

1 10 0

1111w

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( ¬w ∧ y ∧ ¬z ) ∨

( w ∧ x )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

��

��

1 11 1

0101

0 00 1

1001w

x

y

z

git # ba293a0e @ 2019-11-14 375




r = ( y ∧ z ) ∨

( ¬w ∧ ¬x ) ∨

( ¬w ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

�� 0 01 1

0010

1 10 0

1011w

x

y

z


r = ( w ∧ x ∧ ¬z ) ∨

( w ∧ x ∧ y ) ∨

( x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ) ∨

( ¬w ∧ ¬x ∧ y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

� 1 10 0

0111

0 10 0

0111w

x

y

z


r = ( ¬w ∧ ¬x ∧ ¬y ) ∨

( x ∧ y ) ∨

( ¬y ∧ z )

git # ba293a0e @ 2019-11-14 376




r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

�� 0 10 0

0100

1 11 1

1011w

x

y

z


r = ( w ∧ y ) ∨

( w ∧ ¬z ) ∨

( w ∧ ¬x ) ∨

( ¬w ∧ ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

�� 1 0

1 11101

0 11 0

0011w

x

y

z


r = ( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ z ) ∨

( ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ) ∨

( ¬w ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 377




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

�� 1 01 0

1101

1 01 1

0101w

x

y

z


r = ( w ∧ y ∧ z ) ∨

( ¬x ∧ ¬z ) ∨

( x ∧ z ) ∨

( ¬w ∧ x ∧ ¬y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z )


��

�� 0 1

0 11101

0 00 0

1000w

x

y

z


r = ( ¬w ∧ z ) ∨

( x ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 378




��

�� 0 10 0

1101

1 10 1

0001w

x

y

z


r = ( x ∧ y ∧ z ) ∨

( w ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ) ∨

( ¬w ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

��

��0 1

1 10111

0 01 1

1110w

x

y

z


r = ( w ∧ x ∧ ¬y ) ∨

( y ∧ ¬z ) ∨

( ¬x ∧ y ) ∨

( ¬w ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

�� 1 0

0 00111

0 11 0

0001w

x

y

z

git # ba293a0e @ 2019-11-14 379




r = ( x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ) ∨

( ¬w ∧ x ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


� ��

��

� 0 10 0

1101

0 11 1

1110w

x

y

z


r = ( x ∧ ¬y ) ∨

( w ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ z ) ∨

( ¬w ∧ x ∧ z ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


�� 1 10 0

1100

0 01 0

0011w

x

y

z


r = ( w ∧ x ∧ y ) ∨

( ¬w ∧ ¬y ) ∨

( w ∧ y ∧ ¬z )

git # ba293a0e @ 2019-11-14 380




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

�� 1 0

1 01010

1 01 1

0101w

x

y

z


r = ( w ∧ x ∧ z ) ∨

( w ∧ y ∧ z ) ∨

( ¬x ∧ ¬z ) ∨

( ¬w ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

�� 1 1

1 00101

1 01 0

0001w

x

y

z


r = ( x ∧ y ∧ z ) ∨

( ¬x ∧ ¬z ) ∨

( ¬w ∧ ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )

git # ba293a0e @ 2019-11-14 381




��

��1 1

1 00100

0 11 0

0110w

x

y

z


r = ( w ∧ y ∧ ¬z ) ∨

( ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

��1 0

1 10101

1 10 1

0110w

x

y

z


r = ( ¬w ∧ ¬x ∧ y ) ∨

( w ∧ ¬x ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬y ∧ z ) ∨

( ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

� �� 1 0

0 10101

1 00 0

1010w

x

y

z

git # ba293a0e @ 2019-11-14 382




r = ( w ∧ x ∧ ¬z ) ∨

( ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ z )



r = ( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

? ?? ?

0??1

1 01 0

?1?1w

x

y

z


r = ( x ∧ z ) ∨

( w ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

0 00 1

1010

1 01 0

?101w

x

y

z


r = ( w ∧ ¬x ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬z ) ∨

( w ∧ x ∧ z )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 383




��

� ��

? 01 ?

01?0

1 11 0

?101w

x

y

z


r = ( w ∧ x ∧ z ) ∨

( x ∧ ¬y ∧ z ) ∨

( w ∧ ¬y ) ∨

( ¬x ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��? ?

1 11?00

? ?1 0

01?1w

x

y

z


r = ( w ∧ x ∧ z ) ∨

( w ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬y ) ∨

( ¬w ∧ ¬x )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

��

��

0 1? 1

1011

0 10 ?

?111w

x

y

z

git # ba293a0e @ 2019-11-14 384




r = ( x ∧ ¬z ) ∨

( x ∧ y ) ∨

( ¬x ∧ z ) ∨

( w ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


� ��

�� 1 ?? 0

1?11

0 01 1

0101w

x

y

z


r = ( ¬w ∧ ¬z ) ∨

( x ∧ z ) ∨

( w ∧ ¬x ∧ y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z )


��

��

��

��1 ?

1 ??111

0 1? ?

10??w

x

y

z


r = ( ¬x ∧ z ) ∨

( x ∧ ¬z ) ∨

( ¬w )

git # ba293a0e @ 2019-11-14 385




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z )


� ��

��

1 10 ?

0??0

1 11 1

????w

x

y

z


r = ( ¬x ∧ ¬y ) ∨

( w )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

1 ?0 1

1000

0 1? 0

?111w

x

y

z


r = ( ¬w ∧ ¬x ∧ z ) ∨

( w ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬y ∧ ¬z ) ∨

( w ∧ x )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z )


��

��0 1? ?

1000

0 ?? 0

000?w

x

y

z

git # ba293a0e @ 2019-11-14 386




r = ( ¬w ∧ ¬x ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

0 01 1

1011

? 00 ?

11?1w

x

y

z


r = ( x ∧ ¬z ) ∨

( ¬w ∧ y ) ∨

( w ∧ x )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z )


��

��

��

��? 1

0 1?101

? 11 ?

??00w

x

y

z


r = ( w ∧ ¬x ) ∨

( ¬w ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )

git # ba293a0e @ 2019-11-14 387




��

��

0 10 1

0?0?

0 ?0 1

010?w

x

y

z


r = ( z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

��

� ? 10 0

0?0?

1 ?1 ?

1100w

x

y

z


r = ( w ∧ ¬y ) ∨

( w ∧ ¬x ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

� ��1 01 0

1011

? 10 1

?0?1w

x

y

z


r = ( w ∧ ¬x ∧ z ) ∨

( x ∧ y ) ∨

( ¬w ∧ ¬z )

git # ba293a0e @ 2019-11-14 388




r = ( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


� ��

��

��

0 0? 0

11?1

1 11 0

1?01w

x

y

z


r = ( x ∧ ¬y ) ∨

( w ∧ ¬x ∧ ¬z ) ∨

( w ∧ ¬y ) ∨

( x ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z )


��

�� 1 0

0 1?01?

? 1? ?

??00w

x

y

z


r = ( ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ) ∨

( ¬w ∧ x ∧ y ) ∨

( ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


git # ba293a0e @ 2019-11-14 389



��

��

? 00 1

0?00

0 ?0 1

111?w

x

y

z


r = ( ¬x ∧ y ∧ z ) ∨

( w ∧ x )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

��

��

��

� �� 0 0

1 ?1110

1 1? ?

01??w

x

y

z


r = ( w ∧ z ) ∨

( w ∧ ¬x ) ∨

( y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

��

��

1 ?? 1

0?00

0 1? ?

?1??w

x

y

z


r = ( ¬w ∧ ¬x ) ∨

( w ∧ z )

git # ba293a0e @ 2019-11-14 390




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z )


��

��

��1 ?

1 1?000

0 ?1 0

?000w

x

y

z


r = ( ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z )


� ��

��

? 01 0

1010

0 11 1

??0?w

x

y

z


r = ( ¬w ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ) ∨

( w ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z )


�� 1 ?

1 001?0

1 11 0

0??0w

x

y

z

git # ba293a0e @ 2019-11-14 391




r = ( ¬x ∧ ¬z ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

��

? ?? ?

01??

0 ?1 ?

11?1w

x

y

z


r = ( z ) ∨

( y ) ∨

( w ∧ x )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z )


� �� 1 0

1 00000

1 00 0

0010w

x

y

z


r = ( ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬z )


r = ( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z )


git # ba293a0e @ 2019-11-14 392



��

��0 0

? 0?001

1 1? 1

0?00w

x

y

z


r = ( w ∧ ¬x ) ∨

( ¬w ∧ x ∧ y ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

��

��

��

1 00 1

?111

? 11 ?

0101w

x

y

z


r = ( w ∧ ¬x ) ∨

( y ∧ z ) ∨

( ¬w ∧ x ) ∨

( ¬w ∧ ¬y ∧ ¬z ) ∨

( x ∧ z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


� ��

��

1 11 1

1010

1 00 ?

?100w

x

y

z

git # ba293a0e @ 2019-11-14 393




r = ( ¬w ∧ ¬z ) ∨

( ¬w ∧ ¬x ) ∨

( w ∧ x ∧ ¬y ) ∨

( ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

��1 0

? 00?01

0 ?0 0

?010w

x

y

z


r = ( w ∧ x ∧ ¬z ) ∨

( ¬w ∧ x ∧ z ) ∨

( ¬w ∧ ¬x ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

��1 1

1 00000

? 00 ?

1100w

x

y

z


r = ( ¬w ∧ ¬x ∧ ¬y ) ∨

( w ∧ x ∧ ¬y ) ∨

( ¬w ∧ ¬x ∧ ¬z )


r = ( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 394




��

��

��

��? ?

? 01??1

1 1? 1

0111w

x

y

z


r = ( w ∧ ¬x ) ∨

( x ∧ z ) ∨

( w ∧ y ) ∨

( ¬w ∧ x )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


� ��

� ��

��

0 11 1

1100

? ?1 1

?10?w

x

y

z


r = ( x ∧ ¬y ) ∨

( ¬y ∧ z ) ∨

( ¬x ∧ y )


r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z )


��

��

0 ?0 1

?000

? 10 ?

1???w

x

y

z


r = ( ¬x ∧ z ) ∨

( w ∧ x )

git # ba293a0e @ 2019-11-14 395




r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

��

��

� 1 ?0 ?

11?1

? ?? 1

0100w

x

y

z


r = ( ¬x ∧ z ) ∨

( ¬w ∧ ¬y ) ∨

( ¬w ∧ x ) ∨

( ¬y ∧ z )


r = ( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

? ?0 ?

1111

? 11 ?

?1?1w

x

y

z


r = ( x ) ∨

( w )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


git # ba293a0e @ 2019-11-14 396



��

��

1 ?0 1

?0?0

1 0? ?

1110w

x

y

z


r = ( w ∧ x ∧ ¬y ) ∨

( x ∧ ¬z ) ∨

( ¬y ∧ ¬z ) ∨

( ¬x ∧ y ∧ z )


r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

? ?0 1

1010

? 0? 1

??01w

x

y

z


r = ( ¬w ∧ x ∧ ¬z ) ∨

( ¬x ∧ y ∧ z ) ∨

( w ∧ y ∧ z )


r = ( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z )


�� ? ?0 ?

?100

? 0? ?

10?0w

x

y

z


r = ( ¬w ∧ ¬y ) ∨

( w ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )

git # ba293a0e @ 2019-11-14 397




��

��

��

��

��

0 11 1

1?0?

0 0? ?

0001w

x

y

z


r = ( ¬w ∧ z ) ∨

( y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ) ∨

( ¬x ∧ y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

��

��

� �� 1 00 ?

0?10

1 1? 1

?1?0w

x

y

z


r = ( w ∧ ¬x ) ∨

( x ∧ y ∧ ¬z ) ∨

( ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬y )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

��

��

0 01 1

0111

? ?0 1

010?w

x

y

z

git # ba293a0e @ 2019-11-14 398




r = ( w ∧ z ) ∨

( ¬w ∧ y ) ∨

( x ∧ z )


r = ( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z )


�� 0 ?0 ?

0011

0 ?? 0

???0w

x

y

z


r = ( ¬w ∧ x ∧ y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


��

� ��

��

0 10 ?

100?

1 00 0

11??w

x

y

z


r = ( ¬w ∧ ¬x ∧ z ) ∨

( x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ) ∨

( w ∧ ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z )


git # ba293a0e @ 2019-11-14 399



��

��1 ?

1 001?1

0 ?1 1

00??w

x

y

z


r = ( w ∧ y ) ∨

( ¬w ∧ x ∧ z ) ∨

( ¬w ∧ ¬x ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z )


� ��

��

0 01 ?

?000

1 01 ?

?1?0w

x

y

z


r = ( w ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ) ∨

( ¬x ∧ y )


r = ( ¬w ∧ x ∧ ¬y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z )


��

��

0 0? ?

?11?

0 0? ?

????w

x

y

z


r = ( x )


r = ( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z )


git # ba293a0e @ 2019-11-14 400



��

�� 0 01 1

00?0

1 ?1 0

00?0w

x

y

z


r = ( w ∧ ¬x ∧ ¬z ) ∨

( ¬w ∧ ¬x ∧ y )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ ¬y ∧ z ) ∨

( w ∧ x ∧ y ∧ ¬z )


��

��

1 00 ?

1?0?

1 0? 0

111?w

x

y

z


r = ( w ∧ x ) ∨

( ¬y ∧ ¬z )


r = ( ¬w ∧ ¬x ∧ ¬y ∧ z ) ∨

( ¬w ∧ ¬x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z )


� �� ? 11 0

1???

1 00 0

???0w

x

y

z


r = ( ¬w ∧ ¬z ) ∨

( ¬w ∧ ¬y ) ∨

( ¬y ∧ ¬z )

git # ba293a0e @ 2019-11-14 401




r = ( ¬w ∧ ¬x ∧ y ∧ z ) ∨

( ¬w ∧ x ∧ y ∧ ¬z ) ∨

( ¬w ∧ x ∧ y ∧ z ) ∨

( w ∧ ¬x ∧ ¬y ∧ ¬z ) ∨

( w ∧ ¬x ∧ ¬y ∧ z ) ∨

( w ∧ ¬x ∧ y ∧ ¬z ) ∨

( w ∧ ¬x ∧ y ∧ z ) ∨

( w ∧ x ∧ ¬y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ ¬z ) ∨

( w ∧ x ∧ y ∧ z )


��

��

��

��

��

0 00 1

?011

1 11 1

1?11w

x

y

z


r = ( x ∧ y ) ∨

( y ∧ z ) ∨

( w )

B.3 Chapter 3

S262. A generic, block diagram style framework for FSMs is as follows:

+---------++-->| \delta || +---------+| ^ || Q | | Q'| | v| +---------+

input --+ | state |<-- clock| +---------+| || Q || v| +---------++-->| \omega |--> output

+---------+

st.

• An n-bit register (middle component) holds Q, the current state of the FSM.

• Within a given clock period, the current state is provided as input to δ, the transition function: based onQ and any input, this computes the next state Q′.

• At the same time that δ is computing the next state, the output function ω computes any output from theFSM; depending on the type of FSM, this might be based on Q only, or on Q and any input.

• A positive edge of the clock signal causes the state to be updated with the output from δ. That is, the FSMadvances from the current to next state; computation by δ and ω is performed in the same way duringthe subsequent clock period, once Q has been updated with Q′.

Note that this framework is assumed in any of the following questions that ask for it.This FSM can be in one of two states: either the bits of X processed so far have an even or odd number of

elements equal to 1; we give each of the states a label, so in this case Seven and Sodd for example. Next we candescribe how the FSM can transitions from some current state to a next state, i.e., how the transition function δworks: based on an input Xi provided at each step, we might draw

git # ba293a0e @ 2019-11-14 402



Sevenstart Sodd

Xi = 0

Xi = 1Xi = 0

Xi = 1

or equivalently sayδ

Q Q′

Xi = 0 Xi = 1Seven Seven SoddSodd Sodd Seven

where Q is the current state and Q′ is the next state.Given the FSM has two states only, we can store the current state using a 1-bit register. Based on a natural

mapping of the abstract to concrete state labels (i.e., Seven 7→ 0 and Sodd 7→ 1), we can rewrite the transitionfunction as a truth table:

Xi Q Q′

0 0 00 1 11 0 11 1 0

and see clearly that Q′ = Q ⊕ Xi. Inspecting Q directly provides the output: if Q = 0 we have (so far) evenparity, and in contrast if Q = 1 we have odd parity. So in a sense the output function ω is simply the identityfunction. In short, the low-level detail filled into the high-level design is very simple (in this case at least) oncethe question has been digested.

One obvious addition would be some form of mechanism to reset the FSM: as stated above, we assume itstarts in the state Seven when powered-on but clearly this may not be true (the content in Q will essentially berandom initially).

S263. a There are several approaches to solving this problem. Possibly the easiest, but perhaps not the mostobvious, is to simply build a shift-register: the register stores the last three inputs, when a new input isavailable the register shifts the content along by one which means the oldest input drops off one end andthe new input is inserted into the other end. One can then build a simple circuit to test the current stateof the shift-register to see if the last three inputs match what is required..Alternatively, one can take a more heavy-weight approach and formulate the solution as a state machine.First we need to decide on an encoding for our state; when searching though the input we can havematched zero through three correct tokens we denote this by the integer S stored in two bits using Q1and Q0 as the most-significant and least-significant respectively. We also need an encoding of the actualinput tokens I which are being passed to the matching circuit. Arbitrarily we might select A = 0, C = 1,G = 2 and T = 3 although other encodings are valid and might actually simplify things; we use I1 andI0 to denote the most and least-significant bits of the input token I. From this we can now create a tabledescribing the mapping between current state S and input I to next state S′ which can be roughly writtenas

I1 I0 Q1 Q0 Q′1 Q′00 0 0 0 0 10 1 0 0 0 01 0 0 0 0 01 1 0 0 0 00 0 0 1 0 10 1 0 1 1 01 0 0 1 0 01 1 0 1 0 00 0 1 0 0 10 1 1 0 0 01 0 1 0 0 01 1 1 0 1 10 0 1 1 0 10 1 1 1 0 01 0 1 1 0 01 1 1 1 0 0

git # ba293a0e @ 2019-11-14 403



Thus, if we are in state (Q1,Q0) = (0, 0) = 0 and see an A input, we move to state (Q′1,Q′

0) = (0, 1) = 1otherwise we stay in state (Q′1,Q

′

0) = (0, 0) = 0. Now we can define the transition function from currentstate to next state as

Q0 = (¬Q1 ∧ ¬Q0 ∧ ¬I1 ∧ ¬I0)∨(¬Q1 ∧Q0 ∧ ¬I1 ∧ ¬I0)∨(Q1 ∧ ¬Q0 ∧ ¬I1 ∧ ¬I0)∨(Q1 ∧Q0 ∧ ¬I1 ∧ ¬I0)∨(Q1 ∧ ¬Q0 ∧ I1 ∧ I0)

Q1 = (¬Q1 ∧Q0 ∧ ¬I1 ∧ I0)∨(Q1 ∧ ¬Q0 ∧ I1 ∧ I0)

with simplifications as appropriate. Finally, the output flag F will be set only according to

F = Q1 ∧Q0

to signal when we have matched three characters. As such, we can realise the FSM framework describedin S262. by filling each component with the associated implementation above.

b Making a general-purpose matching circuit will probably use less logic than having three separate circuits;this will reduce the space required. As an extension one might consider implementing the transition andoutput functions as a look-up table instead of hard-wiring them; this will mean the circuit could be usedto match any sequence providing the tables were correctly initialised. Introducing a more complex circuitdesign could have the disadvantage of increasing the critical path (the longest sequential path thoughthe entire circuit). If the critical path is longer, the design will have to be clocked slower and hence willnot perform the matching function as quickly.

S264. a A basic diagram should show the four states and transitions between them which relate the movementfrom one to the other as a result of the washing cycle, and movement as a result of input from the buttons;for example a (very) basic diagram would be:

+------+ +------+ +------+ +------++-->| idle |-->| fill |-->| wash |-->| spin || +------+ +------+ +------+ +------+| | | | |+------+-----------+----------+----------+

b Since there are four states, we can encode them using one two bits; we assign the following encodingidle = 00, fill = 01, wash = 10 and spin = 11. We use Q1 and Q0 to represent the current state, and Q′1 andQ′0 to represent the next state; B1 and B0 are the input buttons. Using this notation, we can construct thefollowing state transition table which encodes the state machine diagram:

B1 B0 Q1 Q0 Q′1 Q′00 0 0 0 0 00 1 0 0 0 11 0 0 0 0 01 1 0 0 0 00 0 0 1 1 00 1 0 1 1 01 0 0 1 0 01 1 0 1 1 00 0 1 0 1 10 1 1 0 1 11 0 1 0 0 01 1 1 0 1 10 0 1 1 0 00 1 1 1 0 01 0 1 1 0 01 1 1 1 0 0

so that if, for example, the machine is in the wash state (i.e., Q1 = 1 and Q0 = 0) and no buttons are pressedthen the next state is spin (i.e., Q′1 = 1 and Q′0 = 1); however if button B1 is pressed to cancel the cycle, thenext state is idle (i.e., Q′1 = 0 and Q′0 = 0).

c From the state transition table, we can easily extract the two Karnaugh maps:

git # ba293a0e @ 2019-11-14 404



0 01 0

1010

0 00 0

0010

B1

Q1

B0

Q0

0 10 1

1010

0 00 1

0010

B1

Q1

B0

Q0

Basic expressions can be extracted from the tables as follows:

Q′0 = (Q1 ∧ ¬Q0 ∧ B0) ∨ (Q1 ∧ ¬Q0 ∧ ¬B1) ∨ (¬Q0 ∧ B0 ∧ ¬B1)Q′1 = (¬Q1 ∧Q0 ∧ B0) ∨ (¬Q1 ∧Q0 ∧ ¬B1) ∨ (Q1 ∧ ¬Q0 ∧ B0) ∨ (Q1 ∧ ¬Q0 ∧ ¬B1)

and through sharing, i.e., by computing

t0 = ¬B1t1 = ¬B0t2 = ¬Q1t3 = ¬Q0t4 = t2 ∧Q0t5 = t3 ∧Q1

we can simplify these to

Q′0 = (t5 ∧ B0) ∨ (t5 ∧ t0) ∨ (t3 ∧ B0 ∧ t0)Q′1 = (t4 ∧ B0) ∨ (t4 ∧ t0) ∨ (t5 ∧ B0) ∨ (t5 ∧ t0)

S265. a The two properties are defined as follows:

i The Hamming weight of X is the number of bits in X that are equal to 1, i.e., the number of timesXi = 1. This can be computed as

H(X) =

n−1∑i=0

Xi.

ii The Hamming distance between X and Y is the number of bits in X that differ from the correspondingbit in Y, i.e., the number of times Xi , Yi:

D(X,Y) =

n−1∑i=0

Xi ⊕ Yi.

b There are two main approaches to constructing a flip-flop of this type; since both start with an SR-latch,the difference is mainly in how the edge-triggered behaviour is realised. Use of a primary-secondaryorganisation is probably the more complete solution, but a simpler alternative would be to use a pulsegenerator. The overall design can be described roughly as follows:

+---+ +---+D--+---------------------------->| |---S-->| || |AND| |NOR|-->r_0 = ~Qv +-->| | r_1-->| |

+---+ +------------+ | +---+ +---+| | | | ||NOT| en-->| pulse gen. |--+| | | | |+---+ +------------+ | +---+ +---+| +-->| |---R-->| || |AND| |NOR|-->r_1 = Q+---------------------------->| | r_0-->| |

+---+ +---+

There are basically four features to note:

i An SR-latch has two inputs S and R, and two outputs Q and ¬Q. When

• S = 0, R = 0 the component retains Q,• S = 1, R = 0 the component updates to Q = 1,

git # ba293a0e @ 2019-11-14 405



• S = 0, R = 1 the component updates to Q = 0,

• S = 1, R = 1 the component is meta-stable.

The component is level-triggered in the sense that Q is updated within the period of time whenS = 1 or R = 1 (rather than when they transition to said values).

ii To provide more fine-grained control over the component, the two inputs are typically gated using(i.e., AND’ed with) an enable signal en: when en = 0, the latch inputs are always zero and hence itretains the same state, when en = 1 it can be updated as normal.

iii In order to change from the current level-triggered behaviour into an edge-triggered alternative,one approach is to use a pulse generator. The idea here is to intentionally create a mismatch inpropagation delay into the inputs of an AND gate: each time en changes, the result is that we see asmall pulse on the output of the AND gate. Provided this is small enough, one can argue it acts likean edge rather than a level.

iv Finally, the gated S and R inputs are tied together and controlled by one input D meaning S = D andR = ¬D. This prevents the component being used erroneously: it can only retain or update the state.

c The power consumed by CMOS transistors can be decomposed into two parts: the static part (whichrelates to leakage) and the dynamic part (which relates to power consumed when the transistor switches).In short, a value switching (i.e., changing from one value to another) consumes much more power thanstaying the same. In this case, clearly we have an advantage in the all but one of the n bits in the registerwill stay the same; hence in terms of power consumption, storing elements of the the Gray code (versussome other sequence for example) is an advantage.

d See S262..

e As an aside, a potentially neat approach here is to use a Johnson counter. This is basically an n-bit register(initialised to zero) whose content is shifted by one place on each clock edge. The new incoming, 0-th bitis computed as the NOT of the outgoing, (n− 1)-th bit and every other bit is shifted up by one place (e.g.,each i-th bit for 1 ≤ i < n − 1 becomes the (i + 1)-th bit). For n = 3, this produces the sequence

〈0, 0, 0〉〈1, 0, 0〉〈1, 1, 0〉〈1, 1, 1〉〈0, 1, 1〉〈0, 0, 1〉

...

which satisfies the Hamming distances property, but does not include all possible values: for example,〈1, 0, 1〉 is not included. So this does not really answer the question in the sense that we require acomponent that cycles through the full 2n-element sequence, an example of which is

〈0, 0, 0〉〈1, 0, 0〉〈1, 1, 0〉〈0, 1, 0〉〈0, 1, 1〉〈1, 1, 1〉〈1, 0, 1〉〈0, 0, 1〉

As a result, we can use an FSM-based approach based on the framework in the question above. For n = 3there are 23 = 8 elements in the Gray code, and so a 3-bit state Q = 〈Q0,Q1,Q2〉 is enough to store thecurrent element. The output functionω is basically free: we simply provide the current state Q as output,which is also the current element in the Gray code sequence. Based on the inputs Q and rst, the state

git # ba293a0e @ 2019-11-14 406



transition function δ can be described as follows:

rst Q2 Q1 Q0 Q′2 Q′1 Q′00 0 0 0 0 0 10 0 0 1 0 1 10 0 1 1 0 1 00 0 1 0 1 1 00 1 1 0 1 1 10 1 1 1 1 0 10 1 0 1 1 0 00 1 0 0 0 0 01 0 0 0 0 0 01 0 0 1 0 0 01 0 1 1 0 0 01 0 1 0 0 0 01 1 1 0 0 0 01 1 1 1 0 0 01 1 0 1 0 0 01 1 0 0 0 0 0

From this truth table we can (more easily that usual perhaps) extract Karnaugh maps for each bit of thenext state Q′

�� 1 10 0

0011

0 00 0

0000

rst

Q1

Q2

Q0 �� 0 1

0 01110

0 00 0

0000

rst

Q1

Q2

Q0

��

��0 0

0 11011

0 00 0

0000

rst

Q1

Q2

Q0

and hence Boolean expressions

Q′0 = ( ¬rst ∧ ¬Q2 ∧ ¬Q1 ) ∨

( ¬rst ∧ Q2 ∧ Q1 )

Q′1 = ( ¬rst ∧ ¬Q2 ∧ Q0 ) ∨

( ¬rst ∧ Q1 ∧ ¬Q0 )

Q′2 = ( ¬rst ∧ Q2 Q0 ) ∨

( ¬rst ∧ Q1 ¬Q0 )

Placing the associated combinatorial logic and a 3-bit, D-type flip-flop based register to store Q into thegeneric framework, we end up with a component that cycles through our 3-bit Gray code sequence undercontrol of a clock signal.

S266. a See S262..

b There are a few different ways to interpret some parts of the problem definition, but one reasonableapproach is as follows:

S0start S1 S2 S3

B2 = 1and

H = 0

B0 = 1and

H = 0

B1 = 1and

H = 0

H = 1

H = 1

H = 1

H = 1

git # ba293a0e @ 2019-11-14 407



Essentially, the idea is that by pressing buttons we advance from the stating state S0 toward the final stateS3 (as long as the handle is not turned, which means we go back to the start): when in S3 the door isunlocked, otherwise it remains locked. In particular, if the buttons are pressed in the wrong order weget “stuck” half way along the sequence and never reach S3. For example if B1 is pressed while in stateS1, the FSM does not (and cannot ever) transition into S2 since the button stays pressed: the only way to“unstick” the FSM is to turn the handle, reset the mechanism and start again.

There are four states in total; since 22 = 4 we can represent the current state Q as a 2-bit integer, makingthe concrete assignment

S0 7→ 〈0, 0〉S1 7→ 〈1, 0〉S2 7→ 〈0, 1〉S3 7→ 〈1, 1〉

The FSM diagram can be expressed as a truth table, particular to this P, which captures the varioustransitions:

H B2 B1 B0 Q1 Q0 Q′1 Q′00 0 ? ? 0 0 0 00 1 ? ? 0 0 0 11 ? ? ? 0 0 0 00 ? ? 0 0 1 0 10 ? ? 1 0 1 1 01 ? ? ? 0 1 0 00 ? 0 ? 1 0 1 00 ? 1 ? 1 0 1 11 ? ? ? 1 0 0 00 ? ? ? 1 1 1 11 ? ? ? 1 1 0 0

Implementing this truth table via a 6-input Karnaugh map is a little more tricky than with fewer inputs;instead, we simply derive the expressions by inspection (i.e., by forming a term for each 1 entry in a givenoutput) to yield

Q′0 = ( ¬H ∧ B2 ∧ ¬Q1 ∧ ¬Q0 ) ∨

( ¬H ∧ ¬B0 ∧ ¬Q1 ∧ Q0 ) ∨

( ¬H ∧ B1 ∧ Q1 ∧ ¬Q0 ) ∨

( ¬H ∧ Q1 ∧ Q0 )

Q′1 = ( ¬H B0 ∧ ¬Q1 ∧ Q0 ) ∨

( ¬H ∧ ¬B1 ∧ Q1 ∧ ¬Q0 ) ∨

( ¬H ∧ B1 ∧ Q1 ∧ ¬Q0 ) ∨

( ¬H ∧ Q1 ∧ Q0 )

with minor optimisation possible thereafter. Returning to the framework, the idea is then that we

i instantiate the middle box with a 2-bit register, using D-type flip-flops for example, to store Q,

ii instantiate the top box to implement δ using the equations above,

iii instantiate the bottom box to implement ω using the equation

L = ¬(Q1 ∧Q0)

so the door is locked unless the FSM is in state S3.

c The purpose of a clock signal is to control the FSM, advancing it through steps (i.e., transitions) with allcomponents synchronised. However, the only updates of state occur on positive transistors of Bi or H.That is, the FSM only chances state when one of the buttons is pressed, or the handle turned: in eachcase, this means the associated value transitions from 0 to 1. As a result, one could argue the expression

H ∨ B0 ∨ B1 ∨ B1 ∨ B2

can be used to advance the FSM (i.e., latch the next state produced by the transition function), rather than“polling” the buttons and handle at each clock edge to see if their value has changed.

d Among various valid answers, the following are clear:

git # ba293a0e @ 2019-11-14 408



i The content stored in an SRAM memory is lost if the power supply is removed: such devices dependon a power supply so transistors used to maintain the stored content can operate. In the context ofthe proposed approach, this means if a power cut occurs, for example, then the password will be“forgotten” by the lock.

ii When the power supply comes back online the password might be essentially random due to theway SRAMs work. If this is not true however, and the SRAM is initialised into a predictable value(e.g., all zero), this could offer an attractive way to bypass the security offered!

iii Given physical access to the lock, one might simply read the password out of the SRAM. With an FSMhard-wired to a single password, the analogue is arguably harder: one would need to (invasively)reverse engineer the gate layout and connectivity, then the FSM design.

Less attractive answers include degradation of performance (e.g., as a result of SRAM access latency) orincrease in cost: given constraints of the application, neither seems particular important. For examplethe access latency of SRAM memory is measured in small fractions of a second; although arguably truein general, from the perspective of a human user of the door lock the delay will be imperceptible.

e This is quite open-ended, but one reasonable approach would be as follows:

i This is a slightly loaded question in that it implies some alteration is needed; as such, marks mighttypically be given for identifying the underlying reason, and explaining each aspect of the proposedalteration.The crucial point to realise is testing implementations of δ and ω, for example, depends on beingable to set (and possibly inspect) the state Q which acts as input to both. An example technology toallow this would be JTAG, which requires an additional interface (inc. TDI, TDO, TCLK, TMS andTRST pins) and also injection of a scan chain to access all flip-flops. This allows the test process toscan a value into Q one bit at a time, run the system normally, then scan out Q to test it.

ii The idea would be to place each system under the control of a test stimulus that automates a series oftests: the test stimulus has access to all inputs (i.e., the JTAG interface, each button and the handle)and outputs (e.g., the JTAG interface, and the lock mechanism), and is tasked with making sure theoverall behaviour matches some reference.In this context, the number of states, inputs and outputs is small enough that a brute force approachis reasonable; this is also motivated by the fact there are no obvious boundary cases and so on. Thestrategy would therefore be: for each entry in the truth table

• put the device in test mode,• scan-in the state Q and drive each Bi with the associated values,• put the device in normal mode, and force an update of the FSM using the clock signal,• put the device in test mode,• check the value of L matches that expected,• scan-out and check the value of Q matches that expected.

An alternative answer might focus on some form of BIST, but in essence this just places all the aboveinside the system rather than viewing it as something done externally.

S267. a At least three advantages (or disadvantages, depending on which way around you view the options) areevident:

• With option one, extracting each digit of the current PIN to form a guess is trivial; with optiontwo this is much harder, in that we need to take the integer P and decompose it into a decimalrepresentation (through repeated division and modular reduction).

• With option one, incrementing the current PIN is harder (since the addition is in decimal); withoption two this is much easier, in that we can simply use a standard integer adder.

• With option one, the total storage requirement is 4 · 4 = 16 bits; with option two this is only 14 bits,since 214 = 16384 > 9999.

Based on this, and reading ahead to the next question, the decimal representation seems more attractive:designing a decimal adder is significantly easier than a binary divider.

b Given the choice, and although both options are viable, we focus on a design for the second, decimalrepresentation: this is simpler by some way, so the expected answer. At a high-level, the component canbe described as follows:

git # ba293a0e @ 2019-11-14 409



P_3 P_2 P_1 P_0| | | |

G_3 <--+ G_2 <--+ G_1 <--+ G_0 <--+| | | |v v v v

+-----------+ +-----------+ +-----------+ +-----------+| x | | x | | x | | x || | | | | | | ||co add ci|<--|co add ci|<--|co add ci|<--|co add ci|<-- 1| | | | | | | || r | | r | | r | | r |+-----------+ +-----------+ +-----------+ +-----------+

| | | |v v v v

P'_3 P'_2 P'_1 P'_0

Pi = Gi so production of the guess is trivial; the other output is a little harder. The basic idea is to usesomething similar to a ripple-carry adder. Each i-th cell takes a decimal digit Pi and a carry-in from theprevious, (i − 1)-th cell; it produces a decimal digit P′i and a carry-out into the next (i + 1)-th cell. Thedifference from a binary ripple-carry adder then is that it only accepts one digit rather than two as input(since it increments P rather than computes a general-purpose addition), plus it obviously works withdecimal rather than binary digits.

There are various ways to approach the design of each decimal adder cell, but perhaps the most straight-forward uses two stages:

x_3 x_2 x_1 x_0| | | |v v v v

+------------+ +------------+ +------------+ +------------+| x | | x | | x | | x || | | | | | | || co ha ci |<--| co ha ci |<--| co ha ci |<--| co ha ci |<-- 0| | | | | | | || r | | r | | r | | r |+------------+ +------------+ +------------+ +------------+

| | | |r'_3 r'_2 r'_1 r'_0| | | |v v v v

+---------------------------------------------------------------+| |

co <--| modular reduction || |+---------------------------------------------------------------+

| | | |r_3 r_2 r_1 r_0| | | |v v v v

The first stage computes an integer sum r′ = x + ci. Although this could be realised using a standardripple-carry adder, we can make a more problem-specific improvement: a ripple-carry adder normallyuses full-adder cells that compute x + y + ci, but we lack the second input y. Thus we can use half-addercells instead, which use half the number of gates; we assume such a half-adder is available as a standardcomponent. The second stage takes r′ = x + ci as input, and produces the outputs r and co, implementingthe modular reduction. The range of each input means 0 ≤ r′ < 11, or equivalently that cases where

git # ba293a0e @ 2019-11-14 410



r′ > 10 are impossible. We can describe the behaviour of the stage using the following truth table:

r′3 r′3 r′3 r′3 co r3 r2 r1 r0

0 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 10 0 1 0 0 0 0 1 00 0 1 1 0 0 0 1 10 1 0 0 0 0 1 0 00 1 0 1 0 0 1 0 10 1 1 0 0 0 1 1 00 1 1 1 0 0 1 1 11 0 0 0 0 1 0 0 01 0 0 1 0 1 0 0 11 0 1 0 1 0 0 0 01 0 1 1 ? ? ? ? ?1 1 0 0 ? ? ? ? ?1 1 0 1 ? ? ? ? ?1 1 1 0 ? ? ? ? ?1 1 1 1 ? ? ? ? ?

As such, we can produce a set of Karnaugh maps

0 10 1

0101

0 1? ?

0???r′3

r′1

r′2

r′00 00 0

1111

0 0? ?

0???r′3

r′1

r′2

r′0

0 01 1

0011

0 0? ?

0???r′3

r′1

r′2

r′00 00 0

0000

1 1? ?

0???r′3

r′1

r′2

r′0

which translate fairly easily into Boolean expressions

r0 = r′0r1 = ¬r′3 ∧ r′1r2 = ¬r′3 ∧ r′2r3 = r′3 ∧ ¬r′2 ∧ ¬r′1

that allow implementation.

c The FSM maintains a current state Q. Given there are five states, we can represent the current state as

Q = 〈Q0,Q1,Q2〉

i.e., three bits (since 23 = 8 > 5), so the device could store it in a register comprised of three D-typeflip-flops; doing so accepts there are three unused state representations.

We can represent the states as follows

S0 = 〈0, 0, 0〉S1 = 〈1, 0, 0〉S2 = 〈0, 1, 0〉S3 = 〈1, 1, 0〉S4 = 〈0, 0, 1〉

git # ba293a0e @ 2019-11-14 411



and therefore formulate a tabular transition function δ:

b r Q2 Q1 Q0 Q′2 Q′1 Q′00 ? 0 0 0 0 0 01 ? 0 0 0 0 0 1? ? 0 0 1 0 1 0? ? 0 1 0 0 1 1? 0 0 1 1 0 0 1? 1 0 1 1 1 0 0? ? 1 0 0 1 0 0

Turning these into Karnaugh maps and then Boolean expressions is a little tricky due to the need for fiveinputs. To cope, we assume there are no transitions from S0 and ignore b, then patch the equation for Q′0(the only bit of the next state influenced by moving out of S0) appropriately. That is, we get the following

0 00 ?

11??

0 00 ?

10??r

Q1

Q2

Q0

0 10 ?

10??

0 10 ?

10??r

Q1

Q2

Q0

0 01 ?

00??

0 01 ?

01??r

Q1

Q2

Q0

which then translate into

Q′0 = ( b ∧ ¬ Q2 ∧ ¬ Q1 ∧ ¬ Q0 ) ∨

( ¬ r ∧ Q1 ) ∨

( Q1 ∧ ¬ Q0 )Q′1 = ( ¬ Q1 ∧ Q0 ) ∨

( Q1 ∧ ¬ Q0 )Q′2 = ( r ∧ Q1 ∧ Q0 ) ∨

( Q2 )

B.4 Chapter 4

S268. O(n) implies the critical path is proportional to the number of bits (including some constant factor) requiredto represent each of the operands. The reason is the carry chain which runs through all n full-adders in thedesign: each i-th full-adder produces a carry-out used as a carry-into the (i + 1)-th full-adder. This means eachi-th bit of the result depends on, and cannot be computed before, all j-th bits for 0 ≤ j < i.

An alternative, carry look-ahead design separates computation of carries from the full-adder cells them-selves; this allows an organisation whose critical path can be described as O(log n), although the number oflogic gates required is less attractive.

S269. The relationshipx is a power-of-two ≡ (x ∧ (x − 1)) = 0

performs the test which can be written as the C expression

( x & ( x - 1 ) ) == 0

This works because if x is an exact power-of-two then x−1 sets all bits less-significant that the n-th to one; whenthis is AND’ed with x (which only has the n-th bit set to one) the result is zero. If x is not an exact power-of-twothen there will be bits in x other than the n-th set to one; in this case x− 1 only sets bits less-significant than theleast-significant and hence there are others left over which, when AND’ed with x, result in a non-zero result.Note that the expression fails, i.e., it is non-zero, for x = 0 but this is allowed since the question says x , 0.

S270. x is of type char, so is therefore represented using two’s-complement in 8 bits; values for such a representationrange between 2n−1

− 1 = 28−1− 1 = 127 and −2n−1 = −28−1 = −128 inclusive. This means that by

a decrementing xwe get the value before 127, which is 126, or

git # ba293a0e @ 2019-11-14 412



b incrementing x we get the value after 127, which is −128: the reason for this is that the representation of127 is 01111111(2), but the next value 10000000(2) is the largest negative value possible. That is, there hasbeen an overflow with the result “wrapping around”.

S271. The expression computes the comparison 0 < x. This is because if x < 0 then x3 = 1, and if x = 0 thenx3 = x2 = x1 = x0 = 0. Therefore, x > 0 if both x3 = 0 and one of xi , 0 for i ∈ {2, 1, 0}. Strictly speaking, it testswhether 0 < x ≤ 7 but the upper bound is implied by the representation of x: it cannot take a value greaterthan 7 by definition.

S272.

The initial temptation is to use six adder components to compute

r = 7 · x = x + x + x + x + x + x + x

where the size of inputs and outputs increases as one progresses through the computation; a consideredapproach might utilise carry save adders to reduce the critical path associated with the multiple summands,but here we consider ripple-carry designs only.

A more efficient alternative would use three adders to compute

r = 7 · x = 4 · x + 2 · x + 1 · x = 22· x + 21

· x + 20· x

noting that the multiplications by powers-of-two are “free” since they can be achieved by simply relabellingbits rather than computation. This approach can be further refined to compute

r = 7 · x = 8 · x − 1 · x = 23· x − 20

· x

using just one adder (assuming addition and subtraction can be realised using the same component). Clearlythis will produce the shortest critical path, and relates to the following diagram:

+------------------+ +---+x -- n-bit -->| 3-bit left-shift |-- (n+3)-bit -->| |

+------------------+ |sub|-- (n+4)-bit --> rx -- n-bit -------------------------------------->| |

+---+

S273. a Clearly we can implement , by negating the result of = and likewise for < and ≥, and > and ≤.Furthermore, we can build ≥ from > and =, and ≤ from < and =. So essentially we only need twocomparisons, say = and < to be able to compute the rest so long as we have the logic operations as well.The choice of which three is simply a matter of which ones you want to go faster: the ones built from acombination of other comparison and logic instructions will take longer to execute. One might take theapproach of looking at C programs and selecting the set most used. For example = and < are used a lotto program typical loops; one might select them for this reason.

b You can be as fancy as you want with any optimisations or special cases, for example checking formultiplication by zero, one or a power-of-two might be a good idea. But basically, the easiest way to dothis is as follows:

uint16_t mul( uint16_t x, uint16_t y ) {switch( x ) {case 0 : return 0;case 1 : return y;case 2 : return y << 1;case 4 : return y << 2;case 8 : return y << 3;case 16 : return y << 4;

}switch( y ) {case 0 : return 0;case 1 : return x;case 2 : return x << 1;case 4 : return x << 2;case 8 : return x << 3;case 16 : return x << 4;

}

uint16_t t = 0;

for( int i = 15; i >= 0; i-- ) {t = t << 1;

git # ba293a0e @ 2019-11-14 413



if( ( y >> i ) & 1 ) {t = t + x;

}}

return t;}

c A basic implementation might look like the following:

int H( uint16_t x ) {int t = 0;

for( int i = 0; i < 16; i++ ) {if( ( x >> i ) & 1 ) {t = t + 1;

}}

return t;}

but this has a number of drawbacks. First, the overhead of of operating the loop quite high in comparisonto the content; for example the loop body needs only a few instructions, while it takes nearly as manyagain to test and increment i during each iteration. Second, the number of branches in the code meansthat pipelined processors might not execute them efficiently at all. An improvement is to use some formof divide-and-conquer approach where we split the problem into 2-bit then 4-bit chunks and so on. Theresult might look like:

int H( uint16_t x ) {x = ( x & 0x5555 ) + ( ( x >> 1 ) & 0x5555 );x = ( x & 0x3333 ) + ( ( x >> 2 ) & 0x3333 );x = ( x & 0x0F0F ) + ( ( x >> 4 ) & 0x0F0F );x = ( x & 0x00FF ) + ( ( x >> 8 ) & 0x00FF );

return ( int )( x );}

S274. First, note that the result via a naive method would be

r2 = x1 · y1r1 = x1 · y0 + x0 · y1r0 = x0 · y0.

However, we can write down three intermediate values using only three multiplications as

t2 = x1 · y1t1 = (x0 + x1) · (y0 + y1)t0 = x0 · y0.

The original result can then be expressed in terms of these intermediate values via

r2 = t2 = x1 · y1r1 = t1 − t0 − t2 = x0 · y0 + x0 · y1 + x1 · y0 + x1 · y1 − x0 · y0 − x1 · y1

= x1 · y0 + x0 · y1r0 = t0 = x0 · y0.

So roughly speaking, over all we use three (n/2)-bit multiplications and four (n/2)-bit additions.

S275. a In binary, the addition we are looking at is

10(10) = 1010(2)12(10) = 1100(2) +

10110(2)

where 10110(2) = 22(10). In 4 bits this value is 0110(2) = 6(10) however, which is wrong.

b A design for the 4-bit clamped adder looks like this:

git # ba293a0e @ 2019-11-14 414



+------+ +------+ +------+ +------++-----|co ci|<----------|co ci|<----------|co ci|<----------|co ci|<- 0| | x|<- x_3 | x|<- x_2 | x|<- x_1 | x|<- x_0| +--|s y|<- y_3 +--|s y|<- y_2 +--|s y|<- y_1 +--|s y|<- y_0| | +------+ | +------+ | +------+ | +------+| | | | || | +------+ | +------+ | +------+ | +------+| +->| OR |-> r_3 +->| OR |-> r_2 +->| OR |-> r_1 +->| OR |-> r_0| +------+ +------+ +------+ +------+| | | | |+---------+------------------+------------------+------------------+

Essentially the idea is that if a carry-out occurs from the most-significant adder, this turns all the outputbits to 1 via the additional OR gates. That is, if the carry-out occurs then we get 1111(2) = 15(10) as theresult, i.e., the largest 4-bit result possible.

S276. Since we know nothing about N, there is no obvious short-cut to performing the modular reduction after themultiplication. Instead, the most simple way to approach the design is to recall that

x · y = x + x + · · · + x + x︸︷︷︸y copies

.

So to compute x · y (mod N), we just have to make sure that each of the additions is modulo N; then we canuse whatever method we want. A circuit for modular addition is actually quite simple:

+-----+ +-----+x ->| |---+------------->| |

| add | | | sub |---> ry ->| | | +-->| |

+-----+ | | +-----+v |

+-----+ +-----+| | | |

N ->| lth |--->| mux || | | |+-----+ +-----+

^ ^| |N 0

In short, we add x and y together, and then compare the result t with N: if t is smaller, we select 0 as the outputfrom the multiplexer otherwise we select N. Then, we subtract the value we selected from t. The end result isthat we get x + y − 0 = x + y (mod N) if x + y < N, and x + y −N = x + y (mod N) if x + y ≥ N.

Recall that an 8-bit, bit-serial multiplier would compute the product x · y as follows:Input: An 8-bit multiplicand x, and 8-bit multiplier yOutput: The product x · y

1 t← 02 for i = 7 downto 0 do3 t← t + t4 if yi = 1 then5 t← t + x6 end7 end8 return t

Armed with our modular adder circuit, we can rewrite this asInput: An 8-bit modulus N, a multiplier 0 ≤ x < N and multiplicand 0 ≤ y < NOutput: The product x · y (mod N)

1 t← 02 for i = 7 downto 0 do3 t← t + t (mod N)4 if yi = 1 then5 t← t + x (mod N)6 end7 end8 return t

which then simply demand eight iterations, under control of a clock, over the circuit

git # ba293a0e @ 2019-11-14 415



+---------+ +---------+ +---------+N ->| | N ->| | N ->| |t ->| mod add | x ->| mod add |------->| mux |---> t't ->| |---+-->| | +--->| |

+---------+ | +---------+ | +---------+| | ^+-----------------+ |

y_i

Notice that we first perform the operation t + t (mod N), then use a multiplexer to decide if we take t + t(mod N) or t + t + x (mod N) as the next value of t. So each iterated use of the circuit represents an iterationof the algorithm loop. Of course, one could construct a combinatorial multiplier using the same approach, i.e.,replacing any standard adder circuits with modular alternatives.

git # ba293a0e @ 2019-11-14 416


assets.phoo.orgassets.phoo.org/book/pica/book.pdf · ' Daniel Page [email protected] CONTENTS I Tools...

Documents

Transcript of assets.phoo.orgassets.phoo.org/book/pica/book.pdf · ' Daniel Page [email protected] CONTENTS I Tools...