Ordered Mesh Network Interconnect (OMNI) Design &...

Post on 07-Jul-2020

3 views 0 download

Transcript of Ordered Mesh Network Interconnect (OMNI) Design &...

Ordered Mesh Network Interconnect (OMNI)

Design & Implementation of In-Network Coherence Suvinay Subramanian, Advisor: Li-Shiuan Peh

Collaborators: Chia-Hsin Owen Chen, Bhavya Daya, Tushar Krishna, Woo-Cheol Kwon, Sunghyun Park

286 386 486 Pentium P2 Celeron P3

P4

Itanium

Core

Core Nehalem

Nehalem

Westmere

Sandybridge

Larrabee

TeraFlops

SCC

Opteron

Opteron

Barcelona

Bulldozer

Bulldozer

PowerPC

Power3 Power4

Power5

Power6

Power7

Cell Niagara

TILE64

TILE-GX

Octeon III

1

2

4

8

16

32

64

128

1980 1985 1990 1995 2000 2005 2010 2015

Nu

mb

er o

f C

ore

s

Year

Uniprocessor

Bus

Ring

Mesh

Compute +

Communicate

• Moore’s Law “The number of transistors

incorporated in a chip will approximately

double every 24 months” Increased

compute density

• Shift to multi-core processors better

performance vis-à-vis power efficiency

• WE NEED A scalable coherence

mechanism to maintain “consistent” view of

“shared memory”

• WE NEED A scalable communication fabric

Specification

• 36 Power-ISA cores

• 32 KB L1; 128 KB L2

• 6x6 Mesh network: OMNI

Why Snoopy Coherence?

• Efficient cache-to-cache transfers

• No directory overhead

• No indirection latency

Snoopy COherent Research Processor with Interconnect Ordering

Evaluations

Introduction

Walkthrough Example

Channel Width Number of

Virtual

Networks

Number of

Virtual

Channels

16 bytes 3 4+2+2 = 8

L2 + Cache Controller

40%

Network Interface + Router

10%

Core 50%

Area Breakdown

Average Power

Critical Path 1.2 ns

The SCORPIO Processor OMNI Design

Traditional interconnects

(eg: buses) route all

messages to central

ordering point

All nodes see all messages

in the same order

OMNI: Network provides

mechanism for global

ordering

No central ordering point

Idea: Nodes agree on global

order at synchronized time

intervals

Router Microarchitecture Network Interface Block Diagram

0

0.2

0.4

0.6

0.8

1

Norm

alize

d R

untim

e

Normalized Runtime of OMNI against

directory protocol

Technology Node

45 nm commercial

Lookahead

bypassing

provides effective

single cycle path

per hop when

possible

Reserved VC

(rVC) provides

escape path

preventing

deadlock

Filter redundant broadcasts

to save bandwidth and

power

Routers maintain &

propagate sharing

information