Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 ›...

35
Jennifer Dworak, Xi Shen, Micah Thornton, Ted Manikas, Mitch Thornton, (SMU), Al Crouch (ASSET), Chad Augisnash (FDLTCC), Kundan Nepal (University of St. Thomas), Iris Bahar (Brown University)

Transcript of Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 ›...

Page 1: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Jennifer  Dworak,  Xi  Shen,  Micah  Thornton,  Ted  Manikas,  Mitch  Thornton,  (SMU),  Al  Crouch  

(ASSET),  Chad  Augisnash  (FDLTCC),    Kundan  Nepal  (University  of  St.  Thomas),  Iris  Bahar  

(Brown  University)        

Page 2: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Why  3D?  � Higher  Performance:  

�  Vertical  distance  from  chip-­‐to-­‐chip  is  ~30  microns  �  Often  smaller  than  route  across  a  chip  �  Much  smaller  than  a  connection  to  another  chip  on  a  board  

�  Smaller  Form  Factor:  �  Can  combine  multiple  technologies  into  a  single  stack  with  different  chips  

� More  Connections  � May  have  10,000  Through-­‐Silicon-­‐Via’s  in  a  stack  vs.  a  much  smaller  number  of  pins  available  on  a  board.  

Page 3: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

3D  Architectures  � Homogeneous:  Multiple  copies  of  the  same  chip  stacked  on  top  of  each  other.  

� Heterogeneous:  �  Single  company  for  several,  if  not  all,  die  

�  A  single  design  can  be  partitioned  across  multiple  layers  for  optimal  performance  

�  Competitive  Socket:  �  Each  die  in  the  stack  forms  a  distinct  purpose  that  could  be  fulfilled  by  die  from  multiple  companies:  �  E.g.  Get  a  DSP  from  Texas  Instruments  or  Analog  Devices…  

Page 4: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Example  Compe88ve  Socket  3D  Stack  

Microprocessor  

Memory  

Memory  

Interposer  

ASIC  

Analog/  RF  

Page 5: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

What  makes  3D  Stacks  Unreliable?  � The  die  could  be  defective  before  assembly  

�  Test  Escape  �  Environmentally  Sensitive  � Die-­‐to-­‐die,  Die-­‐to-­‐Wafer,  Wafer-­‐to-­‐Wafer  

� TSV  could  be  manufactured  defectively  � Grinding  the  die  to  expose  the  TSV’s  could  damage  the  die  or  TSVs  

� Probes  testing  the  die  can  damage  the  TSVs  � Aging,  wearout,  extra  heat,  etc.  may  cause  failures  in  the  field  

Page 6: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

What  is  the  effect  of  normal  test  escapes  on  the  stack?    

0  1000  2000  3000  4000  5000  6000  7000  8000  9000  

0   2   4   6   8   10  

Defec

tive

 Stack

s  Per

 Million

 

Number  of  Die  in  the  Stack  

100  dppm  

300  dppm  

500  dppm  

1000  dppm  

Page 7: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Costs  of  some  poten8al  die  in  the  stack  

Blackfin  400MHz  16-­‐bit  Fixed  Point  DSP:  

$19            

Micron  128  MB  Flash:  

$3.43  

Alliance  Memory  SRAM:  8MB*4  

$68  

Xilinx  FPGA:  $25-­‐$50  

Intel  Atom  Processor:  $90  

Page 8: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

So…what  does  this  mean?  Unlike  on  a  board,  we  can’t  simply  de-­‐solder  and  replace  a  defective  die.    We  need  to  throw  out  the  whole  stack.          The  more  die  there  are  in  the  stack,  the  more  expensive  this  is  likely  to  be….    

We  need  some  way  of  either  identifying  a  defect  early,  or  repairing  it….  

Page 9: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

FPGA’s  to  the  rescue!  

Ok…they  might  not  solve  everything,  but  they  can  help!  

Page 10: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

But  will  FPGA’s  really  be  present  in  the  stack?  

Page 11: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Why  put  an  FPGA  in  a  3D  Stack?  � Good  for  anything  that  may  need  to  be  updated  in  the  field.  �  E.g.  Communication  protocols  

� Often  cheaper  than  an  ASIC  � Can  be  used  to  provide  hardware  acceleration  � Can  be  used  to  aid  in  test,  self  repair,  and  fault  tolerance  

� Ultimately,  you  could  make  the  entire  stack  from  FPGA’s...  

Page 12: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Real  FPGA’s  with  TSV’s  � Xilinx  is  making  FPGA’s  with  Stacked  Silicon  Interconnect:  Virtex-­‐7  7V1500T,  7V2000T,  7VH290T,  7VH580T,  7VH870T  

Xilinx  Stacked  Silicon  Interconnect  Technology  Delivers  Breakthrough  FPGACapacity,  Bandwidth,  and  Power  Efficiency,  by  Kirk  Saban,  WP380  (v1.1)  October  21,  2011  

Passive  Silicon  Interposer  interconnects  multiple  FPGA  SLR’s  (super  logic  regions)  together.  

Page 13: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Advantages  of  Xilinx  Virtex  7  SSI  Solu8ons  for  mul8-­‐FPGA  designs  � Overcomes  the  I/O  pin  limitations  

�  1200  I/O  pins  on  a  package  vs.  more  than  10,000  die  to  die  connections  

�   TSV  drivers  do  not  have  to  deliver  the  same  currents  and  handle  the  same  voltages  as  chip-­‐to-­‐chip  I/O’s.  

�  Pin-­‐to-­‐Pin  delays    are  much  less  �  Time  division  multiplexing  is  not  needed  �  Power  penalty  is  much  less  than  for  standard  I/O’s  

All  of  these  advantages  can  be  used  to  help  harness  FPGA’s  for  reliability….  

Page 14: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

FPGA  Controlled  Test    � When  should  die  be  tested?  

�  Before  assembly  �  During  stack  assembly  �  After  stack  assembly  

�  The  earlier  I  know  that  a  die  is  bad,  the  better  �  Testing  the  stack  as  each  die  is  added  is  difficult  

�  Expensive  �  All  of  the  functionality  is  not  there  yet—making  functional  test  difficult  

�  Solution:  FPGA’s  can  be  reprogrammed  multiple  times  to  serve  as  an  embedded  tester.  

Page 15: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

FPGA  

FPGA  as  an  embedded  tester  �  If  we  add  an  FPGA  to  the  the  stack  first,  it  can  serve  to  test  the  devices  it  has  connections  with…  �  Memory  BIST  �  Functional  Patterns  for  Microprocessor  from  the  non-­‐existent  board/ASIC/Analog  

�  LBIST/Scan  for  ASIC  �  Pseudo  Functional  for  ASIC  

�  Bus  Communication  protocols  for  each  layer  

 

Microprocessor  

Memory  

Memory  

Interposer  

ASIC  

Analog/  RF  

Page 16: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Some  issues  that  must  be  addressed  to  do  this  in  3D…  

�  Appropriate  Test  Architecture:  IEEE  1149.1/IEEE  1687/  etc.  for  scan  test  and  to  access  instruments  in  die  

�  Possibly  different  voltage  levels  on  different  die  must  be  appropriately  converted  

�  FPGA  must  have  access  to  data/address  buses  to  perform  functional  test  

�  FPGA  placement  will  significantly  impact  what  it  can  test  �  Pass-­‐through  FPGA’s  on  upper  die  are  likely  to  be  needed.  �  Tools  are  needed  to  efficiently  operate  the  FPGA  tester  for  multiple  test  sessions  of  different  types.  

Page 17: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Test  is  good….but  what  do  you  do  if  you  find  a  problem?  

FPGA’s  may  be  able  to  help  with  this,  too.  

Page 18: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Throwing  out  an  en8re  3D  stack  is  expensive…how  do  we  repair  die?  � ASIC  solution:  Provide  multiple  copies  of  your  die/multiple  identical  cores  on  a  die  as  spares.      �  Potentially  quite  expensive  as  well—need  one  or  more  spares  for  everything  you  might  want  to  repair.  

�  Some  overhead/planning  required  to  enable  a  switch  to  a  spare  

�  Spare  will  be  an  almost  perfect  replica  of  the  original  and  should  give  same/similar  performance  

Page 19: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Throwing  out  an  en8re  3D  stack  is  expensive…how  do  we  repair  die?  �  FPGA  Solution:  Identify  the  portion  of  the  original  die  that  it  defective  and  replace  it  with  circuitry  in  the  FPGA.  � May  significantly  lower  performance  � Overhead/planning  required  to  enable  a  switch  to  the  FPGA—partitioning  must  be  decided  ahead  of  time.  

�  Programming  for  the  spare  implemented  in  the  FPGA  must  be  stored  somewhere  

� Diagnosis  required  to  determine  what  to  replace.  

Page 20: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Conceptually,  what  might  this  look  like?  Die  to  be  repaired  

Partition  1  

Partition  1  

Partition  1  

Partition  1  

Defective  Partition  

FPGA  

Page 21: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Implementa8on  Notes  � A  single  set  of  TSV’s  may  be  used  to  potentially  repair  one  of  several  partitions.  � Use  a  one-­‐hot  decoder  to  select  the  partition  that  is  defective  by  selecting  the  enable  signals  on  the  tristate  buffers  and  select  inputs  on  the  muxes.  

�  If  the  FPGA  and  the  ASIC  are  not  at  the  same  voltage,  this  must  be  handled  when  passing  the  signals.  

� We  may  want  to  shut  off  power  to  the  defective  partition  

Page 22: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

TSV  connec8ons  within  the  par88on  may  be  tricky  

�  If  it  is  a  bus  that  is  already  connected  to  FPGA  and  other  things,  just  need  to  tristate  signals  in  defective  partition  

� Otherwise,  a  routable  interposer  could  possibly  be  used.  

FPGA  

ASIC  

Interposer  

ASIC  

FPGA  

ASIC  

Interposer  

ASIC  

Page 23: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

So…how  do  we  decide  on  par88ons?  �  Space  available  in  the  FPGA  � Performance  Loss  

�  FPGA’s  are  often  slower  than  ASIC’s  � May  not  always  be  true  if  they  are  implemented  in  different  technology  nodes  

� How  to  handle  I/O.    Should  only  flip-­‐flops/registers  be  partitioning  points?  

� What  functionality  is  worth  protecting?  �  Should  we  partition  to  minimize  the  number  of  TSV’s?  

Page 24: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Experiment  on  Performance  Loss  �  Timing  Comparison  for  FPGA  vs.  ASIC  �  ISCAS  89  Circuits  

�  Different  sizes  �  Consider  them  to  be  part  of  a  larger  circuit  

�  “ASIC”  analysis:  �  Synopsys  Design  Compiler/Synopsys  PrimeTime  �  90  nm  and  32  nm  Libraries  

�  “FPGA”  analysis  �  Xilinx  ISE  �  Compiled  for  Artix7  -­‐3  �  “Balanced”  Optimization    

Page 25: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Xilinx  Ar8x  7  FPGA  �  28  nm  process  � High  Performance,  low  power,  low  cost  �  65%  lower  static  power  and  50%  lower  total  power  compared  to  45  nm  devices  

�  Suggested  for  3D  TV,  Automotive  Applications,  Handheld  Communication,  Digital  SLR  Cameras,  Medical  Devices,  Industrial  Monitor  and  Control  

Page 26: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Comparison  between  90nm  and  FPGA  delay  for  circuits  with  registered  I/O  

0  10  20  30  40  50  60  70  

s420  s641   s713   s820  s832  s838  s953  s1196  s1238  

%  In

crea

se  in

 Clock

 Per

iod  

ISCAS89  Circuits  with  Registered  I/O  

Page 27: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Comparison  between  32nm  and  FPGA  delay  for  circuits  with  registered  I/O  

0  

100  

200  

300  

400  

500  

600  

s420   s641   s713   s820  s832   s838  s953  s1196  s1238  

%  In

crea

se  in

 Clock

 Per

iod  

ISCAS  89  circuits  with  regsitered  I/O  

Page 28: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Performance  Conclusions  �  Some  circuits  are  more  amenable  to  replacement  in  FPGA  at  low  performance  loss  than  others  

� Partitioning  should  take  this  into  account  � We  can  also  devise  ways  to  mitigate  the  impact  of  the  performance  loss  in  an  individual  piece  of  circuitry  �  Especially  true  for  certain  types  of  circuits  and  the  functionality  of  certain  partitions.  

Page 29: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Error  Detec8on  with  FPGAs  � Online  detection  of  errors  is  also  possible  with  an  FPGA  in  a  3D  stack  

� Portions  of  the  design  may  be  instantiated  in  the  FPGA  in  either  the  original  or  a  simplified  form  to  provide  logic  duplication.  

�  If  complete  coverage  of  all  clock  cycles  is  not  needed,  the  speed  differential  between  the  two  technologies  (FPGA  and  ASIC)  can  be  mitigated.  

Page 30: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Hardware  Monitoring  with  FPGA’s  �  Just  as  in  test,  FPGA’s  in  a  3D  stack  can  be  reprogrammed  to  provide  different  hardware  monitoring  capability  for  the  signals  they  have  access  to    �  the  data  and  address  buses,  �   selected  signals  routed  through  the  through-­‐silicon-­‐vias,  etc.  

Page 31: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Security  Issues  �  FPGA’s  in  the  stack  can  also  be  used  for  nefarious  purposes.  � Need  to  protect  the  IP  that  will  be  programmed  into  the  FPGA  for  repair  

� Need  to  prevent  someone  from  re-­‐programming  the  FPGA    to  monitor  the  device  

� Need  to  prevent  someone  from  re-­‐programming  the  FPGA  to  corrupt  the  circuit  operation.  

�  Encryption  can  help,  but  need  to  protect  against  power  analysis  attacks  as  well.  

Page 32: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Conclusions  � There  are  multiple  domains  where  defects  and  errors  may  enter  a  3D  stack.  

� Problems  in  a  3D  stack  are  often    more  expensive  (throwing  out  the  stack  instead  of  a  single  die)  

�  FPGA’s  are  likely  to  be  present  in  the  stack  already  for  a  variety  of  reasons.  

� These  FPGA’s  may  be  harnessed  for  monitoring,  test,  and  repair.  

� Research  is  needed  regarding  partitioning  of  the  die  for  analysis  and  repair  as  well  as  securing  the  process  from  attackers.  

Page 33: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’
Page 34: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Comparison  between  90nm  and  FPGA  delay  

-­‐40  

-­‐20  

0  

20  

40  

60  

80  

100  

120  

s27  

s208

 s298

 s344

 s349

 s382

 s386

 s400

 s420

 s444

 s510  

s526

 s641  

s713  

s820

 s832  

s838

 s1196  

s1238  

s142

3  s148

8  s149

4  s5378  

s9234  

s13207

 s158

50  

s359

32  

%  increase  in  time  of  FGPA  over  90nm  

Page 35: Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’ …btw.tttc-events.org › material › BTW12 › Presentations › 02_BTW... · 2018-05-08 · Jennifer’Dworak,Xi’ Shen,MicahThornton,Ted’

Comparison  between  32  nm  and  FPGA  delay  

0  100  200  300  400  500  600  700  800  900  

s27  

S298

 s344

 s349

 s382

 s386

 s400

 s420

 s444

 s510  

s526

 s641  

s713  

s820

 s832  

s1196  

s1238  

s142

3  s5378  

s9234  

s13207

 s158

50  

s359

32  

%  increase  in  time  of  FGPA  over  32nm