### Low-Power CMOS Optical Interconnect Transceivers

### Samuel Palermo\*



### Computer Systems Laboratory Stanford University

\*Now at Intel Corp., Hillsboro, OR

# Outline

- Introduction
- Optical transmitters
- Optical receiver
- Clock and data recovery
- Optical link system performance
- Conclusion

# High Speed Links

- Increasing computation power and today's networked society requires chip-to-chip I/O bandwidth to increase
  - Routers, Processor Memory Interface



\*2006 International Technology Roadmap for Semiconductors

### **Chip-to-Chip Electrical Interconnects**



### Electrical channel characteristics limit performance

\*V. Stojanovic and M. Horowitz, "Modeling and Analysis of High-Speed Links," CICC, 2003.

### **Chip-to-Chip Electrical Interconnects**



- Sophisticated equalization circuitry required
- Typical commercial electrical I/O xcvr
  - ~20mW/Gb/s at 10Gb/s

### Chip-to-Chip Optical Interconnects



- Optical interconnects remove many channel limitations
  - Reduced complexity and power consumption
  - Potential for high information density with wavelength-division multiplexing (WDM)

### **Optical Sources & Detectors**

### Sources



**VCSEL** 

### MQW Electroabsorption Modulator



Detector

### p-i-n Detector



### Integration



# **CMOS Optical Link Issues**

### VCSEL bandwidth

- Inherent device RC
- Optical bandwidth requires high average current density
- Modulator voltage swing limited by CMOS reliability constraints
- Reduced gain/headroom in scaled technologies
  - Motivates use of integrating RX vs traditional TIA
- Dealing with mismatch
  - Offset compensation (voltage & timing)
- Power and area reduction

# 90nm CMOS 16Gb/s Optical Transceiver Architecture



- 1. S. Palermo et al, "A 90nm CMOS 16Gb/s Transciever for Optical Interconnects," ISSCC, 2007.
- 2. J. Roth, **S. Palermo** *et al*, "1550nm Optical Interconnect Transceiver with Low Voltage Electroabsorption Modulators Flip-Chip Bonded to 90nm CMOS," *OFC*, 2007.

# Outline

- Introduction
- Optical transmitters
  - VCSEL TX
  - MQWM TX
- Optical receiver
- Clock and data recovery
- Optical link system performance
- Conclusion

# VCSEL Bandwidth vs Reliability

10Gb/s VCSEL Frequency Response [1]



 Mean Time to Failure (MTTF) is inversely proportional to current density squared

$$MTTF = \frac{A}{j^2} e^{\left(\frac{E_A}{k}\right)\left(\frac{1}{T_j} - \frac{1}{373}\right)}$$
[2]

 Steep trade-off between bandwidth and reliability

$$MTTF \propto \frac{1}{BW^4}$$

- 1. D. Bossert *et al*, "Production of high-speed oxide confined VCSEL arrays for datacom applications," *Proceedings of SPIE*, 2002.
- 2. M. Teitelbaum and K. Goossen, "Reliability of Direct Mesa Flip-Chip Bonded VCSEL's," LEOS, 2004.

# **VCSEL TX Equalization**



 4-tap FIR filter – 1 precursor, 1 main, and 2 postcursor is a good compromise between power and performance

### Multiplexing FIR Circuit Implementation



S. Palermo and M. Horowitz, "High-Speed Transmitters in 90nm CMOS for High-Density Optical Interconnects," ESSCIRC, 2006. 13

# Tap Mux & Output Stage



 5:1 multiplexing predriver uses 5 pairs of complementary clock phases spaced by a bit time

 Tunable delay predriver compensates for static phase offsets and duty cycle error

# VCSEL TX Optical Testing



### VCSEL 16Gb/s Optical Eye Diagrams



### External Modulation with MQWM



- Absorption edge shifts with changing bias voltage due to the "quantum-confined Stark effect" and modulation occurs
- Maximizing voltage swing allows for good contrast ratio over a wide wavelength range

### High-Voltage Output Stage Issues



© Cascode driver has potential for 2x Vdd drive at high speed

Static-biased cascode suffers from V<sub>ds</sub> stress during transients

### Pulsed-Cascode Output Stage



- Preserves two-transistor stack configuration for maximum speed
- Cascode transistors' gates pulsed during transitions to prevent V<sub>ds</sub> overstress

**S. Palermo** and M. Horowitz, "High-Speed Transmitters in 90nm CMOS for High-Density Optical Interconnects," *ESSCIRC*, 2006.

### **Output Stage Waveforms**



### **Output Stage Waveforms**



# Modulator TX with Level-Shifting Multiplexer



- Level-shifter combined with multiplexer
- Active inductive shunt peaking compensates multiplexer selfloading (reduces transition times by 37%)
- Slightly lower fan-out ratio in "high" signal path to compensate for level-shifting delay
- Delay Tracking
  - "High" path inverter nMOS in separate p-well
  - Metal fringe coupling capacitors perform skew compensation

# MQWM TX Testing



# Modulator Driver Electrical Eye Diagram



- 16Gb/s data subsampled at modulator driver output node
- Experimental full optical link operation at 1.8Gb/s\*
  - Limited by excessively high contact resistance

\*J. Roth, **S. Palermo** *et al*, "An optical interconnect transceiver at 1550nm using low voltage electroabsorption modulators directly integrated to CMOS," *JLT*, 2007.

# Outline

- Introduction
- Optical transmitters
- Optical receiver
- Clock and data recovery
- Optical link system performance
- Conclusion

### **Optical RX Scaling Issues**



### **Integrating Receiver Block Diagram**



A. Emami-Neyestanak et al, "A 1.6Gb/s, 3mW CMOS receiver for optical communication," VLSI, 2002.

### **Demultiplexing Receiver**



- Demultiplexing with multiple clock phases allows higher data rate
  - Data Rate = #Clock Phases x Clock Frequency
  - Gives sense-amp time to resolve data
  - Allows continuous data resolution

### **Clocked Sense Amplifier**



Offset cancelled with digitally adjustable PMOS capacitors

Step=2.3mV, Range=±70mV

- Kickback charge can corrupt adjacent samples
- Need high common-mode input for adequate speed

# **1V Modified Integrating Receiver**



- Differential Buffer
  - Fixes sense-amp common-mode input for improved speed and offset performance
  - Reduces kickback charge
  - Sost of extra power and noise
- Input Range = 0.6 1.1V

### **Receiver Sensitivity Analysis**



# Outline

- Introduction
- Optical transmitters
- Optical receiver
- Clock and data recovery
- Optical link system performance
- Conclusion

### **Conventional Dual-Loop CDR**



- 2 degrees of freedom to filter VCO noise & erroneous phase updates
- Input demultiplexing receiver requires multiple phase muxes & interpolators

# Dual-Loop CDR w/ Feedback Interpolation



Extends [Larsson:99]
 to input demultiplexing receiver

- Only one phase mux/interpolator pair
- Filtering of interpolator switching
- Path from VCO to samplers
  - Minimal Delay
  - Static allows offset cancellation

# Dual-Loop CDR w/ Feedback Interpolation



### **Baud-Rate Phase Detection**



- For certain 4-bit patterns compare V<sub>in</sub>(n) with V<sub>in</sub>(n-2) [Emami:04]
- No "quadrature" phases required
- Reduces net update rate to 18.75%

### **Clock Recovery Performance**

### **CDR Disabled**

# i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i

### **CDR Activated**



- RX clock frequency = 3.2GHz (16Gb/s)
- Jitter increases only marginally when CDR activated
  - 1.74ps<sub>rms</sub>, 12.9ps<sub>pp</sub> » » 1.90ps<sub>rms</sub>, 15.1ps<sub>pp</sub>
  - Sufficient filtering of input noise

### Phase Correction Circuitry

**Tunable Delay Clock Buffers** 

### **Phase Correction Performance**



- Static phase offset corrected with tunable delay clock buffers
  - Digitally-adjustable capacitive loads
  - Phase range at 16Gb/s is ±12% UI

# Outline

- Introduction
- Optical transmitters
- Optical receiver
- Clock and data recovery
- Optical link system performance
  VCSEL link
- Conclusion

### **Optical Transceiver Testing**



### **Receiver Sensitivity**

### Test Conditions

- 8B/10B data patterns (variance of 6 bits)
- Long runlength data (variance of 10 bits)

### ■ BER < 10<sup>-10</sup>



### **Transceiver Power Consumption**



Power at 16Gb/s = 129mW (8.1mW/Gb/s)

- Power scales with data rate
  - Mostly CMOS circuitry
  - Integrating RX sensitivity improves at lower data rates

### **Transceiver Performance Summary**

| Technology                              | 90nm CMOS                         |
|-----------------------------------------|-----------------------------------|
| Supply Voltages                         | Vdd=1V, LVdd=2.8V,<br>PDBias=2.5V |
| Data Rate                               | 5 - 16Gb/s                        |
| Extinction Ratio                        | 3dB                               |
| Average Optical Launch Power            | 3.1dBm                            |
| RX Input Capacitance                    | 440fF                             |
| RX Sensitivity (BER=10 <sup>-10</sup> ) |                                   |
| 10Gb/s                                  | 12.5mV (-9.6dBm)                  |
| 16Gb/s                                  | 20.2mV (-5.4dBm)                  |
| Area                                    | 0.105mm <sup>2</sup>              |
| Power at 16Gb/s                         | 129mW (8.1mW/Gb/s)                |

# Optical vs Electrical XCVR Performance Comparisons



- Compares favorably due to simple equalization circuitry
- Should scale well
  - Better VCSEL technology
  - Lower capacitance photodetectors
  - Higher data rates  $\Rightarrow$  More equalization for electrical channels

### Conclusion

- Optical interconnects provide a path to reducing the I/O bandwidth problem
- Proposed optical interconnect architecture is suitable for large scale integration in current/future CMOS technologies
  - VCSEL TX equalizer allows low current operation
  - Reliable MQWM TX capable of 2\*Vdd voltage drive
  - Low voltage integrating receiver
  - Baud-rate clock recovery

### Acknowledgments

- Profs Mark Horowitz, Azita Emami, and David Miller
- Jon Roth for optics/device design of modulator link
- CMP and STMicroelectronics for chip fabrication
- ULM Photonics for VCSELs
- Albis Optoelectronics for photodetectors
- MARCO-IFC for funding support
- Sh. Palermo for encouragement and support

### Backup Slides

### **Equalization Performance**



Maximum data rate vs average current

- Min 80% eye opening & <40% overshoot</p>
- Equalization allows lower average current for a given data rate
- Linear equalizer limited by VCSEL nonlinearity

### 13Gb/s Power w/ different tap #



# VCSEL TX Power vs Data Rate & Tap #



# Modulator Driver Reliability Simulations

### Maximum nMOS Voltages



- Transient with random data
- Corner simulations show no output stage voltages exceed 11% of nominal Vdd
- Monte Carlo simulations show tight distributions (σ < 15mV)</li>



### Maximum pMOS Voltages

### Coupling Capacitor Skew Compensation Performance

