

#### A High-Performance Fault Diagnosis Approach for the AES SubBytes Utilizing Mixed Bases

#### Mehran Mozaffari Kermani and Arash Reyhani-Masoleh

Presented by Mehran Mozaffari Kermani

Department of Electrical and Computer Engineering

The University of Western Ontario

# Outline



- Introduction
- The Advanced Encryption Standard
- Presented Fault Detection Scheme
- Complexity Analysis
- ASIC Implementations and Comparison
- Conclusions

# Introduction



- The Advanced Encryption Standard (AES) is the current NIST standard used for secure communications
- Faults in the AES
  - Natural faults
  - Malicious faults injected by attackers
- Effective fault detection schemes
  - Acceptable error coverage
  - Low overhead in terms of area and delay



Advanced Encryption Standard (AES) Western



- -128-bit plaintext/key -10 rounds
- -4 transformations





• The nonlinear and most complex transformation among those in the encryption of the AES is the S-boxes.

• The S-box consists of multiplicative inversion and affine transformation.

• Most commonly are implemented using look-up tables and composite fields in hardware.

S-box in Hardware



### Look-up Tables (LUTs)

- Not preferred for high performance applications
  - Because of high area and the fact that unbreakable delay of the LUTs cannot be pipelined

### - Composite Fields

- Low area, can be sub-pipelined
- Most commonly are based on polynomial, normal, and mixed bases



### S-box Using Composite Fields



# **Previous Works**



Parity-based

Mehran Mozaffari Kermani and Arash Reyhani-Masoleh Fault Diagnosis and Tolerance in Cryptography 2011 (FDTC 2011) /estern

### Proposed Fault Detection Scheme



S-box in mixed basis structure

-The operations are divided into 3 blocks.

-5 predicted parities (error flags) are obtained for the entire operations.

Western

## Fault Detection Scheme (cont.)





Error indication for each block used in the fault detection scheme.

## **Parity Predictions**



**Theorem:** The parity predictions for the 3 blocks of the Sbox using mixed basis in the presented fault detection scheme are as follows:

## Parity Predictions (Other Variants)



Based on the reliability requirements and available resources, one may use different number of predicted parities, e.g., merging the ones for the first and last blocks:

$$\hat{P}_{1+2} = \eta_7(\eta_3 + \eta_1) + \eta_6(\eta_2 + \eta_0) + \eta_5\eta_3 + \eta_4\eta_2 + \eta_7 + \eta_4 + \eta_3 + \eta_0,$$

$$\hat{P}_{4+5} = \theta_3(\eta_6 + \eta_5 + \eta_3 + \eta_2 + \eta_1) + \theta_2(\eta_7 + \eta_6 + \eta_4 + \eta_3 + \eta_0) + \theta_1(\eta_7 + \eta_5 + \eta_3 + \eta_1 + \eta_0) + \theta_0(\eta_6 + \eta_4 + \eta_2 + \eta_1).$$

# **Error Simulations**



#### Error Model:

•In this paper, we use stuck-at error model. The objective in using this model is to cover the malicious and natural errors caused by bit flips.

•In fault attacks, single error injection is the ideal case for gaining the maximum information. Nevertheless, due to technological constraints, a more realistic error model is to inject multiple errors.

#### Our Scheme:

•Single stuck-at errors happening at the output of each S-box block are covered 100% in the proposed scheme.

•We have used LSFRs for multiple random error injections.

•After injecting 200,000 multiple errors, the error coverage of close to 100% is obtained.



- We have used the STM 65-nm CMOS standard technology.
- VHDL has been used as the design entry for different fault diagnosis approaches.
- The Synopsys Design Compiler has been utilized for specifying the constraints and performing the synthesis.



## Performance Comparison on ASIC

| Scheme                       | Area                   |          | Frequency |          | Throughput | Efficiency               | EC         |
|------------------------------|------------------------|----------|-----------|----------|------------|--------------------------|------------|
|                              | $(\mu m^2)$            | Overhead | (MHz)     | Overhead | (Gbps)     | $(\frac{Mbps}{\mu m^2})$ |            |
| Redundancy [9],              | $52.3 \times 10^{3}$   | 100%     | 813       | 107%     | 6.5        | 0.12                     | 100%       |
| [15]                         | GE: $26.1 \times 10^3$ |          |           |          |            |                          |            |
| Parity-based                 | $29.5 \times 10^{3}$   |          |           |          |            |                          |            |
| scheme in [14]               | GE: $14.7 \times 10^3$ | 13%      | 1,620     | 4%       | 12.9       | 0.44                     | 50%        |
| $(256 \times 9 \text{ LUT})$ |                        |          |           |          |            |                          | (SubBytes) |
| Parity-based                 | $57.1 \times 10^{3}$   |          |           |          |            |                          |            |
| scheme in [11]               | GE: $28.5 \times 10^3$ | 119%     | 1,470     | 15%      | 11.7       | 0.20                     | 50%        |
| $(512 \times 9 \text{ LUT})$ |                        |          |           |          |            |                          |            |
| Multiplication               | 876                    |          |           |          |            |                          |            |
| approach in [13]             | GE: 421                | 25%      | 532       | 22%      | 4.3        | 4.91                     | 75%        |
| (excluding affine)           |                        |          |           |          |            |                          |            |
| Parity-based                 | 958                    |          |           |          |            |                          |            |
| scheme in [21]               | GE: 461                | 37%      | 555       | 17%      | 4.4        | 4.63                     | 97%        |
| (polynomial basis)           |                        |          |           |          |            |                          |            |
| Parity-based                 | 996                    |          |           |          |            |                          |            |
| proposed scheme              | GE: 479                | 33%      | 625       | 16%      | 5.0        | 5.02                     | 97%        |
| (mixed bases)                |                        |          |           |          |            |                          |            |

#### GE: Gate equivalent in terms of 2-input NAND gates.

- [9] R. Karri, K. Wu, P. Mishra, and K. Yongkook, "Fault-based Side-Channel Cryptanalysis Tolerant Rijndael Symmetric Block Cipher Architecture," *In Proc. of DFT '01*, pp. 418-426, 2001.
- [11] G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri, "Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard," *IEEE Trans. Computers*, vol. 52, no. 4, pp. 492-505, 2003.

- [13] M. Karpovsky, K. J. Kulikowski, and A. Taubin, "Differential Fault Analysis Attack Resistant Architectures for the Advanced Encryption Standard," *In Proc. of CARDIS '04*, vol. 153, pp. 177-192, Aug. 2004.
- [14] K. Wu, R. Karri, G. Kuznetsov, and M. Goessel, "Low Cost Concurrent Error Detection for the Advanced Encryption Standard," *Proc. Int'l Test Conf.* '04, pp. 1242-1248, Oct. 2004.
- [15] C. H. Yen and B. F. Wu, "Simple Error Detection Methods for Hardware Implementation of Advanced Encryption Standard," *IEEE Trans. Computers*, vol. 55, no. 6, pp. 720-731, June 2006.
- [21] M. Mozaffari Kermani and A. Reyhani-Masoleh, "A Low-Power High-Performance Concurrent Fault Detection Approach for the Composite Field S-box and Inverse S-box," To appear in IEEE Trans. Computers, preprint.

# Conclusions



- We have presented a lightweight concurrent fault detection scheme for the composite field realization of SubBytes using mixed basis.
- The presented fault detection scheme has low area cost and negligible degradation in the frequency (reaching the efficiency of 5020 Gbps/mm<sup>2</sup> while maintaining the throughput of 5 Gbps).
- The presented scheme has the error coverage of close to 100% for the entire SubBytes, suitable for secure environments.
- The presented scheme is also applicable for the inverse S-box and the merged structures.



## Thank you!