A Survey on Flip Flop Replacement to Latch on Various Design

S.P. Vasumathi, D. Murlidharan*
School of Computing, SASTRA Deemed University, Thanjavur
*Corresponding Author

ABSTRACT

Abstract: This paper presents survey for the replacement of flip flop to latches and the advantages of the latch based sequential design. Flip flop are the major part of the design, a sequential elements and this flip flop has more disadvantages as performance decreases and area increases. An alternate method to increase the performance and reduce the area size latches. Latches are used instead of flip flops in certain places to increase the performance and decrease the area.

Keywords: Clock Skew, Clock Jitter, Time Borrowing, Flip-Flop, Latch.

1. INTRODUCTION

A FF is a sequential element which is used to store a bit data and it has two stable state also called as bi-stable multi-vibrator. It is an edge triggered it means the output and the next change with respect to the clocks positive edge or the negative edge. Latches can be used to form the FF. There are different FF they are SR FF, JK FF, T FF and D FF. Latch are the basic block of the sequential circuit and they are built from the gates. Latches are sensitive to the level of the inputs. There are different types of latch they are SR latch, T latch, JK latch. Latches are called as transparent when the data can pass and opaque when data cannot pass through a latch. Feedback paths in latch help in retrieving the data from device. These latches can be used as a storage device to store a one bit data [18-29]. Latches are also called as asynchronous devices. The latch based performance leads a major role in the industry because it effectively reduces the performance, area. This has major
advantages like time borrowing; clock skew and clock jitter are reduced due to the latch based performance.

This paper discusses about if FF is replaced by the latch what will be the advantages in the circuit and how a FF can be replaced by the latch and where can a FF be replaced to a latch.

2. POWER and AREA EFFICIENCY of LATCH AND REGISTER

The challenges in edge triggered FF is clock skew, jitter, and Power voltage temperature variations and this reduces the performance of the system and needs additional safety margin to the system or device. In high level synthesis the FF are outplaced by latches to preserve the functionality. The critical paths of the design are outplaced by the latches. In synchronous design the latches can outplace the FFs when the design meets the following methods:

1) Write and read do not take place at the same time.
2) Three sequential FFs so that the middle FF can be replaced by a latch
3) Even after the death time output of the FF should not be changed for a clock cycle

2.1) TIME BORROWING:
Latches have time borrowing property that is if there is a pipeline stage and the first stage finishes its operation in less than the expected time then time remaining in the first stage of the pipeline can be given or borrowed to the next stage to perform its operation with carefully balanced in delays [30-42]. But FFs provides hard pipeline stages and the extra time cannot be passed to the next stage in the pipeline. [2]

2.2) METHODS:
Motivation of latch in this paper
The FFs minimum clock period is given as

$$T_{ff} = t_{critical} + t_{sk} + t_{j} + t_{cq} + t_{SU}$$

The latch minimum clock period is given as

$$T_{l} = t_{critical} + t_{dq} + t_{j}$$
From both equations the latch has smaller clock period less setup time and clock skew.

![Figure 1](image1.png)

**Figure 1** (A) Actual path (B) Modified path.

The FF is replaced to latches in the netlist of device and netlist usually gives the connection information related to nets, instances and some attributes.

Each instance has the name of the instance net details and the library cell information to which they are mapped to and the instance. The netlist is generated from RTL design compiler. The netlist is modified from FFs to latches in the library cells.

```vhdl
v00ftn03nn0c0 \xprgr1/xpr_xph_busnum_reg[0]
(d(xpr_xpi_maddr1[24]), .clk(mb_clk_xp_g),
.rn(mb_mrst_b), .o(xpr_xph_busnum[0]) );

(A)
```

```vhdl
ah01cn90l0d0 \xprgr1/xpr_xph_busnum_reg[0]
(d(xpr_xpi_maddr1[24]), .clkb(mb_clk_xp_g),
.o(xpr_xph_busnum[0]) );

(B)
```

**Figure 3** Example for replacement of the FF to latches in the netlist library cells.
2.3) **ADVANTAGES:**
1) When latches are replaced to FF performance is increased by 17%. [2, 17]
2) Reduced power consumption when replaced to latch. [8]
3) The area is reduced because the latches are simpler than FFs. [8]
4) Latches have less clock skew and setup time it is also known for its time borrowing and slack passing. [2]

2.4) **DISADVANTAGES:**
1) The latch cannot perform the read and write operation at the same time.

2.5) **EXPERIMENT RESULTS:**
After the functionality check the power is improve by 5.78%, area is improved by 2.75% and overall timing is improved by 25% the experiment result shows an lot of improvement when a FF is replaced by the latch.[1]

3) **ANALYSING TIMING OPTIMIZATION**

3.1) **SKEW:**
The skew is caused by the clock tree different section in the device receives different input of clock. The difference of input clock in different section causes clock skew.

3.2) **JITTER:**
The input clock received to the state is expected to be the input clock as such, but the received clock may deviate in different values. This causes clock jitter.[4]

By the help of minimizing clock skew and jitter, fixed phase retiming is carried out to optimize the time. The jitter and clock skew in the clock cycle play a major role in the high-end microprocessor and speed circuit. [5,6]. Latches has the properties of less clock skew and jitter tolerance and also extra time passing and time borrowing.

3.3) **TIMING CONSTRAINT:**
There are two timing constrain need to be taken care of when a FF is replaced by the latches.
The races between the short paths are avoided by hold time

\[ DCL_{, \text{MIN}} = TSK + TH - TCQ \]

Worst case constrain is whether the circuit follow the original clock cycle

\[ DCL, \text{MIN} = TCYCLE - TSETUP - TSK - TSJ - TCQ \]

If Latches are placed outside needed space then clock skew and jitter can be canceled.

\[ DCL, \text{MAX} = TCYCLE - 2 \times DTP \]
3.4) DISADVANTAGES:

There will be lack of formal verification that is the FF is easy to verify itself in formal verification but for the latch the formal verification is difficult. [7, 8, 9]

Optimization technique is less for latches only like buffer insertion and gate resizing

The implemented method adds the SCAN test to the latch that is the SCAN test patterns for the original FFs circuits can be used after the replacement of the latches. [10, 11]

3.5) RULES:

An original FF in the circuit is chosen such that it does not have clock skew and jitter and now the FFs are replaced by latches.

Case one: States only a latch can move over a logical node.

Case two: States a pair of latch is fixed in the place of FFs.

3.6) RETIMING:

Retiming is an optimizing algorithm where the sequential elements convert into the combinational elements without changing the operation of the circuit.

The optimized circuit equation has the operation that does not change the function of the circuit.

\[ Ar(e) = a(e) + R(x) - R(y) \]  

(6)

Where node x and y are connected to an edge e, \( Ar(e) \) is no. of FF after retiming, and \( R(x) \) is...
no. of FF moved from node \( v \) to outside and inside.

\[ R(x) = -1, 0, \text{ or } 1 \]

If \( (a(e))! = 0 \) then \( A_r(e)! = 0 \)

**Fixed phase-forward retiming case** \[ R(x) = -1, 0. \]

**Fixed phase-backward retiming case** \[ R(x) = 1, 0. \]

**3.7) ALGORITHM:**

In optimization of timing is carried on the input and the latches are replaced only where there is no clock jitter and skew in the circuit.

Each FF is convert to a pair of active low or active high latches.
3.8) EXPERIMENT RESULT:

The developed fixed phase retiming in timing optimization by outplace FFs to latches gained the improvement in maximum timing constrains of the circuit by 17.2% from the originally FF based design. [3]

4) PERFORMANCE OPTIMIZATION in FPGA

Latches which are transparent are pulsed latches and has a 50% duty cycle. Certain FFs are iteratively replaced in the circuit with latches for the improvement in the of the circuit. The multiple skewed clocks are replaced with single clock which helps improve the performance of the circuit. The impact of the minimum delays that is short delay paths which can cause hold time violation are reduced due to the use of latches in place of FFs.

Unfortunately a clock in the FPGA consumes 19-40% of dynamic power consumption. [13, 14]. A clock skew is needed for improvement in the circuit speed.

4.1) MINIMUM DELAY:

The longest path between any two circuits and its speed.

4.2) METHODS:

Latches help to resolve the power barrier while using multiple clocks, and their net list modification required in retiming when replacement of FFs to latches.

Consideration: a Logic path from FF j to transparent latch i. The clock period is extened to the maximum delay. If a function starts form j to i’s and does not settle on i input by next rising edge of the clock then it may settle after the clock when i is transparent which can have
long and short path delay which creates hold time violations.

4.3) HOLD TIME VIOLATION:

The short and long paths are explained with an example the dark and dot lines represents latches L2, FF2 and FF3. The launched signal from FF1 arrives at L2 at transparent mode then hold time violation. When short signal arrives at the L2 which is transparent hold time can be reduced.

![Clock Diagram](image)

Figure 7: Show pulsed clock can reduce duty cycle by 33%

4.4) TIMING CONSTRAINTS:

The variation of the latch and FF timing constrain forms multiple latches which has the ability of long and short paths.

\[ a_{j} = \min_{j \to i} \left[ \max(T_{cq}, a_{j} + T_{d}, c_{di}) \right] \text{ for all } i \]  
(1)

These equations suggest that earliest and latest arrival of clock in latch in i and they are the functions of i through combinational path. The latch i driven by a pulse width Wi

\[ A_{i} \leq P + W_{i} - T_{su} \]  
(2)

Combining the long and short path with the pulse width Wi yields

\[ \max_{j \to i} \left[ \max (T_{cq}, a_{j} + T_{a}) + c_{di} \right] \leq P + W_{i} - T_{su} \text{ for all } i \]  
(3)

The latch hold time constrain combined with (eq 2) yields

\[ \min_{j \to i} \left[ \max (T_{cq}, a_{j} + T_{a}) + c_{di} \right] \geq W_{j} + T_{h} \text{ for all } i \]  
(4)

4.5) ALGORITHM:

Maximum Cycle Ratio (MCR)

\[ MCR (G) = \max_{c \in C} \left[ \sum_{(u, v) \in c} d(u, v) / |c| \right] \]  
(5)

Every FF is a latch with a varying width pulse thus requiring multiple clocks and power consumption. Here single clock are used and only specific pulse width are allowed to latches and the FF. Shortest paths are prevented from limiting the pulse width in FF.
4.6) RULES:

Step 1 clock period's best case for \( P_{\text{init}} \) is calculated and its pulse widths, \( W_{\text{init}} \).

Step 2 Then Using \( W_{\text{init}} \) in conjunction with eq 11 to choose the sequential element that has to change to latch.

Step 3 Finally some FF remains FF and with the same pulse width \( W_{\text{final}} \).

Step 4 Then from larger value of \( W_{\text{final}} \) the scale is taken back to the shortest path delay violation.

Step 5 Then every single FF allocated to whichever a FF or latches and the resolved for \( P \) and \( W \) established on the allocated FF or latches.

\[
\text{Input: } G (V; E), Emin
\]

\[
\text{Output: } P_{\text{final}}; W_{\text{final}}; \text{ ! Final}
\]

\[
P_{\text{init}}; \text{!init } \text{ forward (G)}
\]

\[
E \text{ in ascending order of short path}
\]

\[
E_{\text{in}} \text{ short path delay in Emin}
\]

\[
W_{\text{final}} \text{ max (!init)}
\]

\[
\text{for } e (u; v) \text{ sorted } E \text{ do}
\]

\[
d_{\text{min}} (u; v) \text{ delay of } e (u; v) \text{ short path Emin}
\]

\[
\text{if( } T_{\text{cq}} + d_{\text{min}} (u; v) < \text{!init(v)} + T_{h} \text{)then}
\]

\[
W_{\text{final}} T_{\text{cq}} + d_{\text{min}} (u; v) T_{h}
\]

\[
\text{exit}
\]

\[
\text{for}
\]

\[
\text{end}
\]

\[
\text{if}
\]

\[
\text{end}
\]

\[
\text{for}
\]

\[
\text{for } e (u; v) \text{ 2 } E \text{ do}
\]

\[
d_{\text{min}} (u; v) \text{ short path delay of } e (u; v) \text{ from Emin}
\]

\[
\text{if } T_{\text{cq}} + d_{\text{min}} (u; v) < W_{\text{final}} + T_{h} \text{ then}
\]

\[
\text{force Flip Flop (v)}
\]
else
force Latch (v)
end

if
end

for
Pfinal;! final   forward (G)
The inputs are G (V, E) and Emin is the delay between every pair of FF.

Calculation of Pinti and Winti. Sort the edges of E according to Emin

Initialize Wfinal is pulse width max in Winti. Then calculation of hold time violation if not satisfied then it means the pulse widths are larges and it exceeds the violation. Finally Wfinal is found. Next the checking the constrain are the minimum delay less than Wfinal to a FF to assure the hold time violation does not happens again. Calculation of Pfinal and Wfinal according to the newly created structure.

4.7) EXPERIMENT RESULTS:

Figure 8 shows the benefit of using the pulsed latches optimization while using the clock-slacks, and optimized result is normalized to 1.
Figure 9 the minimum delay that is short path is obtained by 70% and the performance is increased by 5%

Thus the performance optimization is obtained by using skewed clocks that can be replaced in place and route section which has no area or the power consumption drawback. [12]

5) TLSSD OPTIMIZED DESIGN

Muxed D are the most used sequential element because it has simplified timing analysis and less race conditions. The latches helps to replace the flip flop and it is useful in the design challenges like Power Voltage Temperatur, low-power and high performance.

5.1) OPERATION:

Figure 10 shows LSSD block diagram, where * divide the input pins and the test inputs. Figure 10 C) shows interconnect of the single latch LSSD cell.

Figure 10 A)Level Sensitive Scan Design cell, B)Truth Table, C)Interconnection of LSSD
In functional mode, when C is high the data are passed from D to latch L1. The test mode has two steps: the first is to register the test data in latch 1 when A must be high. The second step is to copy the test data from latch 1 to latch 2 when B must be high so that test values are copied from latch 1 to latch 2.

This LSSD schematic has 2 three input NANDs, 2 inverters, 8 two-input NANDs, and a total of forty-eight transistors and the NANDs are cross coupled which causes the transistor sizing. TLSSD is the optimized LSSD and it contains tristate inverter and inverter so it is called as Tristate LSSD. It consists of six transistors for tristate inverter and two transistors for each inverter, four transistor for tristate inverter. Hence total number of transistor is reduced by 48 to 38 and 12 transistor are reduced. This TLSSD has the electrical characterization which has simplified transistor transition and easy measure of the setup and hold time violation. The cross coupled NANDs are avoided in the design of TLSSD scan cell.

**5.2) METHODS:**

The TLSSD was designed by using SPICE, using freePDK45 technology and Nan gate library.

![Tristate Level Sensitive Scan Design](image-url)

The Power Delay Product (PDP) defined as the variation between the leakage power and delay of the cell.

\[ \text{Rel}_{\text{PDP}} = \frac{(t_{\text{LSSD}} \cdot P_{\text{LSSD}})}{(t_{\text{TLSSD}} \cdot P_{\text{TLSSD}})} - 1 \times 100 \]

\[ \text{Abs}_{\text{PDP}} = (t_{\text{LSSD}} \cdot P_{\text{LSSD}}) - (t_{\text{TLSSD}} \cdot P_{\text{TLSSD}}) \]
The Energy Per Transition (EPT) show variation between the energy and the propagation delay. Figure 13 shows the energy per transition difference (EPTd) EPT of TLSSD and EPT of the LSSD.

\[ \text{Rel}_{EPTd} = \left( \frac{t_{LSSD} \times E_{LSSD}}{t_{TLSSD} \times E_{TLSSD}} - 1 \right) \times 100 \]  \hspace{1cm} (13)

\[ \text{Abs}_{EPTd} = \left( t_{LSSD} \times E_{LSSD} - t_{TLSSD} \times E_{TLSSD} \right) \]  \hspace{1cm} (14)

6) CONCLUSION:

Various designs like time borrowing, max and min delays, Timing constrains, Retiming, Simulation and Verification phase are compared in the paper of flip flop to latch replacement and some flip flops cannot be replaced by latch in some critical paths and retiming is done for such place and more rules are explained for the replacement of flip flop to latch in paper. Hence from the comparison of the design the performance of the design in which the flip flop are replaced to latch are increased and area is reduced.
7) REFERENCES


[15] KO Yoshikawa, Keisuke Kanamaru, Shigeto Hagihara, Yuichi Nakamurat, Takeshi Yoshimura” Timing Optimization by Replacing Flip-Flops to Latches”, 0-780-8175-0/04/$17.00(C)2004 IEEE.


electroencephalography feedback. Biomedical Research, 28(13), 5646-5650.


