MULTI-CYCLE PATH AND FALSE PATHS
MULTI-CYCLE PATH AND FALSE PATHS
A Multi-cycle path in a design is a Register-to-Register path, through some combinational logic where if the source register changes, the path will require N number of clock cycles (where N>1) before the computation is propagated to the destination register. It is a good practice for a designer to document these multi cycle paths.
Figure shows path P1 that starts at flip-flop U1, goes through gates G1, G3, G5, and G6, and ends at flip-flop U5. This path has a total propagation delay longer than the clock period for CLK1.
In synthesis, it is encouraged that the designer inform the synthesis tool of any multi-cycle paths. This would allow the synthesis tool to more efficiently optimize the other logic paths that are not meeting the setup requirements rather than to attempt to optimize this multi-cycle path.
To specify this timing exception in STA, use the set_multicycle_path command which has -from, -to, and -through switches. For this example, it would look like this:
In synthesis, it is encouraged that the designer inform the synthesis tool of any multi-cycle paths. This would allow the synthesis tool to more efficiently optimize the other logic paths that are not meeting the setup requirements rather than to attempt to optimize this multi-cycle path.
To specify this timing exception in STA, use the set_multicycle_path command which has -from, -to, and -through switches. For this example, it would look like this:
set_multicycle_path -from U1 -to U5
FALSE PATH
In a false path, there is a logical connection from one point to another. Because of the way the logic is designed, this path can never control the timing. For example, a small piece of a design might look like the one in Figure.
When select is 0, there's a path from FF1 to FF2 through both multiplexer inputs. Because both selects can never be 0 concurrently (perhaps they are 1 hot signals), this circuit topology will prevent the path from occurring. As a result, this path doesn't need to be optimized to meet the clock cycle timing from the first to the second flip flop. This path is a false one because it can never occur. Even though it is false, a STA tool would flag it as a path. If the delay on the path misses its target, it would flag it as a failing signal. Placing a false-path constraint on this path will allow the synthesis tool to forgo optimizing this path for speed, thereby generating a smaller, lower-power implementation.
CLOCK BUFFER
I read some article about the clock buffer. Clock buffers are designed to have a equal rise and fall times. For designs with global signals, use global clock buffers to take advantage of the low-skew and high-drive strength of the dedicated global buffer tree of the target device. Your synthesis tool automatically inserts a clock buffer whenever an input signal drives a clock signal or whenever an internal clock signal reaches a certain fanout. You can instantiate the clock buffers in your design if you want to specify how the clock buffer resources should be allocated.
Some synthesis tools require you to instantiate a global buffer in your code to use the dedicated routing resource if a clock is driven from a non-dedicated I/O pin. The following Verilog examples instantiate a BUFG for an internal multiplexed clock circuit.
Some synthesis tools require you to instantiate a global buffer in your code to use the dedicated routing resource if a clock is driven from a non-dedicated I/O pin. The following Verilog examples instantiate a BUFG for an internal multiplexed clock circuit.
module clock_mux
(
data_in,
sel_in,
slow_clk,
fast_clk,
data_out
);
input data_in, sel_in;
input slow_clock, fast_clock;
output data_out;
reg clock;
wire clock_gbuff;
reg data_out;
always @ (sel_in or fast_clk or slow_clk)
begin
if (sel_in == 1'b1)
clock = fast_clk;
else
clock = slow_clk;
end
(
data_in,
sel_in,
slow_clk,
fast_clk,
data_out
);
input data_in, sel_in;
input slow_clock, fast_clock;
output data_out;
reg clock;
wire clock_gbuff;
reg data_out;
always @ (sel_in or fast_clk or slow_clk)
begin
if (sel_in == 1'b1)
clock = fast_clk;
else
clock = slow_clk;
end
buffg gbuff_for_mux
(
.out(clock_gbuff),
.in(clock)
);
always @ (posedge clock_gbuff)
data_out <= data_in;
endmodule
There is an application note from Actel website and can be downloaded from here.
CLOCK TREE SYNTHESIS
Now-a-days, designing clock-distribution networks for high-speed chips is more complex than just meeting timing specifications. Achieving clock latency and clock skew are difficult when you have clock signals of 300 MHz or more transversing the chip. Because the clock network is one of the most power-hungry nets on a chip, you need to design with power dissipation in mind.
The basics of CTS is to develop the interconnect that connects the system clock into all the cells in the chip that uses the clock. For CTS, your major concerns are,
The basics of CTS is to develop the interconnect that connects the system clock into all the cells in the chip that uses the clock. For CTS, your major concerns are,
- Minimizing the clock skew
- Optimizing clock buffers to meet skew specifications and
- Minimize clock-tree power dissipation
The primary job of CTS tools is to vary routing paths, placement of the clocked cells and clock buffers to meet maximum skew specifications.
For a balanced tree without buffers (before CTS), the clock line's capacitance increases exponentially as you move from the clocked element to the primary clock input. The extra capacitance results from the wider metal needed to carry current to the branching segments. The extra metal also results in additional chip area to accommodate the extra clock-line width. Adding buffers at the branching points of the tree significantly lowers clock-interconnect capacitance, because you can reduce clock-line width toward the root.
When designing a clock tree, you need to consider performance specifications that are timing-related. Clock-tree timing specifications include clock latency, skew, and jitter. Non-timing specifications include power dissipation, signal integrity. Many clock-design issues affect multiple performance parameters; for example, adding clock buffers to balance clock lines and decrease skew may result in additional clock-tree power dissipation.
For a balanced tree without buffers (before CTS), the clock line's capacitance increases exponentially as you move from the clocked element to the primary clock input. The extra capacitance results from the wider metal needed to carry current to the branching segments. The extra metal also results in additional chip area to accommodate the extra clock-line width. Adding buffers at the branching points of the tree significantly lowers clock-interconnect capacitance, because you can reduce clock-line width toward the root.
When designing a clock tree, you need to consider performance specifications that are timing-related. Clock-tree timing specifications include clock latency, skew, and jitter. Non-timing specifications include power dissipation, signal integrity. Many clock-design issues affect multiple performance parameters; for example, adding clock buffers to balance clock lines and decrease skew may result in additional clock-tree power dissipation.
The biggest problem we face in designing clock trees is skew minimization. The factors that contribute to clock skew include loading mismatch at the clocked elements, mismatch in RC delay.
Clock skew adds to cycle times, reducing the clock rate at which a chip can operate. Typically, skew should be 10% or less of a chip's clock cycle, meaning that for a 100-MHz clock, skew must be 1 nsec or less. High-performance designs may require skew to be 5% of the clock cycle.
Clock design methodology
Many chip companies have comprehensive clock-network- design strategies that they use on their customers' chips. Motorola uses the Clock Generator tool along with Cadence place-and-route tools. This tool combination produces a tree with minimum insertion delay, a minimum number of buffers, and maximum fan-out. Typical skew is less than 300 psec. After generation of the clock tree, the output from the place-and-route tool is flat, meaning that the design hierarchy is lost.
Effect of CTS
GlossaryEffect of CTS
- Lots of clock buffers are added
- Congestion may increase
- Non-clock tree cells may have been moved to non-ideal locations
- Can introduce new timing violations
Balanced clock tree : The delays from the root of the clock tree to leaves are almost same.
Clock distribution: The main task of clock distribution is to distribute the clock signal across the chip in order to minimize the clock skew.
Clock buffer: To keep equal rise and fall delays of the clock signal.
Global skew: Difference in clock timing paths b/w any combination of two FFs in the design within the same clock domain.
Local skew : Balances the skew only b/w related FF pairs. FFs are related only when one FF launches date which is captured by the other.
Clock distribution: The main task of clock distribution is to distribute the clock signal across the chip in order to minimize the clock skew.
Clock buffer: To keep equal rise and fall delays of the clock signal.
Global skew: Difference in clock timing paths b/w any combination of two FFs in the design within the same clock domain.
Local skew : Balances the skew only b/w related FF pairs. FFs are related only when one FF launches date which is captured by the other.
GUIDELINES FOR IMPROVING PERFORMANCE OF SYNTHESIS
Following are some of the important guidelines to improve the performance of synthesized logic and produce the clean design.
Clock and Reset logic Clock and Reset generation logic for the modules should be kept in one module - Synthesis only once and do not touch. This helps in a clean clock constraints specifications. Another advantage is, the modules which are using these clocks and resets can be constrained using the ideal clock specification.
No glue logic at the top The top module should be used only for connecting various components (modules) together. It should not contain any glue logic.
Module name Module name should be same as the file name and one should avoid describing more than one module or entity in a single file. This avoids any confusion while compiling the files and during the synthesis.
FSM Coding
Tri-state buffers A tri-state buffer is inferred when a high impedance (Z) is assigned to an output. Tri-state logic is generally not always recommended because it reduces testability and is difficult to optimize, since it cannot be buffered.
From:http://www.asic-planet.com/Synthesis.html
No glue logic at the top The top module should be used only for connecting various components (modules) together. It should not contain any glue logic.
Module name Module name should be same as the file name and one should avoid describing more than one module or entity in a single file. This avoids any confusion while compiling the files and during the synthesis.
FSM Coding
- While coding FSMs, the state names should be described using the enumerated types.
- The combinational logic for computing the next state should be in its own process, separate from the state registers.
- Implement the next-state combinational logic with a case statement. This helps in optimizing the logic much better and results in a clean design.
Tri-state buffers A tri-state buffer is inferred when a high impedance (Z) is assigned to an output. Tri-state logic is generally not always recommended because it reduces testability and is difficult to optimize, since it cannot be buffered.
From:http://www.asic-planet.com/Synthesis.html