2008年11月26日 星期三

MULTI-CYCLE PATH AND FALSE PATHS

MULTI-CYCLE PATH AND FALSE PATHS

A Multi-cycle path in a design is a Register-to-Register path, through some combinational logic where if the source register changes, the path will require N number of clock cycles (where N>1) before the computation is propagated to the destination register. It is a good practice for a designer to document these multi cycle paths.




Figure shows path P1 that starts at flip-flop U1, goes through gates G1, G3, G5, and G6, and ends at flip-flop U5. This path has a total propagation delay longer than the clock period for CLK1.

In synthesis, it is encouraged that the designer inform the synthesis tool of any multi-cycle paths. This would allow the synthesis tool to more efficiently optimize the other logic paths that are not meeting the setup requirements rather than to attempt to optimize this multi-cycle path.

To specify this timing exception in STA, use the set_multicycle_path command which has -from, -to, and -through switches. For this example, it would look like this:


set_multicycle_path -from U1 -to U5


FALSE PATH

In a false path, there is a logical connection from one point to another. Because of the way the logic is designed, this path can never control the timing. For example, a small piece of a design might look like the one in Figure.


When select is 0, there's a path from FF1 to FF2 through both multiplexer inputs. Because both selects can never be 0 concurrently (perhaps they are 1 hot signals), this circuit topology will prevent the path from occurring. As a result, this path doesn't need to be optimized to meet the clock cycle timing from the first to the second flip flop. This path is a false one because it can never occur. Even though it is false, a STA tool would flag it as a path. If the delay on the path misses its target, it would flag it as a failing signal. Placing a false-path constraint on this path will allow the synthesis tool to forgo optimizing this path for speed, thereby generating a smaller, lower-power implementation.


CLOCK BUFFER

I read some article about the clock buffer. Clock buffers are designed to have a equal rise and fall times. For designs with global signals, use global clock buffers to take advantage of the low-skew and high-drive strength of the dedicated global buffer tree of the target device. Your synthesis tool automatically inserts a clock buffer whenever an input signal drives a clock signal or whenever an internal clock signal reaches a certain fanout. You can instantiate the clock buffers in your design if you want to specify how the clock buffer resources should be allocated.

Some synthesis tools require you to instantiate a global buffer in your code to use the dedicated routing resource if a clock is driven from a non-dedicated I/O pin. The following Verilog examples instantiate a BUFG for an internal multiplexed clock circuit.


module clock_mux
(
data_in,
sel_in,
slow_clk,
fast_clk,
data_out
);

input
data_in, sel_in;

input slow_clock, fast_clock;
output data_out;

reg clock;
wire clock_gbuff;
reg data_out;

always @ (sel_in or fast_clk or slow_clk)
begin
if (sel_in == 1'b1)
clock = fast_clk;
else
clock = slow_clk;
end
 
buffg gbuff_for_mux
(
.out(clock_gbuff),
.in(clock)
);

always @ (posedge clock_gbuff)
data_out <= data_in;

endmodule


There is an application note from Actel website and can be downloaded from
here.

CLOCK TREE SYNTHESIS

Now-a-days, designing clock-distribution networks for high-speed chips is more complex than just meeting timing specifications. Achieving clock latency and clock skew are difficult when you have clock signals of 300 MHz or more transversing the chip. Because the clock network is one of the most power-hungry nets on a chip, you need to design with power dissipation in mind.

The basics of CTS is to develop the interconnect that connects the system clock into all the cells in the chip that uses the clock. For CTS, your major concerns are,
  • Minimizing the clock skew
  • Optimizing clock buffers to meet skew specifications and
  • Minimize clock-tree power dissipation
The primary job of CTS tools is to vary routing paths, placement of the clocked cells and clock buffers to meet maximum skew specifications.

For a balanced tree without buffers (before CTS), the clock line's capacitance increases exponentially as you move from the clocked element to the primary clock input. The extra capacitance results from the wider metal needed to carry current to the branching segments. The extra metal also results in additional chip area to accommodate the extra clock-line width. Adding buffers at the branching points of the tree significantly lowers clock-interconnect capacitance, because you can reduce clock-line width toward the root.

When designing a clock tree, you need to consider performance specifications that are timing-related. Clock-tree timing specifications include clock latency, skew, and jitter. Non-timing specifications include power dissipation, signal integrity. Many clock-design issues affect multiple performance parameters; for example, adding clock buffers to balance clock lines and decrease skew may result in additional clock-tree power dissipation.

The biggest problem we face in designing clock trees is skew minimization. The factors that contribute to clock skew include loading mismatch at the clocked elements, mismatch in RC delay.

Clock skew adds to cycle times, reducing the clock rate at which a chip can operate. Typically, skew should be 10% or less of a chip's clock cycle, meaning that for a 100-MHz clock, skew must be 1 nsec or less. High-performance designs may require skew to be 5% of the clock cycle.

Clock design methodology

Many chip companies have comprehensive clock-network- design strategies that they use on their customers' chips. Motorola uses the Clock Generator tool along with Cadence place-and-route tools. This tool combination produces a tree with minimum insertion delay, a minimum number of buffers, and maximum fan-out. Typical skew is less than 300 psec. After generation of the clock tree, the output from the place-and-route tool is flat, meaning that the design hierarchy is lost.

Effect of CTS
  1. Lots of clock buffers are added
  2. Congestion may increase
  3. Non-clock tree cells may have been moved to non-ideal locations
  4. Can introduce new timing violations
Glossary

Balanced clock tree : The delays from the root of the clock tree to leaves are almost same.

Clock distribution: The main task of clock distribution is to distribute the clock signal across the chip in order to minimize the clock skew.

Clock buffer: To keep equal rise and fall delays of the clock signal.

Global skew: Difference in clock timing paths b/w any combination of two FFs in the design within the same clock domain.

Local skew : Balances the skew only b/w related FF pairs. FFs are related only when one FF launches date which is captured by the other.


GUIDELINES FOR IMPROVING PERFORMANCE OF SYNTHESIS

Following are some of the important guidelines to improve the performance of synthesized logic and produce the clean design.

Clock and Reset logic Clock and Reset generation logic for the modules should be kept in one module - Synthesis only once and do not touch. This helps in a clean clock constraints specifications. Another advantage is, the modules which are using these clocks and resets can be constrained using the ideal clock specification.

No glue logic at the top The top module should be used only for connecting various components (modules) together. It should not contain any glue logic.

Module name Module name should be same as the file name and one should avoid describing more than one module or entity in a single file. This avoids any confusion while compiling the files and during the synthesis.

FSM Coding
  • While coding FSMs, the state names should be described using the enumerated types.
  • The combinational logic for computing the next state should be in its own process, separate from the state registers.
  • Implement the next-state combinational logic with a case statement. This helps in optimizing the logic much better and results in a clean design.
Multiplexer Inference A case statement is used for implementing multiplexers. To prevent latch inferences in case statements, the default part of the case statement should always be specified. On the other hand an if statement is used for writing priority encoders. Multiple if statements with multiple branches result in the creation of a priority encoder structure.

Tri-state buffers A tri-state buffer is inferred when a high impedance (Z) is assigned to an output. Tri-state logic is generally not always recommended because it reduces testability and is difficult to optimize, since it cannot be buffered.
From:http://www.asic-planet.com/Synthesis.html