Wednesday, February 3, 2010

State Encoding techniques that we have ...?

ONE HOT ENCODING
In one-hot encoding, only one bit of the state vector is asserted for any given state.All other state bits are zero. So if there are n states, then n state flip-flops are required.State decode is simplified, since the state bits themselves can be used directly to indicate whether the machine is in a particular state. No additional logic is required.

* A highly encoded machine may slow dramatically as more states are added.
* one design may no longer be best if you add a few states and change some others.
One-hot is equally “optimal” for all machines.
* One-hot state machines are typically faster. Speed is independent of the number of
states, and instead depends only on the number of transitions into a particular
state.
* One-hot machines are easy to design. HDL code can be written directly from the
state diagram without coding a state table.
* Modifications are straightforward. Adding and deleting states, or changing
excitation equations, can be implemented easily without affecting the rest of the
machine.
* There is typically not much area penalty over highly encoded machines.
* Critical paths are easy to find using static timing analysis.
* It is easy to debug.

BINARY ENCODING

Tuesday, February 2, 2010

Blocking and Non-Blocking

There is very little difference between non-blocking and blocking in speed and no errors if done correctly either way.
Blocking and Non-Blocking are procedural assignments, Where Blocking performs sequential execution and  Non- blocking executes the statements
in  parallel.
If a variable is given value by a blocking assignment statement then this new value is used in evaluating all subsequent statements in block.
 
Synthesized blocking statements:
always@ (posedge clk)
begin
Q1 = D;
Q2 = Q1;
end 

Reset

Synchronous Reset:
Synchronous reset logic will synthesize to smaller flipflops, particularly if the reset is gated with logic generating the input. But in such a case the combinational logic gate count grows, So the overall gate count savings may not be that significant.

The clock works as a filter for smaller reset glitches; However, if these glitches occur near the active clock edge, the flipflop could go to metastable.
Dis adv:
Problem with synchronous reset is  that synthesis tool cannot be easily distinguish the reset signal from any other data signal.
Designs that are pushing limit for datapath timing, cannot afford to have added gates and additional net delays in datapath due to logic inserted to handle synchronous resets.

Asynchronous Reset:
The biggest problem with asynchronous resets is the release, also called reset removal. Using an asynchronous reset, the designer is guaranteed not to have the reset added to that datapath.
Circuit can be reset with or without a clock present.
Dis adv:
Consider that release of the reset can occur withing one clock period, if the release of reset occured on or near the clock edge such that flipflop metastable.

Gating the clock:
Clocks are gated to reduce power dissippation. Part of a cicruit which functions only on receiving certain enable signal, need not be clocked always. In such a case to prevent the sequential element from being clocked through the enable signal is not active, clock gating is used.

Metastability:
To reduce metastability designers most commonly use a multiple-stage synchronizer in which two or more flipflops are cascaded to form a synchronous cicruit.
Example code:
                              always@( posedge clk or negedge rst_n)
                              if (~rst_n)
                                signal <= 1'b0;
                             else
                             begin  // Synchronization of signal.
                                 signal_d1 <= signal;
                                 signal_d2 <= signal_d1;
                             end

Conclusion:
Every design has it own merits and demerits. There is nothing called perfect design.
Synchronous reset needs more gates it to implement and therefore it would be slow due to combinational delay.
Asynchronous reset doesnt require more gates to implement and it would be faster. But it suffers from metastability problems.


DC Balance

The requirement of High speed transmission (Optical fiber links), is the serial data stream be DC balanced. That is, it must have an equal number of 1's and 0's.
Any long term DC component in the data stream (lot more 1's than 0's) creates a bias at the receiver that reduces its ability to distinguish reliably between 1's and 0's.
Generally NRZ, NRZI or RZ data has no guarantee of DC balance. However, it can still be achieved by using a few extra bits to code the user data in a balanced code, in which each code word has an equal number of 1's and 0's, and then sending these code words in NRZ format.
The 8B/10B code sloves this problem by associating with each 8-bit value to be encoded a pair of unbalanced code words, one 4 out of 10 ('light") and the other 6 out of 10 ("heavy"). A single bit of information indicating whether the last unbalanced code word that it transmitted was heavy or light to keep track of the Runnig disparity. When it comes time to transmit another unbalanced code word, the coder selects the one of the pair with the opposite weight.
The big advantage of BPRZ overRZ is that its DC balanced.

Ethernet


Ethernet: 
Ethernet protocols refer to the family of local-area network (LAN) covered by the IEEE 802.3. The simplest form of Ethernet uses a passive bus operated at 10 Mbps, Which is a shared model. After that Switch Ethernet came in to picture.

In the Ethernet standard, there are two modes of operation: half-duplex and full-duplex modes. In the half duplex mode, data are transmitted using the popular Carrier-Sense Multiple Access/Collision Detection (CSMA/CD) protocol on a shared medium. The main disadvantages of the half-duplex are the efficiency and distance limitation, in which the link distance is limited by the minimum MAC frame size. This restriction reduces the efficiency drastically for high-rate transmission. Therefore, the carrier extension technique is used to ensure the minimum frame size of 512 bytes in Gigabit Ethernet to achieve a reasonable link distance.
Four data rates are currently defined for operation over optical fiber and twisted-pair cables:
  • 10 Mbps - 10Base-T Ethernet (IEEE 802.3)  
  • 100 Mbps - Fast Ethernet (IEEE 802.3u)
  • 1000 Mbps - Gigabit Ethernet (IEEE 802.3z)  
  • 10-Gigabit - 10 Gbps Ethernet (IEEE 802.3ae).  
  • 100Gbps - ...

Elements of Ethernet :
  • Ethernet Frame 
  • Medium Access Control (MAC)
  • Physical Medium (PHY)

Ethernet Frame:
Ethernet traffic moves in units called frames. The maximum size of frames is called the Maximum Transmission Unit (MTU). When a network device gets a frame larger than its MTU, the data is fragmented (broken into smaller frames) or dropped. Historically, Ethernet has a maximum frame size of 1500 bytes, so most devices use 1500 as their default MTU. A standard Ethernet frame is comprised of payload produced at Layer 4 and above, an IP header produced at Layer 3, and a data header produced at Layer 2. The payload at Layer 4 is the MSS (Maximum Segment Size), and is typically 1460 bytes. Add the TCP/IP header of 40 bytes and we have the Layer 3 MTU or maximum transmission unit of 1500 bytes.

At Layer 2, a frame header is added to the MTU, which is comprised of the source and destination MAC addresses (6 + 6 = 12 bytes), the Ethernet type (2 bytes) and the CRC information (4 bytes), totaling 18 bytes. Many refer to an Ethernet frame as 1518 bytes, which is simply the 1500 byte MTU plus the 18 byte header. The 4 byte CRC information is sometimes not counted, leading to the 1514 byte size. If 802.1q VLAN tagging is in use, an additional 4 bytes are added, bringing the total to 1522 bytes.

7
1
6
6
2
46-1500bytes
4
Pre
SFD
DA
SA
Length Type
Data
FCS
  • Preamble (PRE)- 7 bytes. The PRE is an alternating pattern of ones and zeros that tells receiving stations that a frame is coming, and that provides a means to synchronize the frame-reception portions of receiving physical layers with the incoming bit stream.
  • Start-of-frame delimiter (SFD)- 1 byte. The SOF is an alternating pattern of ones and zeros, ending with two consecutive 1-bits indicating that the next bit is the left-most bit in the left-most byte of the destination address.
  • Destination address (DA)- 6 bytes. The DA field identifies which station(s) should receive the frame..
  • Source addresses (SA)- 6 bytes. The SA field identifies the sending station.
  • Length/Type- 2 bytes. This field indicates either the number of MAC-client data bytes that are contained in the data field of the frame, or the frame type ID if the frame is assembled using an optional format.
  • Data- Is a sequence of n bytes (46=< n =<1500) of any value. (The total frame minimum is 64bytes.)
  • Frame check sequence (FCS)- 4 bytes. This sequence contains a 32-bit cyclic redundancy check (CRC) value, which is created by the sending MAC and is recalculated by the receiving MAC to check for damaged frames.




    Jumbo frame



    Normal standards-compliant IEEE-defined ethernet frames have a maximum MTU of 1500 bytes (plus 18 additional bytes of header/trailer for srcaddr, dstaddr, length/type, and checksum).

    Do they exist ..?
    Juniper talks about 1514 rather than 1518 (excluding just the 4 byte FCS of ethernet frames when specifying MTUs).
    Cisco InterLink Switch Frame Format takes the max encapsulated ethernet frame size out to 1548 bytes

    Jumbo Frames
    jumbo frame is basically anything bigger than 1522 bytes, with a common size of 9000 bytes, which is exactly six times the size of a standard Ethernet frame.With Ethernet headers, a 9k byte jumbo frame would be 9014-9022 byte.

    how large should an Ethernet frame be?
    Ethernet's 32-bit cyclic redundancy check is effective for detecting bit errors at frame sizes under 12,000 bytes, thereby drawing a logical upper limit. Within that, the optimum large frame size can be determined by an application's block size. For example, Network File System (NFS) transfers data in 8,192-byte blocks. So adding room for headers, an attractive maximum Ethernet frame size for NFS applications is 9,000 bytes.


    Why Jumbo Frames?
    Every data unit on a network has to be assembled by the sender, and its headers have to be read by the network components between the sender and the receiver. The receiver then reads the frame and TCP/IP headers before processing the data. This activity, plus the headers added to frames and packets to get them from sender to receiver, consumes CPU cycles and bandwidth.
    A single 9k jumbo frame replaces six 1.5k standard frames, producing a net reduction of five frames, with fewer CPU cycles consumed end to end. Further, only one TCP/IP header and Ethernet header is required instead of six, resulting in 290 (5*(40+18)) fewer bytes transmitted over the network.

    • There are other jumbo frame sizes used, but larger sizes don't always lead to better performance.
    • Jumbo frames require gigabit Ethernet. 
    • Gigabit Ethernet Layer 2 switches forward or drop jumbo frames; they don't fragment.
    • Fragmentation is a Layer 3 (routing) function.

    Monday, February 1, 2010

    VLAN

    Virtual Local Area Network
    802.1q is a standard to support Virtual LANs (VLAN) in a network of interconnected switches. The frames have an additional identifier called VLAN tag or Identifier. It is not necessary that all the switches in the network, support 802.1q standard. The switches that support 802.1q can work with those that do not.

    There can be two kinds of frames in the network;
    • untagged frames.
    • VLAN tagged frames. 

    The untagged frames are considered to be tagged with the default VLAN Identifier that is defined to be 1. Within the switch, the untagged frames are treated as if they are associated with a VLAN tag of 1.

    Frames having the same VLAN tag belong to an equivalence class and are treated similarly. VLAN tagging can happen at the ingress of a switch. Each port is associated with a Port VLAN Identifier (PVID). The untagged frames that arrive at this port will then be tagged with a VLAN Identifier equal to the PVID. When a computer is directly connected to a switch, it may be desirable to assign a VLAN Identifier to the frames coming from and to the computer. Typically host computers do not want to bother about VLAN tags, so it is a better solution to let the switch assign VLAN tags based on the port to which the computer is connected. Similarly in the return path, for frames going from the switch to the computer, the switch can strip the VLAN tags at the egress.

    A VLAN consists of a VLAN Identifier associated with a set of ports on the switch called the port set. Since the virtual network will span multiple switches, in each switch the same VLAN Identifier has to be associated with the set of ports required. This configuration will have to be done on all the switches from a management entity. Thus a VLAN consists of a set of ports across many switches, mapped to a VLAN Identifier.

    A port can belong to any number of VLANs. Each VLAN also has an untagged port list. This is the list of ports on which, the frames belonging to this VLAN will be untagged before being sent out of the switch.

    At a port level one can associate the following
    ● Port VLAN Identifier : VLAN Identifier to be used for VLAN tag insertion for untagged frames coming into the switch on this port
    ● Accept Only VLAN Tagged frames : If set all untagged frames coming into the switch on this port shall be dropped
    ● Ingress VLAN Filtering : If set, the frame shall be accepted only if this port belongs to the VLAN port set of the VLAN Identifier associated on the frame.

    Friday, January 29, 2010

    synchronous Designing

    There are two generally accepted architectures for Synchronous State Machines. The first type considered is a state machine in which the outputs depend only on the current state. This is commonly known as a Moore machine. In the second type, the outputs depend on both the current state and the input variables. This is
    known as a Mealy Machine

    Which one is better ..?
    In mealy machine,as soon as the input bit is one.. the output will become logic high.It need not go into another state.
    Now let us see for Moore machine... If the incoming input bit is one it will go to another state upon the clock tick where we will take output which depends on the present state.

    Mealy is fast but it is asynchronous. since the output changes as soon as the inputs change according to the logic and for any asynchronously behaving circuit will have glitches.

    Comes to Moore, its output always depends on the State or nothing but registered output but not input.

    Mealy always have one state lesser than compared to Moore, after all state is nothing but the flip flops condition.
    Most designers goes for Moore state machine.


    When it comes to Logic design .. ? 
    Well, some of the designers feel that " This is an academic distinction and in general something you will pretty much never pay any attention to during logic design. As for speed, again, it is impossible to tell which one is 'faster' just by that distinction (mealy vs. moore). The remark about glitches is also not true, if the FSM is a part of a larger synchronous design there is no danger from glitches. In case the FSM outputs are going to another clock domain, glitches are still a very real problem even if you use a moore machine. you pick what is right for the design".

    Transport and Inertial Delay

    Delays are usually used for behavioral models to better represent signaling at their external interface.The main difference is that transport delay adds the propagation delay to the signal. How ever the inertial delay causes the pulses less than that delay to get suppressed & will not propagate these pulses to change the output.

    Inertial Delay
    Inertial delay.inertial delay is the one which gate ( Component ) have,that is if a gate is modeling then in real situation it has some delay to model that inertial delay is used. For example If you model an inertial delay of, say 20 ns, and then put a pulse of, say, 10ns, through the model, it will be "swallowed" and will not appear at the output. Because I/p pulse that do not exceed the propagation delay of the gate do not propagate to the O/P.
    Inertial delay is the time it takes for a signal to change its value.
    This is usually representative of capacitance.The continuous-assignment will create an inertial delay.
    By default delay is inertial.

    Transport delay
    It's the time taken by signal to propagate through a net i.e through wire also known as time of flight

    transport delay is the delay of a wire. if you model a transport delay of  20ns, and then put a pulse of 10ns then it will appear after delayed by the 20ns. it is simply wire delay delay will increase more and more when wire length increases means it can vary.


    Nice explanation given by some of the authors

    Verilog Example

    VHDL example