Re: [myhdl-list] MyHDL logic synthesis

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Mar 23, 2009 at 4:58 AM, Jan Decaluwe <ja...@ja...> wrote:

> Newell Jensen wrote:
> > Jan,
> >
> > Have you ever thought of giving MyHDL synthesis capabilities?
>
> No.
>
> Synthesis, the way I define it, would be a formidable task, certainly
> beyond my capabilities as an open-source developer.
>
> In addition to powerful HDL inference capabilites and logic minimization,
> my definition of synthesis includes timing-driven optimization with an
> integrated timing analyzer and powerful technology mapping, ideally
> including placement info from an integrated P&R tool.
>
> Morever, what would be the value proposition? Today I get these tools
> basically for free from Xilinx and Altera for their architecture.
>
>  > Icarus Verilog can synthesis designs and personally, I would find it a
> huge
> > advantage to not have to go the MyHDL --> Verilog route as there are
> > many things that are not convertable such as delays etc.
>
> I respect Stephen Williams very much and the value of Icarus the simulator
> if very clear to me, including its tremendous value to the MyHDL project.
> However, I don't see this for Icarus the synthesis tool.
> I don't believe it matches my definition of a synthesis tool. It's kind of
> hard to judge, as there is virtually no documentation that I can find,
> but I'm pretty sure we would have heard about it otherwise.
>
> So I suspect that after Icarus "synthesis" some other tool still has
> to perform some tasks (e.g. timing optimization) that I consider part
> of synthesis. In other words, it probably gives you an entry point
> at a somewhat lower level than Verilog RTL, in a tool flow that you
> have to run anyway, and which is basically free anyway.
> Again, what's the point?

There are a couple of points.

                                   As an example, consider the following
piece of
code that would most likely be used in a software implementation for finding
the
third power of X. Note that the term “software” here refers to code that is
targeted
at a set of procedural instructions that will be executed on a
microprocessor.
     XPower = 1;
     for (i=0;i < 3; i++)
        XPower = X * XPower;
Note that the above code is an iterative algorithm. The same variables and
addresses are accessed until the computation is complete. There is no use
for par-
allelism because a microprocessor only executes one instruction at a time
(for the
purpose of argument, just consider a single core processor). A similar
implemen-
tation can be created in hardware. Consider the following Verilog
implementation
of the same algorithm (output scaling not considered):
     module power3(
        output [7:0]      XPower,
        output            finished,
        input [7:0]       X,
        input             clk, start); // the duration of start is a
                                                single clock
        reg      [7:0] ncount;
        reg      [7:0] XPower;
        assign finished = (ncount == 0);
        always@(posedge clk)
           if(start) begin
              XPower <= X;
              ncount <= 2;
           end

           else if(!finished) begin
              ncount <= ncount - 1;
              XPower <= XPower * X;
           end
     endmodule
In the above example, the same register and computational resources are
reused
until the computation is finished as shown in Figure 1.1.
     With this type of iterative implementation, no new computations can
begin
until the previous computation has completed. This iterative scheme is very
similar to a software implementation. Also note that certain handshaking
signals
are required to indicate the beginning and completion of a computation. An
external module must also use the handshaking to pass new data to the module
and receive a completed calculation. The performance of this implementation
is
     Throughput 1⁄4 8/3, or 2.7 bits/clock
     Latency 1⁄4 3 clocks
     Timing 1⁄4 One multiplier delay in the critical path
Contrast this with a pipelined version of the same algorithm:
     module power3(
        output reg [7:0] XPower,
        input                  clk,
        input         [7:0] X
        );
        reg           [7:0] XPower1, XPower2;
        reg           [7:0] X1, X2;
        always @(posedge clk) begin
           // Pipeline stage 1
           X1        <= X;
           XPower1 <= X;
           // Pipeline stage 2
           X2        <= X1;
           XPower2 <= XPower1 * X1;
           // Pipeline stage 3
           XPower <= XPower2 * X2;
        end
     endmodule

In the above implementation, the value of X is passed to both pipeline
stages
where independent resources compute the corresponding multiply operation.
Note
that while X is being used to calculate the final power of 3 in the second
pipeline
stage, the next value of X can be sent to the first pipeline stage as shown
in
Figure 1.2.
     Both the final calculation of X3 (XPower3 resources) and the first
calculation
of the next value of X (XPower2 resources) occur simultaneously. The
perform-
ance of this design is
     Throughput 1⁄4 8/1, or 8 bits/clock
     Latency 1⁄4 3 clocks
     Timing 1⁄4 One multiplier delay in the critical path
The throughput performance increased by a factor of 3 over the iterative
implementation. In general, if an algorithm requiring n iterative loops is
“unrolled,” the pipelined implementation will exhibit a throughput
performance
increase of a factor of n. There was no penalty in terms of latency as the
pipelined
implementation still required 3 clocks to propagate the final computation.
Like-
wise, there was no timing penalty as the critical path still contained only
one
multiplier.

So in the end....if I am going to make optimisations and go through the
trouble of fine tunning things hopefully there is a direct mapping in the
conversion.  Just to see if this was so I went and tried to code up the
second example here in MyHDL.  I got it to convert fine, but what I got
fails during synthesis for Xilinx ISE 10.1 (service pack is the latest as
well).  This is my myhdl module and following it is the conversion:

from myhdl import *

def power3(
    XPower,
    X,
    clk):

    @always(clk.posedge)
    def logic():
        XPower1 = intbv(min=0, max=256)
        XPower2 = intbv(min=0, max=256)
        X1 = intbv(min=0, max=256)
        X2 = intbv(min=0, max=256)
        # Pipeline stage one
        X1.next = X
        XPower1.next = X
        # Pipeline stage two
        X2.next = X1
        XPower2.next = XPower1 * X1
        # Pipeline stage three
        XPower.next = XPower2 * X2

    return logic

def convert():

    XPower, X = [Signal(intbv(0)[8:]) for i in range(2)]
    clk = Signal(bool(0))

    toVerilog(power3, XPower, X, clk)

convert()

#############################################
// File: power3.v
// Generated by MyHDL 0.6
// Date: Mon Mar 23 20:15:45 2009

`timescale 1ns/10ps

module power3 (
    XPower,
    X,
    clk
);

output [7:0] XPower;
reg [7:0] XPower;
input [7:0] X;
input clk;

always @(posedge clk) begin: POWER3_LOGIC
    reg [8-1:0] X2;
    reg [8-1:0] X1;
    reg [8-1:0] XPower2;
    reg [8-1:0] XPower1;
    XPower1 = 0;
    XPower2 = 0;
    X1 = 0;
    X2 = 0;
    X1 <= X;
    XPower1 <= X;
    X2 <= X1;
    XPower2 <= (XPower1 * X1);
    XPower <= (XPower2 * X2);
end

endmodule

The Error that I am getting is --> Cannot mix blocking and non blocking
assignments on signal

So, maybe there is a way to define intermediate registers within the file
outside of the blocks?? I tried using an @instance block as well but got the
same results.

Any ideas??

This is one of the headaches of switching between one HDL and another....

>
>
> If I'm wrong, we can always wrap a synthesize() function around
> the Icarus engine :-)
>
>  From your question I infer that you assume that a direct synthesis
> flow from MyHDL would somehow remove some synthesis-related restrictions
> But that is not true. The restrictions would be just the same as
> today. They are there for Icarus synthesis also, believe me.
>
> However, those "synthesis restrictions" are in fact badly explained in
> text books. So what could be meaningful is to write a guide for
> "Efficient synthesis with MyHDL". (If I would do that, it would be
> totally different from what you read today. I would basically start
> with synchronous processes and flip-flop inferencing from variables.)
>
> > I haven't looked into what would need to happen to make this happen but
> > I wanted to ask you to see what you thought about this.
> >
> > Personaly, writing everything in Python would be a dream come true.
>
> For all practical purposes, for me this dream is true today. After a
> project
> is properly setup, conversion is hidden somewhere in a Makefile right
> before
> synthesis. Verilog is just one of the many back-end formats used by the
> back-end tools needed to go from MyHDL to an implementation. That's how
> I see it, and it works fine.
>
> Jan
>
> --
> Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
>     From Python to silicon:
>     http://www.myhdl.org
>
>
>
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> myhdl-list mailing list
> myh...@li...
> https://lists.sourceforge.net/lists/listinfo/myhdl-list
>

-- 
Newell

http://www.gempillar.com
Before enlightenment: chop wood, carry water
After enlightenment: code, build circuits