Thread: [myhdl-list] GSoC'16 | JPEG Encoder
Brought to you by:
jandecaluwe
From: Akshit K. <aks...@gm...> - 2016-03-02 09:26:56
|
Hi, My name is Akshit Kumar. I am a second year undergraduate student in the department of Electrical Engineering in Indian Institute of Technology, Madras. I wish to do a project under MyHDL in GSoC'16. After going through the projects list, I found the JPEG Encoder interesting, mainly because I know a bit of Verilog and basic scripting in Python. The description of the idea mentioned that familiarity with digital circuits - wanted to know to what extent,could someone give an example of the same. I am new to this, so could someone please help me getting started. Could someone tell me what specific pre-requisites do I need to have to get started with this project? -- Regards *Akshit Kumar* Second Year Undergraduate Student Department of Electrical Engineering Indian Institute of Technology, Madras http://akshitk.com |
From: Martin S. <ha...@se...> - 2016-03-02 11:12:58
|
Hi, > After going through the projects list, I found the JPEG Encoder > interesting, mainly because I know a bit of Verilog and basic scripting > in Python. The description of the idea mentioned that familiarity with > digital circuits - wanted to know to what extent,could someone give an > example of the same. I am new to this, so could someone please help me > getting started. Could someone tell me what specific pre-requisites do I > need to have to get started with this project? > This is quite ambitious, really. I've gone through the fun designing a JPEG encoder IP, you might want to focus on a small partition of the entire project, like an efficient way to pack the huffman encoded bit stream at high pixel clocks (~150 MHz for Full HD) in MyHDL. Unfortunately, I'm a VHDL guy, so I've taken the other road. For sure, you'll need the full understanding of the JPEG encoding basics (this can cost you a few months), and it definitely helps, if you're firm with the cosimulation techniques of your simulator, unless you're using MyHDL completely. You might find some pointers or inspiration here: http://www.section5.ch/vkit The docs are slightly outdated though, most components are now MyHDL instead VHDL. It is kinda tricky, to get the arithmetics right with MyHDL, but when not touching the DCT, you'll save yourself some hassle :-) Cheers, - Strubi |
From: Henry G. <he...@ma...> - 2016-03-02 15:39:17
|
On 02/03/16 15:16, Martin Strubel wrote: >> I wrote an inline complex multiplier based around a single DSP which >> > really gets into the guts of the DSP core. It's hard to see how one >> > would do this in plain VHDL with a hope that it would be inferred >> > correctly (the difficulty is in things like flicking control registers >> > mid pipeline from multiply-add to multiply-accumulate to >> > multiply-deccumulate). >> > > I never had troubles getting the right thing instanced when staying > below the 18 bit of the classic multiplier primitive. Above that, it can > get funky on some toolchains WRT timing, but the nice thing about MyHDL > is that it allows you to swap out the primitives in a much more > configurable/reusable way than on the VHDL level. > For the pipeline control, I typically use VLIW microcode that can be > adapted easily if one of the MAC primitives needs to use a higher delay > within the pipeline. So on the high level 'synthesis', you spell out the > ops done in the pipeline in Python and the architecture (FPGA vendor) > specific translator rolls out the rest. So a DCT is just a "hardware > applet". If you spell it out in pure (vendor independent) VHDL, the > synth tools always did it right so far, it just wasn't always optimal > for their architecture and this is where the manual optimizations get > nasty and way less reusable than in MyHDL. I don't fully understand what you're getting at, but I think your suggesting much the same as me :) What do you mean my VLIW microcode? Why does it need to be VL? Cheers, Henry |
From: Christopher F. <chr...@gm...> - 2016-03-02 11:50:39
|
On 3/2/2016 3:26 AM, Akshit Kumar wrote: > Hi, My name is Akshit Kumar. I am a second year undergraduate student > in the department of Electrical Engineering in Indian Institute of > Technology, Madras. I wish to do a project under MyHDL in GSoC'16. Hellow Akshit, Thanks for the interest in our project. We have had quite a few students inquire about MyHDL this year. Currently, we have as many students as we can handle. You are still welcome to create a proposal as a backup, things do happen where the initial students might not complete a proposal. Regards, Chris |
From: Christopher F. <chr...@gm...> - 2016-03-02 12:07:20
|
On 3/2/2016 4:45 AM, Martin Strubel wrote: > Hi, > >> After going through the projects list, I found the JPEG Encoder >> interesting, mainly because I know a bit of Verilog and basic >> scripting in Python. <snip> > > This is quite ambitious, really. I've gone through the fun designing > a JPEG encoder IP, you might want to focus on a small partition of > the entire project, like an efficient way to pack the huffman encoded > bit stream at high pixel clocks (~150 MHz for Full HD) in MyHDL. The students will not be starting from scratch, they will be using existing open-source encoders [1] to "port". But this will not be a simple port, they will be creating a design that is more modular, scalable, and reusable than the existing version. As well as having a more exhaustive set of tests. > Unfortunately, I'm a VHDL guy, so I've taken the other road. For > sure, you'll need the full understanding of the JPEG encoding basics > (this can cost you a few months), and it definitely helps, if you're > firm with the cosimulation techniques of your simulator, unless > you're using MyHDL completely. You might find some pointers or > inspiration here: http://www.section5.ch/vkit The docs are slightly > outdated though, most components are now MyHDL instead VHDL. It is > kinda tricky, to get the arithmetics right with MyHDL, but when not > touching the DCT, you'll save yourself some hassle :-) Since the reference design exists, I don't imagine they will need to know the specific details of the various algorithms, but they will need to show that the MyHDL version is functionally the same as the reference designs. We also plan to have two students on the JPEGEnc project, the JPEGEnc blocks will be divided between two students. Thanks for the comments! It is all good information and things the students need to be aware of. Now it looks like they will have access to a third reference design :) [1] https://github.com/cfelton/test_jpeg |
From: Henry G. <he...@ma...> - 2016-03-02 13:24:40
|
On 02/03/16 12:06, Christopher Felton wrote: > The students will not be starting from scratch, they will be > using existing open-source encoders [1] to "port". But this > will not be a simple port, they will be creating a design that > is more modular, scalable, and reusable than the existing > version. As well as having a more exhaustive set of tests. Something just came to mind in light of this. I do wonder if it would be useful to have some mechanism by which inner primitive blocks can be switched. Specifically, most FPGAs have various mutually incompatible primitives, things like DSPs and RAM blocks. It would be great, for example, to have a MyHDL DSP structure that can be _just used_, and then switched to support whatever hardware. Clearly, this sort of goal fits within something like rhea, but I'm not sure if there is an explicit drive towards it. In many instances, the primitives can be inferred from the V*, but more complicated designs (e.g. a JPEG encoder) can be made more efficient of resources by time slicing primitives, something the synthesizers are not good at [1]. FYI, I've done some work on a Xilinx DSP and RAM block. Henry [1] as an aside, Xilinx have a beautifully well designed FIR block in Vivado which does exactly this - it will time slice the DSP blocks for you based on throughput clock speed, using the fewest DSP primitives it can get away with. |
From: Christopher F. <chr...@gm...> - 2016-03-02 13:53:26
|
On 3/2/2016 7:24 AM, Henry Gomersall wrote: > On 02/03/16 12:06, Christopher Felton wrote: >> The students will not be starting from scratch, they will be >> using existing open-source encoders [1] to "port". But this >> will not be a simple port, they will be creating a design that >> is more modular, scalable, and reusable than the existing >> version. As well as having a more exhaustive set of tests. > > Something just came to mind in light of this. I do wonder if it would be > useful to have some mechanism by which inner primitive blocks can be > switched. In my opinion this is all do able, one has to decide how they want to manage this in their design. How does the information permeate to submodules (sub-sub-sub). > > Specifically, most FPGAs have various mutually incompatible primitives, > things like DSPs and RAM blocks. It would be great, for example, to have > a MyHDL DSP structure that can be _just used_, and then switched to > support whatever hardware. "can be just used" on whatever hardware is best supported (most portable) when you have generic HDL without specific primitives. You can guide the HDL so the synthesizer infers the correct primitives e.g. DSP blocks can safely be inferred when the correct widths, delay slots, etc. This could be controlled with a couple parameters and the HDL could be modular to fit various structures - maybe? > > Clearly, this sort of goal fits within something like rhea, but I'm not > sure if there is an explicit drive towards it. > > In many instances, the primitives can be inferred from the V*, but more > complicated designs (e.g. a JPEG encoder) can be made more efficient of > resources by time slicing primitives, something the synthesizers are not > good at [1]. > This is all good but I doubt the students will get to this level of optimization. They will be striving for functional correctness and reasonable performance (no performance requirement). If done correctly, a complete set of tests, playing exploring optimizations and refactoring for performance should be straightforward. Regards, Chris |
From: Henry G. <he...@ma...> - 2016-03-02 13:59:20
|
On 02/03/16 13:53, Christopher Felton wrote: >> > >> > Specifically, most FPGAs have various mutually incompatible primitives, >> > things like DSPs and RAM blocks. It would be great, for example, to have >> > a MyHDL DSP structure that can be _just used_, and then switched to >> > support whatever hardware. > "can be just used" on whatever hardware is best supported > (most portable) when you have generic HDL without specific > primitives. You can guide the HDL so the synthesizer > infers the correct primitives e.g. DSP blocks can safely > be inferred when the correct widths, delay slots, etc. > This could be controlled with a couple parameters and the > HDL could be modular to fit various structures - maybe? > Yeah, absolutely. The problem comes when really pushing the bounds. E.g. when the DSP has to be pipelined to maximize throughput, it's no longer just a multiplier and the code has to reflect that. You could create a multiplier block with pipeline stages incorporated, but then you're more or less doing as I suggest (and still with no guarantees the synthesizer will do the right thing). It was a broader point than GSoC - more me thinking out loud. Cheers, Henry |
From: Henry G. <he...@ma...> - 2016-03-02 14:09:34
|
On 02/03/16 13:59, Henry Gomersall wrote: > E.g. when the DSP has to be pipelined to maximize throughput, it's no > longer just a multiplier and the code has to reflect that. You could > create a multiplier block with pipeline stages incorporated, but then > you're more or less doing as I suggest (and still with no guarantees the > synthesizer will do the right thing). I wrote an inline complex multiplier based around a single DSP which really gets into the guts of the DSP core. It's hard to see how one would do this in plain VHDL with a hope that it would be inferred correctly (the difficulty is in things like flicking control registers mid pipeline from multiply-add to multiply-accumulate to multiply-deccumulate). Cheers, Henry |
From: Martin S. <ha...@se...> - 2016-03-02 15:16:21
|
Hi all, > > I wrote an inline complex multiplier based around a single DSP which > really gets into the guts of the DSP core. It's hard to see how one > would do this in plain VHDL with a hope that it would be inferred > correctly (the difficulty is in things like flicking control registers > mid pipeline from multiply-add to multiply-accumulate to > multiply-deccumulate). > I never had troubles getting the right thing instanced when staying below the 18 bit of the classic multiplier primitive. Above that, it can get funky on some toolchains WRT timing, but the nice thing about MyHDL is that it allows you to swap out the primitives in a much more configurable/reusable way than on the VHDL level. For the pipeline control, I typically use VLIW microcode that can be adapted easily if one of the MAC primitives needs to use a higher delay within the pipeline. So on the high level 'synthesis', you spell out the ops done in the pipeline in Python and the architecture (FPGA vendor) specific translator rolls out the rest. So a DCT is just a "hardware applet". If you spell it out in pure (vendor independent) VHDL, the synth tools always did it right so far, it just wasn't always optimal for their architecture and this is where the manual optimizations get nasty and way less reusable than in MyHDL. Greetings, - Strubi |
From: Martin S. <ha...@se...> - 2016-03-02 21:37:29
|
Hi, > > What do you mean my VLIW microcode? Why does it need to be VL? > It's like the parallel instructions for some DSPs like the Intel Micro Signal Architecture (Blackfin). To avoid extra decoding stages, the opcodes just turned out simplest as "VLIW". Snippet: # LD SELB VADD PERM MODE ST ( 0, 1, 1, 3, MODE_ASAA, 0, ), #0: ( 3, 0, 1, 3, MODE_ASAA, 2, ), #1: ... For example, portions of a long opcode control the different stage switches of the pipeline (whether you do just a mul, a mac or de-mac). Greetings, - Martin |
From: Henry G. <he...@ma...> - 2016-03-02 22:01:48
|
On 02/03/16 21:37, Martin Strubel wrote: > Hi, >> > >> > What do you mean my VLIW microcode? Why does it need to be VL? >> > > It's like the parallel instructions for some DSPs like the Intel Micro > Signal Architecture (Blackfin). To avoid extra decoding stages, the > opcodes just turned out simplest as "VLIW". Snippet: > > # LD SELB VADD PERM MODE ST > ( 0, 1, 1, 3, MODE_ASAA, 0, ), #0: > ( 3, 0, 1, 3, MODE_ASAA, 2, ), #1: > ... > > For example, portions of a long opcode control the different stage > switches of the pipeline (whether you do just a mul, a mac or de-mac). Ah that makes sense - it's essentially a concatenation of the control signals. Henry |
From: Nicolas P. <nic...@aa...> - 2016-03-03 07:37:58
|
Hi, Le 02/03/2016 15:09, Henry Gomersall a écrit : > On 02/03/16 13:59, Henry Gomersall wrote: >> E.g. when the DSP has to be pipelined to maximize throughput, it's no >> longer just a multiplier and the code has to reflect that. You could >> create a multiplier block with pipeline stages incorporated, but then >> you're more or less doing as I suggest (and still with no guarantees the >> synthesizer will do the right thing). > I wrote an inline complex multiplier based around a single DSP which > really gets into the guts of the DSP core. It's hard to see how one > would do this in plain VHDL with a hope that it would be inferred > correctly (the difficulty is in things like flicking control registers > mid pipeline from multiply-add to multiply-accumulate to > multiply-deccumulate). I have discovered MyHDL recently and considering using it. I have followed your discussion and get questions : - How is it possible to "switch" the underlying resources ("inner primitive blocks can be switched") ? - Why do you say you can do things with MyHDL but not VHDL (your complex multiplier) ? Nicolas > > Cheers, > > Henry > > ------------------------------------------------------------------------------ > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 > _______________________________________________ > myhdl-list mailing list > myh...@li... > https://lists.sourceforge.net/lists/listinfo/myhdl-list > . > -- *Nicolas PINAULT R&D electronics engineer *** ni...@aa... <mailto:ni...@aa...> *AATON-Digital* 38000 Grenoble - France Tel +33 4 7642 9550 http://www.aaton.com http://www.transvideo.eu French Technologies for Film and Digital Cinematography Follow us on Twitter @Aaton_Digital @Transvideo_HD Like us on Facebook https://www.facebook.com/AatonDigital |
From: Christopher F. <chr...@gm...> - 2016-03-04 12:25:25
|
On 3/3/2016 1:37 AM, Nicolas Pinault wrote: > Hi, Le 02/03/2016 15:09, Henry Gomersall a écrit : >> On 02/03/16 13:59, Henry Gomersall wrote: >>> E.g. when the DSP has to be pipelined to maximize throughput, >>> it's no longer just a multiplier and the code has to reflect >>> that. You could create a multiplier block with pipeline stages >>> incorporated, but then you're more or less doing as I suggest >>> (and still with no guarantees the synthesizer will do the right >>> thing). >> I wrote an inline complex multiplier based around a single DSP >> which really gets into the guts of the DSP core. It's hard to see >> how one would do this in plain VHDL with a hope that it would be >> inferred correctly (the difficulty is in things like flicking >> control registers mid pipeline from multiply-add to >> multiply-accumulate to multiply-deccumulate). > I have discovered MyHDL recently and considering using it. I have > followed your discussion and get questions : - How is it possible to > "switch" the underlying resources ("inner primitive blocks can be > switched") ? - Why do you say you can do things with MyHDL but not > VHDL (your complex multiplier) ? > > Nicolas In Python (and myhdl) it is easier to manage all this complex information. If you want to write a module that is a portable across technologies as possible but in most cases requires using a specific primitive in Python/myhdl you could right something like: def my_module(portmap, techinfo): ven, dev = techinfo.vendor, techinfo.device if ven in modprim and dev in modprim[ven]: prim_inst = modprim[ven][dev](prim_intf) else: prim_inst = prim_beh(prim_intf) You might be able to do this in VHDL but it would be more cumbersome and not as many tools and features to help manage. The difficult part is coming up with a generic interface that can map to all the various prims, if not then you need a specific module that uses each specific primitive and select on the modules not the primitives. That is assuming you want to explicitly instantiate a primitive that you have wrapper with user-defined code. As the other comments have discussed, using similar approaches you can drive the organization base on a the parameters. Regards, Chris |