freetel-oslec Mailing List for Free Telephony Project
Free software and hardware for telephony
Brought to you by:
drowe67
You can subscribe to this list here.
2008 |
Jan
|
Feb
(14) |
Mar
(22) |
Apr
(24) |
May
(64) |
Jun
(56) |
Jul
(31) |
Aug
(76) |
Sep
(66) |
Oct
(34) |
Nov
(6) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2009 |
Jan
(36) |
Feb
(2) |
Mar
(27) |
Apr
(4) |
May
(17) |
Jun
(9) |
Jul
|
Aug
(3) |
Sep
(6) |
Oct
(8) |
Nov
(8) |
Dec
|
2010 |
Jan
(6) |
Feb
(2) |
Mar
(2) |
Apr
|
May
|
Jun
(1) |
Jul
(8) |
Aug
(3) |
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
(2) |
Nov
|
Dec
|
2013 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2014 |
Jan
|
Feb
|
Mar
|
Apr
(6) |
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
(5) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: f. <221...@qq...> - 2017-03-15 23:46:12
|
Thanks a lot. speex has a win32 version, which is very convienent. It seems that I have to establish a linux environment at first. ------------------ Original ------------------ From: "David Rowe";<da...@ro...>; Date: Thu, Mar 16, 2017 07:57 AM To: "Open Source Line Echo Canceller"<fre...@li...>; Subject: Re: [freetel-oslec] Double talk Speex is an acoustic echo canceler, Oslec is a line echo canceler. On 16/03/17 09:32, fighting wrote: > David, Thanks. > > speex is also using NLMS with the double path method, what is the > difference between speex and oslec ? > > > ------------------ Original ------------------ > *From: * "David Rowe";<da...@ro...>; > *Date: * Thu, Mar 16, 2017 04:06 AM > *To: * "Open Source Line Echo > Canceller"<fre...@li...>; > *Subject: * Re: [freetel-oslec] Double talk > >> Is the two path algorithm much better than the tap rotation/Geigel >> algorithm ? > > That was my experience. > >> What's the underlying theory to deal with the double talk ? > > A reliable way to stop adaption when the near end speaker is talking. > > - David > > >> Thanks. >> >> >> > ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> >> >> _______________________________________________ >> freetel-oslec mailing list >> fre...@li... >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec >> > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ freetel-oslec mailing list fre...@li... https://lists.sourceforge.net/lists/listinfo/freetel-oslec |
From: David R. <da...@ro...> - 2017-03-15 23:27:23
|
Speex is an acoustic echo canceler, Oslec is a line echo canceler. On 16/03/17 09:32, fighting wrote: > David, Thanks. > > speex is also using NLMS with the double path method, what is the > difference between speex and oslec ? > > > ------------------ Original ------------------ > *From: * "David Rowe";<da...@ro...>; > *Date: * Thu, Mar 16, 2017 04:06 AM > *To: * "Open Source Line Echo > Canceller"<fre...@li...>; > *Subject: * Re: [freetel-oslec] Double talk > >> Is the two path algorithm much better than the tap rotation/Geigel >> algorithm ? > > That was my experience. > >> What's the underlying theory to deal with the double talk ? > > A reliable way to stop adaption when the near end speaker is talking. > > - David > > >> Thanks. >> >> >> > ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> >> >> _______________________________________________ >> freetel-oslec mailing list >> fre...@li... >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec >> > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > |
From: f. <221...@qq...> - 2017-03-15 23:02:37
|
David, Thanks. speex is also using NLMS with the double path method, what is the difference between speex and oslec ? ------------------ Original ------------------ From: "David Rowe";<da...@ro...>; Date: Thu, Mar 16, 2017 04:06 AM To: "Open Source Line Echo Canceller"<fre...@li...>; Subject: Re: [freetel-oslec] Double talk > Is the two path algorithm much better than the tap rotation/Geigel > algorithm ? That was my experience. > What's the underlying theory to deal with the double talk ? A reliable way to stop adaption when the near end speaker is talking. - David > Thanks. > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ freetel-oslec mailing list fre...@li... https://lists.sourceforge.net/lists/listinfo/freetel-oslec |
From: David R. <da...@ro...> - 2017-03-15 19:36:47
|
> Is the two path algorithm much better than the tap rotation/Geigel > algorithm ? That was my experience. > What's the underlying theory to deal with the double talk ? A reliable way to stop adaption when the near end speaker is talking. - David > Thanks. > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > |
From: f. <221...@qq...> - 2017-03-15 11:28:06
|
Is the two path algorithm much better than the tap rotation/Geigel algorithm ? What's the underlying theory to deal with the double talk ? Thanks. |
From: Машкин С В <ma...@ya...> - 2015-05-21 09:48:10
|
Hello, developers! I have made OSLEC optimization for Blackfin(R) Processor. Main changes: 1.lms_adapt_bg() has been very optimized. It has been fully rewritten on Blackfin assembler (low latencies, parallel instructions, so on). 2.top_bit() version for Blackfin (use SIGNBITS instead ifs and bit checkings). 3.fir16() has been optimized for Blackfin (dual-mac vector instructions). 4.BALANCE_CPU compiler option included. This is a dirty way to make more less CPU usage (not only for Blackfin!). 5.Some compiler preprocessor options: SUPP_ECHO_CAN_USE_CNG SUPP_ECHO_CAN_DISABLE SUPP_ECHO_CAN_USE_TX_HPF SUPP_OSLEC_SNAPSHOT SUPP_FLS_FUNC that may be usefull to get additional tiny optimizations (not only for Blackfin!) My optimization results on Blackfin 531 processor (http://svn.astfin.org/software/oslec/trunk/user/speedtest.c) 1. exact binary same results in out.txt 2. for -O3 gcc compilation option I have: original = 628 ms for 10s of speech optimized = 370 ms for 10s of speech So, speeding up is 628/370 = 1.7 times! ================================ original: Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.25 MIPS ------------------------- Method 1: gettimeofday() at start and end 628 ms for 10s of speech 0.02 MIPS 15.92 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 15.92 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ================================ optimized: Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.43 MIPS ------------------------- Method 1: gettimeofday() at start and end 370 ms for 10s of speech 0.02 MIPS 27.03 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 27.03 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ================================ |
From: Tzafrir C. <tza...@xo...> - 2014-05-11 17:36:27
|
Bruce, do you need any help with the homepage of codec2? -- Tzafrir Cohen icq#16849755 jabber:tza...@xo... +972-50-7952406 mailto:tza...@xo... http://www.xorcom.com |
From: Машкин С В <ma...@ya...> - 2014-04-22 07:44:49
|
"But note the strange very strong increase of speed." please, do not pay attention to this, I wrote this after comparison results for O0 flags... So, this phrase has no sense... obviously for me. 22.04.2014, 11:39, "Машкин С В" <ma...@ya...>: > Good day! > I made test of my optimized/non-optimized versions with > http://svn.astfin.org/software/oslec/trunk/user/speedtest.c > Note: I compile test.c, echo.c and link them together as > stand alone uClinux application. For that I replace all > kernel functions with user-space ones. I comment every > __attribute__((l1_text)). > > While making the test I have found, that my code > contains error: I did not include "R4" into list of clobbered > registers in _asm_ part. So, with O0, O1 gcc flags I have no > errors, but with O2, O3 I have ones. > After including "R4" into list of clobbered registers I have no > errors with any value of On gcc flag. > > Also I add some prettiness to my code (after re-reading of > http://blackfin.uclinux.org/doku.php?id=toolchain:inline_assembly > http://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html > ) > > After all my results on my Blackfin platform are: > > ======= non-optimized (O0) ================== > > Testing OSLEC with 128 taps (16 ms tail) > CPU executes 0.06 MIPS > ------------------------- > > Method 1: gettimeofday() at start and end > 2568 ms for 10s of speech > 0.02 MIPS > 3.89 instances possible at 100% CPU load > Method 2: samples clock cycles at start and end > 0.02 MIPS > 3.89 instances possible at 100% CPU load > Method 3: samples clock cycles for each call, IIR average > cycles_worst 1 cycles_last 1 cycles_av: 0 > 0.00 MIPS > inf instances possible at 100% CPU load > > ======= non-optimized (O2 the best speed for non-optim ver) === > > Testing OSLEC with 128 taps (16 ms tail) > CPU executes 0.27 MIPS > ------------------------- > > Method 1: gettimeofday() at start and end > 592 ms for 10s of speech > 0.02 MIPS > 16.89 instances possible at 100% CPU load > Method 2: samples clock cycles at start and end > 0.02 MIPS > 16.89 instances possible at 100% CPU load > Method 3: samples clock cycles for each call, IIR average > cycles_worst 1 cycles_last 1 cycles_av: 0 > 0.00 MIPS > inf instances possible at 100% CPU load > > ======= non-optimized (O3) ================== > > Testing OSLEC with 128 taps (16 ms tail) > CPU executes 0.06 MIPS > ------------------------- > > Method 1: gettimeofday() at start and end > 2532 ms for 10s of speech > 0.02 MIPS > 3.95 instances possible at 100% CPU load > Method 2: samples clock cycles at start and end > 0.02 MIPS > 3.95 instances possible at 100% CPU load > Method 3: samples clock cycles for each call, IIR average > cycles_worst 1 cycles_last 1 cycles_av: 0 > 0.00 MIPS > inf instances possible at 100% CPU load > > ======= optimized (O0) =================== > > Testing OSLEC with 128 taps (16 ms tail) > CPU executes 0.23 MIPS > ------------------------- > > Method 1: gettimeofday() at start and end > 696 ms for 10s of speech > 0.02 MIPS > 14.37 instances possible at 100% CPU load > Method 2: samples clock cycles at start and end > 0.02 MIPS > 14.37 instances possible at 100% CPU load > Method 3: samples clock cycles for each call, IIR average > cycles_worst 1 cycles_last 1 cycles_av: 0 > 0.00 MIPS > inf instances possible at 100% CPU load > > ======= optimized (O2 the best speed for optim ver) === > > Testing OSLEC with 128 taps (16 ms tail) > CPU executes 0.33 MIPS > ------------------------- > > Method 1: gettimeofday() at start and end > 484 ms for 10s of speech > 0.02 MIPS > 20.66 instances possible at 100% CPU load > Method 2: samples clock cycles at start and end > 0.02 MIPS > 20.66 instances possible at 100% CPU load > Method 3: samples clock cycles for each call, IIR average > cycles_worst 1 cycles_last 1 cycles_av: 0 > 0.00 MIPS > inf instances possible at 100% CPU load > > ======= optimized (O3) ===================== > > Testing OSLEC with 128 taps (16 ms tail) > CPU executes 0.33 MIPS > ------------------------- > > Method 1: gettimeofday() at start and end > 492 ms for 10s of speech > 0.02 MIPS > 20.33 instances possible at 100% CPU load > Method 2: samples clock cycles at start and end > 0.02 MIPS > 20.33 instances possible at 100% CPU load > Method 3: samples clock cycles for each call, IIR average > cycles_worst 1 cycles_last 1 cycles_av: 0 > 0.00 MIPS > inf instances possible at 100% CPU load > > ======================================== > > All produced versions of out.txt file are fully byte-to-byte > identical for any On and for optimized/non-optimized versions. > > So, I see, that the test successfully passed! > But note the strange very strong increase of speed. > > Speed up factor for taps=128 (16 ms tail) is approximately: > > 484 ms (the best speed for optim ver) > / > 592 ms (the best speed for non-optim ver) > = > 0.82 > > or 18% speed increase. > > David, thank you for your advice to use speed.c. > It seems, that without this test I would produce > delayed-action bug! > > Corrected version of my optimized version of lms_adapt_bg() > function is here: > > ==========code without error================= > > #ifdef __bfin__ > static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) > { > #if 0 /* original */ > int i, j; > int offset1; > int offset2; > int factor; > int exp; > int16_t *phist; > int n; > > if (shift > 0) > factor = clean << shift; > else > factor = clean >> -shift; > > /* Update the FIR taps */ > > offset2 = ec->curr_pos; > offset1 = ec->taps - offset2; > phist = &ec->fir_state_bg.history[offset2]; > > /* st: and en: help us locate the assembler in echo.s */ > > /* asm("st:"); */ > n = ec->taps; > for (i = 0, j = offset2; i < n; i++, j++) { > exp = *phist++ * factor; > ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15); > } > /* asm("en:"); */ > > /* Note the asm for the inner loop above generated by Blackfin gcc > 4.1.1 is pretty good (note even parallel instructions used): > > R0 = W [P0++] (X); > R0 *= R2; > R0 = R0 + R3 (NS) || > R1 = W [P1] (X) || > nop; > R0 >>>= 15; > R0 = R0 + R1; > W [P1++] = R0; > > A block based update algorithm would be much faster but the > above can't be improved on much. Every instruction saved in > the loop above is 2 MIPs/ch! The for loop above is where the > Blackfin spends most of it's time - about 17 MIPs/ch measured > with speedtest.c with 256 taps (32ms). Write-back and > Write-through cache gave about the same performance. > */ > #else /* optimized by Serg Ma */ > int offset1; > int offset2; > int factor; > int16_t *phist; > > if (shift > 0) > factor = clean << shift; > else > factor = clean >> -shift; > > /* Update the FIR taps */ > > offset2 = ec->curr_pos; > offset1 = ec->taps - offset2; > phist = &ec->fir_state_bg.history[offset2]; > > __asm__ __volatile__ ( > "R3 = (1<<14);" > "R0 = W [%0++] (X);" > "R0 *= %3;" > "R0 = R0 + R3 (NS) ||" > "R1 = W [%1++] (X) ||" > "nop;" > "R0 >>>= 15;" > "R4 = R0 + R1 (NS) ||" > "R0 = W [%0++] (X) ||" > "nop;" > "LOOP m%= LC0 = %4;" > "LOOP_BEGIN m%=;" > "R0 *= %3;" > "R0 = R0 + R3 (NS) ||" > "R1 = W [%1++] (X) ||" > "W [%2++] = R4.L;" > "R0 >>>= 15;" > "R4 = R0 + R1 (NS) ||" > "R0 = W [%0++] (X) ||" > "nop;" > "LOOP_END m%=;" > : : "a" (phist), "a" (ec->fir_taps16[1]), "b" (ec->fir_taps16[1]), "D" (factor), "a" (ec->taps) > : "R0", "R1", "R3", "R4" > ); > #endif > } > > /* > IDEAS for further optimisation of lms_adapt_bg(): > > 1/ The rounding is quite costly. Could we keep as 32 bit coeffs > then make filter pluck the MS 16-bits of the coeffs when filtering? > However this would lower potential optimisation of filter, as I > think the dual-MAC architecture requires packed 16 bit coeffs. > > 2/ Block based update would be more efficient, as per comments above, > could use dual MAC architecture. > > 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC > packing. > > 4/ Execute the whole e/c in a block of say 20ms rather than sample > by sample. Processing a few samples every ms is inefficient. > */ > > #else > static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) > { > ... > } > > ======================================== > > Sergei > > 22.04.2014, 07:40, "Машкин С В" <ma...@ya...>: > >> Hello, David! >> >> I did not verify my optimised algorithm by speedtest.c, >> I made only real-world call test on my hardware, and notice, >> that there were no difference between echo-suppression results >> of non-optimized and optimized OSLEC versions. >> >> I'll try to make test with >> http://svn.astfin.org/software/oslec/trunk/user/speedtest.c >> and write here about results. >> >> I try to use OSLEC inside Blackfin based hardware, which works >> like PSTN-VOIP gateway. Software is based on uClinux, eXosip, >> oRTP, DAHDI (with OSLEC linked) and some free VOIP vocoders. >> >> My Blackfin processor (BF531) does not have sufficient capacity >> to process required number of VoIP channels (approximately 12-16), >> so I need to do speed up everything I can. OSLEC is the one of >> candidates. >> >> Sergei. >> >> 22.04.2014, 01:38, "David Rowe" <da...@ro...>: >>> Hi Sergei, >>> >>> Thanks for that work. >>> >>> Did you verify that your optimised algorithm has identical results to >>> the original, for example using some sort of unit test? This program: >>> >>> http://svn.astfin.org/software/oslec/trunk/user/speedtest.c >>> >>> Can be used to test if your optimised version gives exactly the same >>> results as the original code, e.g. diff the out.txt files from the two >>> versions. >>> >>> BTW I was wondering what your application is? >>> >>> Thanks, >>> >>> David >>> >>> On Mon, 2014-04-21 at 07:25 +0400, Машкин С В wrote: >>>> Hello! >>>> >>>> Sorry for previous message. It was formatted, but I see, that mailing list does >>>> not support HTML format. So, try again: >>>> >>>> It seems, I have made approximately 15-20 % (depends on ec->taps parameter) >>>> speed optimization of OSLEC. >>>> >>>> My optimization is in Blackfin version of lms_adapt_bg() function. >>>> I have only decreased number of operations inside loop from 6 to 4. >>>> >>>> Because of hard parallelization I am not 100% sure that there are no >>>> errors in code, but I made real tests with this optimized versionof OSLEC and it seems working fine. >>>> >>>> Thanks for great work. >>>> >>>> I have not enough expirience in diff/patch making, so sorry for the format of message. >>>> >>>> File: echo.c >>>> ========================================================= >>>> >>>> #ifdef __bfin__ >>>> static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) >>>> { >>>> #if 0 /* original */ >>>> int i, j; >>>> int offset1; >>>> int offset2; >>>> int factor; >>>> int exp; >>>> int16_t *phist; >>>> int n; >>>> >>>> if (shift > 0) >>>> factor = clean << shift; >>>> else >>>> factor = clean >> -shift; >>>> >>>> /* Update the FIR taps */ >>>> >>>> offset2 = ec->curr_pos; >>>> offset1 = ec->taps - offset2; >>>> phist = &ec->fir_state_bg.history[offset2]; >>>> >>>> /* st: and en: help us locate the assembler in echo.s */ >>>> >>>> /* asm("st:"); */ >>>> n = ec->taps; >>>> for (i = 0, j = offset2; i < n; i++, j++) { >>>> exp = *phist++ * factor; >>>> ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15); >>>> } >>>> /* asm("en:"); */ >>>> >>>> /* Note the asm for the inner loop above generated by Blackfin gcc >>>> 4.1.1 is pretty good (note even parallel instructions used): >>>> >>>> R0 = W [P0++] (X); >>>> R0 *= R2; >>>> R0 = R0 + R3 (NS) || >>>> R1 = W [P1] (X) || >>>> nop; >>>> R0 >>>= 15; >>>> R0 = R0 + R1; >>>> W [P1++] = R0; >>>> >>>> A block based update algorithm would be much faster but the >>>> above can't be improved on much. Every instruction saved in >>>> the loop above is 2 MIPs/ch! The for loop above is where the >>>> Blackfin spends most of it's time - about 17 MIPs/ch measured >>>> with speedtest.c with 256 taps (32ms). Write-back and >>>> Write-through cache gave about the same performance. >>>> */ >>>> #else /* optimized by Sergei Mashkin */ >>>> int offset1; >>>> int offset2; >>>> int factor; >>>> int16_t *phist; >>>> >>>> if (shift > 0) >>>> factor = clean << shift; >>>> else >>>> factor = clean >> -shift; >>>> >>>> /* Update the FIR taps */ >>>> >>>> offset2 = ec->curr_pos; >>>> offset1 = ec->taps - offset2; >>>> phist = &ec->fir_state_bg.history[offset2]; >>>> >>>> __asm__ __volatile__ ( >>>> "P0 = %0;" /* P0 = phist */ >>>> "P1 = %1;" /* P1 = ec->fir_taps16[1] */ >>>> "I1 = P1;" >>>> "R2 = %2;" /* R2 = factor */ >>>> "R3 = (1<<14);" /* R3 = (1<<14) */ >>>> "R0 = W [P0++] (X);" /* R0 = *phist */ >>>> "R0 *= R2;" >>>> "R0 = R0 + R3 (NS) ||" >>>> "R1 = W [P1++] (X) ||" >>>> "nop;" >>>> "R0 >>>= 15;" >>>> "R4 = R0 + R1 (NS) ||" >>>> "R0 = W [P0++] (X) ||" >>>> "nop;" >>>> "LOOP m%= LC0 = %3;" /* ec->taps */ >>>> "LOOP_BEGIN m%=;" >>>> "R0 *= R2;" /* R0 = *phist++ * factor(R2) */ >>>> "R0 = R0 + R3 (NS) ||" /* R0 = *phist++ * factor(R2) + (1<<14), */ >>>> "R1 = W [P1++] (X) ||" /* R1 = ec->fir_taps16[1][i] */ >>>> "W [I1++] = R4.L;" >>>> "R0 >>>= 15;" /* R0 = (exp + (1 << 14)) >> 15 */ >>>> "R4 = R0 + R1 (NS) ||" /* R0 = ((exp + (1 << 14)) >> 15) + ec->fir_taps16[1][i] */ >>>> "R0 = W [P0++] (X) ||" >>>> "nop;" >>>> "LOOP_END m%=;" >>>> : : "a" (phist), "a" (ec->fir_taps16[1]), "a" (factor), "a" (ec->taps) >>>> : "I1", "P0", "P1", "R0", "R1", "R2", "R3" >>>> ); >>>> #endif >>>> } >>>> >>>> /* >>>> IDEAS for further optimisation of lms_adapt_bg(): >>>> >>>> 1/ The rounding is quite costly. Could we keep as 32 bit coeffs >>>> then make filter pluck the MS 16-bits of the coeffs when filtering? >>>> However this would lower potential optimisation of filter, as I >>>> think the dual-MAC architecture requires packed 16 bit coeffs. >>>> >>>> 2/ Block based update would be more efficient, as per comments above, >>>> could use dual MAC architecture. >>>> >>>> 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC >>>> packing. >>>> >>>> 4/ Execute the whole e/c in a block of say 20ms rather than sample >>>> by sample. Processing a few samples every ms is inefficient. >>>> */ >>>> >>>> #else /* #ifdef __bfin__ */ >>>> ... >>>> #endif /* #ifdef __bfin__ */ >>>> >>>> ========================================================= >>>> >>>> ------------------------------------------------------------------------------ >>>> Start Your Social Network Today - Download eXo Platform >>>> Build your Enterprise Intranet with eXo Platform Software >>>> Java Based Open Source Intranet - Social, Extensible, Cloud Ready >>>> Get Started Now And Turn Your Intranet Into A Collaboration Platform >>>> http://p.sf.net/sfu/ExoPlatform >>>> _______________________________________________ >>>> freetel-oslec mailing list >>>> fre...@li... >>>> https://lists.sourceforge.net/lists/listinfo/freetel-oslec >>> ------------------------------------------------------------------------------ >>> Start Your Social Network Today - Download eXo Platform >>> Build your Enterprise Intranet with eXo Platform Software >>> Java Based Open Source Intranet - Social, Extensible, Cloud Ready >>> Get Started Now And Turn Your Intranet Into A Collaboration Platform >>> http://p.sf.net/sfu/ExoPlatform >>> _______________________________________________ >>> freetel-oslec mailing list >>> fre...@li... >>> https://lists.sourceforge.net/lists/listinfo/freetel-oslec >> ------------------------------------------------------------------------------ >> Start Your Social Network Today - Download eXo Platform >> Build your Enterprise Intranet with eXo Platform Software >> Java Based Open Source Intranet - Social, Extensible, Cloud Ready >> Get Started Now And Turn Your Intranet Into A Collaboration Platform >> http://p.sf.net/sfu/ExoPlatform >> _______________________________________________ >> freetel-oslec mailing list >> fre...@li... >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > ------------------------------------------------------------------------------ > Start Your Social Network Today - Download eXo Platform > Build your Enterprise Intranet with eXo Platform Software > Java Based Open Source Intranet - Social, Extensible, Cloud Ready > Get Started Now And Turn Your Intranet Into A Collaboration Platform > http://p.sf.net/sfu/ExoPlatform > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec |
From: Машкин С В <ma...@ya...> - 2014-04-22 07:38:54
|
Good day! I made test of my optimized/non-optimized versions with http://svn.astfin.org/software/oslec/trunk/user/speedtest.c Note: I compile test.c, echo.c and link them together as stand alone uClinux application. For that I replace all kernel functions with user-space ones. I comment every __attribute__((l1_text)). While making the test I have found, that my code contains error: I did not include "R4" into list of clobbered registers in _asm_ part. So, with O0, O1 gcc flags I have no errors, but with O2, O3 I have ones. After including "R4" into list of clobbered registers I have no errors with any value of On gcc flag. Also I add some prettiness to my code (after re-reading of http://blackfin.uclinux.org/doku.php?id=toolchain:inline_assembly http://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html ) After all my results on my Blackfin platform are: ======= non-optimized (O0) ================== Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.06 MIPS ------------------------- Method 1: gettimeofday() at start and end 2568 ms for 10s of speech 0.02 MIPS 3.89 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 3.89 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ======= non-optimized (O2 the best speed for non-optim ver) === Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.27 MIPS ------------------------- Method 1: gettimeofday() at start and end 592 ms for 10s of speech 0.02 MIPS 16.89 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 16.89 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ======= non-optimized (O3) ================== Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.06 MIPS ------------------------- Method 1: gettimeofday() at start and end 2532 ms for 10s of speech 0.02 MIPS 3.95 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 3.95 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ======= optimized (O0) =================== Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.23 MIPS ------------------------- Method 1: gettimeofday() at start and end 696 ms for 10s of speech 0.02 MIPS 14.37 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 14.37 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ======= optimized (O2 the best speed for optim ver) === Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.33 MIPS ------------------------- Method 1: gettimeofday() at start and end 484 ms for 10s of speech 0.02 MIPS 20.66 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 20.66 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ======= optimized (O3) ===================== Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.33 MIPS ------------------------- Method 1: gettimeofday() at start and end 492 ms for 10s of speech 0.02 MIPS 20.33 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 20.33 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load ======================================== All produced versions of out.txt file are fully byte-to-byte identical for any On and for optimized/non-optimized versions. So, I see, that the test successfully passed! But note the strange very strong increase of speed. Speed up factor for taps=128 (16 ms tail) is approximately: 484 ms (the best speed for optim ver) / 592 ms (the best speed for non-optim ver) = 0.82 or 18% speed increase. David, thank you for your advice to use speed.c. It seems, that without this test I would produce delayed-action bug! Corrected version of my optimized version of lms_adapt_bg() function is here: ==========code without error================= #ifdef __bfin__ static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) { #if 0 /* original */ int i, j; int offset1; int offset2; int factor; int exp; int16_t *phist; int n; if (shift > 0) factor = clean << shift; else factor = clean >> -shift; /* Update the FIR taps */ offset2 = ec->curr_pos; offset1 = ec->taps - offset2; phist = &ec->fir_state_bg.history[offset2]; /* st: and en: help us locate the assembler in echo.s */ /* asm("st:"); */ n = ec->taps; for (i = 0, j = offset2; i < n; i++, j++) { exp = *phist++ * factor; ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15); } /* asm("en:"); */ /* Note the asm for the inner loop above generated by Blackfin gcc 4.1.1 is pretty good (note even parallel instructions used): R0 = W [P0++] (X); R0 *= R2; R0 = R0 + R3 (NS) || R1 = W [P1] (X) || nop; R0 >>>= 15; R0 = R0 + R1; W [P1++] = R0; A block based update algorithm would be much faster but the above can't be improved on much. Every instruction saved in the loop above is 2 MIPs/ch! The for loop above is where the Blackfin spends most of it's time - about 17 MIPs/ch measured with speedtest.c with 256 taps (32ms). Write-back and Write-through cache gave about the same performance. */ #else /* optimized by Serg Ma */ int offset1; int offset2; int factor; int16_t *phist; if (shift > 0) factor = clean << shift; else factor = clean >> -shift; /* Update the FIR taps */ offset2 = ec->curr_pos; offset1 = ec->taps - offset2; phist = &ec->fir_state_bg.history[offset2]; __asm__ __volatile__ ( "R3 = (1<<14);" "R0 = W [%0++] (X);" "R0 *= %3;" "R0 = R0 + R3 (NS) ||" "R1 = W [%1++] (X) ||" "nop;" "R0 >>>= 15;" "R4 = R0 + R1 (NS) ||" "R0 = W [%0++] (X) ||" "nop;" "LOOP m%= LC0 = %4;" "LOOP_BEGIN m%=;" "R0 *= %3;" "R0 = R0 + R3 (NS) ||" "R1 = W [%1++] (X) ||" "W [%2++] = R4.L;" "R0 >>>= 15;" "R4 = R0 + R1 (NS) ||" "R0 = W [%0++] (X) ||" "nop;" "LOOP_END m%=;" : : "a" (phist), "a" (ec->fir_taps16[1]), "b" (ec->fir_taps16[1]), "D" (factor), "a" (ec->taps) : "R0", "R1", "R3", "R4" ); #endif } /* IDEAS for further optimisation of lms_adapt_bg(): 1/ The rounding is quite costly. Could we keep as 32 bit coeffs then make filter pluck the MS 16-bits of the coeffs when filtering? However this would lower potential optimisation of filter, as I think the dual-MAC architecture requires packed 16 bit coeffs. 2/ Block based update would be more efficient, as per comments above, could use dual MAC architecture. 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC packing. 4/ Execute the whole e/c in a block of say 20ms rather than sample by sample. Processing a few samples every ms is inefficient. */ #else static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) { ... } ======================================== Sergei 22.04.2014, 07:40, "Машкин С В" <ma...@ya...>: > Hello, David! > > I did not verify my optimised algorithm by speedtest.c, > I made only real-world call test on my hardware, and notice, > that there were no difference between echo-suppression results > of non-optimized and optimized OSLEC versions. > > I'll try to make test with > http://svn.astfin.org/software/oslec/trunk/user/speedtest.c > and write here about results. > > I try to use OSLEC inside Blackfin based hardware, which works > like PSTN-VOIP gateway. Software is based on uClinux, eXosip, > oRTP, DAHDI (with OSLEC linked) and some free VOIP vocoders. > > My Blackfin processor (BF531) does not have sufficient capacity > to process required number of VoIP channels (approximately 12-16), > so I need to do speed up everything I can. OSLEC is the one of > candidates. > > Sergei. > > 22.04.2014, 01:38, "David Rowe" <da...@ro...>: > >> Hi Sergei, >> >> Thanks for that work. >> >> Did you verify that your optimised algorithm has identical results to >> the original, for example using some sort of unit test? This program: >> >> http://svn.astfin.org/software/oslec/trunk/user/speedtest.c >> >> Can be used to test if your optimised version gives exactly the same >> results as the original code, e.g. diff the out.txt files from the two >> versions. >> >> BTW I was wondering what your application is? >> >> Thanks, >> >> David >> >> On Mon, 2014-04-21 at 07:25 +0400, Машкин С В wrote: >>> Hello! >>> >>> Sorry for previous message. It was formatted, but I see, that mailing list does >>> not support HTML format. So, try again: >>> >>> It seems, I have made approximately 15-20 % (depends on ec->taps parameter) >>> speed optimization of OSLEC. >>> >>> My optimization is in Blackfin version of lms_adapt_bg() function. >>> I have only decreased number of operations inside loop from 6 to 4. >>> >>> Because of hard parallelization I am not 100% sure that there are no >>> errors in code, but I made real tests with this optimized versionof OSLEC and it seems working fine. >>> >>> Thanks for great work. >>> >>> I have not enough expirience in diff/patch making, so sorry for the format of message. >>> >>> File: echo.c >>> ========================================================= >>> >>> #ifdef __bfin__ >>> static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) >>> { >>> #if 0 /* original */ >>> int i, j; >>> int offset1; >>> int offset2; >>> int factor; >>> int exp; >>> int16_t *phist; >>> int n; >>> >>> if (shift > 0) >>> factor = clean << shift; >>> else >>> factor = clean >> -shift; >>> >>> /* Update the FIR taps */ >>> >>> offset2 = ec->curr_pos; >>> offset1 = ec->taps - offset2; >>> phist = &ec->fir_state_bg.history[offset2]; >>> >>> /* st: and en: help us locate the assembler in echo.s */ >>> >>> /* asm("st:"); */ >>> n = ec->taps; >>> for (i = 0, j = offset2; i < n; i++, j++) { >>> exp = *phist++ * factor; >>> ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15); >>> } >>> /* asm("en:"); */ >>> >>> /* Note the asm for the inner loop above generated by Blackfin gcc >>> 4.1.1 is pretty good (note even parallel instructions used): >>> >>> R0 = W [P0++] (X); >>> R0 *= R2; >>> R0 = R0 + R3 (NS) || >>> R1 = W [P1] (X) || >>> nop; >>> R0 >>>= 15; >>> R0 = R0 + R1; >>> W [P1++] = R0; >>> >>> A block based update algorithm would be much faster but the >>> above can't be improved on much. Every instruction saved in >>> the loop above is 2 MIPs/ch! The for loop above is where the >>> Blackfin spends most of it's time - about 17 MIPs/ch measured >>> with speedtest.c with 256 taps (32ms). Write-back and >>> Write-through cache gave about the same performance. >>> */ >>> #else /* optimized by Sergei Mashkin */ >>> int offset1; >>> int offset2; >>> int factor; >>> int16_t *phist; >>> >>> if (shift > 0) >>> factor = clean << shift; >>> else >>> factor = clean >> -shift; >>> >>> /* Update the FIR taps */ >>> >>> offset2 = ec->curr_pos; >>> offset1 = ec->taps - offset2; >>> phist = &ec->fir_state_bg.history[offset2]; >>> >>> __asm__ __volatile__ ( >>> "P0 = %0;" /* P0 = phist */ >>> "P1 = %1;" /* P1 = ec->fir_taps16[1] */ >>> "I1 = P1;" >>> "R2 = %2;" /* R2 = factor */ >>> "R3 = (1<<14);" /* R3 = (1<<14) */ >>> "R0 = W [P0++] (X);" /* R0 = *phist */ >>> "R0 *= R2;" >>> "R0 = R0 + R3 (NS) ||" >>> "R1 = W [P1++] (X) ||" >>> "nop;" >>> "R0 >>>= 15;" >>> "R4 = R0 + R1 (NS) ||" >>> "R0 = W [P0++] (X) ||" >>> "nop;" >>> "LOOP m%= LC0 = %3;" /* ec->taps */ >>> "LOOP_BEGIN m%=;" >>> "R0 *= R2;" /* R0 = *phist++ * factor(R2) */ >>> "R0 = R0 + R3 (NS) ||" /* R0 = *phist++ * factor(R2) + (1<<14), */ >>> "R1 = W [P1++] (X) ||" /* R1 = ec->fir_taps16[1][i] */ >>> "W [I1++] = R4.L;" >>> "R0 >>>= 15;" /* R0 = (exp + (1 << 14)) >> 15 */ >>> "R4 = R0 + R1 (NS) ||" /* R0 = ((exp + (1 << 14)) >> 15) + ec->fir_taps16[1][i] */ >>> "R0 = W [P0++] (X) ||" >>> "nop;" >>> "LOOP_END m%=;" >>> : : "a" (phist), "a" (ec->fir_taps16[1]), "a" (factor), "a" (ec->taps) >>> : "I1", "P0", "P1", "R0", "R1", "R2", "R3" >>> ); >>> #endif >>> } >>> >>> /* >>> IDEAS for further optimisation of lms_adapt_bg(): >>> >>> 1/ The rounding is quite costly. Could we keep as 32 bit coeffs >>> then make filter pluck the MS 16-bits of the coeffs when filtering? >>> However this would lower potential optimisation of filter, as I >>> think the dual-MAC architecture requires packed 16 bit coeffs. >>> >>> 2/ Block based update would be more efficient, as per comments above, >>> could use dual MAC architecture. >>> >>> 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC >>> packing. >>> >>> 4/ Execute the whole e/c in a block of say 20ms rather than sample >>> by sample. Processing a few samples every ms is inefficient. >>> */ >>> >>> #else /* #ifdef __bfin__ */ >>> ... >>> #endif /* #ifdef __bfin__ */ >>> >>> ========================================================= >>> >>> ------------------------------------------------------------------------------ >>> Start Your Social Network Today - Download eXo Platform >>> Build your Enterprise Intranet with eXo Platform Software >>> Java Based Open Source Intranet - Social, Extensible, Cloud Ready >>> Get Started Now And Turn Your Intranet Into A Collaboration Platform >>> http://p.sf.net/sfu/ExoPlatform >>> _______________________________________________ >>> freetel-oslec mailing list >>> fre...@li... >>> https://lists.sourceforge.net/lists/listinfo/freetel-oslec >> ------------------------------------------------------------------------------ >> Start Your Social Network Today - Download eXo Platform >> Build your Enterprise Intranet with eXo Platform Software >> Java Based Open Source Intranet - Social, Extensible, Cloud Ready >> Get Started Now And Turn Your Intranet Into A Collaboration Platform >> http://p.sf.net/sfu/ExoPlatform >> _______________________________________________ >> freetel-oslec mailing list >> fre...@li... >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > ------------------------------------------------------------------------------ > Start Your Social Network Today - Download eXo Platform > Build your Enterprise Intranet with eXo Platform Software > Java Based Open Source Intranet - Social, Extensible, Cloud Ready > Get Started Now And Turn Your Intranet Into A Collaboration Platform > http://p.sf.net/sfu/ExoPlatform > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec |
From: Машкин С В <ma...@ya...> - 2014-04-22 03:39:48
|
Hello, David! I did not verify my optimised algorithm by speedtest.c, I made only real-world call test on my hardware, and notice, that there were no difference between echo-suppression results of non-optimized and optimized OSLEC versions. I'll try to make test with http://svn.astfin.org/software/oslec/trunk/user/speedtest.c and write here about results. I try to use OSLEC inside Blackfin based hardware, which works like PSTN-VOIP gateway. Software is based on uClinux, eXosip, oRTP, DAHDI (with OSLEC linked) and some free VOIP vocoders. My Blackfin processor (BF531) does not have sufficient capacity to process required number of VoIP channels (approximately 12-16), so I need to do speed up everything I can. OSLEC is the one of candidates. Sergei. 22.04.2014, 01:38, "David Rowe" <da...@ro...>: > Hi Sergei, > > Thanks for that work. > > Did you verify that your optimised algorithm has identical results to > the original, for example using some sort of unit test? This program: > > http://svn.astfin.org/software/oslec/trunk/user/speedtest.c > > Can be used to test if your optimised version gives exactly the same > results as the original code, e.g. diff the out.txt files from the two > versions. > > BTW I was wondering what your application is? > > Thanks, > > David > > On Mon, 2014-04-21 at 07:25 +0400, Машкин С В wrote: > >> Hello! >> >> Sorry for previous message. It was formatted, but I see, that mailing list does >> not support HTML format. So, try again: >> >> It seems, I have made approximately 15-20 % (depends on ec->taps parameter) >> speed optimization of OSLEC. >> >> My optimization is in Blackfin version of lms_adapt_bg() function. >> I have only decreased number of operations inside loop from 6 to 4. >> >> Because of hard parallelization I am not 100% sure that there are no >> errors in code, but I made real tests with this optimized versionof OSLEC and it seems working fine. >> >> Thanks for great work. >> >> I have not enough expirience in diff/patch making, so sorry for the format of message. >> >> File: echo.c >> ========================================================= >> >> #ifdef __bfin__ >> static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) >> { >> #if 0 /* original */ >> int i, j; >> int offset1; >> int offset2; >> int factor; >> int exp; >> int16_t *phist; >> int n; >> >> if (shift > 0) >> factor = clean << shift; >> else >> factor = clean >> -shift; >> >> /* Update the FIR taps */ >> >> offset2 = ec->curr_pos; >> offset1 = ec->taps - offset2; >> phist = &ec->fir_state_bg.history[offset2]; >> >> /* st: and en: help us locate the assembler in echo.s */ >> >> /* asm("st:"); */ >> n = ec->taps; >> for (i = 0, j = offset2; i < n; i++, j++) { >> exp = *phist++ * factor; >> ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15); >> } >> /* asm("en:"); */ >> >> /* Note the asm for the inner loop above generated by Blackfin gcc >> 4.1.1 is pretty good (note even parallel instructions used): >> >> R0 = W [P0++] (X); >> R0 *= R2; >> R0 = R0 + R3 (NS) || >> R1 = W [P1] (X) || >> nop; >> R0 >>>= 15; >> R0 = R0 + R1; >> W [P1++] = R0; >> >> A block based update algorithm would be much faster but the >> above can't be improved on much. Every instruction saved in >> the loop above is 2 MIPs/ch! The for loop above is where the >> Blackfin spends most of it's time - about 17 MIPs/ch measured >> with speedtest.c with 256 taps (32ms). Write-back and >> Write-through cache gave about the same performance. >> */ >> #else /* optimized by Sergei Mashkin */ >> int offset1; >> int offset2; >> int factor; >> int16_t *phist; >> >> if (shift > 0) >> factor = clean << shift; >> else >> factor = clean >> -shift; >> >> /* Update the FIR taps */ >> >> offset2 = ec->curr_pos; >> offset1 = ec->taps - offset2; >> phist = &ec->fir_state_bg.history[offset2]; >> >> __asm__ __volatile__ ( >> "P0 = %0;" /* P0 = phist */ >> "P1 = %1;" /* P1 = ec->fir_taps16[1] */ >> "I1 = P1;" >> "R2 = %2;" /* R2 = factor */ >> "R3 = (1<<14);" /* R3 = (1<<14) */ >> "R0 = W [P0++] (X);" /* R0 = *phist */ >> "R0 *= R2;" >> "R0 = R0 + R3 (NS) ||" >> "R1 = W [P1++] (X) ||" >> "nop;" >> "R0 >>>= 15;" >> "R4 = R0 + R1 (NS) ||" >> "R0 = W [P0++] (X) ||" >> "nop;" >> "LOOP m%= LC0 = %3;" /* ec->taps */ >> "LOOP_BEGIN m%=;" >> "R0 *= R2;" /* R0 = *phist++ * factor(R2) */ >> "R0 = R0 + R3 (NS) ||" /* R0 = *phist++ * factor(R2) + (1<<14), */ >> "R1 = W [P1++] (X) ||" /* R1 = ec->fir_taps16[1][i] */ >> "W [I1++] = R4.L;" >> "R0 >>>= 15;" /* R0 = (exp + (1 << 14)) >> 15 */ >> "R4 = R0 + R1 (NS) ||" /* R0 = ((exp + (1 << 14)) >> 15) + ec->fir_taps16[1][i] */ >> "R0 = W [P0++] (X) ||" >> "nop;" >> "LOOP_END m%=;" >> : : "a" (phist), "a" (ec->fir_taps16[1]), "a" (factor), "a" (ec->taps) >> : "I1", "P0", "P1", "R0", "R1", "R2", "R3" >> ); >> #endif >> } >> >> /* >> IDEAS for further optimisation of lms_adapt_bg(): >> >> 1/ The rounding is quite costly. Could we keep as 32 bit coeffs >> then make filter pluck the MS 16-bits of the coeffs when filtering? >> However this would lower potential optimisation of filter, as I >> think the dual-MAC architecture requires packed 16 bit coeffs. >> >> 2/ Block based update would be more efficient, as per comments above, >> could use dual MAC architecture. >> >> 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC >> packing. >> >> 4/ Execute the whole e/c in a block of say 20ms rather than sample >> by sample. Processing a few samples every ms is inefficient. >> */ >> >> #else /* #ifdef __bfin__ */ >> ... >> #endif /* #ifdef __bfin__ */ >> >> ========================================================= >> >> ------------------------------------------------------------------------------ >> Start Your Social Network Today - Download eXo Platform >> Build your Enterprise Intranet with eXo Platform Software >> Java Based Open Source Intranet - Social, Extensible, Cloud Ready >> Get Started Now And Turn Your Intranet Into A Collaboration Platform >> http://p.sf.net/sfu/ExoPlatform >> _______________________________________________ >> freetel-oslec mailing list >> fre...@li... >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > ------------------------------------------------------------------------------ > Start Your Social Network Today - Download eXo Platform > Build your Enterprise Intranet with eXo Platform Software > Java Based Open Source Intranet - Social, Extensible, Cloud Ready > Get Started Now And Turn Your Intranet Into A Collaboration Platform > http://p.sf.net/sfu/ExoPlatform > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec |
From: David R. <da...@ro...> - 2014-04-21 21:37:28
|
Hi Sergei, Thanks for that work. Did you verify that your optimised algorithm has identical results to the original, for example using some sort of unit test? This program: http://svn.astfin.org/software/oslec/trunk/user/speedtest.c Can be used to test if your optimised version gives exactly the same results as the original code, e.g. diff the out.txt files from the two versions. BTW I was wondering what your application is? Thanks, David On Mon, 2014-04-21 at 07:25 +0400, Машкин С В wrote: > Hello! > > Sorry for previous message. It was formatted, but I see, that mailing list does > not support HTML format. So, try again: > > It seems, I have made approximately 15-20 % (depends on ec->taps parameter) > speed optimization of OSLEC. > > My optimization is in Blackfin version of lms_adapt_bg() function. > I have only decreased number of operations inside loop from 6 to 4. > > Because of hard parallelization I am not 100% sure that there are no > errors in code, but I made real tests with this optimized versionof OSLEC and it seems working fine. > > Thanks for great work. > > I have not enough expirience in diff/patch making, so sorry for the format of message. > > File: echo.c > ========================================================= > > #ifdef __bfin__ > static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) > { > #if 0 /* original */ > int i, j; > int offset1; > int offset2; > int factor; > int exp; > int16_t *phist; > int n; > > if (shift > 0) > factor = clean << shift; > else > factor = clean >> -shift; > > /* Update the FIR taps */ > > offset2 = ec->curr_pos; > offset1 = ec->taps - offset2; > phist = &ec->fir_state_bg.history[offset2]; > > /* st: and en: help us locate the assembler in echo.s */ > > /* asm("st:"); */ > n = ec->taps; > for (i = 0, j = offset2; i < n; i++, j++) { > exp = *phist++ * factor; > ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15); > } > /* asm("en:"); */ > > /* Note the asm for the inner loop above generated by Blackfin gcc > 4.1.1 is pretty good (note even parallel instructions used): > > R0 = W [P0++] (X); > R0 *= R2; > R0 = R0 + R3 (NS) || > R1 = W [P1] (X) || > nop; > R0 >>>= 15; > R0 = R0 + R1; > W [P1++] = R0; > > A block based update algorithm would be much faster but the > above can't be improved on much. Every instruction saved in > the loop above is 2 MIPs/ch! The for loop above is where the > Blackfin spends most of it's time - about 17 MIPs/ch measured > with speedtest.c with 256 taps (32ms). Write-back and > Write-through cache gave about the same performance. > */ > #else /* optimized by Sergei Mashkin */ > int offset1; > int offset2; > int factor; > int16_t *phist; > > if (shift > 0) > factor = clean << shift; > else > factor = clean >> -shift; > > /* Update the FIR taps */ > > offset2 = ec->curr_pos; > offset1 = ec->taps - offset2; > phist = &ec->fir_state_bg.history[offset2]; > > __asm__ __volatile__ ( > "P0 = %0;" /* P0 = phist */ > "P1 = %1;" /* P1 = ec->fir_taps16[1] */ > "I1 = P1;" > "R2 = %2;" /* R2 = factor */ > "R3 = (1<<14);" /* R3 = (1<<14) */ > "R0 = W [P0++] (X);" /* R0 = *phist */ > "R0 *= R2;" > "R0 = R0 + R3 (NS) ||" > "R1 = W [P1++] (X) ||" > "nop;" > "R0 >>>= 15;" > "R4 = R0 + R1 (NS) ||" > "R0 = W [P0++] (X) ||" > "nop;" > "LOOP m%= LC0 = %3;" /* ec->taps */ > "LOOP_BEGIN m%=;" > "R0 *= R2;" /* R0 = *phist++ * factor(R2) */ > "R0 = R0 + R3 (NS) ||" /* R0 = *phist++ * factor(R2) + (1<<14), */ > "R1 = W [P1++] (X) ||" /* R1 = ec->fir_taps16[1][i] */ > "W [I1++] = R4.L;" > "R0 >>>= 15;" /* R0 = (exp + (1 << 14)) >> 15 */ > "R4 = R0 + R1 (NS) ||" /* R0 = ((exp + (1 << 14)) >> 15) + ec->fir_taps16[1][i] */ > "R0 = W [P0++] (X) ||" > "nop;" > "LOOP_END m%=;" > : : "a" (phist), "a" (ec->fir_taps16[1]), "a" (factor), "a" (ec->taps) > : "I1", "P0", "P1", "R0", "R1", "R2", "R3" > ); > #endif > } > > /* > IDEAS for further optimisation of lms_adapt_bg(): > > 1/ The rounding is quite costly. Could we keep as 32 bit coeffs > then make filter pluck the MS 16-bits of the coeffs when filtering? > However this would lower potential optimisation of filter, as I > think the dual-MAC architecture requires packed 16 bit coeffs. > > 2/ Block based update would be more efficient, as per comments above, > could use dual MAC architecture. > > 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC > packing. > > 4/ Execute the whole e/c in a block of say 20ms rather than sample > by sample. Processing a few samples every ms is inefficient. > */ > > #else /* #ifdef __bfin__ */ > ... > #endif /* #ifdef __bfin__ */ > > ========================================================= > > ------------------------------------------------------------------------------ > Start Your Social Network Today - Download eXo Platform > Build your Enterprise Intranet with eXo Platform Software > Java Based Open Source Intranet - Social, Extensible, Cloud Ready > Get Started Now And Turn Your Intranet Into A Collaboration Platform > http://p.sf.net/sfu/ExoPlatform > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec |
From: Машкин С В <ma...@ya...> - 2014-04-21 03:25:41
|
Hello! Sorry for previous message. It was formatted, but I see, that mailing list does not support HTML format. So, try again: It seems, I have made approximately 15-20 % (depends on ec->taps parameter) speed optimization of OSLEC. My optimization is in Blackfin version of lms_adapt_bg() function. I have only decreased number of operations inside loop from 6 to 4. Because of hard parallelization I am not 100% sure that there are no errors in code, but I made real tests with this optimized versionof OSLEC and it seems working fine. Thanks for great work. I have not enough expirience in diff/patch making, so sorry for the format of message. File: echo.c ========================================================= #ifdef __bfin__ static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift) { #if 0 /* original */ int i, j; int offset1; int offset2; int factor; int exp; int16_t *phist; int n; if (shift > 0) factor = clean << shift; else factor = clean >> -shift; /* Update the FIR taps */ offset2 = ec->curr_pos; offset1 = ec->taps - offset2; phist = &ec->fir_state_bg.history[offset2]; /* st: and en: help us locate the assembler in echo.s */ /* asm("st:"); */ n = ec->taps; for (i = 0, j = offset2; i < n; i++, j++) { exp = *phist++ * factor; ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15); } /* asm("en:"); */ /* Note the asm for the inner loop above generated by Blackfin gcc 4.1.1 is pretty good (note even parallel instructions used): R0 = W [P0++] (X); R0 *= R2; R0 = R0 + R3 (NS) || R1 = W [P1] (X) || nop; R0 >>>= 15; R0 = R0 + R1; W [P1++] = R0; A block based update algorithm would be much faster but the above can't be improved on much. Every instruction saved in the loop above is 2 MIPs/ch! The for loop above is where the Blackfin spends most of it's time - about 17 MIPs/ch measured with speedtest.c with 256 taps (32ms). Write-back and Write-through cache gave about the same performance. */ #else /* optimized by Sergei Mashkin */ int offset1; int offset2; int factor; int16_t *phist; if (shift > 0) factor = clean << shift; else factor = clean >> -shift; /* Update the FIR taps */ offset2 = ec->curr_pos; offset1 = ec->taps - offset2; phist = &ec->fir_state_bg.history[offset2]; __asm__ __volatile__ ( "P0 = %0;" /* P0 = phist */ "P1 = %1;" /* P1 = ec->fir_taps16[1] */ "I1 = P1;" "R2 = %2;" /* R2 = factor */ "R3 = (1<<14);" /* R3 = (1<<14) */ "R0 = W [P0++] (X);" /* R0 = *phist */ "R0 *= R2;" "R0 = R0 + R3 (NS) ||" "R1 = W [P1++] (X) ||" "nop;" "R0 >>>= 15;" "R4 = R0 + R1 (NS) ||" "R0 = W [P0++] (X) ||" "nop;" "LOOP m%= LC0 = %3;" /* ec->taps */ "LOOP_BEGIN m%=;" "R0 *= R2;" /* R0 = *phist++ * factor(R2) */ "R0 = R0 + R3 (NS) ||" /* R0 = *phist++ * factor(R2) + (1<<14), */ "R1 = W [P1++] (X) ||" /* R1 = ec->fir_taps16[1][i] */ "W [I1++] = R4.L;" "R0 >>>= 15;" /* R0 = (exp + (1 << 14)) >> 15 */ "R4 = R0 + R1 (NS) ||" /* R0 = ((exp + (1 << 14)) >> 15) + ec->fir_taps16[1][i] */ "R0 = W [P0++] (X) ||" "nop;" "LOOP_END m%=;" : : "a" (phist), "a" (ec->fir_taps16[1]), "a" (factor), "a" (ec->taps) : "I1", "P0", "P1", "R0", "R1", "R2", "R3" ); #endif } /* IDEAS for further optimisation of lms_adapt_bg(): 1/ The rounding is quite costly. Could we keep as 32 bit coeffs then make filter pluck the MS 16-bits of the coeffs when filtering? However this would lower potential optimisation of filter, as I think the dual-MAC architecture requires packed 16 bit coeffs. 2/ Block based update would be more efficient, as per comments above, could use dual MAC architecture. 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC packing. 4/ Execute the whole e/c in a block of say 20ms rather than sample by sample. Processing a few samples every ms is inefficient. */ #else /* #ifdef __bfin__ */ ... #endif /* #ifdef __bfin__ */ ========================================================= |
From: Машкин С В <ma...@ya...> - 2014-04-21 03:05:07
|
<div>Hello!</div><div> </div><div>It seems, I have made approximately 15-20 % (depends on ec->taps parameter)</div><div>speed optimization of OSLEC.</div><div> </div><div>My optimization is in Blackfin version of lms_adapt_bg() function.</div><div>I have only decreased number of operations inside loop from 6 to 4.</div><div> </div><div>Because of hard parallelization I am not 100% sure that there are no</div><div>errors in code, but I made real tests with this optimized version<div>of OSLEC and it seems working fine.</div></div><div> </div><div>Thanks for great work.</div><div> </div><div><div>I have not enough expirience in diff/patch making, so sorry for the format of message.</div></div><div> </div><div>File: echo.c</div><div>=========================================================</div><div> </div><div><span style="font-family:courier new,courier;">#ifdef __bfin__</span></div><div><span style="font-family:courier new,courier;">static inline void lms_adapt_bg(struct oslec_state *ec, int clean, int shift)</span></div><div><span style="font-family:courier new,courier;">{</span></div><div><span style="font-family:courier new,courier;">#if 0 /* original */</span></div><div><span style="font-family:courier new,courier;">int i, j;</span></div><div><span style="font-family:courier new,courier;">int offset1;</span></div><div><span style="font-family:courier new,courier;">int offset2;</span></div><div><span style="font-family:courier new,courier;">int factor;</span></div><div><span style="font-family:courier new,courier;">int exp;</span></div><div><span style="font-family:courier new,courier;">int16_t *phist;</span></div><div><span style="font-family:courier new,courier;">int n;</span></div><div> </div><div><span style="font-family:courier new,courier;">if (shift > 0)</span></div><div><span style="font-family:courier new,courier;">factor = clean << shift;</span></div><div><span style="font-family:courier new,courier;">else</span></div><div><span style="font-family:courier new,courier;">factor = clean >> -shift;</span></div><div> </div><div><span style="font-family:courier new,courier;">/* Update the FIR taps */</span></div><div> </div><div><span style="font-family:courier new,courier;">offset2 = ec->curr_pos;</span></div><div><span style="font-family:courier new,courier;">offset1 = ec->taps - offset2;</span></div><div><span style="font-family:courier new,courier;">phist = &ec->fir_state_bg.history[offset2];</span></div><div> </div><div><span style="font-family:courier new,courier;">/* st: and en: help us locate the assembler in echo.s */</span></div><div> </div><div><span style="font-family:courier new,courier;">/* asm("st:"); */</span></div><div><span style="font-family:courier new,courier;">n = ec->taps;</span></div><div><span style="font-family:courier new,courier;">for (i = 0, j = offset2; i < n; i++, j++) {</span></div><div><span style="font-family:courier new,courier;">exp = *phist++ * factor;</span></div><div><span style="font-family:courier new,courier;">ec->fir_taps16[1][i] += (int16_t) ((exp + (1 << 14)) >> 15);</span></div><div><span style="font-family:courier new,courier;">}</span></div><div><span style="font-family:courier new,courier;">/* asm("en:"); */</span></div><div> </div><div><span style="font-family:courier new,courier;">/* Note the asm for the inner loop above generated by Blackfin gcc</span></div><div><span style="font-family:courier new,courier;"> 4.1.1 is pretty good (note even parallel instructions used):</span></div><div> </div><div><span style="font-family:courier new,courier;"> R0 = W [P0++] (X);</span></div><div><span style="font-family:courier new,courier;"> R0 *= R2;</span></div><div><span style="font-family:courier new,courier;"> R0 = R0 + R3 (NS) ||</span></div><div><span style="font-family:courier new,courier;"> R1 = W [P1] (X) ||</span></div><div><span style="font-family:courier new,courier;"> nop;</span></div><div><span style="font-family:courier new,courier;"> R0 >>>= 15;</span></div><div><span style="font-family:courier new,courier;"> R0 = R0 + R1;</span></div><div><span style="font-family:courier new,courier;"> W [P1++] = R0;</span></div><div> </div><div><span style="font-family:courier new,courier;"> A block based update algorithm would be much faster but the</span></div><div><span style="font-family:courier new,courier;"> above can't be improved on much. Every instruction saved in</span></div><div><span style="font-family:courier new,courier;"> the loop above is 2 MIPs/ch! The for loop above is where the</span></div><div><span style="font-family:courier new,courier;"> Blackfin spends most of it's time - about 17 MIPs/ch measured</span></div><div><span style="font-family:courier new,courier;"> with speedtest.c with 256 taps (32ms). Write-back and</span></div><div><span style="font-family:courier new,courier;"> Write-through cache gave about the same performance.</span></div><div><span style="font-family:courier new,courier;">*/</span></div><div><span style="font-family:courier new,courier;">#else /* optimized by Sergei Mashkin */</span></div><div><span style="font-family:courier new,courier;">int offset1;</span></div><div><span style="font-family:courier new,courier;">int offset2;</span></div><div><span style="font-family:courier new,courier;">int factor;</span></div><div><span style="font-family:courier new,courier;">int16_t *phist;</span></div><div> </div><div><span style="font-family:courier new,courier;">if (shift > 0)</span></div><div><span style="font-family:courier new,courier;">factor = clean << shift;</span></div><div><span style="font-family:courier new,courier;">else</span></div><div><span style="font-family:courier new,courier;">factor = clean >> -shift;</span></div><div> </div><div><span style="font-family:courier new,courier;">/* Update the FIR taps */</span></div><div> </div><div><span style="font-family:courier new,courier;">offset2 = ec->curr_pos;</span></div><div><span style="font-family:courier new,courier;">offset1 = ec->taps - offset2;</span></div><div><span style="font-family:courier new,courier;">phist = &ec->fir_state_bg.history[offset2];</span></div><div> </div><div><span style="font-family:courier new,courier;"> __asm__ __volatile__ (</span></div><div><span style="font-family:courier new,courier;"> "P0 = %0;" /* P0 = phist */</span></div><div><span style="font-family:courier new,courier;"> "P1 = %1;" /* P1 = ec->fir_taps16[1] */</span></div><div><span style="font-family:courier new,courier;"> "I1 = P1;"</span></div><div><span style="font-family:courier new,courier;"> "R2 = %2;" /* R2 = factor */</span></div><div><span style="font-family:courier new,courier;"> "R3 = (1<<14);" /* R3 = (1<<14) */</span></div><div><span style="font-family:courier new,courier;"> "R0 = W [P0++] (X);" /* R0 = *phist */</span></div><div><span style="font-family:courier new,courier;"> "R0 *= R2;"</span></div><div><span style="font-family:courier new,courier;"> "R0 = R0 + R3 (NS) ||"</span></div><div><span style="font-family:courier new,courier;"> "R1 = W [P1++] (X) ||"</span></div><div><span style="font-family:courier new,courier;"> "nop;"</span></div><div><span style="font-family:courier new,courier;"> "R0 >>>= 15;"</span></div><div><span style="font-family:courier new,courier;"> "R4 = R0 + R1 (NS) ||"</span></div><div><span style="font-family:courier new,courier;"> "R0 = W [P0++] (X) ||"</span></div><div><span style="font-family:courier new,courier;"> "nop;"</span></div><div><span style="font-family:courier new,courier;"> "LOOP m%= LC0 = %3;" /* ec->taps */</span></div><div><span style="font-family:courier new,courier;"> "LOOP_BEGIN m%=;"</span></div><div><span style="font-family:courier new,courier;"> "R0 *= R2;" /* R0 = *phist++ * factor(R2) */</span></div><div><span style="font-family:courier new,courier;"> "R0 = R0 + R3 (NS) ||" /* R0 = *phist++ * factor(R2) + (1<<14), */</span></div><div><span style="font-family:courier new,courier;"> "R1 = W [P1++] (X) ||" /* R1 = ec->fir_taps16[1][i] */</span></div><div><span style="font-family:courier new,courier;"> "W [I1++] = R4.L;"</span></div><div><span style="font-family:courier new,courier;"> "R0 >>>= 15;" /* R0 = (exp + (1 << 14)) >> 15 */</span></div><div><span style="font-family:courier new,courier;"> "R4 = R0 + R1 (NS) ||" /* R0 = ((exp + (1 << 14)) >> 15) + ec->fir_taps16[1][i] */</span></div><div><span style="font-family:courier new,courier;"> "R0 = W [P0++] (X) ||"</span></div><div><span style="font-family:courier new,courier;"> "nop;"</span></div><div><span style="font-family:courier new,courier;"> "LOOP_END m%=;"</span></div><div><span style="font-family:courier new,courier;"> : : "a" (phist), "a" (ec->fir_taps16[1]), "a" (factor), "a" (ec->taps)</span></div><div><span style="font-family:courier new,courier;"> : "I1", "P0", "P1", "R0", "R1", "R2", "R3"</span></div><div><span style="font-family:courier new,courier;"> );</span></div><div><span style="font-family:courier new,courier;">#endif</span></div><div><span style="font-family:courier new,courier;">}</span></div><div> </div><div><span style="font-family:courier new,courier;">/*</span></div><div><span style="font-family:courier new,courier;"> IDEAS for further optimisation of lms_adapt_bg():</span></div><div> </div><div><span style="font-family:courier new,courier;"> 1/ The rounding is quite costly. Could we keep as 32 bit coeffs</span></div><div><span style="font-family:courier new,courier;"> then make filter pluck the MS 16-bits of the coeffs when filtering?</span></div><div><span style="font-family:courier new,courier;"> However this would lower potential optimisation of filter, as I</span></div><div><span style="font-family:courier new,courier;"> think the dual-MAC architecture requires packed 16 bit coeffs.</span></div><div> </div><div><span style="font-family:courier new,courier;"> 2/ Block based update would be more efficient, as per comments above,</span></div><div><span style="font-family:courier new,courier;"> could use dual MAC architecture.</span></div><div> </div><div><span style="font-family:courier new,courier;"> 3/ Look for same sample Blackfin LMS code, see if we can get dual-MAC</span></div><div><span style="font-family:courier new,courier;"> packing.</span></div><div> </div><div><span style="font-family:courier new,courier;"> 4/ Execute the whole e/c in a block of say 20ms rather than sample</span></div><div><span style="font-family:courier new,courier;"> by sample. Processing a few samples every ms is inefficient.</span></div><div><span style="font-family:courier new,courier;">*/</span></div><div> </div><div><span style="font-family:courier new,courier;">#else /* #ifdef __bfin__ */</span></div><div><span style="font-family:courier new,courier;">...</span></div><div><span style="font-family:courier new,courier;">#endif /* #ifdef __bfin__ */</span></div><div> </div><div><div>=========================================================</div><div> </div></div> |
From: Dennis S. <de...@op...> - 2013-02-22 02:32:43
|
Hi, I'm using Dahdi. Met vriendelijke groet, Dennis Spaan www.opendial.nl Tel. 015-3010405 2013/2/20 Tzafrir Cohen <tza...@xo...> > On Mon, Oct 22, 2012 at 09:19:13AM -0500, Juan Manuel Coronado Zúñiga > wrote: > > Hello Dennis, > > > > Do you also have a E1/T1 Rhino Interface which is the one connect to the > > Channelbank or is it a TDM card from another provider (Digium, Sangoma, > > etc)? > > > > In case you do have a Rhino Equipment TDM interface, take a look at the > > Rhino PRI cards section on David's blog (in case you haven't already): > > > > http://www.rowetel.com/blog/?page_id=454 > > > > Please note that the rhino-2.2.6 driver had commented out the relevant EC > > code portion and that you would have to recompile the driver to solve > that. > > Bob Conklin also has another solution for this mentioned on the same > entry. > > > > In case you don't have a Rhino card, and you see Oslec properly > configured > > and running (cat /proc/oslec/info), check if the Channelbank includes > echo > > cancellation modules and if maybe your problem is on that end and not on > > the E1/T1 circuit connecting to the channelbank. > > Do you use Zaptel or DAHDI? > > In Zaptel there was a single echo canceller module and thus support for > oslec required much more overriding of existing code. The OSLEC driver > then also provided /proc/oslec . > > The DAHDI version has a modular echo canceller, and thus no need to > override code of the Rhino driver. However there is no /proc/oslec . > Recent DAHDI versions should show the echo canceller used in each > channel on /proc/dahdi/* . > > -- > Tzafrir Cohen > icq#16849755 jabber:tza...@xo... > +972-50-7952406 mailto:tza...@xo... > http://www.xorcom.com iax:gu...@lo.../tzafrir > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > |
From: Tzafrir C. <tza...@xo...> - 2013-02-20 20:51:22
|
On Mon, Oct 22, 2012 at 09:19:13AM -0500, Juan Manuel Coronado Zúñiga wrote: > Hello Dennis, > > Do you also have a E1/T1 Rhino Interface which is the one connect to the > Channelbank or is it a TDM card from another provider (Digium, Sangoma, > etc)? > > In case you do have a Rhino Equipment TDM interface, take a look at the > Rhino PRI cards section on David's blog (in case you haven't already): > > http://www.rowetel.com/blog/?page_id=454 > > Please note that the rhino-2.2.6 driver had commented out the relevant EC > code portion and that you would have to recompile the driver to solve that. > Bob Conklin also has another solution for this mentioned on the same entry. > > In case you don't have a Rhino card, and you see Oslec properly configured > and running (cat /proc/oslec/info), check if the Channelbank includes echo > cancellation modules and if maybe your problem is on that end and not on > the E1/T1 circuit connecting to the channelbank. Do you use Zaptel or DAHDI? In Zaptel there was a single echo canceller module and thus support for oslec required much more overriding of existing code. The OSLEC driver then also provided /proc/oslec . The DAHDI version has a modular echo canceller, and thus no need to override code of the Rhino driver. However there is no /proc/oslec . Recent DAHDI versions should show the echo canceller used in each channel on /proc/dahdi/* . -- Tzafrir Cohen icq#16849755 jabber:tza...@xo... +972-50-7952406 mailto:tza...@xo... http://www.xorcom.com iax:gu...@lo.../tzafrir |
From: Dennis S. <de...@op...> - 2013-02-20 17:07:21
|
Hi Guan, Sorry for the 4 month late replay (lol) and my thanks for responding to my question. Our callcenter has been offline for a few months and i was abroad so this went to the bottom of my to-do list. To answer your question. I'm using the Rhino E1/T1 card that's supposed to be used with the channelbank. However i believe i have an older version then the one mentioned here: http://rhinoequipment.com/products/59 And mine doesn't have an onboard echo canceler and that's why they point me to oslec. Thanks for that page, i will read it carefully. If i fail at this, would you be able to provide remote support? Met vriendelijke groet, Dennis Spaan www.opendial.nl Tel. 015-3010405 2012/10/22 Juan Manuel Coronado Zúñiga <jua...@gm...> > Hello Dennis, > > Do you also have a E1/T1 Rhino Interface which is the one connect to the > Channelbank or is it a TDM card from another provider (Digium, Sangoma, > etc)? > > In case you do have a Rhino Equipment TDM interface, take a look at the > Rhino PRI cards section on David's blog (in case you haven't already): > > http://www.rowetel.com/blog/?page_id=454 > > Please note that the rhino-2.2.6 driver had commented out the relevant EC > code portion and that you would have to recompile the driver to solve that. > Bob Conklin also has another solution for this mentioned on the same entry. > > In case you don't have a Rhino card, and you see Oslec properly configured > and running (cat /proc/oslec/info), check if the Channelbank includes echo > cancellation modules and if maybe your problem is on that end and not on > the E1/T1 circuit connecting to the channelbank. > > Hope this helps. > > > Regards, > > -- > Juan M. Coronado Z. > > On Mon, Oct 22, 2012 at 8:05 AM, Dennis Spaan <de...@op...> wrote: > >> Hello, >> >> A few years ago i bought a Rhino Channelbank for our callcenter. Due to >> various reasons we never were able to install the channelbank until >> recently. We've used Elastix to create Dadhi extensions and everything >> works fine except for a very distinct echo. The people of Rhino have given >> us great support but in the end they referred us to this mailing list >> because they believe the echo can only be solved with different oslec >> settings. Is there anyone that can provide paid support and has experience >> with Rhino channel banks? >> >> Met vriendelijke groet, >> >> Dennis Spaan >> www.opendial.nl >> Tel. 015-3010405 >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://p.sf.net/sfu/appdyn_sfd2d_oct >> _______________________________________________ >> freetel-oslec mailing list >> fre...@li... >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec >> >> > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > |
From: Juan M. C. Z. <jua...@gm...> - 2012-10-22 14:19:39
|
Hello Dennis, Do you also have a E1/T1 Rhino Interface which is the one connect to the Channelbank or is it a TDM card from another provider (Digium, Sangoma, etc)? In case you do have a Rhino Equipment TDM interface, take a look at the Rhino PRI cards section on David's blog (in case you haven't already): http://www.rowetel.com/blog/?page_id=454 Please note that the rhino-2.2.6 driver had commented out the relevant EC code portion and that you would have to recompile the driver to solve that. Bob Conklin also has another solution for this mentioned on the same entry. In case you don't have a Rhino card, and you see Oslec properly configured and running (cat /proc/oslec/info), check if the Channelbank includes echo cancellation modules and if maybe your problem is on that end and not on the E1/T1 circuit connecting to the channelbank. Hope this helps. Regards, -- Juan M. Coronado Z. On Mon, Oct 22, 2012 at 8:05 AM, Dennis Spaan <de...@op...> wrote: > Hello, > > A few years ago i bought a Rhino Channelbank for our callcenter. Due to > various reasons we never were able to install the channelbank until > recently. We've used Elastix to create Dadhi extensions and everything > works fine except for a very distinct echo. The people of Rhino have given > us great support but in the end they referred us to this mailing list > because they believe the echo can only be solved with different oslec > settings. Is there anyone that can provide paid support and has experience > with Rhino channel banks? > > Met vriendelijke groet, > > Dennis Spaan > www.opendial.nl > Tel. 015-3010405 > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > |
From: Dennis S. <de...@op...> - 2012-10-22 13:05:59
|
Hello, A few years ago i bought a Rhino Channelbank for our callcenter. Due to various reasons we never were able to install the channelbank until recently. We've used Elastix to create Dadhi extensions and everything works fine except for a very distinct echo. The people of Rhino have given us great support but in the end they referred us to this mailing list because they believe the echo can only be solved with different oslec settings. Is there anyone that can provide paid support and has experience with Rhino channel banks? Met vriendelijke groet, Dennis Spaan www.opendial.nl Tel. 015-3010405 |
From: Frank D <cb...@ho...> - 2012-07-11 03:29:19
|
http://eposhotel.com.tr/rumyn.html?tk=irsyatx |
From: Stelios K. <sko...@di...> - 2010-08-24 20:22:19
|
I had OSLEC running on a 233Mhz powerpc and could get 2 channels going with pretty good results. The only time i had trouble was with the HFC based isdn cards that generate 8000 interrupts a seconds pretty much hogging the whole system Do you have any other peripherals generating interrupts in large numbers on your board? Task switching and cache handling in this type of cpu's can't be compared to x86 architecture. On Tue, 2010-07-13 at 11:48 -0400, Ye Liu wrote: > Hi David and Tzafrir, I'd tested oslec on a clean & new asterisk > 1.6.2.9 and dahdi 2.3.0.1 with minimal configuration, but the echo was > still there, sounded very bad. > > Here is the result of running speedtest on my platform: > > Testing OSLEC with 128 taps (16 ms tail) > CPU executes 0.21 MIPS > ------------------------- > > Method 1: gettimeofday() at start and end > 748 ms for 10s of speech > 0.02 MIPS > 13.37 instances possible at 100% CPU load > Method 2: samples clock cycles at start and end > 0.02 MIPS > 13.37 instances possible at 100% CPU load > Method 3: samples clock cycles for each call, IIR average > cycles_worst 1 cycles_last 1 cycles_av: 0 > 0.00 MIPS > inf instances possible at 100% CPU load > > > Compare this to the result on my x86 virtual machine, I feel my ARM11 > 533MHz is not fast enough to handle the echo canceller. Am I right? > > > On Thu, Jul 1, 2010 at 1:08 AM, David Rowe <da...@ro...> wrote: > > Hello Ye, > > > > I haven't heard any other reports of this sort of problem, so perhaps > > it's a configuration or ARM 11 specific issue. Is Oslec being compiled > > with the MMX option on the command line? This is an x86 specific > > option. > > > > Another possibility is endian issue, it could be that the byte or word > > ordering is opposite for ARM. > > > > - David > > > > On Wed, 2010-06-30 at 10:52 -0400, Ye Liu wrote: > >> (I just changed my email address for this mailing list, sorry for the duplicate) > >> > >> Hi there, > >> > >> I want to use oslec with my dahdi compatible analog hardware, but > >> whenever asterisk connects the call to pstn line, I can hear a very > >> loud harsh non-stop noise (white noise?) and my own voice instantly > >> when I speak something into the handset. oslec seems not cancel the > >> echo but generate noise back to me. > >> > >> I've been with mg2 for a long time, that echo canceler works fine most > >> of the time, but the voice quality is not acceptable during > >> double-talk. That's why I'm looking into oslec. I experimented > >> different settings in dahdi/system.conf and asterisk/chan_dahdi.conf, > >> but nothing helped... > >> > >> It might be my cpu, it's ARM 11, but I don't know whether it's the > >> actual cause. Could anyone here please help me? > >> > >> Here is my system: > >> > >> Linux 2.6.19.2 > >> Asterisk 1.6.1.1 > >> DAHDI 2.3.0.1 > >> OSLEC is grabbed from linux-2.6.34 source tree > >> > >> -- > >> Ye Liu (AKA @jaux) > >> > >> http://jaux.net > >> > >> > >> > > > > > > ------------------------------------------------------------------------------ > > This SF.net email is sponsored by Sprint > > What will you do first with EVO, the first 4G phone? > > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > > _______________________________________________ > > freetel-oslec mailing list > > fre...@li... > > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > > > > > -- Stelios S. Koroneos Digital OPSiS - Embedded Intelligence Tel +30 210 9858296 Ext 100 Fax +30 210 9858298 http://www.digital-opsis.com |
From: David R. <da...@ro...> - 2010-08-07 03:01:03
|
Oslec is not designed for AEC (Acoustic Echo Cancellation) so probably wont work. The Speex project has an AEC. A big problem is often the sound drivers. On Fri, 2010-08-06 at 19:03 -0700, Ming-Ching Tiew wrote: > I wonder if oslec will be useful for the job of performing acoustic echo cancellation on a typical PC with built-in microphone and open speaker, where the speaker could actually acoustically feedback into the microphone and hence the remote will hear himself talking ( very loudly, if the local speaker volume is very high ). > > We have tested a typical case of a PC softphone calling a remote asterisk box with oslec installed. The acoustic feedback heard by the remote is pretty bad. Will it be a right place for having oslec put at the PC softphone end to fix this problem ? > > Best regards. > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by > > Make an app they can't live without > Enter the BlackBerry Developer Challenge > http://p.sf.net/sfu/RIM-dev2dev > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec |
From: Ming-Ching T. <mc...@ya...> - 2010-08-07 02:03:12
|
I wonder if oslec will be useful for the job of performing acoustic echo cancellation on a typical PC with built-in microphone and open speaker, where the speaker could actually acoustically feedback into the microphone and hence the remote will hear himself talking ( very loudly, if the local speaker volume is very high ). We have tested a typical case of a PC softphone calling a remote asterisk box with oslec installed. The acoustic feedback heard by the remote is pretty bad. Will it be a right place for having oslec put at the PC softphone end to fix this problem ? Best regards. |
From: Ye L. <ja...@gm...> - 2010-07-13 15:49:17
|
Hi David and Tzafrir, I'd tested oslec on a clean & new asterisk 1.6.2.9 and dahdi 2.3.0.1 with minimal configuration, but the echo was still there, sounded very bad. Here is the result of running speedtest on my platform: Testing OSLEC with 128 taps (16 ms tail) CPU executes 0.21 MIPS ------------------------- Method 1: gettimeofday() at start and end 748 ms for 10s of speech 0.02 MIPS 13.37 instances possible at 100% CPU load Method 2: samples clock cycles at start and end 0.02 MIPS 13.37 instances possible at 100% CPU load Method 3: samples clock cycles for each call, IIR average cycles_worst 1 cycles_last 1 cycles_av: 0 0.00 MIPS inf instances possible at 100% CPU load Compare this to the result on my x86 virtual machine, I feel my ARM11 533MHz is not fast enough to handle the echo canceller. Am I right? On Thu, Jul 1, 2010 at 1:08 AM, David Rowe <da...@ro...> wrote: > Hello Ye, > > I haven't heard any other reports of this sort of problem, so perhaps > it's a configuration or ARM 11 specific issue. Is Oslec being compiled > with the MMX option on the command line? This is an x86 specific > option. > > Another possibility is endian issue, it could be that the byte or word > ordering is opposite for ARM. > > - David > > On Wed, 2010-06-30 at 10:52 -0400, Ye Liu wrote: >> (I just changed my email address for this mailing list, sorry for the duplicate) >> >> Hi there, >> >> I want to use oslec with my dahdi compatible analog hardware, but >> whenever asterisk connects the call to pstn line, I can hear a very >> loud harsh non-stop noise (white noise?) and my own voice instantly >> when I speak something into the handset. oslec seems not cancel the >> echo but generate noise back to me. >> >> I've been with mg2 for a long time, that echo canceler works fine most >> of the time, but the voice quality is not acceptable during >> double-talk. That's why I'm looking into oslec. I experimented >> different settings in dahdi/system.conf and asterisk/chan_dahdi.conf, >> but nothing helped... >> >> It might be my cpu, it's ARM 11, but I don't know whether it's the >> actual cause. Could anyone here please help me? >> >> Here is my system: >> >> Linux 2.6.19.2 >> Asterisk 1.6.1.1 >> DAHDI 2.3.0.1 >> OSLEC is grabbed from linux-2.6.34 source tree >> >> -- >> Ye Liu (AKA @jaux) >> >> http://jaux.net >> >> >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > -- Ye Liu (AKA @jaux) http://jaux.net |
From: Ye L. <ja...@gm...> - 2010-07-06 16:52:37
|
Thank you, David! I executed speedtest on both x86 and ARM, and two out.txt are identical. Does this mean the issue is actually caused by configuration? On Mon, Jul 5, 2010 at 7:22 PM, David Rowe <da...@ro...> wrote: > You could try one of the off-line unit tests, for example in Oslec SVN > user/speedtest dumps the echo canceller output to a text file. Try > running this on an x86 box then your ARM. The two text files should be > identical. This will tell you if Oslec is working OK on your CPU. > > Cheers, > > David > > On Mon, 2010-07-05 at 18:02 -0400, Ye Liu wrote: >> I did more experiment this afternoon, I found that once I removed the >> ECHO_CAN_USE_RX_HPF option when calling oslec_create(), the noise went >> away. >> >> However, echo was still there, looked like the echo cancellation >> algorithm never worked. I had printk here and there in oslec_update(), >> this function had been called for sure, but I don't know what to look >> at... >> >> >> On Mon, Jul 5, 2010 at 11:40 AM, Ye Liu <ja...@gm...> wrote: >> > One additional question: my hardware has 2 pstn ports and the arm cpu >> > is running on 532 MHz, is the cpu speed fast enough for oslec? >> > >> > On Mon, Jul 5, 2010 at 11:06 AM, Tzafrir Cohen <tza...@xo...> wrote: >> >> On Mon, Jul 05, 2010 at 10:52:24AM -0400, Ye Liu wrote: >> >>> Hi David, >> >>> >> >>> I checked the endianness of echo.ko and dahdi_echocan_oslec, they were >> >>> both elf32-littlearm, so same as x86. I compiled oslec without MMX or >> >>> any other options. I'm going to attach my dahdi/system.conf and >> >>> asterisk/chan_dahdi.conf to the end of this email. >> >>> >> >>> Have you heard any successful examples of running oslec on ARM 11? I >> >>> can barely find information about oslec on arm via google... >> >> >> >> I used it with a SheevaPlug (Marvell Kirkwood. ARMV5?). Worked fine, >> >> IIRC. >> >> >> >> -- >> >> Tzafrir Cohen >> >> icq#16849755 jabber:tza...@xo... >> >> +972-50-7952406 mailto:tza...@xo... >> >> http://www.xorcom.com iax:gu...@lo.../tzafrir >> >> >> >> ------------------------------------------------------------------------------ >> >> This SF.net email is sponsored by Sprint >> >> What will you do first with EVO, the first 4G phone? >> >> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >> >> _______________________________________________ >> >> freetel-oslec mailing list >> >> fre...@li... >> >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec >> >> >> > >> > >> > >> > -- >> > Ye Liu (AKA @jaux) >> > >> > http://jaux.net >> > >> >> >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > freetel-oslec mailing list > fre...@li... > https://lists.sourceforge.net/lists/listinfo/freetel-oslec > -- Ye Liu (AKA @jaux) http://jaux.net |
From: David R. <da...@ro...> - 2010-07-05 23:22:07
|
You could try one of the off-line unit tests, for example in Oslec SVN user/speedtest dumps the echo canceller output to a text file. Try running this on an x86 box then your ARM. The two text files should be identical. This will tell you if Oslec is working OK on your CPU. Cheers, David On Mon, 2010-07-05 at 18:02 -0400, Ye Liu wrote: > I did more experiment this afternoon, I found that once I removed the > ECHO_CAN_USE_RX_HPF option when calling oslec_create(), the noise went > away. > > However, echo was still there, looked like the echo cancellation > algorithm never worked. I had printk here and there in oslec_update(), > this function had been called for sure, but I don't know what to look > at... > > > On Mon, Jul 5, 2010 at 11:40 AM, Ye Liu <ja...@gm...> wrote: > > One additional question: my hardware has 2 pstn ports and the arm cpu > > is running on 532 MHz, is the cpu speed fast enough for oslec? > > > > On Mon, Jul 5, 2010 at 11:06 AM, Tzafrir Cohen <tza...@xo...> wrote: > >> On Mon, Jul 05, 2010 at 10:52:24AM -0400, Ye Liu wrote: > >>> Hi David, > >>> > >>> I checked the endianness of echo.ko and dahdi_echocan_oslec, they were > >>> both elf32-littlearm, so same as x86. I compiled oslec without MMX or > >>> any other options. I'm going to attach my dahdi/system.conf and > >>> asterisk/chan_dahdi.conf to the end of this email. > >>> > >>> Have you heard any successful examples of running oslec on ARM 11? I > >>> can barely find information about oslec on arm via google... > >> > >> I used it with a SheevaPlug (Marvell Kirkwood. ARMV5?). Worked fine, > >> IIRC. > >> > >> -- > >> Tzafrir Cohen > >> icq#16849755 jabber:tza...@xo... > >> +972-50-7952406 mailto:tza...@xo... > >> http://www.xorcom.com iax:gu...@lo.../tzafrir > >> > >> ------------------------------------------------------------------------------ > >> This SF.net email is sponsored by Sprint > >> What will you do first with EVO, the first 4G phone? > >> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > >> _______________________________________________ > >> freetel-oslec mailing list > >> fre...@li... > >> https://lists.sourceforge.net/lists/listinfo/freetel-oslec > >> > > > > > > > > -- > > Ye Liu (AKA @jaux) > > > > http://jaux.net > > > > > |