linuxsh-shmedia-dev Mailing List for LinuxSH (Page 3)

Brought to you by: kkojima, lethal

linuxsh-shmedia-dev — Linux kernel development for the SHmedia architecture

You can subscribe to this list here.

2002	_Jan	_Feb	_Mar	_Apr	_May (3)	_Jun (11)	_Jul (5)	_Aug	_Sep (1)	_Oct (4)	_Nov (2)	_Dec
2003	_Jan	_Feb	_Mar (18)	_Apr (7)	_May (8)	_Jun (19)	_Jul (16)	_Aug	_Sep	_Oct	_Nov	_Dec
2004	_Jan	_Feb	_Mar	_Apr	_May (8)	_Jun	_Jul (2)	_Aug (1)	_Sep (7)	_Oct	_Nov	_Dec (2)
2005	_Jan (3)	_Feb (2)	_Mar	_Apr	_May (10)	_Jun	_Jul (1)	_Aug (3)	_Sep	_Oct	_Nov (4)	_Dec (1)
2006	_Jan (41)	_Feb (41)	_Mar (1)	_Apr	_May	_Jun	_Jul	_Aug (2)	_Sep	_Oct	_Nov	_Dec
2007	_Jan (1)	_Feb	_Mar	_Apr (1)	_May	_Jun	_Jul (1)	_Aug (1)	_Sep	_Oct (1)	_Nov (1)	_Dec

Flat | Threaded

<< < 1 2 3 4 5 .. 10 > >> (Page 3 of 10)

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: David S. M. <da...@da...> - 2006-01-30 23:03:55

From: Stuart Brady <sd...@nt...>
Date: Mon, 30 Jan 2006 19:50:04 +0000

> Shame about popc on SPARC.  However, ffz is cheese, regardless of pops.
> (On sparc64, ffs is too.)  I'll wait for the generic bitops patches to
> be dealt with (or not) and then submit a patch fixing this if needed.

I'm happy with any improvement you might make here, for sure.

The sparc64 ffz() implementation was done so dog stupid like that
so that the code would be small since this gets inlined all over
the place.

So if you can keep it small and improve it, or make it a bit larger
and uninline it, that's great.

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Stuart B. <sd...@nt...> - 2006-01-30 19:49:19

On Mon, Jan 30, 2006 at 05:06:47PM +0000, Ralf Baechle wrote:
> On Sun, Jan 29, 2006 at 07:12:42AM +0000, Stuart Brady wrote:
> 
> > On MIPS, fls() and flz() should probably use CLO.
> 
> It actually uses clz.

I know.  flz(x) is basically __ilog2(~x), and I still say clo would be
better.  Removing flz() sounds reasonable, though.

> > Curiously, MIPS is the only arch with a flz() function.
> 
> No longer.  The fls implementation was based on flz and fls was the only
> user of flz.  So I cleaned that, once I commit flz will be gone.  Not
> only a cleanup but also a minor optimization.

I'd got that slightly wrong.  Yeah, fls(x) returned flz(~x) + 1, which
is __ilog2(~~x) + 1.  So obviously clz was fine for that, but it needed
cleaning up.

Shame about popc on SPARC.  However, ffz is cheese, regardless of pops.
(On sparc64, ffs is too.)  I'll wait for the generic bitops patches to
be dealt with (or not) and then submit a patch fixing this if needed.

Thanks,
-- 
Stuart Brady

By the way, I really hope nobody gets ten copies of this, as happened
with my last post.  It does not seem to be my fault, AFAICS.

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Ralf B. <ra...@li...> - 2006-01-30 17:12:29

On Sun, Jan 29, 2006 at 07:12:42AM +0000, Stuart Brady wrote:

> On MIPS, fls() and flz() should probably use CLO.

It actually uses clz.

> Curiously, MIPS is the only arch with a flz() function.

No longer.  The fls implementation was based on flz and fls was the only
user of flz.  So I cleaned that, once I commit flz will be gone.  Not
only a cleanup but also a minor optimization.

  Ralf

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: David S. M. <da...@da...> - 2006-01-30 04:03:44

From: Stuart Brady <sd...@nt...>
Date: Sun, 29 Jan 2006 07:12:42 +0000

> There are versions of hweight*() for sparc64 which use POPC when
> ULTRA_HAS_POPULATION_COUNT is defined, but AFAICS, it's never defined.

That's right, the problem here is that no chips actually implement
the instruction in hardware, it is software emulated, ie. useless :-)

[linuxsh-shmedia-dev] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: <mi...@mi...> - 2006-01-30 03:29:38

On Fri, Jan 27, 2006 at 09:51:47PM +0900, Hirokazu Takata wrote:

> Could you tell me more about the new generic {set,clear,test}_bit()
> routines?
> 
> Why do you copied these routines from parisc and employed them
>  as generic ones?
> I'm not sure whether these generic {set,clear,test}_bit() routines
> are really generic or not.

I think it is the most portable implementation.
And I'm trying not to write my own code in this patch set.

> 
> > +/* Can't use raw_spin_lock_irq because of #include problems, so
> > + * this is the substitute */
> > +#define _atomic_spin_lock_irqsave(l,f) do {	\
> > +	raw_spinlock_t *s = ATOMIC_HASH(l);	\
> > +	local_irq_save(f);			\
> > +	__raw_spin_lock(s);			\
> > +} while(0)
> > +
> > +#define _atomic_spin_unlock_irqrestore(l,f) do {	\
> > +	raw_spinlock_t *s = ATOMIC_HASH(l);		\
> > +	__raw_spin_unlock(s);				\
> > +	local_irq_restore(f);				\
> > +} while(0)
> 
> Is there a possibility that these routines affect for archs
> with no HAVE_ARCH_ATOMIC_BITOPS for SMP ?

Currently there is no architecture using this atomic *_bit() routines
on SMP. But it may be the benefit of those who are trying to port Linux.
(See the comment by Theodore Ts'o in include/asm-generic/bitops.h)

> I think __raw_spin_lock() is sufficient and local_irqsave() is 
> not necessary in general atomic routines.

If the interrupt handler also wants to do bit manipilation then
you can get a deadlock between the original caller of *_bit() and the
interrupt handler.

[linuxsh-shmedia-dev] Re: [PATCH 4/6] use include/asm-generic/bitops for each architecture

From: <mi...@mi...> - 2006-01-30 03:15:32

On Fri, Jan 27, 2006 at 10:04:01PM +0900, Hirokazu Takata wrote:

> compile and boot test on m32r: OK

Thanks.

> Code size became a little bigger...  ;-)
> 
> $ size linux-2.6.16-rc1*/vmlinux
>    text    data     bss     dec     hex filename
> 1768030  124412  721632 2614074  27e33a linux-2.6.16-rc1.bitops/vmlinux
> 1755010  124412  721632 2601054  27b05e linux-2.6.16-rc1.org/vmlinux

The only difference I can find is __ffs()/ffz().
As Russel King clealy pointed out, it will generate larger code
than before.

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Stuart B. <sd...@nt...> - 2006-01-29 07:11:56

On Thu, Jan 26, 2006 at 11:03:54PM +0000, Russell King wrote:
> Me too - already solved this problem once.  However, I'd rather not
> needlessly take a step backwards in the name of generic bitops.

Indeed.  However, I think we can actually improve bitops for some
architectures.  Here's what I've found so far:

Versions of Alpha, ARM, MIPS, PowerPC and SPARC have bit counting
instructions which we're using in most cases.  I may have missed some:

  Alpha may have:
    ctlz, CounT Leading Zeros
    cttz, CounT Trailing Zeros

  ARM (since v5) has:
    clz, Count Leading Zeros

  MIPS may have:
    clz, Count Leading Zeros
    clo, Count Leading Ones

  PowerPC has:
    cntlz[wd], CouNT Leading Zeros (for Word/Double-word)

  SPARC v9 has:
    popc, POPulation Count

PA-RISC has none.  I've not checked any others.

The Alpha, ARM and PowerPC functions look fine to me.

On MIPS, fls() and flz() should probably use CLO.  Curiously, MIPS is
the only arch with a flz() function.

On SPARC, the implementation of ffz() appears to be "cheese", and the
proposed generic versions would be better.  ffs() looks quite generic,
and fls() uses the linux/bitops.h implementation.

There are versions of hweight*() for sparc64 which use POPC when
ULTRA_HAS_POPULATION_COUNT is defined, but AFAICS, it's never defined.

The SPARC v9 arch manual recommends using popc(x ^ ~-x) for functions
like ffs().  ffz() would return ffs(~x).

I've had an idea for fls():

static inline int fls(unsigned long x)
{
	x |= x >> 1;
	x |= x >> 2;
	x |= x >> 4;
	x |= x >> 8;
	x |= x >> 16;
	return popc(x);
}

I'm not sure how that compares to the generic fls(), but I suspect it's
quite a bit faster.  Unfortunately, I don't have any MIPS or SPARC v9
hardware to test this on.

I'm not sure if this is of any use:

static inline int __ffs(unsigned long x)
{
	return (int)hweight_long(x ^ ~-x) - 1;
}

The idea being that the generic hweight_long has no branches.
-- 
Stuart Brady

[linuxsh-shmedia-dev] Re: [PATCH 4/6] use include/asm-generic/bitops for each architecture

From: Hirokazu T. <ta...@li...> - 2006-01-27 13:04:17

From: mi...@mi... (Akinobu Mita)
Subject: Re: [PATCH 4/6] use include/asm-generic/bitops for each architecture
Date: Thu, 26 Jan 2006 10:49:34 +0900
> On Wed, Jan 25, 2006 at 08:33:37PM +0900, mita wrote:
> > compile test on i386, x86_64, ppc, sparc, sparc64, alpha
> > boot test on i386, x86_64, ppc
...
>
> o m32r
> 
> - remove __{,test_and_}{set,clear,change}_bit() and test_bit()
> - remove ffz()
> - remove find_{next,first}{,_zero}_bit()
> - remove __ffs()
> - remove fls()
> - remove fls64()
> - remove sched_find_first_bit()
> - remove ffs()
> - remove hweight()
> - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
> - remove ext2_{set,clear}_bit_atomic()
> - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
> - define HAVE_ARCH_ATOMIC_BITOPS
> 

compile and boot test on m32r: OK

Code size became a little bigger...  ;-)

$ size linux-2.6.16-rc1*/vmlinux
   text    data     bss     dec     hex filename
1768030  124412  721632 2614074  27e33a linux-2.6.16-rc1.bitops/vmlinux
1755010  124412  721632 2601054  27b05e linux-2.6.16-rc1.org/vmlinux

-- Takata

[linuxsh-shmedia-dev] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Hirokazu T. <ta...@li...> - 2006-01-27 12:52:09

Hello Mita-san, and folks,

From: mi...@mi... (Akinobu Mita)
Subject: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h
Date: Wed, 25 Jan 2006 20:32:06 +0900
> o generic {,test_and_}{set,clear,change}_bit() (atomic bitops)
> 
> This patch introduces the C-language equivalents of the functions below:
> void set_bit(int nr, volatile unsigned long *addr);
> void clear_bit(int nr, volatile unsigned long *addr);
...
> int test_and_change_bit(int nr, volatile unsigned long *addr);
> 
> HAVE_ARCH_ATOMIC_BITOPS is defined when the architecture has its own
> version of these functions.
> 
> This code largely copied from:
> include/asm-powerpc/bitops.h
> include/asm-parisc/bitops.h
> include/asm-parisc/atomic.h

Could you tell me more about the new generic {set,clear,test}_bit()
routines?

Why do you copied these routines from parisc and employed them
 as generic ones?
I'm not sure whether these generic {set,clear,test}_bit() routines
are really generic or not.

> +/* Can't use raw_spin_lock_irq because of #include problems, so
> + * this is the substitute */
> +#define _atomic_spin_lock_irqsave(l,f) do {	\
> +	raw_spinlock_t *s = ATOMIC_HASH(l);	\
> +	local_irq_save(f);			\
> +	__raw_spin_lock(s);			\
> +} while(0)
> +
> +#define _atomic_spin_unlock_irqrestore(l,f) do {	\
> +	raw_spinlock_t *s = ATOMIC_HASH(l);		\
> +	__raw_spin_unlock(s);				\
> +	local_irq_restore(f);				\
> +} while(0)

Is there a possibility that these routines affect for archs
with no HAVE_ARCH_ATOMIC_BITOPS for SMP ?
I think __raw_spin_lock() is sufficient and local_irqsave() is 
not necessary in general atomic routines.

If the parisc's LDCW instruction required disabling interrupts,
it would be parisc specific and not generic case, I think, 
although I'm not familier with the parisc architecture...

-- Takata

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of

From: John D. A. <da...@hi...> - 2006-01-27 00:29:43

> Yeah, about the same for parisc.
> 
> > Clearly the smallest of the lot with the smallest register pressure,
> > being the best candidate out of the lot, whether we inline it or not.
> 
> Agreed. But I expect parisc will have to continue using it's asm
> sequence and ignore the generic version. AFAIK, the compiler isn't that
> good with instruction nullification and I have other issues I'd
> rather work on.

I looked at the assembler code generated on parisc with 4.1.0 (prerelease).
The toernig code is definitely inferior.  The mita sequence is four
instructions longer than the arm sequence, but it didn't have any branches.
The arm sequence has four branches.  Thus, it's not clear to me which
would perform better in the real world.  There were no nullified instructions
generated for any of the sequences.  However, neither is as good as the
handcraft asm sequence currently being used.

Dave
-- 
J. David Anglin                                  dav...@nr...
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Russell K. <rmk...@ar...> - 2006-01-26 23:04:52

On Thu, Jan 26, 2006 at 04:04:43PM -0700, Grant Grundler wrote:
> On Thu, Jan 26, 2006 at 04:40:21PM +0000, Russell King wrote:
> > Ok, I can see I'm going to lose this, but what the hell.
> 
> Well, we agree. As Richard Henderson just pointed out, parisc
> is among those that can't load large immediate values either.
> 
> > Let's compare the implementations, which are:
> ...
> > int arm_ffs(unsigned long word)
> > {
> >      int k = 31;
> >      if (word & 0x0000ffff) { k -= 16; word <<= 16; }
> >      if (word & 0x00ff0000) { k -= 8;  word <<= 8;  }
> >      if (word & 0x0f000000) { k -= 4;  word <<= 4;  }
> >      if (word & 0x30000000) { k -= 2;  word <<= 2;  }
> >      if (word & 0x40000000) { k -= 1; }
> >      return k;
> > }
> 
> Of those suggested, arm_ffs() is closest to what parisc
> currently has in assembly (see include/asm-parisc/bitops.h:__ffs()).
> But given how unobvious the parisc instruction nullification works,
> the rough equivalent in "C" (untested!) would look something like:
> 
> 	unsigned int k = 31;
> 	if (word & 0x0000ffff) { k -= 16;} else { word >>= 16; }
> 	if (word & 0x000000ff) { k -=  8;} else { word >>= 8; }
> 	if (word & 0x0000000f) { k -=  4;} else { word >>= 4; }
> 	if (word & 0x00000003) { k -=  2;} else { word >>= 2; }
> 	if (word & 0x00000001) { k -=  1;}
> 	return k;
> 
> I doubt that's better for arm but am curious how it compares.
> You have time to try it?

This is essentially the same as arm_ffs():

grundler_ffs:
        mov     r3, r0, asl #16
        mov     r3, r3, lsr #16
        cmp     r3, #0
        moveq   r0, r0, lsr #16
        mov     r3, #31
        movne   r3, #15
        tst     r0, #255
        moveq   r0, r0, lsr #8
        subne   r3, r3, #8
        tst     r0, #15
        moveq   r0, r0, lsr #4
        subne   r3, r3, #4
        tst     r0, #3
        moveq   r0, r0, lsr #2
        subne   r3, r3, #2
        tst     r0, #1
        subne   r3, r3, #1
        mov     r0, r3
        mov     pc, lr

only that the shifts, immediate values and the sense of some of the
conditional instructions have changed.  Therefore, the parisc rough
equivalent looks like it would be suitable for ARM as well.

> > Clearly the smallest of the lot with the smallest register pressure,
> > being the best candidate out of the lot, whether we inline it or not.
> 
> Agreed. But I expect parisc will have to continue using it's asm
> sequence and ignore the generic version. AFAIK, the compiler isn't that
> good with instruction nullification and I have other issues I'd
> rather work on.

Me too - already solved this problem once.  However, I'd rather not
needlessly take a step backwards in the name of generic bitops.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Grant G. <gru...@pa...> - 2006-01-26 22:55:19

On Thu, Jan 26, 2006 at 04:40:21PM +0000, Russell King wrote:
> Ok, I can see I'm going to lose this, but what the hell.

Well, we agree. As Richard Henderson just pointed out, parisc
is among those that can't load large immediate values either.

> Let's compare the implementations, which are:
...
> int arm_ffs(unsigned long word)
> {
>      int k = 31;
>      if (word & 0x0000ffff) { k -= 16; word <<= 16; }
>      if (word & 0x00ff0000) { k -= 8;  word <<= 8;  }
>      if (word & 0x0f000000) { k -= 4;  word <<= 4;  }
>      if (word & 0x30000000) { k -= 2;  word <<= 2;  }
>      if (word & 0x40000000) { k -= 1; }
>      return k;
> }

Of those suggested, arm_ffs() is closest to what parisc
currently has in assembly (see include/asm-parisc/bitops.h:__ffs()).
But given how unobvious the parisc instruction nullification works,
the rough equivalent in "C" (untested!) would look something like:

	unsigned int k = 31;
	if (word & 0x0000ffff) { k -= 16;} else { word >>= 16; }
	if (word & 0x000000ff) { k -=  8;} else { word >>= 8; }
	if (word & 0x0000000f) { k -=  4;} else { word >>= 4; }
	if (word & 0x00000003) { k -=  2;} else { word >>= 2; }
	if (word & 0x00000001) { k -=  1;}
	return k;

I doubt that's better for arm but am curious how it compares.
You have time to try it?
If not, no worries.

> 19 instructions.  2 registers.  0 register based shifts.  More reasonable
> for inlining.

Yeah, about the same for parisc.

> Clearly the smallest of the lot with the smallest register pressure,
> being the best candidate out of the lot, whether we inline it or not.

Agreed. But I expect parisc will have to continue using it's asm
sequence and ignore the generic version. AFAIK, the compiler isn't that
good with instruction nullification and I have other issues I'd
rather work on.

cheers,
grant

[linuxsh-shmedia-dev] Re: [PATCH 1/6] {set,clear,test}_bit() related cleanup

From: Paul J. <pj...@sg...> - 2006-01-26 19:15:23

Pavel wrote:
> cpu_set sounds *very* ambiguous. We have thing called cpusets,

Hmmm ... you're right.  I've worked for quite some time on both
of these, and hadn't noticed this similarity before.

Oh well.  Such is the nature of naming things.  Sometimes nice
names resemble other nice names in unexpected ways.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj...@sg...> 1.925.600.0401

[linuxsh-shmedia-dev] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Richard H. <rt...@tw...> - 2006-01-26 17:31:15

On Thu, Jan 26, 2006 at 05:34:12AM +0100, Edgar Toernig wrote:
> Why shift at all?

Becuase that *is* a valid architecture tuning knob.  Most risc
machines can't AND with arbitrary constants like that, and loading
the constant might bulk things up more than just using the shift.


r~

[linuxsh-shmedia-dev] Re: [PATCH 1/6] {set,clear,test}_bit() related cleanup

From: Russell K. <rmk...@ar...> - 2006-01-26 16:48:23

On Thu, Jan 26, 2006 at 05:14:27PM +0100, Pavel Machek wrote:
> > Index: 2.6-git/include/asm-x86_64/mmu_context.h
> > ===================================================================
> > --- 2.6-git.orig/include/asm-x86_64/mmu_context.h	2006-01-25 19:07:15.000000000 +0900
> > +++ 2.6-git/include/asm-x86_64/mmu_context.h	2006-01-25 19:13:59.000000000 +0900
> > @@ -34,12 +34,12 @@
> >  	unsigned cpu = smp_processor_id();
> >  	if (likely(prev != next)) {
> >  		/* stop flush ipis for the previous mm */
> > -		clear_bit(cpu, &prev->cpu_vm_mask);
> > +		cpu_clear(cpu, prev->cpu_vm_mask);
> >  #ifdef CONFIG_SMP
> >  		write_pda(mmu_state, TLBSTATE_OK);
> >  		write_pda(active_mm, next);
> >  #endif
> > -		set_bit(cpu, &next->cpu_vm_mask);
> > +		cpu_set(cpu, next->cpu_vm_mask);
> >  		load_cr3(next->pgd);
> >  
> >  		if (unlikely(next->context.ldt != prev->context.ldt)) 
> 
> cpu_set sounds *very* ambiguous. We have thing called cpusets, for
> example. I'd not guess that is set_bit in cpu endianity (is it?).

That's a problem for the cpusets folk - cpu_set predates them by a
fair time - it's part of the cpumask API.  See include/linux/cpumask.h

Also, since cpu_vm_mask is a cpumask_t, the above change to me looks
like a bug fix in its own right.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Russell K. <rmk...@ar...> - 2006-01-26 16:41:00

On Thu, Jan 26, 2006 at 09:18:49AM -0700, Grant Grundler wrote:
> On Thu, Jan 26, 2006 at 08:55:41AM +0000, Russell King wrote:
> > Unfortunately that's not correct.  You do not appear to have checked
> > the compiler output like I did - this code does _not_ generate
> > constant shifts.
> 
> Russell,
> By "written stupidly", I thought Richard meant they could have
> used constants instead of "s".  e.g.:
> 	if (word << 16 == 0) { b += 16; word >>= 16); }
> 	if (word << 24 == 0) { b +=  8; word >>=  8); }
> 	if (word << 28 == 0) { b +=  4; word >>=  4); }
> 
> But I prefer what Edgar Toernig suggested.

Ok, I can see I'm going to lose this, but what the hell.

Firstly though, an out of line function call on ARM clobbers six out
of 11 CPU registers.

Let's compare the implementations, which are:

int toernig_ffs(unsigned long word)
{
    int bit = 0;
    word &= -word; // only keep the lsb.
    if (word & 0xffff0000) bit |= 16;
    if (word & 0xff00ff00) bit |=  8;
    if (word & 0xf0f0f0f0) bit |=  4;
    if (word & 0xcccccccc) bit |=  2;
    if (word & 0xaaaaaaaa) bit |=  1;
    return bit;
}

toernig_ffs:
        rsb     r3, r0, #0
        and     r0, r0, r3
        mov     r3, r0, lsr #16
        bic     r2, r0, #16711680
        str     lr, [sp, #-4]!
        mov     r3, r3, asl #16
        ldr     lr, .L7
        ldr     r1, .L7+4
        ldr     ip, .L7+8
        cmp     r3, #0
        bic     r2, r2, #255
        and     lr, r0, lr
        and     r1, r0, r1
        and     ip, r0, ip
        movne   r0, #16
        moveq   r0, #0
        cmp     r2, #0
        orrne   r0, r0, #8
        cmp     r1, #0
        orrne   r0, r0, #4
        cmp     ip, #0
        orrne   r0, r0, #2
        cmp     lr, #0
        orrne   r0, r0, #1
        ldr     pc, [sp], #4
.L8:
        .align  2
.L7:
        .word   -1431655766
        .word   -252645136
        .word   -858993460

25 instructions.  3 words of additional data.  5 registers.  0 register
based shifts.

I feel that this is far too expensive to sanely inline - at least three
words of additional data for a use in a function, and has a high register
usage comparable to that of an out of line function.

int mita_ffs(unsigned long word)
{
     int b = 0, s;
     s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s;
     s =  8; if (word << 24 != 0) s = 0; b += s; word >>= s;
     s =  4; if (word << 28 != 0) s = 0; b += s; word >>= s;
     s =  2; if (word << 30 != 0) s = 0; b += s; word >>= s;
     s =  1; if (word << 31 != 0) s = 0; b += s;
     return b;
}

mita_ffs:
        movs    r1, r0, asl #16
        moveq   r2, #16
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        mov     r3, r2
        movs    r2, r0, asl #24
        moveq   r2, #8
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        movs    r1, r0, asl #28
        add     r3, r3, r2
        moveq   r2, #4
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        movs    r1, r0, asl #30
        add     r3, r3, r2
        moveq   r2, #2
        movne   r2, #0
        mov     r0, r0, lsr r2		@ register-based shift
        tst     r0, #1
        add     r3, r3, r2
        moveq   r2, #1
        movne   r2, #0
        add     r3, r3, r2
        mov     r0, r3
        mov     pc, lr

26 instructions.  4 registers used.  4 unconditional register-based
shifts (expensive).

Better, but uses inefficient register based shifts (which can take twice
as many cycles as non-register based shifts depending on the CPU).  Still
has a high usage on CPU registers though.  Could possibly be a candidate
for inlining.

int arm_ffs(unsigned long word)
{
     int k = 31;
     if (word & 0x0000ffff) { k -= 16; word <<= 16; }
     if (word & 0x00ff0000) { k -= 8;  word <<= 8;  }
     if (word & 0x0f000000) { k -= 4;  word <<= 4;  }
     if (word & 0x30000000) { k -= 2;  word <<= 2;  }
     if (word & 0x40000000) { k -= 1; }
     return k;
}

arm_ffs:
        mov     r3, r0, asl #16
        mov     r3, r3, lsr #16
        cmp     r3, #0
        movne   r0, r0, asl #16
        mov     r3, #31
        movne   r3, #15
        tst     r0, #16711680
        movne   r0, r0, asl #8
        subne   r3, r3, #8
        tst     r0, #251658240
        movne   r0, r0, asl #4
        subne   r3, r3, #4
        tst     r0, #805306368
        movne   r0, r0, asl #2
        subne   r3, r3, #2
        tst     r0, #1073741824
        subne   r3, r3, #1
        mov     r0, r3
        mov     pc, lr

19 instructions.  2 registers.  0 register based shifts.  More reasonable
for inlining.

Clearly the smallest of the lot with the smallest register pressure,
being the best candidate out of the lot, whether we inline it or not.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Nicolas P. <ni...@ca...> - 2006-01-26 16:30:58

On Thu, 26 Jan 2006, Grant Grundler wrote:

> On Thu, Jan 26, 2006 at 08:55:41AM +0000, Russell King wrote:
> > Unfortunately that's not correct.  You do not appear to have checked
> > the compiler output like I did - this code does _not_ generate
> > constant shifts.
> 
> Russell,
> By "written stupidly", I thought Richard meant they could have
> used constants instead of "s".  e.g.:
> 	if (word << 16 == 0) { b += 16; word >>= 16); }
> 	if (word << 24 == 0) { b +=  8; word >>=  8); }
> 	if (word << 28 == 0) { b +=  4; word >>=  4); }
> 
> But I prefer what Edgar Toernig suggested.

It is just as bad on ARM since it requires large constants that cannot 
be expressed with immediate litteral values.  The constant shift 
approach is really the best on ARM.


Nicolas

[linuxsh-shmedia-dev] Re: [PATCH 1/6] {set,clear,test}_bit() related cleanup

From: Pavel M. <pa...@su...> - 2006-01-26 16:15:25

Hi!

> While working on these patch set, I found several possible cleanup
> on x86-64 and ia64.

It is probably not your fault, but...

> Index: 2.6-git/include/asm-x86_64/mmu_context.h
> ===================================================================
> --- 2.6-git.orig/include/asm-x86_64/mmu_context.h	2006-01-25 19:07:15.000000000 +0900
> +++ 2.6-git/include/asm-x86_64/mmu_context.h	2006-01-25 19:13:59.000000000 +0900
> @@ -34,12 +34,12 @@
>  	unsigned cpu = smp_processor_id();
>  	if (likely(prev != next)) {
>  		/* stop flush ipis for the previous mm */
> -		clear_bit(cpu, &prev->cpu_vm_mask);
> +		cpu_clear(cpu, prev->cpu_vm_mask);
>  #ifdef CONFIG_SMP
>  		write_pda(mmu_state, TLBSTATE_OK);
>  		write_pda(active_mm, next);
>  #endif
> -		set_bit(cpu, &next->cpu_vm_mask);
> +		cpu_set(cpu, next->cpu_vm_mask);
>  		load_cr3(next->pgd);
>  
>  		if (unlikely(next->context.ldt != prev->context.ldt)) 

cpu_set sounds *very* ambiguous. We have thing called cpusets, for
example. I'd not guess that is set_bit in cpu endianity (is it?).

								Pavel
-- 
Thanks, Sharp!

[linuxsh-shmedia-dev] Re: [parisc-linux] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Grant G. <gru...@pa...> - 2006-01-26 16:09:28

On Thu, Jan 26, 2006 at 08:55:41AM +0000, Russell King wrote:
> Unfortunately that's not correct.  You do not appear to have checked
> the compiler output like I did - this code does _not_ generate
> constant shifts.

Russell,
By "written stupidly", I thought Richard meant they could have
used constants instead of "s".  e.g.:
	if (word << 16 == 0) { b += 16; word >>= 16); }
	if (word << 24 == 0) { b +=  8; word >>=  8); }
	if (word << 28 == 0) { b +=  4; word >>=  4); }

But I prefer what Edgar Toernig suggested.

grant

[linuxsh-shmedia-dev] Compiler damage

From: Paul M. <le...@li...> - 2006-01-26 15:50:51

gcc 3.2 for sh64 doesn't appear to be able to build the kernel anymore:

	fs/xattr.c: In function `listxattr':
	fs/xattr.c:367: unrecognizable insn:
	(insn 700 697 462 (set (subreg:DI (reg:SI 236) 0)
		(reg:DI 239)) -1 (nil)
	    (nil))
	fs/xattr.c:367: Internal compiler error in get_attr_highpart, at
	insn-attrtab.c:6211
	Please submit a full bug report,
	with preprocessed source if appropriate.
	See <URL:http://www.gnu.org/software/gcc/bugs.html> for instructions.
	make[1]: *** [fs/xattr.o] Error 1
	make: *** [fs] Error 2

What other toolchains are people using now? It looks like binutils isn't
in too bad shape now, and a C targetted version of GCC was buildable not
too long ago.

[linuxsh-shmedia-dev] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Russell K. <rmk...@ar...> - 2006-01-26 08:56:17

On Wed, Jan 25, 2006 at 04:06:18PM -0800, Richard Henderson wrote:
> On Wed, Jan 25, 2006 at 08:02:50PM +0000, Russell King wrote:
> > > +	s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s;
> > > +	s =  8; if (word << 24 != 0) s = 0; b += s; word >>= s;
> > > +	s =  4; if (word << 28 != 0) s = 0; b += s; word >>= s;
> ...
> > Basically, shifts which depend on a variable are more expensive than
> > constant-based shifts.
> 
> Actually, they're all constant shifts.  Just written stupidly.

Unfortunately that's not correct.  You do not appear to have checked
the compiler output like I did - this code does _not_ generate
constant shifts.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

[linuxsh-shmedia-dev] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Edgar T. <fr...@gm...> - 2006-01-26 04:34:37

Richard Henderson wrote:
>
> On Wed, Jan 25, 2006 at 08:02:50PM +0000, Russell King wrote:
> > > +	s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s;
> > > +	s =  8; if (word << 24 != 0) s = 0; b += s; word >>= s;
> > > +	s =  4; if (word << 28 != 0) s = 0; b += s; word >>= s;
> ...
> > Basically, shifts which depend on a variable are more expensive than
> > constant-based shifts.
> 
> Actually, they're all constant shifts.  Just written stupidly.

Why shift at all?

int ffs(u32 word)
{
    int bit = 0;

    word &= -word; // only keep the lsb.

    if (word & 0xffff0000) bit |= 16;
    if (word & 0xff00ff00) bit |=  8;
    if (word & 0xf0f0f0f0) bit |=  4;
    if (word & 0xcccccccc) bit |=  2;
    if (word & 0xaaaaaaaa) bit |=  1;

    return bit;
}

Ciao, ET.

[linuxsh-shmedia-dev] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: <mi...@mi...> - 2006-01-26 02:13:16

On Wed, Jan 25, 2006 at 10:54:43PM +1100, Keith Owens wrote:
> Be very, very careful about using these generic *_bit() routines if the
> architecture supports non-maskable interrupts.
> 
> NMI events can occur at any time, including when interrupts have been
> disabled by *_irqsave().  So you can get NMI events occurring while a
> *_bit fucntion is holding a spin lock.  If the NMI handler also wants
> to do bit manipulation (and they do) then you can get a deadlock
> between the original caller of *_bit() and the NMI handler.
> 
> Doing any work that requires spinlocks in an NMI handler is just asking
> for deadlock problems.  The generic *_bit() routines add a hidden
> spinlock behind what was previously a safe operation.  I would even say
> that any arch that supports any type of NMI event _must_ define its own
> bit routines that do not rely on your _atomic_spin_lock_irqsave() and
> its hash of spinlocks.

At least cris and parisc are using similar *_bit function on SMP.
I will add your advise in comment.

--- ./include/asm-generic/bitops.h.orig	2006-01-26 10:56:00.000000000 +0900
+++ ./include/asm-generic/bitops.h	2006-01-26 11:01:28.000000000 +0900
@@ -50,6 +50,16 @@ extern raw_spinlock_t __atomic_hash[ATOM
  * C language equivalents written by Theodore Ts'o, 9/26/92
  */
 
+/*
+ * NMI events can occur at any time, including when interrupts have been
+ * disabled by *_irqsave().  So you can get NMI events occurring while a
+ * *_bit fucntion is holding a spin lock.  If the NMI handler also wants
+ * to do bit manipulation (and they do) then you can get a deadlock
+ * between the original caller of *_bit() and the NMI handler.
+ *
+ * by Keith Owens
+ */
+
 static __inline__ void set_bit(int nr, volatile unsigned long *addr)
 {
 	unsigned long mask = BITOP_MASK(nr);

[linuxsh-shmedia-dev] Re: [PATCH 4/6] use include/asm-generic/bitops for each architecture

From: <mi...@mi...> - 2006-01-26 01:49:33

On Wed, Jan 25, 2006 at 08:33:37PM +0900, mita wrote:
> compile test on i386, x86_64, ppc, sparc, sparc64, alpha
> boot test on i386, x86_64, ppc
> 

I have fogotten attaching the changes for each archtecture.

o alpha

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- define HAVE_ARCH_FFS_BITOPS

- if defined(__alpha_cix__) and defined(__alpha_fix__)
  - define HAVE_ARCH_FLS_BITOPS
  - define HAVE_ARCH_HWEIGHT_BITOPS
  - define HAVE_ARCH_HWEIGHT64_BITOPS
- else
  - remove fls()
  - remove hweight64()
  - remove hweight{32,16,8}()

- remove fls64()
- remove find_{next,first}{,_zero}_bit()
- define HAVE_ARCH_SCHED_BITOPS
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o arm

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_FIND_BITOPS

- if __LINUX_ARM_ARCH__ < 5
  - remove ffz()
  - remove __ffs()
  - remove fls()
  - remove ffs()
- else (__LINUX_ARM_ARCH__ >= 5)
  - define HAVE_ARCH_FLS_BITOPS
  - define HAVE_ARCH_FFS_BITOPS
  - define HAVE_ARCH___FFS_BITOPS
  - define HAVE_ARCH_FFZ_BITOPS

- remove fls64()
- remove hweight64()
- remove hweight{32,16,8}()
- remove sched_find_first_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove HAVE_ARCH_MINIX_BITOPS

o arm26

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_FIND_BITOPS
- remove ffz()
- remove __ffs()
- remove fls()
- remove fls64()
- remove ffs()
- remove sched_find_first_bit()
- remove hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- define HAVE_MINIX_BITOPS

o cris

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- remove fls()
- remove fls64()
- remove hweight{32,16,8}()
- remove find_{next,first}{,_zero}_bit()
- define HAVE_ARCH_FFS_BITOPS
- remove sched_find_first_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o frv

- remove ffz()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- remove find_{next,first}{,_zero}_bit()
- remove ffs()
- remove __ffs()
- remove fls64()
- remove sched_find_first_bit()
- remove hweight{32,16,8}()
- define HAVE_ARCH_FLS_BITOPS
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- define HAVE_ARCH_MINIX_BITOPS

o h8300

- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- remove ffs()
- remove find_{next,first}{,_zero}_bit()
- remove sched_find_first_bit()
- remove hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- define HAVE_ARCH___FFS_BITOPS
- remove fls()
- remove fls64()

o i386

- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- define HAVE_ARCH_FIND_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- remove fls64()
- remove sched_find_first_bit()
- define HAVE_ARCH_FFS_BITOPS
- remove hweight{32,16,8}()
- define HAVE_ARCH_FLS_BITOPS
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o ia64

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- remove fls64()
- define HAVE_ARCH_FLS_BITOPS
- define HAVE_ARCH_FFS_BITOPS
- define HAVE_ARCH_HWEIGHT_BITOPS
- define HAVE_ARCH_HWEIGHT64_BITOPS
- define HAVE_ARCH_FIND_BITOPS
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove sched_find_first_bit()

o m32r

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- remove ffz()
- remove find_{next,first}{,_zero}_bit()
- remove __ffs()
- remove fls()
- remove fls64()
- remove sched_find_first_bit()
- remove ffs()
- remove hweight()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- define HAVE_ARCH_ATOMIC_BITOPS

o m68k

- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- define HAVE_ARCH_FIND_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH_FFS_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- remove fls64()
- remove sched_find_first_bit()
- remove ffs()
- remove hweight()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_MINIX_BITOPS
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS

o m68knommu

- remove ffs()
- remove __ffs()
- remove sched_find_first_bit()
- remove ffz()
- remove find_{next,first}{,_zero}_bit()
- remove hweight()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- define HAVE_ARCH_EXT2_NON_ATOMIC_BITOPS
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove fls()
- remove fls64()

o mips

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS

- if defined(CONFIG_CPU_MIPS32) or defined(CONFIG_CPU_MIPS64)
  - define HAVE_ARCH___FFS_BITOPS
  - define HAVE_ARCH_FFS_BITOPS
  - define HAVE_ARCH_FFZ_BITOPS
  - define HAVE_ARCH_FLS_BITOPS
- else
  - remove __ffs()
  - remove ffs()
  - remove ffz()
  - remove fls()

- remove fls64()
- remove find_{next,first}{,_zero}_bit()
- remove sched_find_first_bit()
- remove hweight()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o s390

- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- define HAVE_ARCH_FIND_BITOPS
- remove ffs()
- remove fls()
- remove fls64()
- remove hweight()
- remove hweight64()
- define HAVE_ARCH_SCHED_BITOPS
- define HAVE_ARCH_EXT2_NON_ATOMIC_BITOPS
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o sh

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- remove find_{next,first}{,_zero}_bit()
- remove ffs()
- remove hweight()
- remove sched_find_first_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove fls()
- remove fls64()

o sh64

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- remove __ffs()
- remove find_{next,first}{,_zero}_bit()
- remove hweight()
- remove sched_find_first_bit()
- remove ffs()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
- remove fls()
- remove fls64()

o sparc

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- remove ffz()
- remove __ffs()
- remove sched_find_first_bit()
- remove ffs()
- remove fls()
- remove fls64()
- remove hweight{32,16,8}()
- remove find_{next,first}{,_zero}_bit()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- remove ext2_{set,clear}_bit_atomic()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o sparc64

- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_ATOMIC_BITOPS
- remove ffz()
- remove __ffs()
- remove fls()
- remove fls64()
- remove sched_find_first_bit()
- remove ffs()

- if defined(ULTRA_HAS_POPULATION_COUNT)
  - define HAVE_ARCH_HWEIGHT64_BITOPS
  - define HAVE_ARCH_HWEIGHT_BITOPS
- else
  - remove hweight64()
  - remove hweight{32,16,8}()

- define HAVE_ARCH_FIND_BITOPS
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- define HAVE_ARCH_EXT2_NON_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o v850

- remove ffz()
- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- remove find_{next,first}{,_zero}_bit()
- remove ffs()
- remove fls()
- remove fls64()
- remove __ffs()
- remove sched_find_first_bit()
- remove hweight{32,16,8}()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o x86_64

- define HAVE_ARCH_ATOMIC_BITOPS
- define HAVE_ARCH_NON_ATOMIC_BITOPS
- define HAVE_ARCH_FIND_BITOPS
- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- define HAVE_ARCH_FLS_BITOPS
- remove sched_find_first_bit()
- define HAVE_ARCH_FFS_BITOPS
- define HAVE_ARCH_FLS64_BITOPS
- remove hweight{32,16,8}()
- define HAVE_ARCH_FLS_BITOPS
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o xtensa

- remove {,test_and_}{set,clear,change}_bit()
- remove __{,test_and_}{set,clear,change}_bit() and test_bit()
- define HAVE_ARCH_FFZ_BITOPS
- define HAVE_ARCH___FFS_BITOPS
- define HAVE_ARCH_FFS_BITOPS
- define HAVE_ARCH_FLS_BITOPS
- remove fls64()
- remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
- define HAVE_ARCH_EXT2_ATOMIC_BITOPS
- remove hweight{32,16,8}()
- remove sched_find_first_bit()
- remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()

o remove unused generic bitops

[linuxsh-shmedia-dev] Re: [PATCH 3/6] C-language equivalents of include/asm-*/bitops.h

From: Richard H. <rt...@tw...> - 2006-01-26 00:07:00

On Wed, Jan 25, 2006 at 08:02:50PM +0000, Russell King wrote:
> > +	s = 16; if (word << 16 != 0) s = 0; b += s; word >>= s;
> > +	s =  8; if (word << 24 != 0) s = 0; b += s; word >>= s;
> > +	s =  4; if (word << 28 != 0) s = 0; b += s; word >>= s;
...
> Basically, shifts which depend on a variable are more expensive than
> constant-based shifts.

Actually, they're all constant shifts.  Just written stupidly.


r~

1 message has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 2 3 4 5 .. 10 > >> (Page 3 of 10)

2002	Jan	Feb	Mar	Apr	May (3)	Jun (11)	Jul (5)	Aug	Sep (1)	Oct (4)	Nov (2)	Dec
2003	Jan	Feb	Mar (18)	Apr (7)	May (8)	Jun (19)	Jul (16)	Aug	Sep	Oct	Nov	Dec
2004	Jan	Feb	Mar	Apr	May (8)	Jun	Jul (2)	Aug (1)	Sep (7)	Oct	Nov	Dec (2)
2005	Jan (3)	Feb (2)	Mar	Apr	May (10)	Jun	Jul (1)	Aug (3)	Sep	Oct	Nov (4)	Dec (1)
2006	Jan (41)	Feb (41)	Mar (1)	Apr	May	Jun	Jul	Aug (2)	Sep	Oct	Nov	Dec
2007	Jan (1)	Feb	Mar	Apr (1)	May	Jun	Jul (1)	Aug (1)	Sep	Oct (1)	Nov (1)	Dec