Thread: Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling =?iso-8859-1?q?updated=09patch?=

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thursday 15 February 2007 00:52, Carl Love wrote:

> --- linux-2.6.20-rc1.orig/arch/powerpc/oprofile/Kconfig	2007-01-18 16:43:14.000000000 -0600
> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/Kconfig	2007-02-13 19:04:46.271028904 -0600
> @@ -7,7 +7,8 @@
>  
>  config OPROFILE
>  	tristate "OProfile system profiling (EXPERIMENTAL)"
> -	depends on PROFILING
> +	default m
> +	depends on SPU_FS && PROFILING
>  	help
>  	  OProfile is a profiling system capable of profiling the
>  	  whole system, include the kernel, kernel modules, libraries,

Milton already commented on this being wrong. I think what you want
is
	depends on PROFILING && (SPU_FS = n || SPU_FS)

that should make sure that when SPU_FS=y that OPROFILE can not be 'm'.

> @@ -15,3 +16,10 @@
>  
>  	  If unsure, say N.
>  
> +config OPROFILE_CELL
> +	bool "OProfile for Cell Broadband Engine"
> +	depends on SPU_FS && OPROFILE
> +	default y
> +	help
> +	  OProfile for Cell BE requires special support enabled
> +	  by this option.

You should at least mention that this allows profiling the spus.

> +#define EFWCALL  ENOSYS         /* Use an existing error number that is as
> +				 * close as possible for a FW call that failed.
> +				 * The probability of the call failing is
> +				 * very low.  Passing up the error number
> +				 * ensures that the user will see an error
> +				 * message saying OProfile did not start.
> +				 * Dmesg will contain an accurate message
> +				 * about the failure.
> +				 */

ENOSYS looks wrong though. It would appear to the user as if the oprofile
function in the kernel was not present. I'd suggest EIO, and not use 
an extra define for that.

>  static int
>  rtas_ibm_cbe_perftools(int subfunc, int passthru,
>  		       void *address, unsigned long length)
>  {
>  	u64 paddr = __pa(address);
>  
> -	return rtas_call(pm_rtas_token, 5, 1, NULL, subfunc, passthru,
> -			 paddr >> 32, paddr & 0xffffffff, length);
> +	pm_rtas_token = rtas_token("ibm,cbe-perftools");
> +
> +	if (unlikely(pm_rtas_token == RTAS_UNKNOWN_SERVICE)) {
> +		printk(KERN_ERR
> +		       "%s: rtas token ibm,cbe-perftools unknown\n",
> +		       __FUNCTION__);
> +		return -EFWCALL;
> +	} else {
> +
> +		return rtas_call(pm_rtas_token, 5, 1, NULL, subfunc, 
> +			 passthru, paddr >> 32, paddr & 0xffffffff, length); 
> +	}
>  }

Are you now reading the rtas token every time you call rtas? that seems
like a waste of time.

> +#define size 24
> +#define ENTRIES  (0x1<<8) /* 256 */
> +#define MAXLFSR  0xFFFFFF
> +
> +int initial_lfsr[] =
> +{16777215, 3797240, 13519805, 11602690, 6497030, 7614675, 2328937, 2889445,
> + 12364575, 8723156, 2450594, 16280864, 14742496, 10904589, 6434212, 4996256,
> + 5814270, 13014041, 9825245, 410260, 904096, 15151047, 15487695, 3061843,
> + 16482682, 7938572, 4893279, 9390321, 4320879, 5686402, 1711063, 10176714,
> + 4512270, 1057359, 16700434, 5731602, 2070114, 16030890, 1208230, 15603106,
> + 11857845, 6470172, 1362790, 7316876, 8534496, 1629197, 10003072, 1714539,
> + 1814669, 7106700, 5427154, 3395151, 3683327, 12950450, 16620273, 12122372,
> + 7194999, 9952750, 3608260, 13604295, 2266835, 14943567, 7079230, 777380,
> + 4516801, 1737661, 8730333, 13796927, 3247181, 9950017, 3481896, 16527555,
> + 13116123, 14505033, 9781119, 4860212, 7403253, 13264219, 12269980, 100120,
> + 664506, 607795, 8274553, 13133688, 6215305, 13208866, 16439693, 3320753,
> + 8773582, 13874619, 1784784, 4513501, 11002978, 9318515, 3038856, 14254582,
> + 15484958, 15967857, 13504461, 13657322, 14724513, 13955736, 5695315, 7330509,
> + 12630101, 6826854, 439712, 4609055, 13288878, 1309632, 4996398, 11392266,
> + 793740, 7653789, 2472670, 14641200, 5164364, 5482529, 10415855, 1629108,
> + 2012376, 13661123, 14655718, 9534083, 16637925, 2537745, 9787923, 12750103,
> + 4660370, 3283461, 14862772, 7034955, 6679872, 8918232, 6506913, 103649,
> + 6085577, 13324033, 14251613, 11058220, 11998181, 3100233, 468898, 7104918,
> + 12498413, 14408165, 1208514, 15712321, 3088687, 14778333, 3632503, 11151952,
> + 98896, 9159367, 8866146, 4780737, 4925758, 12362320, 4122783, 8543358,
> + 7056879, 10876914, 6282881, 1686625, 5100373, 4573666, 9265515, 13593840,
> + 5853060, 1188880, 4237111, 15765555, 14344137, 4608332, 6590210, 13745050,
> + 10916568, 12340402, 7145275, 4417153, 2300360, 12079643, 7608534, 15238251,
> + 4947424, 7014722, 3984546, 7168073, 10759589, 16293080, 3757181, 4577717,
> + 5163790, 2488841, 4650617, 3650022, 5440654, 1814617, 6939232, 15540909,
> + 501788, 1060986, 5058235, 5078222, 3734500, 10762065, 390862, 5172712,
> + 1070780, 7904429, 1669757, 3439997, 2956788, 14944927, 12496638, 994152,
> + 8901173, 11827497, 4268056, 15725859, 1694506, 5451950, 2892428, 1434298,
> + 9048323, 13558747, 15083840, 8154495, 15830901, 391127, 14970070, 2451434,
> + 2080347, 10775644, 14599429, 12540753, 4813943, 16140655, 2421772, 12724304,
> + 12935733, 7206473, 5697333, 10328104, 2418008, 13547986, 284246, 1732363,
> + 16375319, 8109554, 16372365, 14346072, 1835890, 13059499, 2442500, 4110674};
> +
> +/*
> + * The hardware uses an LFSR counting sequence to determine when to capture
> + * the SPU PCs.  The SPU PC capture is done when the LFSR sequence reaches the
> + * last value in the sequence.  An LFSR sequence is like a puesdo random
> + * number sequence where each number occurs once in the sequence but the
> + * sequence is not in numerical order.  To reduce the calculation time, a
> + * sequence of 256 precomputed values in the LFSR sequence are stored in a
> + * table.  The nearest precomputed value is used as the initial point from
> + * which to caculate the desired LFSR value that is n from the end of the
> + * sequence.  The lookup table reduces the maximum number of iterations in
> + * the loop from 2^24 to 2^16.
> + */
> +static int calculate_lfsr(int n)
> +{
> +  int i;
> +
> +  int start_lfsr_index;
> +  unsigned int newlfsr0;
> +  unsigned int lfsr = MAXLFSR;
> +  unsigned int binsize = (MAXLFSR+1)/ENTRIES;
> +  unsigned int howmany;
> +
> +  start_lfsr_index = (MAXLFSR - n) / binsize;
> +  lfsr = initial_lfsr[start_lfsr_index];
> +  howmany = (MAXLFSR - n) - (start_lfsr_index * (binsize));
> +
> +  for (i = 2; i < howmany+2; i++) {
> +    newlfsr0 = (((lfsr >> (size - 1 - 0)) & 1) ^
> +		((lfsr >> (size - 1 - 1)) & 1) ^
> +		(((lfsr >> (size - 1 - 6)) & 1) ^
> +		 ((lfsr >> (size - 1 - 23)) & 1)));
> +
> +    lfsr >>= 1;
> +    lfsr = lfsr | (newlfsr0 << (size - 1));
> +  }
> +  return lfsr;
> +}

I agree with Milton that it would be far nicer even to calculate
the value from user space, but since you say that would
violate the oprofile interface conventions, let's not go there.
In order to make this code nicer on the user, you should probably
insert a 'cond_resched()' somewhere in the loop, maybe every
500 iterations or so.

it also looks like there is whitespace damage in the code here.

> +
> +/* This interface allows a profiler (e.g., OProfile) to store
> + * spu_context information needed for profiling, allowing it to
> + * be saved across context save/restore operation.
> + *
> + * Assumes the caller has already incremented the ref count to
> + * profile_info; then spu_context_destroy must call kref_put
> + * on prof_info_kref.
> + */
> +void spu_set_profile_private(struct spu_context * ctx, void * profile_info,
> +			     struct kref * prof_info_kref,
> +			     void (* prof_info_release) (struct kref * kref))
> +{
> +	ctx->profile_private = profile_info;
> +	ctx->prof_priv_kref = prof_info_kref;
> +	ctx->prof_priv_release = prof_info_release;
> +}
> +EXPORT_SYMBOL_GPL(spu_set_profile_private);

I think you don't need the profile_private member here, if you just use
container_of with ctx->prof_priv_kref in all users.

	Arnd <><

Thread: Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling =?iso-8859-1?q?updated=09patch?=

oprofile-list