I doubt this is the easy way to do what you want. I'd rather build a
model with known statistical distribution and generate random
variables.
For instante :
1. Generate random gender (0 or 1) with a Bernouilli distribution with
parameter p, the proportion of men.
2. Generate age, according to gender. Try to find data for a given
population and simply draw randomly from it.
3. Generate income (dependent on gender and age) This is the tricky
part, there are many distributions that can model correlation,
(bivariate gumbel, bivariate normal, copulas), but you'll have to
specify a model for the correlation anyway. I doubt a linear
correlation would do...
4. Same problem for outcome.
You'll find a wide array of distributions to generate randomly from in
scipy.stats and in the random module.
Cheers,
David
2006/2/24, Hugo Koopmans <hug...@gm...>:
> Hi there,
>
> I have done some experiments with PyMC. Has been working very well so far=
,
> keep up the good work!!!
>
> Now, I want to use PyMC to generate data to do experiments with missing
> values. Therefore I need to generate toy data first.
> This toydata for example could consist of the following variables:
> age, income and gender and an outcome (e.g. change of buying product X)
> now I would like to have an underlying model to generate data from in whi=
ch
> for instance age and income are correlated in some way and females like t=
he
> product more then males. Also the correlation between income and age is
> stronger for females then for males.
>
> Would this be possible using PyMC? Did anyone do something like this? Sam=
ple
> code would be appriciated very much!
>
> Regards,
>
> hugo
|