[Numpy-discussion] Sampling from the multivariate normal

Joshua Anthony Reyes joshua.reyes@gmail....
Wed Nov 9 10:40:50 CST 2011


Hi List,

I'm new to Numpy and I'm a little confused about the behavior of 
numpy.random.multivariate_normal(). I'm not sure I'm passing the 
variances correctly. My goal is to sample from a bivariate normal, but 
the kooky behavior shows up when I sample from a univariate distribution.

In short, the multivariate normal function doesn't seem to give me 
values in the appropriate ranges. Here's an example of what I mean.

In[1]: from numpy.random import normal, multivariate_normal

In [2]: normal(100, 100, 10)
Out[2]:
array([ 258.62344586,   70.16378815,   49.46826997,   49.58567724,
         182.68652256,  226.67292034,   92.03801549,   18.2686146 ,
          94.09104313,   80.35697507])

The samples look about right to me. But then when I try to do the same 
using the multivariate_normal, the values it draws look too close to the 
mean.

In [3]: multivariate_normal([100], [[100]], 10)
Out[3]:
array([[ 109.10083984],
        [  97.43526716],
        [ 108.43923772],
        [  97.87345947],
        [ 103.405562  ],
        [ 110.2963736 ],
        [ 103.96445585],
        [  90.58168544],
        [  91.20549222],
        [ 104.4051761 ]])

These values all fall within 10 units of the mean.

In [4]: multivariate_normal([100], [[1000]], 10)
Out[4]:
array([[  62.04304611],
        [ 123.29364557],
        [  83.53840083],
        [  64.67679778],
        [ 127.82433157],
        [  11.3305656 ],
        [  95.4101701 ],
        [ 126.53213908],
        [ 104.68868736],
        [  32.45886112]])

In [5]: normal(100, 1000, 10)
Out[5]:
array([ 1052.93998938, -1254.12576419,   258.75390045,   688.32715327,
           -2.36543806, -1570.54842269,   472.90045029,   394.62908014,
          137.10320437,  1741.85017871])

And just to exaggerate things a little more:

In [6]: multivariate_normal([100], [[10000]], 10).T][0]
Out[6]:
array([ 274.45446694,   85.79359682,  245.03248721,  120.10912405,
         -34.76526896,  134.93446664,   47.6768889 ,   93.34140984,
          80.27244669,  229.64700591])

Whereas
In [7]: normal(100, 10000, 10)
Out[7]:
array([  -554.68666687,   3724.59638363, -14873.55303901,  -3111.22162495,
        -10813.66412901,   4688.81310356,  -9510.92470735, -12689.02667559,
        -10379.01381925,  -4534.60480031])

I'm shocked that I don't get some negative values in Out[4]. And Out[6] 
really ought to have some numbers in the thousands.

I'd totally be willing to believe that I don't understand the 
multivariate normal and/or variance. Can someone tell me whether these 
numbers look sane?

For the bivariate case I do something like this:

means = [100, 100]
variances = [100, 1000]
Sx, Sy = variances
sx, sy = map(sqrt, variances)
cor = 0.7
cov = [[Sx, cor*sx*sy], [cor*sy*sx, Sy]]
draws = 10
samples = multivariate_normal(means, cov, draws)

As mentioned before, the samples are shockingly close to their means.

Thanks,
Josh


More information about the NumPy-Discussion mailing list