[Numpy-discussion] weighted mean; weighted standard error of the mean (sem)

josef.pktd@gmai... josef.pktd@gmai...
Thu Sep 9 22:44:06 CDT 2010


On Thu, Sep 9, 2010 at 11:32 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Thu, Sep 9, 2010 at 8:07 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> On Thu, Sep 9, 2010 at 7:22 PM, cpblpublic <cpblpublic+numpy@gmail.com> wrote:
>>> I am looking for some reaally basic statistical tools. I have some
>>> sample data, some sample weights for those measurements, and I want to
>>> calculate a mean and a standard error of the mean.
>>
>> How about using a bootstrap?
>>
>> Array and weights:
>>
>>>> a = np.arange(100)
>>>> w = np.random.rand(100)
>>>> w = w / w.sum()
>>
>> Initialize:
>>
>>>> n = 1000
>>>> ma = np.zeros(n)
>>
>> Save mean of each bootstrap sample:
>>
>>>> for i in range(n):
>>   ....:     idx = np.random.randint(0, 100, 100)
>>   ....:     ma[i] = np.dot(a[idx], w[idx])
>>   ....:
>>   ....:
>>
>> Error in mean:
>>
>>>> ma.std()
>>   3.854023384833674
>>
>> Sanity check:
>>
>>>> np.dot(w, a)
>>   49.231127299096954
>>>> ma.mean()
>>   49.111478821225127
>>
>> Hmm...should w[idx] be renormalized to sum to one in each bootstrap sample?
>
> Or perhaps there is no uncertainty about the weights, in which case:
>
>>> for i in range(n):
>   ....:     idx = np.random.randint(0, 100, 100)
>   ....:     ma[i] = np.dot(a[idx], w)
>   ....:
>   ....:
>>> ma.std()
>   3.2548815339711115

or maybe `w` reflects an underlying sampling scheme and you should
sample in the bootstrap according to w ?

if weighted average is a sum of linear functions of (normal)
distributed random variables, it still depends on whether the
individual observations have the same or different variances, e.g.
http://en.wikipedia.org/wiki/Weighted_mean#Statistical_properties

What I can't figure out is whether if you assume simga_i = sigma for
all observation i, do we use the weighted or the unweighted variance
to get an estimate of sigma. And I'm not able to replicate with simple
calculations what statsmodels.WLS gives me.

???

Josef


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list