[Numpy-discussion] Rebinning numpy array

Olivier Delalleau shish@keba...
Sun Nov 13 11:55:12 CST 2011

```(Sorry for the spam, I should have given more thought to this before

It actually seems to me that using a linear interpolation is not a good
idea, since it will throw out a lot of information if you decrease the
number of bins: to compute the value at time t, it will only use the
closest bins (t_k and t_{k+1} such that t_k < t < t_{k+1}), so that data
stored in many of the bins will not be used at all.
I haven't looked closely at the suggestion from Robert but it may be a
better way to achieve what you want.

-=- Olivier

2011/11/13 Olivier Delalleau <shish@keba.be>

> Also: it seems like you are using values at the boundaries of the bins,
> while I think it would make more sense to compute interpolated values at
> the middle point of a bin. I'm not sure it'll make a big difference
> visually, but it may be more appropriate.
>
> -=- Olivier
>
>
> 2011/11/13 Olivier Delalleau <shish@keba.be>
>
>> Just one thing: numpy.interp says it doesn't check that the x coordinates
>> are increasing, so make sure it's the case.
>>
>> Assuming this is ok, I could still see how you may get some non-smooth
>> behavior: this may be because your spike can either be split between two
>> bins (which "dilutes" it somehow), or be included in a single bin (which
>> would make it stand out more). And as you increase your bin size, you will
>> switch between these two situations.
>>
>> -=- Olivier
>>
>>
>> 2011/11/13 Johannes Bauer <dfnsonfsduifb@gmx.de>
>>
>>> Hi group,
>>>
>>> I have a rather simple problem, or so it would seem. However I cannot
>>> seem to find the right solution. Here's the problem:
>>>
>>> A Geiger counter measures counts in distinct time intervals. The time
>>> intervals are not of constant length. Imaging for example that the
>>> counter would always create a table entry when the counts reach 10. Then
>>> we would have the following bins (made-up data for illustration):
>>>
>>> Seconds         Counts  Len     CPS
>>> 0 - 44          10      44      0.23
>>> 44 - 120        10      76      0.13
>>> 120 - 140       10      20      0.5
>>> 140 - 200       10      60      0.16
>>>
>>> So we have n bins (in this example 4), but they're not equidistant. I
>>> want to rebin samples to make them equidistant. For example, I would
>>> like to rebin into 5 bins of 40 seconds time each. Then the rebinned
>>> example (I calculate by hand so this might contain errors):
>>>
>>> 0-40            9.09
>>> 40-80           5.65
>>> 80-120          5.26
>>> 120-160         13.33
>>> 160-200         6.66
>>>
>>> That means, if a destination bin completely overlaps a source bin, its
>>> complete value is taken. If it overlaps partially, linear interpolation
>>> of bin sizes should be used.
>>>
>>> It is very important that the overall count amount stays the same (in
>>> this case 40, so my numbers seem to be correct, I checked that). In this
>>> example I increased the bin size, but usually I will want to decrease
>>> bin size (even dramatically).
>>>
>>> Now my pathetic attempts look something like this:
>>>
>>> interpolation_points = 4000
>>> xpts = [ time.mktime(x.timetuple()) for x in self.getx() ]
>>>
>>> interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points)
>>> interpolatedy = numpy.interp(interpolatedx, xpts, self.gety())
>>>
>>> self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in
>>> interpolatedx ]
>>> self._yreformatted = interpolatedy
>>>
>>> This works somewhat, however I see artifacts depending on the
>>> destination sample size: for example when I have a spike in the sample
>>> input and reduce the number of interpolation points (i.e. increase
>>> destination bin size) slowly, the spike will get smaller and smaller
>>> (expected behaviour). After some amount of increasing, the spike however
>>> will "magically" reappear. I believe this to be an interpolation
>>> artifact.
>>>
>>> Is there some standard way to get from a non-uniformally distributed bin
>>> distribution to a unifomally distributed bin distribution of arbitrary
>>> bin width?
>>>
>>> Best regards,
>>> Joe
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111113/50ec9ace/attachment.html
```