[Numpy-discussion] Optimize removing nan-values of dataset

David Reed david.reed.c@gmail....
Tue Aug 13 16:32:47 CDT 2013


Hi Thomas,

Your array is Nx6 do you want the nan values replace by the mean of the 2
adjacent elemets by row or by column?


On Tue, Aug 13, 2013 at 2:50 AM, Thomas Goebel <
Thomas.Goebel@th-nuernberg.de> wrote:

> Hi,
>
> i am trying to remove nan-values from an array of shape(40, 6).
> These nan-values at point data[x] should be replaced by the mean
> of data[x-1] and data[x+1] if both values at x-1 and x+1 are not
> nan. The function nan_to_mean (see below) is working but i wonder
> if i could optimize the code.
>
> I thought about something like
>   1. Find all nan values in array:
>      nans = np.isnan(dataarray)
>   2. Check if values before, after nan indice are not nan
>   3. Calculate mean
>
> While using this script for my original dataset of
> shape(63856, 6) it takes 139.343 seconds to run it. And some
> datasets are even bigger. I attached the example_dataset.txt and
> the example.py script.
>
> Thanks for any help,
> Tom
>
> def nan_to_mean(arr):
>     for cnt, value in enumerate(arr):
>         # Check if first value is nan, if so continue
>         if cnt == 0 and np.isnan(value):
>             continue
>         # Check if last value is nan:
>         #     If x-1 value is nan dont do anything!
>         #     If x-1 is float, last value will be value of x-1
>         elif cnt == (len(arr)-1):
>             if np.isnan(value) and not np.isnan(arr[cnt-1]):
>                 arr[cnt] = arr[cnt-1]
>         # If the first values of file are nan ignore them all
>         elif np.isnan(value) and np.isnan(arr[cnt-1]):
>             continue
>         # Found nan value and x-1 value is of type float
>         elif np.isnan(value) and not np.isnan(arr[cnt-1]):
>             # Check if x+1 value is not nan
>             if not np.isnan(arr[cnt+1]):
>                 arr[cnt] = '%.1f' % np.mean((
>                         arr[cnt-1],arr[cnt+1]))
>             # If x+1 value is nan, go to next value
>             else:
>                 for N in xrange(2, 30):
>                     if cnt+N == (len(arr)):
>                         break
>                     elif not np.isnan(arr[cnt+N]):
>                         arr[cnt] = '%.1f' % np.mean(
>                                 (arr[cnt-1], arr[cnt+N]))
>     return arr
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130813/785dd606/attachment.html 


More information about the NumPy-Discussion mailing list