[Numpy-discussion] Optimize removing nan-values of dataset

Brett Olsen brett.olsen@gmail....
Wed Aug 14 10:48:50 CDT 2013


The example data/method you've provided doesn't do what you describe.
 E.g., in your example data you have several 2x2 blocks of NaNs.  According
to your description, these should not be replaced (as they all have a
neighbor that is also a NaN).  Your example method, however, replaces them
- in fact, replaces any NaN values that are not in the first or last row or
contiguous with NaNs in the first or last row.

Here's a replacement method that does do what you've described:
def nan_to_mean(data):
        data[1:-1][np.isnan(data[1:-1])] = ((data[:-2] + data[2:]) /
2)[np.isnan(data[1:-1])]
        return data

~Brett


On Tue, Aug 13, 2013 at 1:50 AM, Thomas Goebel <
Thomas.Goebel@th-nuernberg.de> wrote:

> Hi,
>
> i am trying to remove nan-values from an array of shape(40, 6).
> These nan-values at point data[x] should be replaced by the mean
> of data[x-1] and data[x+1] if both values at x-1 and x+1 are not
> nan. The function nan_to_mean (see below) is working but i wonder
> if i could optimize the code.
>
> I thought about something like
>   1. Find all nan values in array:
>      nans = np.isnan(dataarray)
>   2. Check if values before, after nan indice are not nan
>   3. Calculate mean
>
> While using this script for my original dataset of
> shape(63856, 6) it takes 139.343 seconds to run it. And some
> datasets are even bigger. I attached the example_dataset.txt and
> the example.py script.
>
> Thanks for any help,
> Tom
>
> def nan_to_mean(arr):
>     for cnt, value in enumerate(arr):
>         # Check if first value is nan, if so continue
>         if cnt == 0 and np.isnan(value):
>             continue
>         # Check if last value is nan:
>         #     If x-1 value is nan dont do anything!
>         #     If x-1 is float, last value will be value of x-1
>         elif cnt == (len(arr)-1):
>             if np.isnan(value) and not np.isnan(arr[cnt-1]):
>                 arr[cnt] = arr[cnt-1]
>         # If the first values of file are nan ignore them all
>         elif np.isnan(value) and np.isnan(arr[cnt-1]):
>             continue
>         # Found nan value and x-1 value is of type float
>         elif np.isnan(value) and not np.isnan(arr[cnt-1]):
>             # Check if x+1 value is not nan
>             if not np.isnan(arr[cnt+1]):
>                 arr[cnt] = '%.1f' % np.mean((
>                         arr[cnt-1],arr[cnt+1]))
>             # If x+1 value is nan, go to next value
>             else:
>                 for N in xrange(2, 30):
>                     if cnt+N == (len(arr)):
>                         break
>                     elif not np.isnan(arr[cnt+N]):
>                         arr[cnt] = '%.1f' % np.mean(
>                                 (arr[cnt-1], arr[cnt+N]))
>     return arr
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130814/d88e4bf5/attachment.html 


More information about the NumPy-Discussion mailing list