[Numpy-discussion] Column-Specific Conditions and Column-Specific Substitution Values

Warren Weckesser warren.weckesser@enthought....
Tue Mar 23 09:11:58 CDT 2010


Cristiano Fini wrote:
>  
> Hi Everyone,
> a beginner's question on how to perform some data substitution 
> efficiently. I have a panel dataset, or in other words x individuals 
> observed over a certain time span. For each column or individual, I 
> need to substitute a certain value anytime a certain condition is 
> satisfied. Both the condition and the value to be substituted into the 
> panel dataset are individual specific. I can tackle the fact that the 
> condition is individual specific but I cannot find a way to tackle the 
> fact that the value to be substituted is individual specific without 
> using a for – lop. Frankly, considering the size of the dataset the 
> use of a for loop is perfectly acceptable in terms of the time needed 
> to complete task but still it would be nice to learn a way to do this 
> (a task I implement often) in a more efficient way.
> Thanks in advance
> Cristiano
>  
>
> import numpy as np
> from copy import deepcopy
> Data = np.array([[0,4,0],
>                 [2,5,7],
>                 [2,5,6]])
> EditedData = deepcopy(Data)               
> Condition = np.array([0, 5, 6])     # individual-specific condition
> SubstituteData = np.array([1, 10,100])   
> # The logic here
> # if the value of any obssrvation for the 1st individual is 0, 
> substitute 1,
> #                                     the 2nd individual is 5, 
> substitute 10
> #                                     the 3rd individual is 6, 
> substitute 100
>       
> # This wouldn't a problem if SubstituteData was not individual 
> specific Data
> # eg EditedData[Data==Condition] = 555
> # As SubstituteData is individual specifc, I need to use a for loop
> for i in range(np.shape(EditedData)[1]):
>     TempData = EditedData[:, i]  # I introduce TempData to increase 
> readability
>     TempData[TempData == Condition[i]] = SubstituteData[i]
>     EditedData[:, i] = TempData
>    
> print   EditedData
>   

Instead of the loop, you could use:

EditedData = np.choose(Data == Condition, (Data, SubstituteData))


Warren

> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   



More information about the NumPy-Discussion mailing list