Hi Everyone,
a beginner's question on how to perform some data substitution efficiently.
I have a panel dataset, or in other words x individuals observed over a
certain time span. For each column or individual, I need to substitute a
certain value anytime a certain condition is satisfied. Both the condition
and the value to be substituted into the panel dataset are individual
specific. I can tackle the fact that the condition is individual specific
but I cannot find a way to tackle the fact that the value to be substituted
is individual specific without using a for – lop. Frankly, considering the
size of the dataset the use of a for loop is perfectly acceptable in terms
of the time needed to complete task but still it would be nice to learn a
way to do this (a task I implement often) in a more efficient way.
Thanks in advance
Cristiano
import numpy as np
from copy import deepcopy
Data = np.array([[0,4,0],
[2,5,7],
[2,5,6]])
EditedData = deepcopy(Data)
Condition = np.array([0, 5, 6]) # individual-specific condition
SubstituteData = np.array([1, 10,100])
# The logic here
# if the value of any obssrvation for the 1st individual is 0, substitute 1,
# the 2nd individual is 5, substitute 10
# the 3rd individual is 6, substitute
100
# This wouldn't a problem if SubstituteData was not individual specific Data
# eg EditedData[Data==Condition] = 555
# As SubstituteData is individual specifc, I need to use a for loop
for i in range(np.shape(EditedData)[1]):
TempData = EditedData[:, i] # I introduce TempData to increase
readability
TempData[TempData == Condition[i]] = SubstituteData[i]
EditedData[:, i] = TempData
print EditedData
