[Numpy-discussion] speeding up an array operation

Mag Gam magawake@gmail....
Thu Jul 9 06:14:20 CDT 2009


The problem is the array is very large. We are talking about 200+ million rows.


On Thu, Jul 9, 2009 at 4:41 AM, David Warde-Farley<dwf@cs.toronto.edu> wrote:
> On 9-Jul-09, at 1:12 AM, Mag Gam wrote:
>
>> Here is what I have, which does it 1x1:
>>
>> z={}  #dictionary
>> r=csv.reader(file)
>> for i,row in enumerate(r):
>>  p="/MIT/"+row[1]
>>
>>  if p not in z:
>>    z[p]=0:
>>  else:
>>    z[p]+=1
>>
>>  arr[p]['chem'][z[p]]=tuple(row) #this loads the array 1 x 1
>>
>>
>> I would like to avoid the 1x1 loading, instead I would like to bulk
>> load the array. Lets say load up 5million lines into memory and then
>> push into array. Any ideas on how to do that?
>
>
> Depending on how big your data is, this looks like a job for e.g.
> numpy.loadtxt(), to give you one big array.
>
> Then sort the array on the second column, so that all the rows with
> the same 'p' appear one after the other. Then you can assign slices of
> this big array to be arr[p]['chem'].
>
> David
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list