[Numpy-discussion] Outer join ?

A B python6009@gmail....
Thu Feb 12 09:52:36 CST 2009


This is probably more than I need but I will definitely keep it as
reference. Thank you.

On 2/12/09, bernhard.voigt@gmail.com <bernhard.voigt@gmail.com> wrote:
> You might consider the groupby from the itertools module.
>
> Do you have two keys only? I would prefer grouping on the first
> column. For grouby you need to sort the array after the first column
> then.
>
> from itertools import groupby
> a.sort(order='col1')
>
> # target array: first col are unique dates, second col values for
> key1, third col values for key2
> data = numpy.zeros(len(unique(a['col1'])), dtype=dict(names=['dates',
> 'key1', 'key2'] , types=[long, float, float]))
>
> for i, (date, items) in enumerate(groupby(a, lambda item: item
> ['col1'])):
>     data[i][dates] = date
>     for col1, col2, col3 in items:
>         data[i][col2] = col3
>
> Hope this works! Bernhard
>
> On Feb 12, 6:24 am, A B <python6...@gmail.com> wrote:
>> Hi,
>>
>> I have the following data structure:
>>
>> col1 | col2 | col3
>>
>> 20080101|key1|4
>> 20080201|key1|6
>> 20080301|key1|5
>> 20080301|key2|3.4
>> 20080601|key2|5.6
>>
>> For each key in the second column, I would like to create an array
>> where for all unique values in the first column, there will be either
>> a value or zero if there is no data available. Like so:
>>
>> # 20080101, 20080201, 20080301, 20080601
>>
>> key1 - 4, 6, 5,    0
>> key2 - 0, 0, 3.4, 5.6
>>
>> Ideally, the results would end up in a 2d array.
>>
>> What's the most efficient way to accomplish this? Currently, I am
>> getting a list of uniq col1 and col2 values into separate variables,
>> then looping through each unique value in col2
>>
>> a = loadtxt(...)
>>
>> dates = unique(a[:]['col1'])
>> keys = unique(a[:]['col2'])
>>
>> for key in keys:
>>     b = a[where(a[:]['col2'] == key)]
>>     ???
>>
>> Thanks in advance.
>> ______________________


More information about the Numpy-discussion mailing list