[AstroPy] anyone using Pandas?

Thøger Rivera-Thorsen trive@astro.su...
Mon Dec 17 18:41:04 CST 2012


OK that didn't turn out very clearly; I'll try again:

On 12/17/2012 11:59 PM, Wolfgang Kerzendorf wrote:
> So this doesn't work for me:
> The following should have many more entries
>
>   In [26]: myplasma.levels_data.ix[[14, 26]]
> Out[26]:
>                                           level_id        energy  g metastable
> atomic_number ion_number level_number
> 14            0          14            1400000014  9.392722e-12  3      False
>                           26            1400000026  9.943114e-12  9      False

Using integers for index is interpreted as a Numpy-like position 
slicing, not a pandas label-based slicing. That is; 
myplasma.levels_data.ix[[14, 26]] will return the 13th and 25th row in 
your DataFrame, while myplasma.levels_data.ix[['14', '26']] should 
return the sub-frame with the atomic numbers 14 and 26, as you expected.





> In [27]: myplasma.levels_data.ix[14]
> Out[27]:
> <class 'pandas.core.frame.DataFrame'>
> MultiIndex: 752 entries, (0, 0) to (3, 51)
> Data columns:
> level_id      752  non-null values
> energy        752  non-null values
> g             752  non-null values
> metastable    752  non-null values
> dtypes: bool(1), float64(1), int64(2
>
> In [28]: myplasma.levels_data.ix[26]
> Out[28]:
> <class 'pandas.core.frame.DataFrame'>
> MultiIndex: 2486 entries, (0, 0) to (3, 275)
> Data columns:
> level_id      2486  non-null values
> energy        2486  non-null values
> g             2486  non-null values
> metastable    2486  non-null values
> dtypes: bool(1), float64(1), int64(2)
> ------
> What I expect is 752+2486 entries which is not the case.
> I'm on pandas 0.9.1 maybe that's the problem.
>
> My approach seems to work as well so it's not crucial.
>
> Cheers
>     Wolfgang
>
> On 2012-12-17, at 5:49 PM, Thøger Rivera-Thorsen <trive@astro.su.se> wrote:
>
>> On 12/17/2012 08:48 PM, Wolfgang Kerzendorf wrote:
>>> Hi Thoger,
>>>
>>> Thanks, I maybe didn't phrase my question properly. I already did know how to do a single slice, what I wanted to know is to slice with multiple atom numbers
>>>
>>> I tried df.xs([1,3], level=0) which doesn't do it (same with xs).
>> For me, with the dataframe from earlier, MyDF.ix[['1', '3']]['Nrg_lvl']  works just fine.
>> Note, that there's a subtle difference: MyDF.ix[['1', '3']] will slice the two row'wise indices '1' and '3', while MyDF.ix[('1', '3')] will slice the row-wise index '1', column-wise index '3'.
>>
>>
>>
>>> In numpy I can slice with multiple indices by doing array[[1,3,4,5]] - I would like to do the same with Pandas however just using the first index. Here's how I do it now - but maybe there's a better way:
>>>
>>> levels_atom_filter = atom_data.levels_data['atomic_number'].isin([1,3,4,5])
>>> levels_data = atom_data.levels_data[levels_atom_filter]
>>>
>>>
>>> As for the second question, I now figured that group by can do that.
>>>
>>> Cheers
>>>     Wolfgang
>>> On 2012-12-17, at 11:09 AM, Thøger Rivera-Thorsen <trive@astro.su.se> wrote:
>>>
>>>> As for the first question: You simply use
>>>>
>>>>     MyDF.xs('1')
>>>>
>>>> (level=0 is the default). That, or
>>>>
>>>>     MyDF.ix['1']
>>>>
>>>> if you want to also be able to change/set values (the .xs method only lets you read values).
>>>>
>>>> As for the second question, the .ix[] and .xs() methods slices index-wise, while column selection can be done with a simple [] like if you were slicing a Python list or 1D NumPy array:
>>>>
>>>> In [46]: MyDF.ix['1']['Nrg_lvl']
>>>>
>>>> Out[46]:
>>>>
>>>> levels
>>>>
>>>> a         0.922009
>>>>
>>>> b         0.575255
>>>>
>>>> c         0.341259
>>>>
>>>> Name: Nrg_lvl
>>>>
>>>> In [47]: MyDF.xs('1')['Nrg_lvl']
>>>>
>>>> Out[47]:
>>>>
>>>> levels
>>>>
>>>> a         0.922009
>>>>
>>>> b         0.575255
>>>>
>>>> c         0.341259
>>>>
>>>> Name: Nrg_lvl
>>>>
>>>> You can access the index[ by the .index parameter:
>>>>
>>>> MyDF.index
>>>>
>>>> Out[73]:
>>>>
>>>> MultiIndex
>>>>
>>>> [(1, a), (1, b), (1, c), (2, a), (2, b), (2, c), (3, a), (3, b), (3, c), (3, d), (3, e)]
>>>>
>>>> As you can see, the MultiIndex is essentially a list of tuples, so you can cycle through the list and
>>>> for each tuple select the relevant entry and use for whatever you want.
>>>>
>>>> In general, I'll recommend that you take some time doing what I have done: Read the Pandas documentation - it's quite long, but it's full of examples and very comprehensive, and you can read only the chapters that seem to be relevant for what you are doing right now. Have an iPython interface of your choice open while you do it, and run the examples and maybe try out different variations of them - it gives a feeling of how it works far beyond what any e-mail explanations can provide. Also, I have learned a lot from simply using IPython's Tab-autocompletion - it tells what methods are available for an object and gives an idea about which ones could be interesting.
>>>> Of course, there'll still be questions to ask, but you'll get many questions answered quickly that way.
>>>>
>>>> But before doing that, I think this video (if you haven't already watched it) is a quite thorough introduction without being too long and tehnical:
>>>>
>>>> http://www.youtube.com/watch?v=MxRMXhjXZos
>>>>
>>>>
>>>> Cheers
>>>> Thøger
>>>>
>>>>
>>>> On 12/16/2012 11:06 PM, Wolfgang Kerzendorf wrote:
>>>>> Hi Thoger,
>>>>>
>>>>> Thanks for your suggestions. I'm using multi indexed levels now - there's still a couple of questions I have:
>>>>>
>>>>> How can I filter by atoms - in your example only have a data frame with 1s in them?
>>>>>
>>>>> How can I loop through one of the index - so get 1 and the a list of (0.052846  0.533835  0.185949, …)
>>>>>
>>>>> Thanks for helping,
>>>>>       Wolfgang
>>>>>
>>>>>
>>>>> How can I get back all the
>>>>> On 2012-12-15, at 6:44 AM, Thøger Rivera-Thorsen <trive@astro.su.se> wrote:
>>>>>
>>>>>> Hi;
>>>>>>
>>>>>> It still depends. Your MultiIndex can be made from lists, tuples or
>>>>>> arrays; I usually use arrays since I'm familiar with them from using
>>>>>> NumPy.  For a 3-level index, you need to have 3 1D-arrays of equal
>>>>>> length, that civer alle the atomic nr. - Ion - Level combinations you
>>>>>> want covered. Toy example; 3 atoms with three levels each, atoms denoted
>>>>>> by number and levels by letters. Let's say the two first atoms have 3,
>>>>>> and your third atom five levels. Then your arrays should be:
>>>>>> atoms = sp.array(['1', '1', '1', '2', '2', '2', '3', '3', '3', '3', '3'])
>>>>>> levels  = sp.array(['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'e'])
>>>>>>
>>>>>> idx = pd.MultiIndex.from_arrays([atoms, levels], names=['atoms', 'levels'])
>>>>>>
>>>>>> Your data should then be a 11x2 array with energy and g as the columns.
>>>>>> You then build the array as:
>>>>>>
>>>>>> MyDF = pd.DataFrame(dataarray, index=idx, columns=['Nrg_lvl', 'g.'])
>>>>>>
>>>>>>
>>>>>> That gives:
>>>>>>
>>>>>> In [22]: print MyDF
>>>>>>
>>>>>>                 Nrg_lvl        g.
>>>>>>
>>>>>> atoms levels
>>>>>>
>>>>>> 1     a       0.052846  0.533835
>>>>>>
>>>>>>        b       0.185949  0.064069
>>>>>>
>>>>>>        c       0.384630  0.646803
>>>>>>
>>>>>> 2     a       0.835958  0.392594
>>>>>>
>>>>>>        b       0.016399  0.165862
>>>>>>
>>>>>>        c       0.300874  0.975590
>>>>>>
>>>>>> 3     a       0.124640  0.815488
>>>>>>
>>>>>>        b       0.590613  0.749555
>>>>>>
>>>>>>        c       0.284481  0.299149
>>>>>>
>>>>>>        d       0.104408  0.723406
>>>>>>
>>>>>>        e       0.733087  0.730055
>>>>>>
>>>>>>
>>>>>> (Here I have just used random numbers for the data).
>>>>>>
>>>>>>  From here, you can do all the Pandas goodness you want:
>>>>>>
>>>>>> In [23]: print MyDF.xs('b', level=1)
>>>>>>
>>>>>>          Nrg_lvl        g.
>>>>>>
>>>>>> atoms
>>>>>>
>>>>>> 1      0.185949  0.064069
>>>>>>
>>>>>> 2      0.016399  0.165862
>>>>>>
>>>>>> 3      0.590613  0.749555
>>>>>>
>>>>>>
>>>>>> Once you have the basic structure built, it is easy to extend either by
>>>>>> concatenating or by using the set_value function. However, most of these
>>>>>> operations are returning a modified copy rather than in-place
>>>>>> operations, so better build as much of the DataFrame as possible in one
>>>>>> go, if you're concerned about memory overhead. Åandas doesn't seem to be
>>>>>> designed to gradually build its data structures.
>>>>>>
>>>>>> Cheers;
>>>>>>
>>>>>> Thøger
>>>>>>
>>>>>>
>>>>>> On 12/15/2012 05:52 AM, Wolfgang Kerzendorf wrote:
>>>>>>> Sorry before this didn't go to the list:
>>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I'm slowly understanding better what pandas is about. The object I'm representing in a pandas data frame is an atomic database.
>>>>>>> Each line in there has atomic_number, ion_number, level_number, energy, g. So I have created a Pandas dataFrame and then set the index to atomic_number, ion_number, level_number. Now I want to make a new DataFrame where atomic_number in (6, 7, 8, 9) - but it is an index. how do I do that?
>>>>>>>
>>>>>>> Cheers
>>>>>>>     W
>>>>>>> On 2012-12-14, at 8:15 PM, Thøger Rivera-Thorsen <trive@astro.su.se> wrote:
>>>>>>>
>>>>>>>> Like Tyler said, can you be a bit more specific about what you want to
>>>>>>>> obtain?
>>>>>>>> What is your starting point, and where do you want to go from there?
>>>>>>>>
>>>>>>>> I've been looking quite a bit into multiindexing lately, and it is very
>>>>>>>> handy but it does have some caveats.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/14/2012 09:04 PM, Wolfgang Kerzendorf wrote:
>>>>>>>>> Hey guys,
>>>>>>>>>
>>>>>>>>> I'm trying to play around with pandas. Currently I have a look at join and it always seems to copy the data. I believe I want to use advanced indexing, but am not quite sure how to do that: any Pandas experts here?
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>     Wolfgang
>>>>>>>>> _______________________________________________
>>>>>>>>> AstroPy mailing list
>>>>>>>>> AstroPy@scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>>>>>> _______________________________________________
>>>>>>>> AstroPy mailing list
>>>>>>>> AstroPy@scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>>>>> _______________________________________________
>>>>>>> AstroPy mailing list
>>>>>>> AstroPy@scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>>>> _______________________________________________
>>>>>> AstroPy mailing list
>>>>>> AstroPy@scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/astropy



More information about the AstroPy mailing list