[AstroPy] anyone using Pandas?
Thøger Rivera-Thorsen
trive@astro.su...
Mon Dec 17 16:49:36 CST 2012
On 12/17/2012 08:48 PM, Wolfgang Kerzendorf wrote:
> Hi Thoger,
>
> Thanks, I maybe didn't phrase my question properly. I already did know how to do a single slice, what I wanted to know is to slice with multiple atom numbers
>
> I tried df.xs([1,3], level=0) which doesn't do it (same with xs).
For me, with the dataframe from earlier, MyDF.ix[['1', '3']]['Nrg_lvl']
works just fine.
Note, that there's a subtle difference: MyDF.ix[['1', '3']] will slice
the two row'wise indices '1' and '3', while MyDF.ix[('1', '3')] will
slice the row-wise index '1', column-wise index '3'.
> In numpy I can slice with multiple indices by doing array[[1,3,4,5]] - I would like to do the same with Pandas however just using the first index. Here's how I do it now - but maybe there's a better way:
>
> levels_atom_filter = atom_data.levels_data['atomic_number'].isin([1,3,4,5])
> levels_data = atom_data.levels_data[levels_atom_filter]
>
>
> As for the second question, I now figured that group by can do that.
>
> Cheers
> Wolfgang
> On 2012-12-17, at 11:09 AM, Thøger Rivera-Thorsen <trive@astro.su.se> wrote:
>
>> As for the first question: You simply use
>>
>> MyDF.xs('1')
>>
>> (level=0 is the default). That, or
>>
>> MyDF.ix['1']
>>
>> if you want to also be able to change/set values (the .xs method only lets you read values).
>>
>> As for the second question, the .ix[] and .xs() methods slices index-wise, while column selection can be done with a simple [] like if you were slicing a Python list or 1D NumPy array:
>>
>> In [46]: MyDF.ix['1']['Nrg_lvl']
>>
>> Out[46]:
>>
>> levels
>>
>> a 0.922009
>>
>> b 0.575255
>>
>> c 0.341259
>>
>> Name: Nrg_lvl
>>
>> In [47]: MyDF.xs('1')['Nrg_lvl']
>>
>> Out[47]:
>>
>> levels
>>
>> a 0.922009
>>
>> b 0.575255
>>
>> c 0.341259
>>
>> Name: Nrg_lvl
>>
>> You can access the index[ by the .index parameter:
>>
>> MyDF.index
>>
>> Out[73]:
>>
>> MultiIndex
>>
>> [(1, a), (1, b), (1, c), (2, a), (2, b), (2, c), (3, a), (3, b), (3, c), (3, d), (3, e)]
>>
>> As you can see, the MultiIndex is essentially a list of tuples, so you can cycle through the list and
>> for each tuple select the relevant entry and use for whatever you want.
>>
>> In general, I'll recommend that you take some time doing what I have done: Read the Pandas documentation - it's quite long, but it's full of examples and very comprehensive, and you can read only the chapters that seem to be relevant for what you are doing right now. Have an iPython interface of your choice open while you do it, and run the examples and maybe try out different variations of them - it gives a feeling of how it works far beyond what any e-mail explanations can provide. Also, I have learned a lot from simply using IPython's Tab-autocompletion - it tells what methods are available for an object and gives an idea about which ones could be interesting.
>> Of course, there'll still be questions to ask, but you'll get many questions answered quickly that way.
>>
>> But before doing that, I think this video (if you haven't already watched it) is a quite thorough introduction without being too long and tehnical:
>>
>> http://www.youtube.com/watch?v=MxRMXhjXZos
>>
>>
>> Cheers
>> Thøger
>>
>>
>> On 12/16/2012 11:06 PM, Wolfgang Kerzendorf wrote:
>>> Hi Thoger,
>>>
>>> Thanks for your suggestions. I'm using multi indexed levels now - there's still a couple of questions I have:
>>>
>>> How can I filter by atoms - in your example only have a data frame with 1s in them?
>>>
>>> How can I loop through one of the index - so get 1 and the a list of (0.052846 0.533835 0.185949, …)
>>>
>>> Thanks for helping,
>>> Wolfgang
>>>
>>>
>>> How can I get back all the
>>> On 2012-12-15, at 6:44 AM, Thøger Rivera-Thorsen <trive@astro.su.se> wrote:
>>>
>>>> Hi;
>>>>
>>>> It still depends. Your MultiIndex can be made from lists, tuples or
>>>> arrays; I usually use arrays since I'm familiar with them from using
>>>> NumPy. For a 3-level index, you need to have 3 1D-arrays of equal
>>>> length, that civer alle the atomic nr. - Ion - Level combinations you
>>>> want covered. Toy example; 3 atoms with three levels each, atoms denoted
>>>> by number and levels by letters. Let's say the two first atoms have 3,
>>>> and your third atom five levels. Then your arrays should be:
>>>> atoms = sp.array(['1', '1', '1', '2', '2', '2', '3', '3', '3', '3', '3'])
>>>> levels = sp.array(['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'e'])
>>>>
>>>> idx = pd.MultiIndex.from_arrays([atoms, levels], names=['atoms', 'levels'])
>>>>
>>>> Your data should then be a 11x2 array with energy and g as the columns.
>>>> You then build the array as:
>>>>
>>>> MyDF = pd.DataFrame(dataarray, index=idx, columns=['Nrg_lvl', 'g.'])
>>>>
>>>>
>>>> That gives:
>>>>
>>>> In [22]: print MyDF
>>>>
>>>> Nrg_lvl g.
>>>>
>>>> atoms levels
>>>>
>>>> 1 a 0.052846 0.533835
>>>>
>>>> b 0.185949 0.064069
>>>>
>>>> c 0.384630 0.646803
>>>>
>>>> 2 a 0.835958 0.392594
>>>>
>>>> b 0.016399 0.165862
>>>>
>>>> c 0.300874 0.975590
>>>>
>>>> 3 a 0.124640 0.815488
>>>>
>>>> b 0.590613 0.749555
>>>>
>>>> c 0.284481 0.299149
>>>>
>>>> d 0.104408 0.723406
>>>>
>>>> e 0.733087 0.730055
>>>>
>>>>
>>>> (Here I have just used random numbers for the data).
>>>>
>>>> From here, you can do all the Pandas goodness you want:
>>>>
>>>> In [23]: print MyDF.xs('b', level=1)
>>>>
>>>> Nrg_lvl g.
>>>>
>>>> atoms
>>>>
>>>> 1 0.185949 0.064069
>>>>
>>>> 2 0.016399 0.165862
>>>>
>>>> 3 0.590613 0.749555
>>>>
>>>>
>>>> Once you have the basic structure built, it is easy to extend either by
>>>> concatenating or by using the set_value function. However, most of these
>>>> operations are returning a modified copy rather than in-place
>>>> operations, so better build as much of the DataFrame as possible in one
>>>> go, if you're concerned about memory overhead. Åandas doesn't seem to be
>>>> designed to gradually build its data structures.
>>>>
>>>> Cheers;
>>>>
>>>> Thøger
>>>>
>>>>
>>>> On 12/15/2012 05:52 AM, Wolfgang Kerzendorf wrote:
>>>>> Sorry before this didn't go to the list:
>>>>>
>>>>> Hey guys,
>>>>>
>>>>> I'm slowly understanding better what pandas is about. The object I'm representing in a pandas data frame is an atomic database.
>>>>> Each line in there has atomic_number, ion_number, level_number, energy, g. So I have created a Pandas dataFrame and then set the index to atomic_number, ion_number, level_number. Now I want to make a new DataFrame where atomic_number in (6, 7, 8, 9) - but it is an index. how do I do that?
>>>>>
>>>>> Cheers
>>>>> W
>>>>> On 2012-12-14, at 8:15 PM, Thøger Rivera-Thorsen <trive@astro.su.se> wrote:
>>>>>
>>>>>> Like Tyler said, can you be a bit more specific about what you want to
>>>>>> obtain?
>>>>>> What is your starting point, and where do you want to go from there?
>>>>>>
>>>>>> I've been looking quite a bit into multiindexing lately, and it is very
>>>>>> handy but it does have some caveats.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/14/2012 09:04 PM, Wolfgang Kerzendorf wrote:
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I'm trying to play around with pandas. Currently I have a look at join and it always seems to copy the data. I believe I want to use advanced indexing, but am not quite sure how to do that: any Pandas experts here?
>>>>>>>
>>>>>>> Cheers
>>>>>>> Wolfgang
>>>>>>> _______________________________________________
>>>>>>> AstroPy mailing list
>>>>>>> AstroPy@scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>>>> _______________________________________________
>>>>>> AstroPy mailing list
>>>>>> AstroPy@scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>>> _______________________________________________
>>>>> AstroPy mailing list
>>>>> AstroPy@scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>> _______________________________________________
>>>> AstroPy mailing list
>>>> AstroPy@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/astropy
More information about the AstroPy
mailing list