[Numpy-discussion] ANN: pandas 0.10.0 released
Wes McKinney
wesmckinn@gmail....
Mon Dec 17 11:19:49 CST 2012
hi all,
I'm super excited to announce the pandas 0.10.0 release. This is
a major release including a new high performance file reading
engine with tons of new user-facing functionality as well, a
bunch of work on the HDF5/PyTables integration layer,
much-expanded Unicode support, a new option/configuration
interface, integration with the Google Analytics API, and a wide
array of other new features, bug fixes, and performance
improvements. I strongly recommend that all users get upgraded as
soon as feasible. Many performance improvements made are quite
substantial over 0.9.x, see vbenchmarks at the end of the e-mail.
As of this release, we are no longer supporting Python 2.5. Also,
this is the first release to officially support Python 3.3.
Note: there are a number of minor, but necessary API changes that
long-time pandas users should pay attention to in the What's New.
Thanks to all who contributed to this release, especially Chang
She, Yoval P, and Jeff Reback (and everyone else listed in the
commit log!).
As always source archives and Windows installers are on PyPI.
What's new: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html
Installers: http://pypi.python.org/pypi/pandas
$ git log v0.9.1..v0.10.0 --pretty=format:%aN | sort | uniq -c | sort -rn
246 Wes McKinney
140 y-p
99 Chang She
45 jreback
18 Abraham Flaxman
17 Jeff Reback
14 locojaydev
11 Keith Hughitt
5 Adam Obeng
2 Dieter Vandenbussche
1 zach powers
1 Luke Lee
1 Laurent Gautier
1 Ken Van Haren
1 Jay Bourque
1 Donald Curtis
1 Chris Mulligan
1 alex arsenovic
1 A. Flaxman
Happy data hacking!
- Wes
What is it
==========
pandas is a Python package providing fast, flexible, and
expressive data structures designed to make working with
relational, time series, or any other kind of labeled data both
easy and intuitive. It aims to be the fundamental high-level
building block for doing practical, real world data analysis in
Python.
Links
=====
Release Notes: http://github.com/pydata/pandas/blob/master/RELEASE.rst
Documentation: http://pandas.pydata.org
Installers: http://pypi.python.org/pypi/pandas
Code Repository: http://github.com/pydata/pandas
Mailing List: http://groups.google.com/group/pydata
Performance vs. v0.9.0
======================
Benchmarks from https://github.com/pydata/pandas/tree/master/vb_suite
Ratio < 1 means that v0.10.0 is faster
v0.10.0 v0.9.0 ratio
name
unstack_sparse_keyspace 1.2813 144.1262 0.0089
groupby_frame_apply_overhead 20.1520 337.3330 0.0597
read_csv_comment2 25.3097 363.2860 0.0697
groupbym_frame_apply 75.1554 504.1661 0.1491
frame_iteritems_cached 0.0711 0.3919 0.1815
read_csv_thou_vb 35.2690 191.9360 0.1838
concat_small_frames 12.9019 55.3561 0.2331
join_dataframe_integer_2key 5.8184 21.5823 0.2696
series_value_counts_strings 5.3824 19.1262 0.2814
append_frame_single_homogenous 0.3413 0.9319 0.3662
read_csv_vb 18.4084 46.9500 0.3921
read_csv_standard 12.0651 29.9940 0.4023
panel_from_dict_all_different_indexes 73.6860 158.2949 0.4655
frame_constructor_ndarray 0.0471 0.0958 0.4918
groupby_first 3.8502 7.1988 0.5348
groupby_last 3.6962 6.7792 0.5452
panel_from_dict_two_different_indexes 50.7428 86.4980 0.5866
append_frame_single_mixed 1.2950 2.1930 0.5905
frame_get_numeric_data 0.0695 0.1119 0.6212
replace_fillna 4.6349 7.0540 0.6571
frame_to_csv 281.9340 427.7921 0.6590
replace_replacena 4.7154 7.1207 0.6622
frame_iteritems 2.5862 3.7463 0.6903
series_align_int64_index 29.7370 41.2791 0.7204
join_dataframe_integer_key 1.7980 2.4303 0.7398
groupby_multi_size 31.0066 41.7001 0.7436
groupby_frame_singlekey_integer 2.3579 3.1649 0.7450
write_csv_standard 326.8259 427.3241 0.7648
groupby_simple_compress_timing 41.2113 52.3993 0.7865
frame_fillna_inplace 16.2843 20.0491 0.8122
reindex_fillna_backfill 0.1364 0.1667 0.8181
groupby_multi_series_op 15.2914 18.6651 0.8193
groupby_multi_cython 17.2169 20.4420 0.8422
frame_fillna_many_columns_pad 14.9510 17.5114 0.8538
panel_from_dict_equiv_indexes 25.8427 29.9682 0.8623
merge_2intkey_nosort 19.0755 22.1138 0.8626
sparse_series_to_frame 167.8529 192.9920 0.8697
reindex_fillna_pad 0.1410 0.1617 0.8720
merge_2intkey_sort 44.7863 51.3315 0.8725
reshape_stack_simple 2.6698 3.0502 0.8753
groupby_indices 7.2264 8.2314 0.8779
sort_level_one 4.3845 4.9902 0.8786
sort_level_zero 4.3362 4.9198 0.8814
write_store 16.0587 18.2042 0.8821
frame_reindex_both_axes 0.3726 0.4183 0.8907
groupby_multi_different_numpy_functions 13.4164 15.0509 0.8914
index_int64_intersection 25.3705 28.1867 0.9001
groupby_frame_median 7.7491 8.6011 0.9009
frame_drop_dup_na_inplace 2.6290 2.9155 0.9017
dataframe_reindex_columns 0.3052 0.3372 0.9049
join_dataframe_index_multi 20.5651 22.6893 0.9064
frame_ctor_list_of_dict 101.7439 112.2260 0.9066
groupby_pivot_table 18.4551 20.3184 0.9083
reindex_frame_level_align 0.9644 1.0531 0.9158
stat_ops_level_series_sum_multiple 7.3637 8.0230 0.9178
write_store_mixed 38.2528 41.6604 0.9182
frame_reindex_both_axes_ix 0.4550 0.4950 0.9192
stat_ops_level_frame_sum_multiple 8.1975 8.9055 0.9205
panel_from_dict_same_index 25.7938 28.0147 0.9207
groupby_series_simple_cython 5.1310 5.5624 0.9224
frame_sort_index_by_columns 41.9577 45.1816 0.9286
groupby_multi_python 54.9727 59.0400 0.9311
datetimeindex_add_offset 0.2417 0.2584 0.9356
frame_boolean_row_select 0.2905 0.3100 0.9373
frame_reindex_axis1 2.9760 3.1742 0.9376
stat_ops_level_series_sum 2.3382 2.4937 0.9376
groupby_multi_different_functions 14.0333 14.9571 0.9382
timeseries_timestamp_tzinfo_cons 0.0159 0.0169 0.9397
stats_rolling_mean 1.6904 1.7959 0.9413
melt_dataframe 1.5236 1.6181 0.9416
timeseries_asof_single 0.0548 0.0582 0.9416
frame_ctor_nested_dict_int64 134.3100 142.6389 0.9416
join_dataframe_index_single_key_bigger 15.6578 16.5949 0.9435
stat_ops_level_frame_sum 3.2475 3.4414 0.9437
indexing_dataframe_boolean_rows 0.2382 0.2518 0.9459
timeseries_asof_nan 10.0433 10.6006 0.9474
frame_reindex_axis0 1.4403 1.5184 0.9485
concat_series_axis1 69.2988 72.8099 0.9518
join_dataframe_index_single_key_small 6.8492 7.1847 0.9533
dataframe_reindex_daterange 0.4054 0.4240 0.9562
join_dataframe_index_single_key_bigger 6.4616 6.7578 0.9562
timeseries_timestamp_downsample_mean 4.5849 4.7787 0.9594
frame_fancy_lookup 2.5498 2.6544 0.9606
series_value_counts_int64 2.5569 2.6581 0.9619
frame_fancy_lookup_all 30.7510 31.8465 0.9656
index_int64_union 82.2279 85.1500 0.9657
indexing_dataframe_boolean_rows_object 0.4809 0.4977 0.9662
frame_ctor_nested_dict 91.6129 94.8122 0.9663
stat_ops_series_std 0.2450 0.2533 0.9673
groupby_frame_cython_many_columns 3.7642 3.8894 0.9678
timeseries_asof 10.4352 10.7721 0.9687
series_ctor_from_dict 3.7707 3.8749 0.9731
frame_drop_dup_inplace 3.0007 3.0746 0.9760
timeseries_large_lookup_value 0.0242 0.0248 0.9764
read_table_multiple_date_baseline 1201.2930 1224.3881 0.9811
dti_reset_index 0.6339 0.6457 0.9817
read_table_multiple_date 2600.7280 2647.8729 0.9822
reindex_frame_level_reindex 0.9524 0.9674 0.9845
reindex_multiindex 1.3483 1.3685 0.9853
frame_insert_500_columns 102.1249 103.4329 0.9874
frame_drop_duplicates 19.3780 19.6157 0.9879
reindex_daterange_backfill 0.1870 0.1889 0.9899
stats_rank2d_axis0_average 25.0480 25.2801 0.9908
series_align_left_monotonic 13.1929 13.2558 0.9953
timeseries_add_irregular 22.4635 22.5122 0.9978
read_store_mixed 13.4398 13.4560 0.9988
lib_fast_zip 11.1289 11.1354 0.9994
match_strings 0.3831 0.3833 0.9995
read_store 5.5526 5.5290 1.0043
timeseries_sort_index 22.7172 22.5976 1.0053
timeseries_1min_5min_mean 0.6224 0.6175 1.0079
stats_rank2d_axis1_average 14.6569 14.5339 1.0085
reindex_daterange_pad 0.1886 0.1867 1.0102
timeseries_period_downsample_mean 6.4241 6.3480 1.0120
frame_drop_duplicates_na 19.3303 19.0970 1.0122
stats_rank_average_int 23.3569 22.9996 1.0155
lib_fast_zip_fillna 14.1394 13.8473 1.0211
index_datetime_intersection 17.2626 16.8986 1.0215
timeseries_1min_5min_ohlc 0.7054 0.6891 1.0237
stats_rank_average 31.3440 30.3845 1.0316
timeseries_infer_freq 10.9854 10.6439 1.0321
timeseries_slice_minutely 0.0637 0.0611 1.0418
index_datetime_union 17.9083 17.1640 1.0434
series_align_irregular_string 89.9470 85.1344 1.0565
series_constructor_ndarray 0.0127 0.0119 1.0742
indexing_panel_subset 0.5692 0.5214 1.0917
groupby_apply_dict_return 46.3497 42.3220 1.0952
reshape_unstack_simple 3.2901 2.9089 1.1310
timeseries_to_datetime_iso8601 4.2305 3.6015 1.1746
frame_to_string_floats 53.6217 37.2041 1.4413
reshape_pivot_time_series 170.4340 107.9068 1.5795
sparse_frame_constructor 6.2714 3.5053 1.7891
datetimeindex_normalize 37.2718 6.9329 5.3761
Columns: test_name | target_duration [ms] | baseline_duration [ms] | ratio
More information about the NumPy-Discussion
mailing list