[SciPy-User] Proposal for a new data analysis toolbox
Wes McKinney
wesmckinn@gmail....
Wed Nov 24 16:04:13 CST 2010
On Wed, Nov 24, 2010 at 12:05 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Wed, Nov 24, 2010 at 4:43 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
>
>> I am not for placing arbitrary restrictions or having a strict
>> enumeration on what goes in this library. I think having a practical,
>> central dumping ground for data analysis tools would be beneficial. We
>> could decide about having "spin-off" libraries later if we think
>> that's appropriate.
>
> I'd like to start small (I've already bitten off more than I can chew)
> by delivering a well thought out (and implemented) small feature set.
> Functions of the form:
>
> sum(arr, axis=None)
> move_sum(arr, window, axis=0)
> group_sum(arr, label, axis)
>
> where sum can be replaced by a long (to be decided) list of functions
> such as std, max, median, etc.
>
> Once that is delivered and gets some use, I'm sure we'll want to push
> into new territory. What do you suggest for the next feature to add?
I have no problem if you would like to develop in this way-- but I
don't personally work well like that. I think having a library with 20
80% solutions would be better than a library with 5 100% solutions. Of
course over time you eventually want to build out those 20 80%
solutions into 100% solutions, but I think that approach is of greater
utility overall.
> So it could be that we are talking about the same end point but are
> thinking about different development models. I cringe at the thought
> of the package becoming a dumping ground.
I find that the best and most useful code gets written (and gets
written fastest) when the person writing it has a concrete problem
they are trying to solve. So if someone comes along and says "I have
problem X", where X lives in the general problem domain we are talking
about, I might say, "Well I've never had problem X but I have no
problem with you writing code to solve it and putting it in my library
for this problem domain". So "dumping ground" here is a bit too
pejorative but you get the idea. Personally if you or someone else
told me "don't put that code here, we are only working on a small set
of features for now" I would be kind of bothered (assuming that the
code was related to the general problem domain).
