[Numpy-discussion] datarray repositories have diverged

Fernando Perez fperez.net@gmail....
Fri Oct 8 21:21:13 CDT 2010


Hi Rob, Josh and Lluis,

On Thu, Sep 30, 2010 at 9:41 AM, Rob Speer <rspeer@mit.edu> wrote:
> The fact that I wasn't around for the sprint probably has a lot to do
> with how much the code had diverged. But it's not too bad -- I merged
> Fernando's branch into mine and only had to change a couple of things
> to make the tests pass.
>
> There seem to be two general patterns for decentralized projects on
> GitHub: either you have one de facto leader who owns what everyone
> considers the main branch (this is what datarray is doing now, with
> Fernando as the leader), or you create a GitHub "organization" that
> owns the main branch and make a bunch of key people members of the
> organization (which is what numpy is doing).
>
> The way you'd usually get something merged in this kind of project is
> to send a pull request to the leader using the "Pull Request" button.
> But in this case, I'm basically making my pull request on the mailing
> list, because it's not straightforward enough for a simple pull
> request.

sorry it took a bit longer than originally planned to merge this.
Some of us got together on Wednesday on campus and worked through a
lot of this code (we closed one pull request by Lluis and replied to
the other requesting more work, as it broke many tests).  We didn't
announce an IRC sprint because last time it just proved a bit hard to
be simultaneously productive on-premises and keep a good flow of
activity on IRC for remote contributors, sorry.  Perhaps with more
people available that will be possible later on, but for now it seemed
wiser for us just to push through with what little work we could do.

**Merged changes**

Here's a summary of the things we did merge in from your pull request:

- merged gitignore, thanks.

- your graft change to manifest.in wasn't needed, instead the bug was
that our setup file was missing properly listing the testing package,
and that's the right solution to apply:

  PACKAGES            = ["datarray", "datarray/tests", "datarray/testing"]

Thanks for catching that problem!

- merge readme improvements into readme.txt.  Note that since this is
a pure python project, markdown-formatted readme files are fairly out
of place (they don't render correctly on pypi, for one thing, and our
readme is the same as our long_description field in the setup.py).  So
the top-level readme must remain a reST file.  But we did incorporate
your text, thanks!

As a future note when editing text files, please keep text lines to 80
characters just like code ones.  Diff (and github) are mostly
line-oriented, so it's best to format text files with hard linebreaks
(even if many editors can handle soft linewraps correctly, it's just
not very portable).

- merged your fancy printing support, excellent work!  We did adjust
the tests a little bit so they would pass without named/attribute
support, since that will require more discussion (see below).  But the
main file is in unmodified, and the test changes are tiny.

We made a mini-release v0.0.5 now, with these changes in:

http://fperez.github.com/datarray-doc/0.0.5/index.html

**Workflow**

For future reference, while we were able to manually work through your
pull request, I'd like to suggest that you adopt a more traditional
workflow where a single pull request contains only a "conceptually
atomic" set of changes related to each other. That way it can either
be all merged or discussed and refined until merge more easily.  Your
pull request had work from multiple people, making changes of many
unrelated types (gh-pages, .gitignore, named access, printing,
etc...).  I was able to cherry-pick one or two commits, but by and
large I had to resort to manually copying files out of your repo,
because there were some things that should not be merged, and there
was no easy way to disentangle it.

Here's for example the currently active pull requests on ipython:

http://github.com/ipython/ipython/pulls

some of them are fairly extensive, but each is conceptually atomic, so
each can be studied in isolation and either will require refinement or
will be merged, but nobody is going to have to dissect it into pieces
to commit some and drop others.

I hope this is clear, let me know if you need any more info on this.
Ultimately it's a matter of making the process more efficient for all
involved.

**Changes where further discussion is needed**

There were some things we did *not* merge.  The sphinx extension isn't
needed (we use a different mechanism for gh-pages that is cleaner), so
we just ignored it.

But the important point are the changes to named access support.  I
realize since the conference you've wanted this, but we really would
like to proceed more carefully and implement first, only .axis.name
access.  The top-level access requires a changed __getattr__ method
(which slows down *all* attribute access), and opens the door for name
collisions.  I think the best approach will be to follow the lead of
numpy here: structured arrays offer only access by named key ['name'],
and for plain .name access you need to make them a recarray.  We
should also offer in our base class only the simpler, safer mechanism,
and then we can build one that uses the .name attributes as a subclass
for those who want the convenience and understand the risks and
tradeoffs.

How does this sound to you?  We should make sure we agree on the api
before writing too much more code along these lines.  I realize you've
actually contributed an implementation of this, but I think we need to
make sure it's the right design before merging it in.  We do need to
have this design discussion to find a class that will work for as many
people as possible, so many thanks for starting by offering working
code!

Regards,

f


More information about the NumPy-Discussion mailing list