[IPython-User] embed data in notebooks?

MinRK benjaminrk@gmail....
Tue Jun 5 18:04:30 CDT 2012


I don't disagree that each approach is preferable in different
circumstances, but we have very limited development/maintenance resources
available to this project with lots of exciting work to do, so I don't
think we are going to spend the time (yet, at least) working towards two
parallel solutions to the problem of shipping data with notebooks.  At the
very least, we are going to implement the project UI *first*, before we
decide that it cannot suffice and we add an additional per-notebook
approach.

On Tue, Jun 5, 2012 at 3:22 PM, Thomas Breuel <tmbdev@gmail.com> wrote:

> We did seriously consider the idea of archive file-formats while planning
>> the notebook format, but we decided (largely from the perspective of VCS,
>> etc) that JSON makes much more sense,
>
>
> It's not an either/or choice.  In effect, I'm saying, when encountering a
> zip file in a path, iPython notebooks could behave as if the path element
> had been mounted with fuse-zip.  That way, people could choose whether to
> use zip files or directories, and whether to use one or multiple notebooks
> with separate or shared data.  I expect I would use both kinds of
> representations.
>
> Also, version control systems are learning to deal with zip files anyway
> because other document formats use zip files to solve the same problem
> (text-based documents with embedded binary objects).  Both Mercurial and
> git support this now.
>
>
>> What is unfortunate at this point is that we really haven't developed our
>> project-level UI/APIs yet, they only exist in the planning stages.  I think
>> once you have project as dir/repo, then the benefits of data in the
>> notebook file itself vanish, as the project becomes the unit of
>> sharing/etc.
>>
>
> Sure, when working on research projects, notebooks can live in project
> repositories with no problem, and you usually want them to share data.   I
> use notebooks a lot this way.
>
> When teaching, however, notebooks often need to be moved between
> "projects", emailed, etc. There are lots of little files, most of them used
> only in a single notebook.  Making sure that every notebook has all the
> files it needs becomes a headache.   I often end up having broken notebooks
> because the data that I used to run them has gotten lost, and since
> notebooks don't "fail" and can't be tested, these are even harder to find.
>

> The two situations are just different use cases with different
> requirements.  There is no single "right" solution, and supporting one use
> case doesn't break the other.  (By analogy, we also don't discuss whether
> we should either have global variables or local variables, we have both
> because each kind of scope has its own uses.)
>
> Anyway, there are really two different issues.  Adding nb_open and nb_data
> really would work with JSON or zip transparently and not break anything.
>  Separately, long term, I encourage you to reconsider the decision about
> zip files, and how to support them.  I think the suggestion I made above
> (treating any zip file along a path as if it had been mounted with
> fuse-zip) would allow you to continue development along your preferred path
> (treating everything as directory trees) while still giving people the
> option of working with self-contained files when they need to.
>
> Tom
>
> PS:
>
> I don't think storing binary data in the notebook file itself is worth a
>> new kernel-side API, more than existing systems for b64 data in Python
>> scripts, which will work just as well in the notebook as anywhere else (not
>> that they are great, of course).
>
>
> That's not practical: that is a lot of effort when there are dozens of
> notebooks and files involved, it's impossible to maintain, and the
> resulting notebooks look horrible.  The only way to get reasonably
> self-contained and transportable notebooks right now is to put all the
> shared data on a web server and access it with urlopen, but that has other
> obvious problems.
>

While data-in-code certainly suffers aesthetically (horribly so if it's
lots of binary data), nb_open as proposed offers no improvement at all for
sharing data between notebooks, which is much better served when the
project/dir/repo is the distributed entity.


>
>
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-user/attachments/20120605/23b6eaef/attachment.html 


More information about the IPython-User mailing list