[IPython-dev] magics and metadata

Aaron Meurer asmeurer@gmail....
Wed Jun 20 18:11:54 CDT 2012


On Jun 20, 2012, at 5:04 PM, MinRK <benjaminrk@gmail.com> wrote:



On Wed, Jun 20, 2012 at 3:09 PM, Aaron Meurer <asmeurer@gmail.com> wrote:

> On Jun 20, 2012, at 11:06 AM, Brian Granger <ellisonbg@gmail.com> wrote:
>
> > On Tue, Jun 19, 2012 at 7:49 PM, MinRK <benjaminrk@gmail.com> wrote:
> >>
> >>
> >> On Tue, Jun 19, 2012 at 7:25 PM, Brian Granger <ellisonbg@gmail.com>
> wrote:
> >>>
> >>> On Tue, Jun 19, 2012 at 5:01 PM, MinRK <benjaminrk@gmail.com> wrote:
> >>>>
> >>>>
> >>>> On Tue, Jun 19, 2012 at 4:20 PM, Brian Granger <ellisonbg@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> When the metadata PR come up, I was originally going to vote -1 on it
> >>>>> because of this issue.  I sat on it for a while and in the end
> decided
> >>>>> that it was OK because I think the need for metadata is already upon
> >>>>> us even though we don't have an actual usage case in our own code
> base
> >>>>> (for example, we don't have a metadata UI in the notebook web app).
> >>>>>
> >>>>> There is a fine line to walk here.  On one hand, I completely agree
> >>>>> with you that we should try to future-proof the notebook format to
> >>>>> minimize disruptive format changes.  On the other hand, adding things
> >>>>> too soon leads to even more potential disruption for the following
> >>>>> reason.  As I developed the notebook format and notebook UI last
> >>>>> summer, there were multiple situations where I added something to the
> >>>>> notebook format before I actually used it in the UI.  In many of
> these
> >>>>> cases, when I did get around to developing the UI for it, I realized
> >>>>> that my original thoughts on that element were incomplete.  It wasn't
> >>>>> until I wrote the UI that used the data that I realized exactly what
> >>>>> the format of that data needed to be.  As a result, I had to go back
> >>>>> and modify the notebook format.  After a few iterations of this, I
> >>>>> realized that this approach was broken and started to enforce the
> >>>>> following simple rule on myself: don't add it to the notebook format
> >>>>> until I am ready to write the UI code that uses it.  That rule served
> >>>>> me very well last summer.
> >>>>>
> >>>>> This is why for example the notebook and cells do not currently have
> >>>>> any timestamp information (even though I think we will eventually
> want
> >>>>> it).  The one notebook feature (which I regret adding to the format)
> >>>>> that doesn't have a UI is the multiple worksheets.  We absolutely
> want
> >>>>> that as a feature, I just wish I had waited to add it to the notebook
> >>>>> format.  When we do implement the mulitple worksheet UI, it is likely
> >>>>> we will want to go back and make changes to the notebook format to
> >>>>> better reflect the UI (for example, we will probably want to persist
> >>>>> which worksheet is active/open).
> >>>>
> >>>>
> >>>> I couldn't agree less.  There is simply no reason that adding support
> >>>> for
> >>>> multiple worksheets in future versions of IPython should render
> >>>> single-sheet
> >>>> notebooks unreadable in 0.13, just like adding new metadata should not
> >>>> make
> >>>> the notebook artificially unreadable.
> >>>
> >>> I am not sure I am following you on this.  Are you suggesting that
> >>> 0.14 notebooks (let's say we bump to a v4 nbformat with expanded
> >>> worksheet support) should be readable in 0.13?
> >>
> >>
> >> I think I am saying the opposite - with the current state of 0.13,
> adding
> >> multi-worksheet support to the *javascript* should not result in
> >> incrementing the notebook version.
> >
> > With the current state of the notebook format, I think we can probably
> > pull this off.  So far, the only changes to the notebook format I can
> > imagine will be minor version incrementing ones.
> >
> >>>
> >>>
> >>>>>
> >>>>>
> >>>>> For the cell and worksheet metadata, I knew we would eventually need
> >>>>> it and I didn't want to hold up the beta release any longer.  But
> >>>>> there are still unanswered questions related to it:
> >>>>>
> >>>>> * What types of things go in the metadata?
> >>>>>
> >>>>> * Is this an area for us to write data to, or for advanced users to
> >>>>> write data to?
> >>>>> * Is it entirely unstructured, or will we require a discussion for
> >>>>> each new key/value entry into it.
> >>>>>
> >>>>>
> >>>>> It is not at all clean that the current metadata design will hold up
> >>>>> to our answers of these questions.  But in the end, I sort of wanted
> >>>>> to add the metadata as it is now, so we could being to see how we and
> >>>>> others start to use it.  But just because we added the metadata to
> the
> >>>>> notebook format definitely doesn't mean that future-proofs this part
> >>>>> of the notebook format.
> >>>>>
> >>>>>
> >>>>> Hope this clarifies things a bit.
> >>>>
> >>>>
> >>>> Sure, while it is extremely clear that we need cell metadata, we
> cannot
> >>>> be
> >>>> 100% certain that
> >>>> a simple dict will solve 100% of the cases we encounter.  But adding
> it
> >>>> now
> >>>> means that we have at least a *chance*
> >>>> of making a release that is not backwards-incompatible.
> >>>
> >>> Yes, I agree with this.
> >>>
> >>>>>
> >>>>>
> >>>>> Back to the question of output-level metadata.  When a bit of code
> >>>>> remains unused for almost a year, I start to question whether we
> >>>>> really need it.  I not convinced we don't need it, I am not sure.  In
> >>>>> light of this, I don't think that adding it to the notebook format
> >>>>> makes sense.  When one of us finds a good purpose for this metadata,
> >>>>> let's add it to the nbformat them.
> >>>>
> >>>>
> >>>> I believe the only current use is in the parallel display
> republishing,
> >>>> where the engine ID is added to the display data
> >>>> so that frontends could theoretically draw display data differently
> >>>> based on
> >>>> which engine it came from.
> >>>
> >>> Yes, we have discussed this.  The only other situation where I
> >>> remember thinking about this is if we wanted to use metadata to help a
> >>> frontend interpret JSON display data.  There are numerous reasons code
> >>> might display JSON data, and that code would have to help the frontend
> >>> to know what to do with that data.
> >>>
> >>> Do you think the engine ID idea makes sense to implement or should
> >>> that information just be passed in the formatted display data itself?
> >>> We could also handle by creating a custom JS widget that knows how to
> >>> intelligently display data from multiple engines.
> >>
> >>
> >> Right now I do both since the metadata is totally ignored, but I think
> it's
> >> better to have less markup in the output itself.  It is precisely the
> same
> >> reason we don't embed the rendered prompt in the output of execute
> replies -
> >> frontends have their own way of rendering them (in the prompt column,
> etc.).
> >>  The metadata could be used to do that for parallel results, rather
> than the
> >> current behavior of having fakee prompts in the general output area.
> >
> > OK if you think we want to go this route for displaying the engine
> > IDs, then we should i) keep the display data metadata in the message
> > itself and ii) move towards persisting that information in the
> > nbformat.
> >
> >>>
> >>>>>
> >>>>>
> >>>>> The other philosophical line of reasoning that I am being guided by
> >>>>> here is simplicity.  It would be very easy to over design the
> notebook
> >>>>> format and add all sorts of feature that we might need.  I think this
> >>>>> is a wrong direction to go.  We want a notebook format that is as
> >>>>> compact and minimal as possible, where each and every bit of data is
> >>>>> there for a well-defined and justified reason.
> >>>>
> >>>>
> >>>> I think it's simple: We have had ideas over and over and over again
> for
> >>>> features requiring metadata attached to cells (hashes, links,
> >>>> timestamps,
> >>>> etc.), so this is clearly a feature we have a need for right now.
> >>>
> >>> Yes - maybe I wasn't completely clear.  I do think that having cell
> >>> and worksheet metadata right now does make sense.
> >>>
> >>>>  It would
> >>>> be totally silly for adding timestamps to require updating the
> nbformat
> >>>> in a
> >>>> backward-incompatible way.
> >>>
> >>> And I am definitely not suggesting that it would or should.
> >>>
> >>>>  And the biggest advantage of using json is that
> >>>> adding keys has no effect on backwards *readability*.  It's only
> adding
> >>>> values/types that can cause problems, and should force new versions
> >>>> (e.g.
> >>>> changing worsheet to worksheets, or adding new cell types).
> >>>
> >>> Yes, JSON indeed turned out to be much nicer than XML for this type of
> >>> thing exactly because of this.
> >>>
> >>> But I am wondering what your thought are about newer notebook versions
> >>> being readable by older IPython versions.  I have always thought that
> >>> we would promise that older nbformats would *always* be readable by
> >>> newer IPython versions, but that we would make no promises about newer
> >>> nformats being readable by older IPython versions.  I just want to
> >>> clarify what other people are thinking in this respect.
> >>
> >>
> >> Incrementing the nbformat means making notebooks unreadable in old
> versions,
> >> yes.
> >> This is very painful if we are doing it every six months.  I am only
> trying
> >> to make
> >> reasonable efforts that the current nbformat is prepared for changes we
> >> *know* we intend to make soon,
> >> so that incrementing the nbformat is reserved for changes we don't
> already
> >> have planned, and aren't
> >> already prepared for.
> >> Obviously, if we have a change that we cannot fit into the current
> format,
> >> then we increment.
> >
> > I honestly can't think of any upcoming changes to the notebook format
> > that we have thought about which would require a major version
> > increment like you are talking about.  I think there are lots of minor
> > ones that we can do using minor version increments.  I like the minor
> > versioning scheme we have now as it clarifies our policies on this.
> > So I think overall, the notebook format is pretty future safe for the
> > time being.  I hope we can stick with the 3.x nbformats for a few
> > IPython releases.
>
> I'm curious what the effective difference between a minor version and
> a major version would be to me, the user. Would you try to make minor
> versions backward compatible if possible, either by not putting in new
> keys if they don't need to be there or by somehow trying to future
> proof the notebook to new unexpected notebook format changes?
>

Major version: totally unreadable, don't even try
Minor revision: newer features are obviously unavailable, but the format is
fundamentally readable

The minor version stuff is not meant to make it impossible, or even any
harder, to update the nbformat.  Only to give us a mechanism for expressing
 "this notebook is newer, and may use features you don't have, but at least
you can still read it", which we did not have before - there was no
distinction between "created by exactly this version" and "totally
unreadable".


So you are going to attempt to keep minor versions backwards compatible?
 Or maybe I'm misunderstanding what you mean by "readable".

Aaron Meurer



> Because as far as I, the user, am concerned, if a newer notebook
> format version doesn't work at all in older versions of IPython (such
> as is the case with notebook format v3 and IPython 0.12), then it
> hardly matters how "major" or "minor" the changes were. Or maybe you
> are thinking more for the benefit of people like Sage who are building
> on top of the notebook API?
>

> By the way, I completely agree with Brian that future proofing is
> usually a waste of time. But also be careful against overly "past
> proofing". I would much rather see new features added to the notebook,
> even every release, than to have them held back simply for the
> purposes of keeping things backwards compatible. Also, if jumping the
> gun on future proofing is a waste of time, so is spending a lot of
> effort on making sure that new notebook versions work correctly in
> older, unsupported releases.
>

I totally agree that we should not spend significant effort on future (or
past) proofing, and we haven't.  Nor is there any reason this would cause
resistance to new features that do require updating the nbformat.  If a
hoop must be leapt through to keep the nbformat, then the nbformat should
be updated.  We have a hoop threshold of zero.  This only aims to prevent
*known, planned, imminent features* from necessarily forcing that
unpleasantness (they still may, since they haven't actually been
implemented).

-MinRK


>
> Aaron Meurer
>
> >
> >> But where we are right now, adding to the metadata on cells or adding
> >> multiple worksheets will *not* require
> >> bumping the nbformat.
> >
> > Right.
> >
> > Cheers,
> >
> > Brian
> >
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> Brian
> >>>
> >>>> -MinRK
> >>>>
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Brian
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Jun 19, 2012 at 3:25 PM, MinRK <benjaminrk@gmail.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jun 19, 2012 at 3:23 PM, Brian Granger <ellisonbg@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On Tue, Jun 19, 2012 at 3:19 PM, MinRK <benjaminrk@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Jun 19, 2012 at 3:18 PM, Brian Granger
> >>>>>>>> <ellisonbg@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> On Tue, Jun 19, 2012 at 2:59 PM, Fernando Perez
> >>>>>>>>> <fperez.net@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>> On Tue, Jun 19, 2012 at 1:17 PM, MinRK <benjaminrk@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>> Yes - we put metadata on outputs for a reason, presumably.  If
> >>>>>>>>>>> this
> >>>>>>>>>>> shouldn't be saved, it should probably be removed from the
> >>>>>>>>>>> API.
> >>>>>>>>>>
> >>>>>>>>>> I can't recall precisely what we had in mind when we put it in,
> >>>>>>>>>> but
> >>>>>>>>>> something that springs to mind as potentially useful, for
> >>>>>>>>>> example,
> >>>>>>>>>> would be to specify a desired priority order for the various
> >>>>>>>>>> types
> >>>>>>>>>> of
> >>>>>>>>>> outputs. Right now when a client can display several kinds of
> >>>>>>>>>> output
> >>>>>>>>>> it just makes a choice, but we could let objects provide a hint
> >>>>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>> preferred order, based on what they know about the relative
> >>>>>>>>>> quality
> >>>>>>>>>> of
> >>>>>>>>>> each.
> >>>>>>>>>
> >>>>>>>>> I originally put it there to allow objects to provide hints to
> >>>>>>>>> the
> >>>>>>>>> frontend on how it should display a representation.  This is
> >>>>>>>>> similar
> >>>>>>>>> to how the payloads can indicate where it came from.
> >>>>>>>>>
> >>>>>>>>>> So I'd vote for not removing this, as it may prove useful...
> >>>>>>>>>
> >>>>>>>>> I also think it could be useful, although it seems a bit
> >>>>>>>>> excessive
> >>>>>>>>> to
> >>>>>>>>> store metadata for each output.  Here is what I propose.  We
> >>>>>>>>> simply
> >>>>>>>>> leave it alone until we have an actual use case that will help us
> >>>>>>>>> figure out exactly what this should look like.  Without a
> >>>>>>>>> concrete
> >>>>>>>>> usage case, it is difficult to know what is needed.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> But this doesn't answer the immediate question: Should this
> >>>>>>>> metadata
> >>>>>>>> dict be
> >>>>>>>> included in the nbformat
> >>>>>>>
> >>>>>>> I would vote no - not until we have a real usage case.  I don't
> like
> >>>>>>> to add things to the notebook format until we are actually using
> >>>>>>> them.
> >>>>>>
> >>>>>>
> >>>>>> Then should we remove all of the metadata stuff we just added?  The
> >>>>>> whole
> >>>>>> point was to prepare the nbformat for future changes to we don't
> have
> >>>>>> to
> >>>>>> update the nbformat, which is incredibly painful and should be done
> >>>>>> as
> >>>>>> rarely as possible.
> >>>>>>
> >>>>>> -MinRK
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> f
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> IPython-dev mailing list
> >>>>>>>>>> IPython-dev@scipy.org
> >>>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Brian E. Granger
> >>>>>>>>> Cal Poly State University, San Luis Obispo
> >>>>>>>>> bgranger@calpoly.edu and ellisonbg@gmail.com
> >>>>>>>>> _______________________________________________
> >>>>>>>>> IPython-dev mailing list
> >>>>>>>>> IPython-dev@scipy.org
> >>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> IPython-dev mailing list
> >>>>>>>> IPython-dev@scipy.org
> >>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Brian E. Granger
> >>>>>>> Cal Poly State University, San Luis Obispo
> >>>>>>> bgranger@calpoly.edu and ellisonbg@gmail.com
> >>>>>>> _______________________________________________
> >>>>>>> IPython-dev mailing list
> >>>>>>> IPython-dev@scipy.org
> >>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> IPython-dev mailing list
> >>>>>> IPython-dev@scipy.org
> >>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Brian E. Granger
> >>>>> Cal Poly State University, San Luis Obispo
> >>>>> bgranger@calpoly.edu and ellisonbg@gmail.com
> >>>>> _______________________________________________
> >>>>> IPython-dev mailing list
> >>>>> IPython-dev@scipy.org
> >>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> IPython-dev mailing list
> >>>> IPython-dev@scipy.org
> >>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Brian E. Granger
> >>> Cal Poly State University, San Luis Obispo
> >>> bgranger@calpoly.edu and ellisonbg@gmail.com
> >>> _______________________________________________
> >>> IPython-dev mailing list
> >>> IPython-dev@scipy.org
> >>> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>
> >>
> >>
> >> _______________________________________________
> >> IPython-dev mailing list
> >> IPython-dev@scipy.org
> >> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >>
> >
> >
> >
> > --
> > Brian E. Granger
> > Cal Poly State University, San Luis Obispo
> > bgranger@calpoly.edu and ellisonbg@gmail.com
> > _______________________________________________
> > IPython-dev mailing list
> > IPython-dev@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-dev
> _______________________________________________
> IPython-dev mailing list
> IPython-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>

_______________________________________________
IPython-dev mailing list
IPython-dev@scipy.org
http://mail.scipy.org/mailman/listinfo/ipython-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/ipython-dev/attachments/20120620/30a7827c/attachment-0001.html 


More information about the IPython-dev mailing list