[Numpy-discussion] Lazy imports again

Friedrich Romstedt friedrichromstedt@gmail....
Wed Jul 18 08:54:18 CDT 2012


A.S.: Some forewords, below the first paragraphs are annotated after I got to the conclusion at the end of the email; read these annotation in square brackets as if it were footnotes. The main point to grasp and keep in mind for me was, when it comes to package hierarchies, that "import package.module" and all sorts of import statements are completely distinct from variable lookup "package.module" although the syntactic similarity is very sweet syntactic sugar and makes things look consistent. It did confuse me, so some initial words are without this clarity; it might explain it (I don't want to delete it since it's written and hence provides context, although it leads to the end I cannot just drop it for the sake of the ending; there is no ending in fact :-). So now, have fun!

[This email is because it asks if there is interest in an existing working (see below) importing delay mechanism without use of meta levels inside of Python.]


Hi,

Some Years ago I made a "lazy" (who's lazy here? We have work with it, the code has, so?), so let's say, a postponed import system when I made Pmw py2exe compatible (it has or "had" a version choice built-in which was broken when loading from zips, so that files were not readable; this was the issue I wrote it for). It's standard Python 2.6 or so. I would have to lookup myself what the precise mechanism was, but in principle, the postponed-for-load modules of a package are objects with overloaded __getitem__ which does the import. After this, I think [and was wrong, see below], the loaded module is not placed in Python system space sys.modules but lookup always happens through this pseudo module which does the broadcast and yields the same behaviour in terms of attribute access [correct w.r.t. the lookup]. 

I would not bet my hand on that the module isn't placed in sys.modules [well done, Friedrich; good decision! :-)], but I think it isn't, the usage of that Pmw package was historically to never import submodules but always leave that to the system of the package and act as if it was already imported [what is true, nevertheless, it needs to import the modules somehow, and I didn't like the importing standard module to circumvent ordinary Python syntax, so I used exec instead]. I think this would actually play nice with what numpy was doing in past and would conserve that hence [this seems to be true; a subtlety is pointed out in the postpostscriptum]. 

Imports from the submodules in the style "from package.submodule import Class" did work, by design and not by fiddling afair [By Python design, yes; it has nothing to do with the import-delaying objects acting as pseudo modules]. 

Attributes of a package which are supposed to act as to be modules whose loading is postponed are coded in the __init__.py of the package, for instance, by code like "submodule = PostponedLoad("package.submodule")", this is the style, don't take it literally. I said it's years ago and I would look it up only if there is interest [probably I meant 'if I develop interest myself'; I didnt know :-)]. 

I think I even made a standalone module of the functionality already but I'm not sure. There might have been some cleanup which made me never publish it although there was even interest on the Pmw list. Sigh. 

You can be sure that it was a clean solution with all the constraints by modeling submodules as objects of an instantiated class clearly visible. It was fully compatible with the handling Pmw was doing before, and although I don't have user feedback, I never had to change any code after switching to my own remake. This includes for sure [not for sure; there were no submodules to be specified] syntax like I mentioned above (from p.m import C). I guess I would be very much mistaken in this; so I conclude without further reading that Python first looks up the named object "p.m" before traversing into sys.modules? This are details. [Yes, I was very much mistaken, it's only "from p import C" although it goes internally down to "p.m"; and yes, this are details indeed.]

What I am not sure on are local [that is, relative] imports because I guess they might rely on something what causes incompatibility. This is a "might", and there is chance that it turns into a "not", or not, depending on how local imports (which I never use; bad style :-) are tied to file structure. If they are inherently object and name based ... well I don't know. What speculation. I'm not knowledgabe on this. [The import mechanism isn't touched by this proposal here; the only thing is how e.g. "numpy.random.randn" is made available to the user without importing "numpy.random" before: and this is done by making numpy.random (when looked up by name, not in an import statement) an object importing on demand and then pretending it would be the module object itself.]

I'm currently not expecting myself to do anything substantial to this. I would just give the code free (public domain) and you can do whatever you want with it. Just use it freely. Even if I would like to get involved I probably won't do more that wise recommendations on how to use it. It's well documented inline. I don't think I get into coding here: No inclination and too much responsibility. After a few monthes another one would have to maintain it anyway. So better do it yourself from the beginning. 

What I can help with is to give the working code free; it was really a bit of thinking involved in getting into it and solved cleanly. 

I'm on the iPad currently (so I don't have the code around); let me know. [I looked the code up on github; could not stand it :-)]

The other person interested was from NASA ... they said they had a huge codebase where it would be handy. Didn't drive me anyway to make it published; probably for perfectionism. So it might help them also. I still have him in mind from time to time. 

So you would also do NASA peer review ... so that I can give it to him with your reference. :-D. Funny imagination. 

Would be good to have some good testers on this. Pmw isn't too well-known. Actually it's pretty dead in terms of development. It doesn't change anymore. (Don't know why we call this "dead" all the time.) If anyone knows, I hereby clearly distant myself from the mess in other parts of the codebase of that. 

Ralf, maybe we got in touch with this earlier about it; I have a vague memory. If, then it didn't work out that time either. I just got interested in the topic; not reading the list regularly. If it helps you fine; for me it's not important. I think or pretend ;-)

Yours,
Friedrich

P.S.: If it wouldn't be important for me I wouldn't write this email. I hope I can leave the bikeshedding to others. Probably it's part of that I would like to finish this project after three years or so having it lingering around unfinished in terms of feeling. Too much of this stuff. It'll hurt to hand it over to you but I think it would be the right step still. Things that can be handed over cannot be so important to make oneself believe they are; publication is probably just that. I would like to make this a history although I see the contradiction with that my interest shows that it isn't still for me; and probably won't even succeed in this. Coding and open source is probably part of getting involved in things because one doesn't want to get involved in it; well: life. Why do we want to always make things with functionality where noone cares about how we made it and why; not even we ourselves? I don't know, I don't believe into the separation of content and form. I don't know why I'm trying to help you here with this tiny piece of postponed module loading. I don't want to know it. Even if I like to and had some knowledge it would be plain useless and negligible, unimportant. :-) — F.

P.P.S.: Couldn't help it; usage illustrated here: https://github.com/friedrichromstedt/pmw2/blob/pmw2/__init__.py, and code which matters here: https://github.com/friedrichromstedt/pmw2/blob/pmw2/dynload.py. Did this because I wanted to know about if my arising doubts on "from p.m import C" are sensible; since in pmw2, everything is toplevel. I'm looking up the Python docs on how that works ... ah ya, and it (the dynload module) places the loaded module in sys.packages since it's an simple "exec 'import ' + name" statement; so ... Ya, importing .. but I think it should work, although not as expected. When saying e.g. "from numpy.random import randn", and the object present in the numpy module as attribute being intended to import numpy.random when its attributes are accessed e.g. by "numpy.random.randn(...)" has not been used yet s.t. numpy.random isn't loaded yet and not yet present in sys.modules, then the importing mechanism for Python 2.7.3 (currently at http://docs.python.org/reference/simple_stmts.html#the-import-statement) will import numpy.random, although without knowing or taking use of the attribute "random" already present in the "numpy" module object. It is a confusion between the syntax in the import statement "from numpy.random import (...)" and the syntax used at attribute access "a = numpy.random.rand(...)", which both use the dotted hierarchy naming, but are actually something very much different. In the end, if "from numpy.random import randn" is executed before accessing any of the attributes of the "random" attribute of "numpy", accessing numpy.random.randn afterwards will cause that import-delaying object to import "numpy.random" as a module in an import statement once more[,] what doesn't harm although it's not strictly necessary [see below for refinement; there might very well be a slight alteration causing this overhead to disappear completely without trace; it's rather an implication, not an alteration.]. I think everything should just be fine as long as no-one expects the result of name lookup for "numpy.random" to yield a module object, since this lookup will instead return an import-delaying class instance behaving similar to the module object it stands for in terms of attribute access. So as long as this similarity is sufficient it would suffice (pleaonasm! Say: Code would not have to be changed to reflect the change of object type.) Since import statements seem to never lookup local variables (they just use the same syntax), judging from abovenoted docs for 2.7.3, it wouldn't even harm to have numpy as package in a local variable with the same name when doing "from numpy.random import randn" later in the same scope. 

Oh my god, so much of writing for so little code. And then it's still untested! :-)


Am 17.07.2012 um 05:37 schrieb Travis Oliphant <travis@continuum.io>:

> 
> On Jul 16, 2012, at 4:50 PM, Ralf Gommers wrote:
> 
>> 
>> 
>> On Mon, Jul 16, 2012 at 6:28 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
>> Hi All,
>> 
>> Working lazy imports would be useful to have. Ralf is opposed to the idea because it caused all sorts of problems on different platforms when it was tried in scipy.
>> 
>> Note that my being opposed is because the benefits are smaller than the cost. If there was a better reason than shaving a couple of extra ms off the import time or being able to more cleanly deprecate modules, then of course it's possible I'd change my mind.
> 
> This sort of cost / benefit analysis is always tricky.   The "benefits" are always subjective.  For some, shaving a few ms off of import times is *extremely* valuable and so they would opine that it absolutely outweights the "cost".  Others see such benefits as barely worth mentioning.   I doubt we can define an absolute scale that would allow such an optimization to actually be done.    
> 
[Precisely, there is no absolute scale allowing such optimisations to be done. There are absolute scales, but they all say: Stay away from optimisation! :-)]

[Or to frame it differently: There is a language barrier which is made to have it and to be able to stick to what is called "Subjectivism". I don't think Ralf was subjective here; only the ironic or nearly cynical, a bit polemic example was. The first sentence is what counts, and Ralf, am I understanding you correctly that in some sense it contains the delusion that benefits have to outweight costs; would you agree on that you just don't see why to pay for this, although not being reluctant to pay a lot for good stuff (so to speak); s.t. what is called cost and benefit isn't an opposing pair to be balanced but instead always comes together, but here's little of both? :-)]

[I think the incinerating subject of milliseconds could be called barely worth mentioning if not taking that literal since it actually is mentioned while, e.g, well, it is mentioned, I cannot mention things without mentioning them. So it cannot be barely worth mentioning soo much. Nevertheless I think we all agree it would have been better if we never came to that point. :-)]

> My personal view is that speeding up import times is an important goal (though not the *most* important goal).    The "costs" here are also very loosely defined.   Lazy imports were tried with SciPy.   Several people had problems with them.   Mostly it was difficult to not have a "brittle" implementation that didn't break someone downstream libraries concept of what "import" meant --- systems that try to "freeze" Python programs were particularly annoyed at SciPy's lazy import mechanism. 

It wouldn't offend anything here what I'm proposing as the Python import mechanism is used as is; no change there. All changes inside on the code base of the package concerned are just as outlined above. This might be brittle too if there is needed more than attribute access. For instance reloads might not work with "reload(numpy.random)", I think because it depends on the name lookup of "numpy.random", which isn't a module anymore and hence cannot be reloaded (even though there is a module with that name in sys.modules). One would be able to use the numpy.random.reload() method I coded for that task apparently (in 2009?). :-)

Amendment: The abovenoted docs (http://docs.python.org/reference/simple_stmts.html#the-import-statement) do not state clearly how hierarchical imports are bound, but I guess that "import numpy.random" binds the module object for numpy.random to a variable called "random" being an attribute of the variable "numpy" (or its object, to be precise). This variable "numpy" is then the only local variable bound in current scope to the "numpy" module (or package). This would mean, since the __init__.py of the "numpy" module would initiate also a "random" attribute to be the import-delaying class instance, this "random" attribute would be overwritten, since initialisation of the "numpy" package happens first (not entirely sure, but it should), before the import traversal goes on to import "numpy.random". At this point, the import-delaying object is lost, without effect, i.e. not harmfully, since it was the goal to import numpy.random anyway, and thus the delaying object can be discarded from then; it is even more efficient. I notice that this should happen also when this delay object is used first by accessing its attributes, because it just exec's the same import: "exec 'import numpy.random' " essentially. Nevertheless, overloaded attribute access is still needed, because the delaying object is already looked up when the user issues "numpy.random.randn(...)" the first time "numpy.random" needs to be imported for this. After this, the delay mechanism should hence destruct itself and leave no trace. It should be as if it has never been there, for all following calls to "numpy.random.anything". This should be verified, though; I never cared for it since I didn't consider this. It would be highly elegant, also highly efficient (as there is no penalty after the first access anymore; everything just as right now), and I probably never noted this because the package I wrote dynload for (Pmw) couldn't make use of it, as it accesses the classes in the package's modules as if they were imported into the package module object via "from package.module import *", in principle. (Thus they are callable objects which instantiate on call an attribute retrieved from the import-delaying pseudo module.) I think the mechanism is highly versatile if tests do not fail and would be superior to metalevel approaches both in terms of simplicity and elegancy as well as penalty freeness. (Also it does not affect other packages, since the package wanting to make us of it needs to do this itself by publishing the submodules as import-delaying class instances as outlined in the beginning and in the illustration code, so by "module = dynload.Dynload('package.module')"; there might have been some renaming for the publication I planned later after doing the Pmw stuff with it.)

> However, I don't think that lazy imports should be universally eschewed.    They are probably not right for the NumPy library by default, but I think good use could be made of a system loaded before any other import by users who are looking to avoid the costs of importing all the packages imported by a library.  In particular, I suspect we could come up with an approach for C-API users like Andrew so that they could call _import_lazy() and then _import_array() and get what they want (hooks to the C-API) without what they don't (everything else on the Python side). 

You can safely skip this paragraph; it says less then nothing:
[I think this is rather nonoperational, no? From a universal not-eschewing follows just plain nothing than that the not-eschewing is then universal; something what's right for everything. I don't understand the relation between the second part and the third; I feel like lamped by technicalities? Here you have a perfect example of bikeshedding on my side: The discussion stays on low level and still a lot of text is produced. :-)]

I'm not sure on the use; but frankly speaking I think nothing is made because people think it's useful. If people say something wouldn't be useful it often is nothing more than an objectivation of the observation of a lack of personal interest. I don't know, people who do their work with heart are probably easily driven to locate the reason in the outside world as soon as they are driven to notice the difference. If Use arguments come up it might be a sign of that the subject is a bike shed, though. Or that it's too far away for people to relate to it; I don't know. Bike sheds more tend to create arguments on the colour and size; not on its use. (So on how to do it; not if.)

> I've implemented something like this in the past by using the metapath import hook to intercept import of a specific library.     It could be done for NumPy in a way that doesn't affect people who don't want it, but can be used by people who want to speed up their import times and don't need much from NumPy.

{Was this the thing Ilan had the talk about at PyCon DE Leipzig last year?}

I never liked that; it's too much elegant for me. Somehow I feel like adding meta levels takes out the meaning. And a meta level which solves "everything" or looks like[,] even if it is not claimed to do; is just another thing on its own where it is not really clear why it is done other than actually to replace flaws in the lower layer by other flaws. I agree that this even applies to the import-delaying class instances behaving like modules. Somehow it's like a motorised bike: it will never look sensible. I have in general the impression that Python is growing beyond its bounds in terms of complexity and the package/module notion together with importing submodules as if they were just groups of toplevel objects (as it is done in numpy) might be not a design flaw but a sign that numpy is bigger than it really is or wants to be. Same applies probably to scipy, I don't know good enough. There should be something beyond packages and package installation/deinstallation but aside of this need probably no-one knows anything?

> Certainly, in retrospect, it would have been a better design to allow the C-API to be loaded without everything else being loaded as well... -- maybe even providing a module *just* for importing the C-API that was in the top-level namespace "_numpy" or some such...    

In future, I'm afraid Python is expecting too much of itself, which means to have any expectations at all ... this is my personal opinion although I don't think the truth it is bound to is related to me other than by experience ... :-/ after we've seen that we can do a lot with Python we want a bit more. I don't think this will work. When people are changing from inventing to enhancing and embettering normally to my perceiving the "new" things arising then are not as sensible as the "invented" things before (which were not "new"; but shared what all inventions share). It's probably not pessimistic to say that Python is beyond peak; Python 3k might be a sign?

Things like this just happen, all the time. People live, grow, mature, and die; it's very normal. Same with other entities; same with Python. Too fast-growing entities probably have a lack on other sides. And I really adore Python; I love it, although I didn't go beyond Python 2.6. I did ten thousands line of code per year without trash; I used it every day and it worked just fine (mostly). I did my whole scientific education in University with it as long as this was computer-related and not about text. I'm using it since about 2002 where it was about Python <faraway>. And it worked that time also. Ten or twenty years are a long life time for computer languages I think. If we need to change something "substantial" without having another "idea" with it; I think we're doing something wrong then. 

It won't lead anywhere because it knows already what it wants. And the boundaries impressed by what is already there are nothing more than an obstacle then. 

Again, I'm not pessimistic, I see that it is unrelated to the list but I also find that the subject is in the sense I outlined above. It might explain that inconceivable colour smashing that happened recently ("by what?" – "exactly."). Unrelated people are attracted by unrelated things. I don't like this development on the list, I find it annoying, and although I'm about to unsubscribe I have to confess that I believe myself and this post is somehow part of it. It's just that I'm doing critics about it and do not believe it would matter in the sense it should or people might be inclined to think I would like it to see alike. 

I think mailing lists allowing its members to have thoughts like this without being dreamy or "unrelated", but serious and honest; there must be something wrong with the subject, no? Things grow with their people, does anyone have similar thoughts or feelings? I might touch an open miracle here. 

It's because I notice this tendency without account for age in the minds of many people around me. And they agree with my thoughts, what is the most horrid. 

Maybe we should >>> import lazy again. 


Friedrich. 


P.B.: I guess I will get now the usual banners saying "You know very well that tenthousands of users and innocent and serious people will receive this text, even if they don't read it, and waste at least a button click on it or some minutes and some thinking otherwise?"; as well as "You wrote this email at speed xy.z characters per minute, do you think that was too fast"; and "You want to let it cool off a bit?"; so they have been there already: I think I will answer them "Yes", "hm, typing is probably fractal so how can you know, but what I know is that the speed was here a quarter of an email per hour!", and finally "No, not again! Then it will fall to ashes and be stone like!" :-) let's see ... ah right, the horrid impression mentioned above also is because people rather agree on that they don't understand what I'm talking about or say that themselves, if they don't plainly agree. So it cannot be that horrid. Probably it's an application of Goedel's theorem ;-)

Quality Assurance is near to be passed ... about to smash QA ... Okay I have let it cool off one day actually, I was driven to sending and redacting it [now] by the solution which came up and which I find inelegant and highly agressive to the interpreter internals; I know it's possible but I find it a misuse for production environments. Yes, I don't think so of what I'm proposing here; I don't even call it a solution, not only because it's untested in numpy context, but also because it doesn't solve a problem but rather implements a concept. :-). 

F.

> Best, 
> 
> -Travis
> 
> 
> 
>> 
>> Ralf
>> 
>>  
>> I thought I'd open the topic for discussion so that folks who had various problems/solutions could offer input and the common experience could be collected in one place. Perhaps there is a solution that actually works.

Some bikesheds for people to fling on (e.g., me): What is a common experience? What is a solution that doesn't work? I agree on that there are things which work but aren't solutions. :-) Only joking, I'm just smiling because I got through editing (I was afraid I never would :-). 

>> Ideas?

Yes! :-D

>> Chuck

Friedrich. 

Sending now. Hey, you got to here? Did you actually read? :-)

>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120718/fa6a587a/attachment-0001.html 


More information about the NumPy-Discussion mailing list