[SciPy-user] The IO library and image file formats -- compare with with PIL

Zachary Pincus zachary.pincus@yale....
Sun Apr 20 21:19:49 CDT 2008


On Apr 20, 2008, at 1:42 PM, Stéfan van der Walt wrote:

> On 18/04/2008, Zachary Pincus <zachary.pincus@yale.edu> wrote:
>> I have my own "internal fork" of PIL that I've been calling "PIL-
>> lite". I tore out everything except the file IO, and I fixed that to
>> handle 16-bit files correctly on all endian machines, and to have a
>> more robust array interface.
>>
>> If people wanted to make a proper "fork" of PIL into a numpy-
>> compatible image IO layer, I would be all for that. I'd be happy to
>> donate "PIL-lite" as a starting point. Now, the file IO in PIL is a
>> bit circuitous -- files are initially read by pure-Python code that
>> determines the file type, etc. This information is then passed to
>> (brittle and ugly) C code to unpack and swizzle the bits as  
>> necessary,
>> and pack them into the PIL structs in memory.
>
> I would really try and avoid the forking route, if we could.   Each
> extra dependency (i.e. libpng, libjpeg etc.) is a potential build
> problem, and PIL already comes packaged everywhere.  My changes can
> easily be included in SciPy, rather than in PIL.  Could we do the same
> for yours?  Then we could rather build scipy.image (Travis' and
> Robert's colour-space codes can be incorporated there, as well?) on
> top of the PIL.

Nothing should be built "on top" of PIL, or any other image IO  
library, IMO. Just build things to work with numpy arrays (or things  
that have an array interface, so can be converted by numpy), and let  
the user decide what package is best for getting bits into and out of  
files on disk. Any explicit PIL dependencies should be really  
discouraged, because of that library's continued unsuitability for  
dealing with scientifically-relevant file formats and data types.

As to the problems with PIL that I've addressed (and several others),  
these are deep-seated issues that won't be fixed without a major  
overhaul. My thought was thus to take the pure-python file-sniffing  
part of PIL and marry it to numpy tools for taking in byte sequences  
and interpreting them as necessary. This would be have no library  
dependencies, and really wouldn't be a "fork" of PIL so much as using  
a small amount of non-broken PIL file-format-reading code that's there  
and abandoning the awkward/broken byte-IO and memory-model. I can't  
promise I have any time to work on this -- but I'll look into it,  
maybe -- and if anyone else wants to look into it as well, I'm happy  
to provide some code to start with.

> I'm really unhappy about the current state of ndimage.  It's written
> in (Python API) C, so no one wants to touch the code.  Much of it can
> be rewritten in equivalent pure Python, using modern NumPy constructs
> that weren't available to Peter.  What we really need is to get
> knowledgeable people together for a week and hack on this (ndimage is
> an extremely useful module!), but I don't know when we're going to
> have that chance.  Who fancies a visit to South Africa? :)

A major difficulty with ndimage, beyond the hairy C-code, is the  
spline-interpolation model that nearly everything is built on. While  
it's technically a nice infrastructure, it's quite dissimilar from  
what a lot of people (well, especially me) are used to with regard to  
how image resampling systems are generally constructed. So that makes  
it a lot harder to hack on or track down and fix bugs. I don't really  
have a good suggestion for addressing this, though, because the spline  
model is really quite nice when it works.

Zach




More information about the SciPy-user mailing list