[Numpy-discussion] More loadtxt() changes

Pierre GM pgmdevlist@gmail....
Tue Nov 25 18:00:30 CST 2008


Ryan,
Quick comments:

* I already have some unittests for StringConverter, check the file I  
attach.

* Your str2bool will probably mess things up in upgrade compared to  
the one JDH had written (the one I send you): you don't wanna use  
int(bool(value)), as it'll always give you 0 or 1 when you might need  
a ValueError

* Your locked version of update won't probably work either, as you  
force the converter to output a string (you set the status to largest  
possible, that's the one that outputs strings). Why don't you set the  
status to the current one (make a tmp one if needed).

* I'd probably get rid of StringConverter._get_from_dtype, as it is  
not needed outside the __init__. You may wanna stick to the original  
__init__.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_preview.py
Type: text/x-python-script
Size: 4871 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20081125/20d3c39d/attachment-0001.bin 
-------------- next part --------------




All, another question:
What's the best way to have some kind of sandbox for code like the one  
Ryan is writing ? So that we can try it, modify it, without commiting  
anything to SVN yet ?



On Nov 25, 2008, at 6:08 PM, Ryan May wrote:

> Pierre GM wrote:
>> Sounds like a plan. Wouldn't mind getting more feedback from  
>> fellow  users before we get too deep, however...
>
> Ok, I've attached, as a first cut, a diff against SVN HEAD that does  
> (I think) what I'm looking for.  It passes all of the old tests and  
> passes my own quick test.  A more rigorous test suite will follow,  
> but I want this out the door before I need to leave for the day.
>
> What this changeset essentially does is just add support for  
> automatic dtypes along with supplying/reading names for flexible  
> dtypes.  It leverages StringConverter heavily, using a few tweaks so  
> that old behavior is kept.  This is by no means a final version.
>
> Probably the biggest change from what I mentioned earlier is that  
> instead of dtype='auto', I've used dtype=None to signal the  
> detection code, since dtype=='auto' causes problems.
>
> I welcome any and all suggestions here, both on the code and on the  
> original idea of adding these capabilities to loadtxt().
>
> Ryan
>
> -- 
> Ryan May
> Graduate Research Assistant
> School of Meteorology
> University of Oklahoma
> Index: lib/io.py
> ===================================================================
> --- lib/io.py	(revision 6099)
> +++ lib/io.py	(working copy)
> @@ -233,29 +233,138 @@
>     for name in todel:
>         os.remove(name)
>
> -# Adapted from matplotlib
> +def _string_like(obj):
> +    try: obj + ''
> +    except (TypeError, ValueError): return False
> +    return True
>
> -def _getconv(dtype):
> -    typ = dtype.type
> -    if issubclass(typ, np.bool_):
> -        return lambda x: bool(int(x))
> -    if issubclass(typ, np.integer):
> -        return lambda x: int(float(x))
> -    elif issubclass(typ, np.floating):
> -        return float
> -    elif issubclass(typ, np.complex):
> -        return complex
> +def str2bool(value):
> +    """
> +    Tries to transform a string supposed to represent a boolean to  
> a boolean.
> +
> +    Raises
> +    ------
> +    ValueError
> +        If the string is not 'True' or 'False' (case independent)
> +    """
> +    value = value.upper()
> +    if value == 'TRUE':
> +        return True
> +    elif value == 'FALSE':
> +        return False
>     else:
> -        return str
> +        return int(bool(value))
>
> +class StringConverter(object):
> +    """
> +    Factory class for function transforming a string into another  
> object (int,
> +    float).
>
> -def _string_like(obj):
> -    try: obj + ''
> -    except (TypeError, ValueError): return 0
> -    return 1
> +    After initialization, an instance can be called to transform a  
> string
> +    into another object. If the string is recognized as  
> representing a missing
> +    value, a default value is returned.
>
> +    Parameters
> +    ----------
> +    dtype : dtype, optional
> +        Input data type, used to define a basic function and a  
> default value
> +        for missing data. For example, when `dtype` is float,  
> the :attr:`func`
> +        attribute is set to ``float`` and the default value to  
> `np.nan`.
> +    missing_values : sequence, optional
> +        Sequence of strings indicating a missing value.
> +
> +    Attributes
> +    ----------
> +    func : function
> +        Function used for the conversion
> +    default : var
> +        Default value to return when the input corresponds to a  
> missing value.
> +    mapper : sequence of tuples
> +        Sequence of tuples (function, default value) to evaluate in  
> order.
> +
> +    """
> +    from numpy.core import nan # To avoid circular import
> +    mapper = [(str2bool, None),
> +              (lambda x: int(float(x)), -1),
> +              (float, nan),
> +              (complex, nan+0j),
> +              (str, '???')]
> +
> +    def __init__(self, dtype=None, missing_values=None):
> +        if dtype is None:
> +            self.func = str2bool
> +            self.default = None
> +            self._status = 0
> +        else:
> +            dtype = np.dtype(dtype).type
> +            self.func,self.default,self._status =  
> self._get_from_dtype(dtype)
> +
> +        # Store the list of strings corresponding to missing values.
> +        if missing_values is None:
> +            self.missing_values = []
> +        else:
> +            self.missing_values = set(list(missing_values) + [''])
> +
> +    def __call__(self, value):
> +        if value in self.missing_values:
> +            return self.default
> +        return self.func(value)
> +
> +    def upgrade(self, value):
> +        """
> +    Tries to find the best converter for `value`, by testing  
> different
> +    converters in order.
> +    The order in which the converters are tested is read from the
> +    :attr:`_status` attribute of the instance.
> +        """
> +        try:
> +            self.__call__(value)
> +        except ValueError:
> +            _statusmax = len(self.mapper)
> +            if self._status == _statusmax:
> +                raise ValueError("Could not find a valid conversion  
> function")
> +            elif self._status < _statusmax - 1:
> +                self._status += 1
> +            (self.func, self.default) = self.mapper[self._status]
> +            self.upgrade(value)
> +
> +    def _get_from_dtype(self, dtype):
> +        """
> +    Sets the :attr:`func` and :attr:`default` attributes for a  
> given dtype.
> +        """
> +        dtype = np.dtype(dtype).type
> +        if issubclass(dtype, np.bool_):
> +            return (str2bool, 0, 0)
> +        elif issubclass(dtype, np.integer):
> +            return (lambda x: int(float(x)), -1, 1)
> +        elif issubclass(dtype, np.floating):
> +            return (float, np.nan, 2)
> +        elif issubclass(dtype, np.complex):
> +            return (complex, np.nan + 0j, 3)
> +        else:
> +            return (str, '???', -1)
> +
> +    def update(self, func, default=None, locked=False):
> +        """
> +    Sets the :attr:`func` and :attr:`default` attributes directly.
> +
> +    Parameters
> +    ----------
> +    func : function
> +        Conversion function.
> +    default : var, optional
> +        Default value to return when a missing value is encountered.
> +    locked : bool, optional
> +        Whether this should lock in the function so that no  
> upgrading is
> +        possible.
> +        """
> +        self.func = func
> +        self.default = default
> +        if locked:
> +            self._status = len(self.mapper)
> +
> def loadtxt(fname, dtype=float, comments='#', delimiter=None,  
> converters=None,
> -            skiprows=0, usecols=None, unpack=False):
> +            skiprows=0, usecols=None, unpack=False, names=None):
>     """
>     Load data from a text file.
>
> @@ -333,11 +442,10 @@
>             fh = gzip.open(fname)
>         else:
>             fh = file(fname)
>