[Numpy-discussion] [ANN] Theano 0.4.0 released

David Warde-Farley wardefar@iro.umontreal...
Mon Jun 27 18:14:53 CDT 2011

Announcing Theano 0.4.0

This is a major release, with lots of new features, bug fixes, and some
interface changes (deprecated or potentially misleading features were
removed).  The upgrade is recommended for everybody, unless you rely on
deprecated features that have been removed.

For those using the bleeding edge version in the
mercurial repository, we encourage you to update to the `0.4.0` tag.

Deleting old cache

The caching mechanism for compiled C modules has been updated.
In some cases, using previously-compiled modules with the new version of
Theano can lead to high memory usage and code slow-down. If you experience
these symptoms, we encourage you to clear your cache.

The easiest way to do that is to execute:
   theano-cache clear

(The theano-cache executable is in Theano/bin.)

What's New

[Include the content of NEWS.txt here]
Change in output memory storage for Ops:
If you implemented custom Ops, with either C or Python implementation,
this will concern you.

The contract for memory storage of Ops has been changed. In particular,
it is no longer guaranteed that output memory buffers are either empty,
or allocated by a previous execution of the same Op.

Right now, here is the situation:
* For Python implementation (perform), what is inside output_storage
  may have been allocated from outside the perform() function, for
  instance by another node (e.g., Scan) or the Mode. If that was the
  case, the memory can be assumed to be C-contiguous (for the moment).
* For C implementations (c_code), nothing has changed yet.

In a future version, the content of the output storage, both for Python and C
versions, will either be NULL, or have the following guarantees:
* It will be a Python object of the appropriate Type (for a Tensor variable,
  a numpy.ndarray, for a GPU variable, a CudaNdarray, for instance)
* It will have the correct number of dimensions, and correct dtype
However, its shape and memory layout (strides) will not be guaranteed.

When that change is made, the config flag DebugMode.check_preallocated_output
will help you find implementations that are not up-to-date.

* tag.shape attribute deprecated (#633)
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
* Dividing integers with / is deprecated: use // for integer division, or
  cast one of the integers to a float type if you want a float result (you may
  also change this behavior with config.int_division).
* Removed (already deprecated) sandbox/compile module
* Removed (already deprecated) incsubtensor and setsubtensor functions,
  inc_subtensor and set_subtensor are to be used instead.

Bugs fixed:
* In CudaNdarray.__{iadd,idiv}__, when it is not implemented, return the error.
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Trying to compute x % y with one or more arguments being complex now
  raises an error.
* The output of random samples computed with uniform(..., dtype=...) is
  guaranteed to be of the specified dtype instead of potentially being of a
  higher-precision dtype.
* The perform() method of DownsampleFactorMax did not give the right result
  when reusing output storage. This happen only if you use the Theano flags
  'linker=c|py_nogc' or manually specify the mode to be 'c|py_nogc'.

Crash fixed:
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
* Some optimizations crashed when the "ShapeOpt" optimization was disabled.

* Optimize all subtensor followed by subtensor.

* Move to the gpu fused elemwise that have other dtype then float32 in them
  (except float64) if the input and output are float32.
  * This allow to move elemwise comparisons to the GPU if we cast it to
    float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
  memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
* Multinomial random variates now available on GPU

New features:
* ProfileMode
   * profile the scan overhead
   * simple hook system to add profiler
   * reordered the output to be in the order of more general to more specific
* DebugMode now checks Ops with different patterns of preallocated memory,
  configured by config.DebugMode.check_preallocated_output.
* var[vector of index] now work, (grad work recursively, the direct grad
  work inplace, gpu work)
   * limitation: work only of the outer most dimensions.
* New way to test the graph as we build it. Allow to easily find the source
  of shape mismatch error:
* cuda.root inferred if nvcc is on the path, otherwise defaults to
* Better graph printing for graphs involving a scan subgraph
* Casting behavior can be controlled through config.cast_policy,
  new (experimental) mode.
* Smarter C module cache, avoiding erroneous usage of the wrong C
  implementation when some options change, and avoiding recompiling the
  same module multiple times in some situations.
* The "theano-cache clear" command now clears the cache more thoroughly.
* More extensive linear algebra ops (CPU only) that wrap scipy.linalg
  now available in the sandbox.
* CUDA devices 4 - 16 should now be available if present.
* infer_shape support for the View op, better infer_shape support in Scan
* infer_shape supported in all case of subtensor
* tensor.grad now gives an error by default when computing the gradient
  wrt a node that is disconnected from the cost (not in the graph, or
  no continuous path from that op to the cost).
* New tensor.isnan and isinf functions.

* Better commenting of cuda_ndarray.cu
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
* Better documentation of testing on Windows
* Better documentation of the 'run_individual_tests' script

Unit tests:
* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
* Better scripts to run tests individually or in batches
* Some tests are now run whenever cuda is available and not just when it has
  been enabled before
* Tests display less pointless warnings.

* Correctly put the broadcast flag to True in the output var of
  a Reshape op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default, option to print
  more compact node names.
* pydotprint: How trunk label that are too long.
* More compact printing (ignore leading "Composite" in op names)


You can download Theano from http://pypi.python.org/pypi/Theano.


Theano is a Python library that allows you to define, optimize, and
efficiently evaluate mathematical expressions involving
multi-dimensional arrays. It is built on top of NumPy. Theano

* tight integration with NumPy: a similar interface to NumPy's.
  numpy.ndarrays are also used internally in Theano-compiled functions.
* transparent use of a GPU: perform data-intensive computations up to
  140x faster than on a CPU (support for float32 only).
* efficient symbolic differentiation: Theano can compute derivatives
  for functions of one or many inputs.
* speed and stability optimizations: avoid nasty bugs when computing
  expressions such as log(1+ exp(x)) for large values of x.
* dynamic C code generation: evaluate expressions faster.
* extensive unit-testing and self-verification: includes tools for
  detecting and diagnosing bugs and/or potential problems.

Theano has been powering large-scale computationally intensive
scientific research since 2007, but it is also approachable
enough to be used in the classroom (IFT6266 at the University of Montreal).


About Theano:

About NumPy:

About SciPy:

Machine Learning Tutorial with Theano on Deep Architectures:


I would like to thank all contributors of Theano. For this particular
release, many people have helped during the release sprint:
(in alphabetical order) Frederic Bastien, James Bergstra, Nicolas
Boulanger-Lewandowski, Raul Chandias Ferrari, Olivier Delalleau,
Guillaume Desjardins, Philippe Hamel, Pascal Lamblin, Razvan Pascanu and
David Warde-Farley.

Also, thank you to all NumPy and SciPy developers as Theano builds on
its strength.

All questions/comments are always welcome on the Theano
mailing-lists ( http://deeplearning.net/software/theano/ )

More information about the NumPy-Discussion mailing list