
=============================================
Writing an Op to work on an ``ndarray`` in C
=============================================

This section walks through a non-trivial example Op that does something pretty
weird and unrealistic, that is hard to express with existing Ops.
(Technically, we could use ``Scan`` to implement the Op we're about to describe,
but we ignore that possibility for the sake of example.)

The following code works, but important error-checking has been omitted for
clarity.  For example, when you write C code that assumes memory is contiguous,
you should check the strides and alignment.


.. testcode::

   import theano

   class Fibby(theano.Op):
       """
       An arbitrarily generalized Fibbonacci sequence
       """
       __props__ = ()

       def make_node(self, x):
           x_ = tensor.as_tensor_variable(x)
           assert x_.ndim == 1
           return theano.Apply(self,
               inputs=[x_],
               outputs=[x_.type()])
           # using x_.type() is dangerous, it copies x's broadcasting behaviour

       def perform(self, node, inputs, output_storage):
           x, = inputs
           y = output_storage[0][0] = x.copy()
           for i in range(2, len(x)):
               y[i] = y[i-1] * y[i-2] + x[i]

       def c_code(self, node, name, inames, onames, sub):
           x, = inames
           y, = onames
           fail = sub['fail']
           return """
   Py_XDECREF(%(y)s);
   %(y)s = (PyArrayObject*)PyArray_FromArray(
               %(x)s, 0, NPY_ARRAY_ENSURECOPY);
   if (!%(y)s)
     %(fail)s;
   {//New scope needed to make compilation work
     dtype_%(y)s * y = (dtype_%(y)s*)PyArray_DATA(%(y)s);
     dtype_%(x)s * x = (dtype_%(x)s*)PyArray_DATA(%(x)s);
     for (int i = 2; i < PyArray_DIMS(%(x)s)[0]; ++i)
       y[i] = y[i-1]*y[i-2] + x[i];
   }
           """ % locals()

       def c_code_cache_version(self):
           return (1,)

   fibby = Fibby()


In the first two lines of the C function, we make y point to a new array with
the correct size for the output. This is essentially simulating the line
``y = x.copy()``.
The variables ``%(x)s`` and ``%(y)s`` are set up by the TensorType to be ``PyArrayObject`` pointers.
TensorType also set up ``dtype_%(x)s`` to be a typdef to the C type for ``x``.


.. code-block:: c

    Py_XDECREF(%(y)s);
    %(y)s = (PyArrayObject*)PyArray_FromArray(
        %(x)s, 0, NPY_ARRAY_ENSURECOPY);

The first line reduces the reference count of the data that y originally
pointed to. The second line allocates the new data and makes y point to it.

In C code for a theano op, numpy arrays are represented as ``PyArrayObject`` C
structs. This is part of the numpy/scipy C API documented at
http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html

TODO: NEEDS MORE EXPLANATION.

.. _op_contract_fibby:

Writing an Optimization
=======================

``fibby`` of a vector of zeros is another vector of zeros of
the same size.
Theano does not attempt to infer this from the code provided via ``Fibby.perform`` or ``Fibby.c_code``.
However, we can write an optimization that makes use of this observation.
This sort of local substitution of special cases is common,
and there is a stage of optimization (specialization) devoted to such optimizations.
The following optimization (``fibby_of_zero``) tests whether the input is
guaranteed to be all zero, and if so it returns the input itself as a replacement
for the old output.

TODO: talk about OPTIMIZATION STAGES

.. If you modify this code, also change :
.. theano/tests/test_tutorial.py:T_fibby.test_fibby_1

.. testcode::

   from theano.tensor.opt import get_scalar_constant_value, NotScalarConstantError

   # Remove any fibby(zeros(...))
   @theano.tensor.opt.register_specialize
   @theano.gof.local_optimizer([fibby])
   def fibby_of_zero(node):
       if node.op == fibby:
           x = node.inputs[0]
           try:
               if numpy.all(0 == get_scalar_constant_value(x)):
                   return [x]
           except NotScalarConstantError:
               pass

The ``register_specialize`` decorator is what activates our optimization, and
tells Theano to use it in the specialization stage.
The ``local_optimizer`` decorator builds a class instance around our global
function.  The ``[fibby]`` argument is a hint that our optimizer works on nodes
whose ``.op`` attribute equals ``fibby``.
The function here (``fibby_of_zero``) expects an ``Apply`` instance as an
argument for parameter ``node``. It tests using
function ``get_scalar_constant_value``, which determines if a
Variable (``x``) is guaranteed to be a constant, and if so, what constant.

Test the optimization
=====================

Here is some code to test that the optimization is applied only when needed.

.. testcode::

   import numpy
   import theano.tensor as T
   from theano import function
   from theano import tensor

   # Test it does not apply when not needed
   x = T.dvector()
   f = function([x], fibby(x))

   # We call the function to make sure it runs.
   # If you run in DebugMode, it will compare the C and Python outputs.
   f(numpy.random.rand(5))
   topo = f.maker.fgraph.toposort()
   assert len(topo) == 1
   assert isinstance(topo[0].op, Fibby)

   # Test that the optimization gets applied.
   f_zero = function([], fibby(T.zeros([5])))

   # If you run in DebugMode, it will compare the output before
   # and after the optimization.
   f_zero()

   # Check that the optimization removes the Fibby Op.
   # For security, the Theano memory interface ensures that the output
   # of the function is always memory not aliased to the input.
   # That is why there is a DeepCopyOp op.
   topo = f_zero.maker.fgraph.toposort()
   assert len(topo) == 1
   assert isinstance(topo[0].op, theano.compile.ops.DeepCopyOp)
