Building Blocks#
Here we explain the components of Ivy which are fundamental to its usage as a code converter.
Backend Functional APIs ✅#
The first important point to make is that, Ivy does not implement its own C++ or CUDA backend.
Instead, Ivy wraps the functional APIs of existing frameworks, bringing them into syntactic and semantic alignment.
Let’s take the function ivy.stack()
as an example.
There are separate backend modules for JAX, TensorFlow, PyTorch, and NumPy, and so we implement the stack
method once for each backend, each in separate backend files like so:
# ivy/functional/backends/jax/manipulation.py:
def stack(
arrays: Union[Tuple[JaxArray], List[JaxArray]],
/,
*,
axis: int = 0,
out: Optional[JaxArray] = None,
) -> JaxArray:
return jnp.stack(arrays, axis=axis)
# ivy/functional/backends/numpy/manipulation.py:
def stack(
arrays: Union[Tuple[np.ndarray], List[np.ndarray]],
/,
*,
axis: int = 0,
out: Optional[np.ndarray] = None,
) -> np.ndarray:
return np.stack(arrays, axis, out=out)
stack.support_native_out = True
# ivy/functional/backends/tensorflow/manipulation.py:
def stack(
arrays: Union[Tuple[tf.Tensor], List[tf.Tensor]],
/,
*,
axis: int = 0,
out: Optional[Union[tf.Tensor, tf.Variable]] = None,
) -> Union[tf.Tensor, tf.Variable]:
return tf.experimental.numpy.stack(arrays, axis)
# ivy/functional/backends/torch/manipulation.py:
def stack(
arrays: Union[Tuple[torch.Tensor], List[torch.Tensor]],
/,
*,
axis: int = 0,
out: Optional[torch.Tensor] = None,
) -> torch.Tensor:
return torch.stack(arrays, axis, out=out)
stack.support_native_out = True
There were no changes required for this function, however NumPy and PyTorch both had to be marked as supporting the out argument natively.
For more complicated functions, we need to do more than simply wrap and maybe change the name.
For functions with differing behavior then we must modify the function to fit the unified in-out behavior of Ivy’s API.
For example, the APIs of JAX, PyTorch, and NumPy all have a logspace
method, but TensorFlow does not at the time of writing.
Therefore, we need to construct it using a composition of existing TensorFlow ops like so:
# ivy/functional/backends/tensorflow/creation.py:
def logspace(
start: Union[tf.Tensor, tf.Variable, int],
stop: Union[tf.Tensor, tf.Variable, int],
num: int,
base: float = 10.0,
axis: Optional[int] = None,
*,
dtype: tf.DType,
device: str,
) -> Union[tf.Tensor, tf.Variable]:
power_seq = ivy.linspace(start, stop, num, axis, dtype=dtype, device=device)
return base**power_seq
Ivy Functional API ✅#
Calling the different backend files explicitly would work okay, but it would mean we need to import ivy.functional.backends.torch as ivy
to use a PyTorch backend or import ivy.functional.backends.tensorflow as ivy
to use a TensorFlow backend.
Instead, we allow these backends to be bound to the single shared namespace ivy.
The backend can then be changed by calling ivy.set_backend('torch')
for example.
ivy.functional.ivy
is the submodule where all the doc strings and argument typing reside for the functional Ivy API.
For example, the function prod()
is shown below:
# ivy/functional/ivy/elementwise.py:
@to_native_arrays_and_back
@handle_out_argument
@handle_nestable
def prod(
x: Union[ivy.Array, ivy.NativeArray],
*,
axis: Optional[Union[int, Sequence[int]]] = None,
dtype: Optional[Union[ivy.Dtype, ivy.NativeDtype]] = None,
keepdims: bool = False,
out: Optional[ivy.Array] = None,
) -> ivy.Array:
"""Calculate the product of input array x elements.
x
input array. Should have a numeric data type.
axis
axis or axes along which products must be computed. By default, the product must
be computed over the entire array. If a tuple of integers, products must be
computed over multiple axes. Default: ``None``.
keepdims
bool, if True, the reduced axes (dimensions) must be included in the result as
singleton dimensions, and, accordingly, the result must be compatible with the
input array (see Broadcasting). Otherwise, if False, the reduced axes
(dimensions) must not be included in the result. Default: ``False``.
dtype
data type of the returned array. If None,
if the default data type corresponding to the data type “kind” (integer or
floating-point) of x has a smaller range of values than the data type of x
(e.g., x has data type int64 and the default data type is int32, or x has data
type uint64 and the default data type is int64), the returned array must have
the same data type as x. if x has a floating-point data type, the returned array
must have the default floating-point data type. if x has a signed integer data
type (e.g., int16), the returned array must have the default integer data type.
if x has an unsigned integer data type (e.g., uint16), the returned array must
have an unsigned integer data type having the same number of bits as the default
integer data type (e.g., if the default integer data type is int32, the returned
array must have a uint32 data type). If the data type (either specified or
resolved) differs from the data type of x, the input array should be cast to the
specified data type before computing the product. Default: ``None``.
out
optional output array, for writing the result to.
Returns
-------
ret
array, if the product was computed over the entire array, a zero-dimensional
array containing the product; otherwise, a non-zero-dimensional array containing
the products. The returned array must have a data type as described by the dtype
parameter above.
>>> x = ivy.array([1, 2, 3])
>>> z = ivy.prod(x)
>>> print(z)
ivy.array(6)
>>> x = ivy.array([1, 0, 3])
>>> z = ivy.prod(x)
>>> print(z)
ivy.array(0)
"""
return current_backend(x).prod(
x, axis=axis, dtype=dtype, keepdims=keepdims, out=out
)
Implicitly, Ivy sets numpy as the default backend or operates with the backend corresponding to the specified data inputs until the user explicitly sets a different backend. The examples can be seen below:
# implicit
import ivy
x = ivy.array([1, 2, 3])
(type(ivy.to_native(x)))
# -> <class 'numpy.ndarray'>
import torch
t = torch.tensor([23,42, -1])
type(ivy.to_native(ivy.sum(t)))
# -> <class 'torch.Tensor'>
|
# explicit
import ivy
ivy.set_backend("jax")
z = ivy.array([1, 2, 3]))
type(ivy.to_native(z))
# -> <class 'jaxlib.xla_extension.DeviceArray'>
|
This implicit backend selection, and the use of a shared global ivy namespace for all backends, are both made possible via the backend handler.
Frontend Functional APIs ✅#
While the backend API, Ivy API, and backend handler enable all Ivy code to be framework-agnostic, they do not, for example, enable PyTorch code to be framework agnostic. But with frontend APIs, we can also achieve this!
Let’s take a look at how the implementation of clip
method would seem like in the frontends:
# ivy/functional/frontends/jax/lax/functions.py
def clamp(x_min,x, x_max):
return ivy.clip(x, x_min, x_max)
# ivy/functional/frontends/numpy/general.py
def clip(x, x_min, x_max):
return ivy.clip(x, x_min, x_max)
# ivy/functional/frontends/tensorflow/general.py
def clip_by_value(x, x_min, x_max):
return ivy.clip(x, x_min, x_max)
# ivy/functional/frontends/torch/general.py
def clamp(x, x_min, x_max):
return ivy.clip(x, x_min, x_max)
combined, we have the following situation:
Importantly, we can select the backend and frontend independently from one another. For example, this means we can select a JAX backend, but also select the PyTorch frontend and write Ivy code which fully adheres to the PyTorch functional API. In the reverse direction: we can take pre-written pure PyTorch code, replace each PyTorch function with the equivalent function using Ivy’s PyTorch frontend, and then run this PyTorch code using JAX:
For this example it’s very simple, the differences are only syntactic, but the above process works for any function. If there are semantic differences then these will be captured (a) in the wrapped frontend code which expresses the frontend method as a composition of Ivy functions, and (b) in the wrapped backend code which expressed the Ivy functions as compositions of backend methods.
Let’s take a more complex example and convert the PyTorch method torch.nn.functional.one_hot()
into NumPy code.
The frontend is implemented by wrapping a single Ivy method ivy.one_hot()
as follows:
# ivy/functional/frontends/torch/nn/sparse_functions.py
def one_hot(tensor, num_classes=-1):
return ivy.one_hot(tensor, num_classes)
Let’s look at the NumPy backend code for this Ivy method:
# ivy/functional/backends/numpy/general.py
def one_hot(
indices: np.ndarray, depth: int, *, device: str, out: Optional[np.ndarray] = None
) -> np.ndarray:
res = np.eye(depth)[np.array(indices).reshape(-1)]
return res.reshape(list(indices.shape) + [depth])
By chaining these methods together, we can now call torch.nn.functional.one_hot()
using NumPy:
import ivy
import ivy.frontends.torch as torch
ivy.set_backend('numpy')
x = np.array([0., 1., 2.])
ret = torch.nn.functional.one_hot(x, 3)
Let’s take one more example and convert TensorFlow method tf.cumprod()
into PyTorch code.
This time, the frontend is implemented by wrapping two Ivy methods ivy.cumprod()
, and ivy.flip()
as follows:
# ivy/functional/frontends/tensorflow/math.py
def cumprod(x, axis=0, exclusive=False, reverse=False, name=None):
ret = ivy.cumprod(x, axis, exclusive)
if reverse:
return ivy.flip(ret, axis)
return ret
Let’s look at the PyTorch backend code for both of these Ivy methods:
# ivy/functional/backends/torch/general.py
def cumprod(
x: torch.Tensor,
axis: int = 0,
exclusive: bool = False,
*,
out: Optional[torch.Tensor] = None,
) -> torch.Tensor:
if exclusive:
x = torch.transpose(x, axis, -1)
x = torch.cat((torch.ones_like(x[..., -1:]), x[..., :-1]), -1, out=out)
res = torch.cumprod(x, -1, out=out)
return torch.transpose(res, axis, -1)
return torch.cumprod(x, axis, out=out)
# ivy/functional/backends/torch/manipulation.py
def flip(
x: torch.Tensor,
axis: Optional[Union[int, Sequence[int]]] = None,
*,
out: Optional[torch.Tensor] = None,
) -> torch.Tensor:
num_dims: int = len(x.shape)
if not num_dims:
return x
if axis is None:
new_axis: List[int] = list(range(num_dims))
else:
new_axis: List[int] = axis
if isinstance(new_axis, int):
new_axis = [new_axis]
else:
new_axis = new_axis
new_axis = [item + num_dims if item < 0 else item for item in new_axis]
ret = torch.flip(x, new_axis)
return ret
Again, by chaining these methods together, we can now call tf.math.cumprod()
using PyTorch:
import ivy
import ivy.frontends.tensorflow as tf
ivy.set_backend('torch')
x = torch.tensor([[0., 1., 2.]])
ret = tf.math.cumprod(x, -1)
Backend Handler ✅#
All code for setting and unsetting the backend resides in the submodule at ivy/utils/backend/handler.py
, and the front facing function is ivy.current_backend()
.
The contents of this function are as follows:
# ivy/utils/backend/handler.py
def current_backend(*args, **kwargs):
global implicit_backend
# if a global backend has been set with set_backend then this will be returned
if backend_stack:
f = backend_stack[-1]
return f
# if no global backend exists, we try to infer the backend from the arguments
f = _determine_backend_from_args(list(args) + list(kwargs.values()))
if f is not None:
implicit_backend = f.current_backend_str()
return f
return importlib.import_module(_backend_dict[implicit_backend])
If a global backend framework has been previously set using for example ivy.set_backend('tensorflow')
, then this globally set backend is returned.
Otherwise, the input arguments are type-checked to infer the backend, and this is returned from the function as a callable module with all bound functions adhering to the specific backend.
The functions in this returned module are populated by iterating through the global ivy.__dict__
(or a non-global copy of ivy.__dict__
if non-globally-set), and overwriting every function which is also directly implemented in the backend-specific namespace.
The following is a slightly simplified version of this code for illustration, which updates the global ivy.__dict__
directly:
# ivy/utils/backend/handler.py
def set_backend(backend: str):
# un-modified ivy.__dict__
global ivy_original_dict
if not backend_stack:
ivy_original_dict = ivy.__dict__.copy()
# add the input backend to the global stack
backend_stack.append(backend)
# iterate through original ivy.__dict__
for k, v in ivy_original_dict.items():
# if method doesn't exist in the backend
if k not in backend.__dict__:
# add the original ivy method to backend
backend.__dict__[k] = v
# update global ivy.__dict__ with this method
ivy.__dict__[k] = backend.__dict__[k]
The functions implemented by the backend-specific backend such as ivy.functional.backends.torch
only constitute a subset of the full Ivy API.
This is because many higher level functions are written as a composition of lower level Ivy functions.
These functions therefore do not need to be written independently for each backend framework.
A good example is ivy.lstm_update()
, as shown:
# ivy/functional/ivy/layers.py
@to_native_arrays_and_back
@handle_nestable
def lstm_update(
x: Union[ivy.Array, ivy.NativeArray],
init_h: Union[ivy.Array, ivy.NativeArray],
init_c: Union[ivy.Array, ivy.NativeArray],
kernel: Union[ivy.Array, ivy.NativeArray],
recurrent_kernel: Union[ivy.Array, ivy.NativeArray],
bias: Optional[Union[ivy.Array, ivy.NativeArray]] = None,
recurrent_bias: Optional[Union[ivy.Array, ivy.NativeArray]] = None,
) -> Tuple[ivy.Array, ivy.Array]:
"""Perform long-short term memory update by unrolling time dimension of the input array.
Parameters
----------
x
input tensor of LSTM layer *[batch_shape, t, in]*.
init_h
initial state tensor for the cell output *[batch_shape, out]*.
init_c
initial state tensor for the cell hidden state *[batch_shape, out]*.
kernel
weights for cell kernel *[in, 4 x out]*.
recurrent_kernel
weights for cell recurrent kernel *[out, 4 x out]*.
bias
bias for cell kernel *[4 x out]*. (Default value = None)
recurrent_bias
bias for cell recurrent kernel *[4 x out]*. (Default value = None)
Returns
-------
ret
hidden state for all timesteps *[batch_shape,t,out]* and cell state for last
timestep *[batch_shape,out]*
"""
# get shapes
x_shape = list(x.shape)
batch_shape = x_shape[:-2]
timesteps = x_shape[-2]
input_channels = x_shape[-1]
x_flat = ivy.reshape(x, (-1, input_channels))
# input kernel
Wi = kernel
Wi_x = ivy.reshape(
ivy.matmul(x_flat, Wi) + (bias if bias is not None else 0),
batch_shape + [timesteps, -1],
)
Wii_x, Wif_x, Wig_x, Wio_x = ivy.split(Wi_x, 4, -1)
# recurrent kernel
Wh = recurrent_kernel
# lstm states
ht = init_h
ct = init_c
# lstm outputs
hts_list = []
# unrolled time dimension with lstm steps
for Wii_xt, Wif_xt, Wig_xt, Wio_xt in zip(
ivy.unstack(Wii_x, axis=-2),
ivy.unstack(Wif_x, axis=-2),
ivy.unstack(Wig_x, axis=-2),
ivy.unstack(Wio_x, axis=-2),
):
htm1 = ht
ctm1 = ct
Wh_htm1 = ivy.matmul(htm1, Wh) + (
recurrent_bias if recurrent_bias is not None else 0
)
Whi_htm1, Whf_htm1, Whg_htm1, Who_htm1 = ivy.split(
Wh_htm1, num_or_size_splits=4, axis=-1
)
it = ivy.sigmoid(Wii_xt + Whi_htm1)
ft = ivy.sigmoid(Wif_xt + Whf_htm1)
gt = ivy.tanh(Wig_xt + Whg_htm1)
ot = ivy.sigmoid(Wio_xt + Who_htm1)
ct = ft * ctm1 + it * gt
ht = ot * ivy.tanh(ct)
hts_list.append(ivy.expand_dims(ht, -2))
return ivy.concat(hts_list, -2), ct
We could find and wrap the functional LSTM update methods for each backend framework which might bring a small performance improvement, but in this case there are no functional LSTM methods exposed in the official functional APIs of the backend frameworks, and therefore the functional LSTM code which does exist for the backends is much less stable and less reliable for wrapping into Ivy. Generally, we have made decisions so that Ivy is as stable and scalable as possible, minimizing dependencies to backend framework code where possible with minimal sacrifices in performance.
Source-to-Source Transpiler ✅#
Round Up
Hopefully, this has painted a clear picture of the fundamental building blocks underpinning the Ivy framework, being the Backend functional APIs, Ivy functional API, Backend handler, and Tracer 😄
Please reach out on discord if you have any questions!