Superset Behaviour#
When implementing functions in Ivy, whether they are primary, compositional, or mixed, we are constantly faced with the question: which backend implementation should Ivy most closely follow?
Extending the Standard#
It might seem as though this question is already answered. Ivy fully adheres to the Array API Standard, which helpfully limits our design space for the functions, but in its current form this only covers a relatively small number of functions, which together make up less than half of the functions in Ivy. Even for Ivy functions which adhere to the standard, the standard permits the addition of extra arguments and function features, provided that they do not contradict the requirements of the standard. Therefore, we are still faced with the same kind of design decisions for all Ivy functions, even those appearing in the Array API Standard.
What is the Superset?#
We explain through examples how Ivy always goes for the superset of functionality among the backend frameworks. This means that even if only one framework supports a certain feature, then we still strive to include this feature in the Ivy function. The Ivy function then entails the superset of all backend features. However, this is not always totally possible, and in some cases certain framework-specific features must be sacrificed, but usually it’s possible to implement a very generalized function which covers most of the unique features among the corresponding functions in each framework.
We strive to implement the superset for primary, compositional, and mixed functions.
In many cases compositional functions do not actually have corresponding backend-specific functions, but this is not always the case.
For example, ivy.linear()
is a fully compositional function, but torch.nn.functional.linear()
also exists.
We should therefore make sure the compositional ivy.linear()
function includes all behaviours supported by torch.nn.functional.linear()
.
A Non-Duplicate Superset#
It would be easy to assume that implementing the superset simply means adding all arguments from all related functions into the Ivy function.
However, this is not the case for a few reasons.
Firstly, different functions might have different argument names for the same behaviour.
Looking at the functions numpy.concatenate and torch.cat, we of course do not want to add both of the arguments axis
and dim
to ivy.concat()
, as these both represent exactly the same thing: the dimension/axis along which to concatenate.
In this case, the argument is covered in the Array API Standard and so we opt for axis
.
In cases where there are differences between the backend argument names, and the function or argument is not in the standard, then it is up to us to determine which argument name to use.
What is not the Superset?#
We’ve already explained that we should not duplicate arguments in the Ivy function when striving for the superset.
Does this mean, provided that the proposed argument is not a duplicate, that we should always add this backend-specific argument to the Ivy function?
The answer is no.
When determining the superset, we are only concerned with the pure mathematics of the function, and nothing else.
For example, the name
argument is common to many TensorFlow functions, such as tf.concat, and is used for uniquely identifying parts of the traced computation graph during logging and debugging.
This has nothing to do with the mathematics of the function, and so is not included in the superset considerations when implementing Ivy functions.
Similarly, in NumPy the argument subok
controls whether subclasses of the numpy.ndarray
class should be permitted, which is included in many functions, such as numpy.ndarray.astype.
Finally, in JAX the argument precision
is quite common, which controls the precision of the return values, as used in jax.lax.conv for example.
Similarly, the functions jacfwd()
and jacrev()
in JAX are actually mathematically identical, and these functions differ only in their underlying algorithm, either forward mode or reverse mode.
None of the above arguments or function variants are included in our superset considerations, as again they are not relating to the pure mathematics, and instead relate to framework, hardware, or algorithmic specifics. Given the abstraction layer that Ivy operates at, Ivy is fundamentally unable to control under-the-hood specifics such as those mentioned above. However, this is by design, and the central benefit of Ivy is the ability to abstract many different runtimes and algorithms under the same banner, unified by their shared fundamental mathematics.
A special case is the NumPy order
argument which controls the low-level memory layout of the array.
Although it normally has no effect on the mathematics of a function, in certain manipulation routines like reshape
, flatten
and ravel
, order determines the way the elements are read and placed into the reshaped array.
Therefore, ivy supports order
for these functions and any remaining logic surrounding order is handled in the NumPy frontend.
Regarding the only mathematics rule regarding the superset considerations, there are two exceptions to this, which are the handling of data type and device arguments. Neither of these relate to the pure mathematics of the function. However, as is discussed below, we always strive to implement Ivy functions such that they support as many data types and devices as possible.
Balancing Generalization with Efficiency#
Sometimes, the simplest way to implement superset behaviour comes at the direct expense of runtime efficiency.
We explore this through the examples of softplus()
.
ivy.softplus
When looking at the softplus()
(or closest equivalent) implementations for Ivy, JAX, TensorFlow, and PyTorch, we can see that torch is the only framework which supports the inclusion of the beta
and threshold
arguments, which are added for improved numerical stability.
We can also see that numpy does not support a softplus()
function at all.
Ivy should also support the beta
and threshold
arguments, in order to provide the generalized superset implementation among the backend frameworks.
Let’s take the tensorflow backend implementation as an example when assessing the necessary changes. Without superset behaviour, the implementation is incredibly simple, with only a single tensorflow function called under the hood.
def softplus(x: Tensor,
/,
*,
out: Optional[Tensor] = None) -> Tensor:
return tf.nn.softplus(x)
The simplest approach would be to implement softplus()
in each Ivy backend as a simple composition.
For example, a simple composition in the tensorflow backend would look like the following:
def softplus(x: Tensor,
/,
*,
beta: Optional[Union[int, float]] = 1,
threshold: Optional[Union[int, float]] = 20,
out: Optional[Tensor] = None) -> Tensor:
res = (tf.nn.softplus(x * beta)) / beta
return tf.where(x * beta > threshold, x, res)
This approach uses the default argument values used by PyTorch, and it does indeed extend the behaviour correctly.
However, the implementation now uses six tensorflow function calls instead of one, being: __mul__()
, tf.nn.softplus()
, __div__()
, __mul__()
, __gt__()
, tf.where()
in order of execution.
If a user doesn’t care about the extra threshold
and beta
arguments, then a 6× increase in backend functions is a heavy price to pay efficiency-wise.
Therefore, we should in general adopt a different approach when implementing superset behaviour. We should still implement the superset, but keep this extended behaviour as optional as possible, with maximal efficiency and minimal intrusion in the case that this extended behaviour is not required. The following would be a much better solution:
def softplus(x: Tensor,
/,
*,
beta: Optional[Union[int, float]] = None,
threshold: Optional[Union[int, float]] = None,
out: Optional[Tensor] = None) -> Tensor:
if beta is not None and beta != 1:
x_beta = x * beta
res = (tf.nn.softplus(x_beta)) / beta
else:
x_beta = x
res = tf.nn.softplus(x)
if threshold is not None:
return tf.where(x_beta > threshold, x, res)
return res
You will notice that this implementation involves more lines of code, but this should not be confused with added complexity.
All Ivy code should be traced for efficiency, and in this case all the if
and else
statements are removed, and all that remains is the backend functions which were executed.
This new implementation will be traced to a graph of either one, three, four, or six functions depending on the values of beta
and threshold
, while the previous implementation would always traces to six functions.
This does mean we do not adopt the default values used by PyTorch, but that’s okay. Implementing the superset does not mean adopting the same default values for arguments, it simply means equipping the Ivy function with the capabilities to execute the superset of behaviours.
More Examples#
We now take a look at some examples, and explain our rationale for deciding upon the function signature that we should use in Ivy. The first three examples are more-or-less superset examples, while the last example involves a deliberate decision to not implement the full superset, for some of the reasons explained above.
ivy.linspace
When looking at the linspace()
(or closest equivalent) implementations for Ivy, JAX, NumPy, TensorFlow, and PyTorch, we can see that torch does not support arrays for the start
and end
arguments, while JAX, numpy, and tensorflow all do.
Likewise, Ivy also supports arrays for the start
and stop
arguments, and in doing so provides the generalized superset implementation among the backend frameworks.
ivy.eye
When looking at the eye()
(or closest equivalent) implementations for Ivy, JAX, NumPy, TensorFlow, and PyTorch, we can see that tensorflow is the only framework which supports a batch_shape
argument.
Likewise, Ivy also supports a batch_shape
argument, and in doing so provides the generalized superset implementation among the backend frameworks.
ivy.scatter_nd
When looking at the scatter_nd()
(or closest equivalent) implementations for Ivy, JAX, NumPy, TensorFlow, and PyTorch, we can see that torch only supports scattering along a single dimension, while all other frameworks support scattering across multiple dimensions at once.
Likewise, Ivy also supports scattering across multiple dimensions at once, and in doing so provides the generalized superset implementation among the backend frameworks.
ivy.logical_and
When looking at the logical_and()
(or closest equivalent) implementations for Ivy, JAX, NumPy, TensorFlow, and PyTorch, we can see that numpy and torch support the out
argument for performing inplace updates, while JAX and tensorflow do not.
With regards to the supported data types, JAX, numpy, and torch support numeric arrays, while tensorflow supports only boolean arrays.
With regards to both of these points, Ivy provides the generalized superset implementation among the backend frameworks, with support for the out
argument and also support for both numeric and boolean arrays in the input.
However, as discussed above, np.logical_and()
also supports the where
argument, which we opt to not support in Ivy.
This is because the behaviour can easily be created as a composition like so ivy.where(mask, ivy.logical_and(x, y), ivy.zeros_like(mask))
, and we prioritize the simplicity, clarity, and function uniqueness in Ivy’s API in this case, which comes at the cost of reduced runtime efficiency for some functions when using a NumPy backend.
However, in future releases our automatic graph tracing and graph simplification processes will alleviate these minor inefficiencies entirely from the final computation graph, by fusing multiple operations into one at the API level where possible.
Maximizing Usage of Native Functionality#
While achieving the objective of having superset behaviour across the backends, the native functionality of frameworks should be made use of as much as possible. Even if a framework-specific function doesn’t provide complete superset behaviour, we should still make use of the partial behaviour that it provides and then add more logic for the remaining part. This is for efficiency reasons and is more explained under the Mixed Function section. In cases when a framework-specific function exists for one or two backends but not the others, we implement a Mixed Function. But when the framework-specific functions do not cover all superset functionality, Ivy also allows for a mixed-compositional hybrid approach.
Consider the example of interpolate()
.
Most frameworks contain some kind of interpolation function, usually limited to 2D and/or 3D, but ivy.interpolate()
should be much more general, including interpolations across a larger number of dimensions.
On top of this, different framework-specific functions support different sets of modes for interpolation.
For example, if we look at the framework-specific functions available that serve the purpose of interpolation
torch.nn.functional.interpolate()
supports a larger number of dimensions in the input but doesn’t support thegaussian
ormitchellcubic
modes which are supported bytf.image.resize()
.
tf.image.resize()
supports thegaussian
ormitchellcubic
modes but doesn’t support some other modes intorch.nn.functional.interpolate()
and it also doesn’t support larger than a 4-dimensional input.
jax.image.resize()
also has missing modes and doesn’t support a larger number of dimensions.
numpy
doesn’t have an equivalent function for interpolation (numpy.interp()
is very different from the functionality required).
So the ideal superset implementation for ivy.interpolate()
would be supporting the union of all modes supported by different implementations and support a larger number of dimensions in the input.
But there are a few considerations to be made,
Implementing all the modes for all the backend-specific implementations would be tedious and repetitive as some modes may not be supported by more than one framework.
We would need a completely compositional implementation for the
numpy
backend which doesn’t have an equivalent framework-specific function.But also having a single compositional implementation for all backends would be considerably inefficient as compared to the framework-specific functions with overlapping functionality.
As a workaround, we can simply make use of the backend-specific implementations for a certain number of dimensions and modes for each backend, and then have a general compositional implementation which covers all the remaining cases. This will make sure that we don’t introduce any inefficiencies and also avoid re-implementation for all the backends.
Ivy allows this using the partial_mixed_handler attribute on the backend-specific implementation. So the torch
backend implementation of interpolate()
would look like the following,
def interpolate(
x: torch.Tensor,
size: Union[Sequence[int], int],
/,
*,
mode: Literal[
"linear",
"bilinear",
"trilinear",
"nearest",
"area",
"nearest_exact",
"tf_area",
"bicubic",
"mitchellcubic",
"lanczos3",
"lanczos5",
"gaussian",
] = "linear",
scale_factor: Optional[Union[Sequence[int], int]] = None,
recompute_scale_factor: Optional[bool] = None,
align_corners: Optional[bool] = None,
antialias: bool = False,
out: Optional[torch.Tensor] = None,
):
return torch.nn.functional.interpolate(
x,
size=size,
mode=mode,
align_corners=align_corners,
antialias=antialias,
scale_factor=scale_factor,
recompute_scale_factor=recompute_scale_factor,
)
interpolate.partial_mixed_handler = lambda *args, mode="linear", **kwargs: mode not in [
"tf_area",
"tf_bicubic",
"mitchellcubic",
"lanczos3",
"lanczos5",
"gaussian",
]
When the backend is set, we use this attribute to apply the handle_partial_mixed_function decorator to the function.
The @handle_partial_mixed_function
accepts a function as an input that receives the arguments and keyword arguments passed to the backend-specific implementation.
The input function is expected to be a boolean function where we’d use the backend-specific implementation if True
and the compositional implementation if False
.
This provides the flexibility to add any custom logic based on the use-case for maximal use of framework-specific implementations while achieving superset generalization.
Note
Even though we are always striving to adhere to the superset, there might be cases where a feature has slipped under the radar. In case you stumble upon an Ivy function that you think has not included all native framework functionalities in the optimal way, you are invited to let us know in the comment section of this dedicated issue.
Round Up
This should have hopefully given you a good feel of what should and should not be included when deciding how to design a new Ivy function. In many cases, there is not a clear right and wrong answer, and we arrive at the final decision via open discussion. If you find yourself proposing the addition of a new function in Ivy, then we will most likely have this discussion on your Pull Request!
If you have any questions, please feel free to reach out on discord in the superset behavior thread!