FAQ#
These are some of the most common technical questions that continue to arise when we’re discussing Ivy with developers in the community.
As Ivy becomes more mature and we continue these discussions, then many more questions and answers will no doubt be added!
We are all incredibly grateful to everyone in the community who has put in the time and effort to ask so many important and probing questions! We hope these Q&As are a useful reference!
Maintaining Backend Versions#
Q: Isn’t it complex to maintain support for all backend versions, particularly as they undergo constant changes? How are you going to handle this, will you have an option to select any version for any backend?
A: Ivy only wraps the functional APIs of each backend framework. The last 2 years of Ivy development have shown us how remarkably stable the functional APIs are for each backend framework. Not once have we needed to change an implementation or a unit test as a result of a version update of a backend framework. This is not entirely surprising, each framework has strong backward compatibility requirements, and the functional API is generally one of the lower level building blocks upon which everything else in the framework depends. Our CI always tests against the latest version available on PyPI, and this has been the case since we started development. We do not lock-in any versions during our continuous testing, and we will continue to always pull the latest version.
In the future, we hope to add explicit testing also for previous versions, so we can guarantee backward compatibility for each backend. We will also add an option to select backend versions for the small minority of cases where changes in the backend functional APIs do cause breaking changes for Ivy.
Dynamic Sizes#
Q: Assuming a static computation graph, can tensors have sizes that dynamically change? XLA does not support dynamic sizes, because it JIT-compiles the graph, and pre-allocates all buffers in memory before the graph runs. TensorFlow and PyTorch do allow dynamic sizes, but only on certain backends. Dynamic sizes require a dynamic memory manager, which CPUs/GPUs have, but XLA currently doesn’t. How does Ivy deal with all of this?
A: Ivy assumes dynamic shapes are supported, but an error will be thrown if/when the function is traced with dynamic shapes enabled, but the backend does not support dynamic shapes in the traced graph. For now, fully framework-agnostic traced graphs are only possible for static graphs.
Type and Shape Checking#
Q: What kind of type system does Ivy use? Does it do shape-checking of tensors? If so, how does it handle dynamic sizes? The gold standard here is a fully dependent type system, but this is very rare, with the exception of dex.
A: The checks performed during compiling will remain backend-specific.
The function ivy.trace_graph()
wraps the backend tracing functions, for example jax.jit()
, tf.function()
, torch.jit.script()
and torch.jit.trace()
.
For some backends, shape-checking will be performed during the tracing phase and for others it will not.
GPU handling#
Q: How does Ivy handle GPU usage?
A: Ivy handles GPU usage by simply wrapping the backend frameworks, so Ivy will use GPUs in the same manner as the backend framework does. E.g. When using a torch backend, torch will be a dependency of Ivy, and its handling of GPU functionalities will be inherited and extended upon by Ivy.
Auto-Differentiation#
Q: How do you handle reverse mode, forward mode, and Jacobians? How about stop gradients, and gradient checkpointing, and custom gradients? What about autodiff for control-flow operators like lax.scan
?
This is where JAX really shines, and unless you are implementing your own autodiff framework, you are at the mercy of whatever the backend supports.
A: Ivy will implement all of the general methods that JAX supports, and will provide errors if/when the backend does not support this.
In general, Ivy will support the superset of functionality, and not just the lowest common denominator.
Ivy takes a fully functional approach like JAX, and the API enables arbitrary nested execute_with_gradient
calls up to an arbitrary gradient order.
Again, if a backend does not support this then an error will be thrown.
This means Ivy code is not 100% framework-agnostic, and is indeed at the mercy of what the backend autograd package supports in these cases.
Replicas, and Data vs Model Parallelism#
Q: Big models don’t run on just one device, and the major frameworks have very different ways of splitting a model up so that it runs on a cluster. There are multiple competing paradigms for parallelisation – e.g. SPMD vs mixture of experts. JAX and Tensorflow are very sophisticated in this department, and routinely run models on hundreds or thousands of devices. How will Ivy support multi-device training, if at all?
A: This is not something we’re diving into too deeply at the moment. However, we have written our own API for distributed training, which broadly follows PyTorch’s approach using a CUDA-enabled multiprocessing module.
If heavily distributed training is important. Then Ivy can be supplementary for the time being, rather than a total replacement. For example, someone can use TensorFlow’s distributed training tools, and just use Ivy to copy over a PyTorch model into their TF pipeline.
We are not trying to encourage anyone to drop any existing tools and just use Ivy instead. Projects can use 1% Ivy code or 100%. We’re very happy in either case!
Support for Functions#
Q: Is it possible to trace tensor code into a reusable and differentiable function? If you can’t, then it will be difficult to apply any fancy kernel fusion algorithms, and you can expect to lose a lot of performance.
What about higher-order operations, like jax.vmap
and jax.pmap
?
A: Most functions in Ivy are primary functions, which are generally implemented as light wrapping around a near-identical backend-specific function, which itself will likely map to an efficient kernel. Compositional functions on the other hand are implemented as a composition of other Ivy functions, meaning there will not be a one-to-one mapping to a single backend kernel. However, our experiments (to be published soon!) show this does not lead to a significant run-time overhead, even when a composition of operations is required.
For methods like jax.vmap
and jax.pmap
, we will need to implement these as (possibly inefficient) compositions in other frameworks, until they are supported in these frameworks.
However, it seems as though other frameworks such as PyTorch are seeing the benefit in these functions, and will eventually fully support these.
Alternative Data Structures#
Q: Will Ivy support data structures such as tuples, dictionaries, lists etc.? For example, JAX code is full of them.
A: We will of course support these structures in pure python code, but we will not support backend-specific alternative compilable data structures. While Ivy will not provide an interface to these data structures directly, Ivy code can easily supplement JAX code which does contain these data structures, and both can be traced together without issue. Ivy can act as a supplementary framework if/when some of the more unique backend-specific data structures are required.
Custom Operations#
Q: Most frameworks have a backdoor for user-defined ops, implemented in C++/CUDA, or some kind of host callback mechanism. Will Ivy support this ability also?
A: We will not attempt to provide a unified back-door for all possible backend kernel customizations, but of course users can still use the backend-specific backdoors which already exist when using Ivy.