scaled_dot_product_attention#

ivy.scaled_dot_product_attention(query, key, value, /, *, scale=None, mask=None, dropout_p=0.0, is_causal=False, training=False, out=None)[source]#

Apply scaled dot product attention to inputs x using optional mask.

Parameters:

query (Union[Array, NativeArray]) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.
key (Union[Array, NativeArray]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.
value (Union[Array, NativeArray]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.
scale (Optional[float], default: None) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.
mask (Optional[Union[Array, NativeArray]], default: None) – The mask input array. The mask to apply to the query-key values. Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].
dropout_p (Optional[float], default: 0.0) – Specifies the dropout probability, if greater than 0.0, dropout is applied
is_causal (Optional[bool], default: False) – If true, assumes causal attention masking and errors if both mask and is_causal are set.
training (Optional[bool], default: False) – If True, dropout is used, otherwise dropout is not activated.
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .
Both the description and the type hints above assumes an array input for simplicity,
but this function is nestable, and therefore also accepts ivy.Container
instances in place of any of the arguments.

Examples

With ivy.Array input:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q,
...                                           k,
...                                           v,
...                                           scale=1,
...                                           dropout_p=0.1,
...                                           is_causal=True,
...                                           training=True)
>>> print(result)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)

ivy.array([[[2.30000019, 3.23333359],: [2.30000019, 3.23333359], [2.30000019, 3.23333359]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q,
...                                  k,
...                                  v,
...                                  scale=1,
...                                  dropout_p=0.1,
...                                  is_causal=True,
...                                  training=True,
...                                  out=out)
>>> print(out)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)

ivy.array([[[2.30000019, 3.23333359], … [2.30000019, 3.23333359], … [2.30000019, 3.23333359]]])

>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q,
...                                  k,
...                                  v,
...                                  scale=1,
...                                  dropout_p=0.1,
...                                  is_causal=True,
...                                  training=True,
...                                  out=out)
>>> print(out)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

With ivy.Container input:

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> result = ivy.scaled_dot_product_attention(q,
...                                           k,
...                                           v,
...                                           scale=1,
...                                           dropout_p=0.1,
...                                           is_causal=True,
...                                           training=True)
>>> print(result)
{
    a: ivy.array([[[5.19999981, 1.],
    ...            [2.59249449, 2.68226194],
    ...            [4.4000001, 5.5999999]]]),
    b: ivy.array([[[0.2, 1.],
    ...            [2.19603825, 2.9960382],
    ...            [4.4000001, 5.5999999]]])
}

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> mask = ivy.Container(
...     a=ivy.array([[[1.0, 1.0, 1.0],[1.0, 1.0, 1.0],[1.0, 1.0, 1.0]]]),
...     b=ivy.array([[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0,1.0]]])
... )
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)
{
    a: ivy.array([[[4.26894283, 5.40236187],
    ...            [4.39999437, 5.59999037],
    ...            [4.4000001, 5.5999999]]]),
    b: ivy.array([[[4.35046196, 5.54282808],
    ...            [4.39989519, 5.5998764],
    ...            [4.4000001, 5.5999999]]])
}

With a mix of ivy.Array and ivy.NativeArray inputs:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q,
...                                            k,
...                                            v,
...                                            scale=1,
...                                            dropout_p=0.1,
...                                            is_causal=True,
...                                            training=True)
>>> print(result)

ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q,k,v,scale=1,out=out)
>>> print(out)
ivy.array([[[4.03946018, 5.0280633 ],
...         [4.29981947, 5.29981089],
...         [4.30000019, 5.30000019]]])

With a mix of ivy.Array and ivy.Container inputs:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,is_causal=True)
>>> print(result)
{
    a: ivy.array([[[0.40000001, 1.29999995],
    ...            [2.06345534, 2.9634552],
    ...            [4.30000019, 5.30000019]]]),
    b: ivy.array([[[0.40000001, 1.29999995],
    ...            [2.19336844, 3.09336829],
    ...            [4.30000019, 5.30000019]]])
}
With a mix of :class:`ivy.Array` and :class:`ivy.Container` inputs:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3],[4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6],[4.0, 5.6]]]))
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,
...                                           k,
...                                           v,
...                                           scale=1,
...                                           mask=mask,
...                                           dropout_p=0.1,
...                                           training=True)
>>> print(result)
{
    a: ivy.array([[[2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359]]]),
    b: ivy.array([[[2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359],
    ...            [2.30000019, 3.23333359]]])
}

Array.scaled_dot_product_attention(self, key, value, /, *, scale=None, mask=None, dropout_p=0.0, is_causal=False, training=False, out=None)[source]#

ivy.Array instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.

Parameters:

self (Array) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.
key (Union[Array, NativeArray]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.
value (Union[Array, NativeArray]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.
scale (Optional[float], default: None) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.
mask (Optional[Union[Array, NativeArray]], default: None) – The mask input array. The mask to apply to the query-key values. Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].
dropout_p (Optional[float], default: 0.0) – Specifies the dropout probability, if greater than 0.0, dropout is applied
is_causal (Optional[bool], default: False) – If true, assumes causal attention masking and errors if both mask and is_causal are set.
training (Optional[bool], default: False) – If True, dropout is used, otherwise dropout is not activated.
out (Optional[Array], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Array

Returns:

ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

Examples

With ivy.Array input:

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> result = ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                           is_causal=True, training=True)
>>> print(result)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]])
>>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]])
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1, mask=mask)
>>> print(result)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])

>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])
>>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]])
>>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]])
>>> out = ivy.zeros(shape=(1, 3, 2))
>>> ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                  is_causal=True, training=True, out=out)
>>> print(out)
ivy.array([[[0.40000001, 1.29999995],
            [2.19994521, 3.09994531],
            [4.30000019, 5.30000019]]])

Container.scaled_dot_product_attention(self, key, value, /, *, scale, mask=None, dropout_p=0.0, is_causal=False, training=False, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, out=None)[source]#

ivy.Container instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.

Parameters:

self (Container) – The queries input container. The shape of queries input array leaves should be in [batch_shape,num_queries,feat_dim]. The queries input array leaves should have the same size as keys and values.
key (Union[Array, NativeArray, Container]) – The keys input array container. The shape of keys input array leaves should be in [batch_shape,num_keys,feat_dim]. The keys input array leaves should have the same size as queries and values.
value (Union[Array, NativeArray, Container]) – The values input array container. The shape of values input array leaves should be in [batch_shape,num_keys,feat_dim]. The values input array leaves should have the same size as queries and keys.
scale (Union[float, Container]) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.
mask (Optional[Union[Array, NativeArray, Container]], default: None) – The mask input array/container. The mask to apply to the query-key values. Default is None. The shape of mask input array leaves should be in [batch_shape,num_queries,num_keys].
dropout_p (Optional[float], default: 0.0) – Specifies the dropout probability, if greater than 0.0, dropout is applied
is_causal (Optional[bool], default: False) – If true, assumes causal attention masking and errors if both mask and is_causal are set.
training (Optional[bool], default: False) – If True, dropout is used, otherwise dropout is not activated.
key_chains (Optional[Union[List[str], Dict[str, str], Container]], default: None) – The key-chains to apply or not apply the method to. Default is None.
to_apply (Union[bool, Container], default: True) – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default is True.
prune_unapplied (Union[bool, Container], default: False) – Whether to prune key_chains for which the function was not applied. Default is False.
map_sequences (Union[bool, Container], default: False) – Whether to also map method to sequences (lists, tuples). Default is False.
out (Optional[Container], default: None) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.

Return type:

Container

Returns:

ret – The output container following applications of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .

Examples

With ivy.Container input:

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))

>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> result = ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1,
...                                           is_causal=True, training=True)
>>> print(result)
{
    a: ivy.array([[[5.19999981, 1.],
                   [2.59249449, 2.68226194],
                   [4.4000001, 5.5999999]]]),
    b: ivy.array([[[0.2, 1.],
                   [2.19603825, 2.9960382],
                   [4.4000001, 5.5999999]]])
}

>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]),
...                   b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]),
...                   b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]))
>>> mask =
... ivy.Container(a=ivy.array([[[1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0]]]),
...               b=ivy.array([[[1.0, 1.0, 1.0],
...                             [1.0, 1.0, 1.0],
...                             [1.0, 1.0,1.0]]]))
>>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask)
>>> print(result)
{
    a: ivy.array([[[4.26894283, 5.40236187],
                   [4.39999437, 5.59999037],
                   [4.4000001, 5.5999999]]]),
    b: ivy.array([[[4.35046196, 5.54282808],
                   [4.39989519, 5.5998764],
                   [4.4000001, 5.5999999]]])

}