scaled_dot_product_attention#
- ivy.scaled_dot_product_attention(query, key, value, /, *, scale=None, mask=None, dropout_p=0.0, is_causal=False, training=False, out=None)[source]#
Apply scaled dot product attention to inputs x using optional mask.
- Parameters:
query (
Union
[Array
,NativeArray
]) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.key (
Union
[Array
,NativeArray
]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.value (
Union
[Array
,NativeArray
]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.scale (
Optional
[float
], default:None
) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.mask (
Optional
[Union
[Array
,NativeArray
]], default:None
) – The mask input array. The mask to apply to the query-key values. Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].dropout_p (
Optional
[float
], default:0.0
) – Specifies the dropout probability, if greater than 0.0, dropout is appliedis_causal (
Optional
[bool
], default:False
) – If true, assumes causal attention masking and errors if both mask and is_causal are set.training (
Optional
[bool
], default:False
) – If True, dropout is used, otherwise dropout is not activated.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
- Returns:
ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .
Both the description and the type hints above assumes an array input for simplicity,
but this function is nestable, and therefore also accepts
ivy.Container
instances in place of any of the arguments.
Examples
With
ivy.Array
input:>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]]) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]]) >>> result = ivy.scaled_dot_product_attention(q, ... k, ... v, ... scale=1, ... dropout_p=0.1, ... is_causal=True, ... training=True) >>> print(result)
ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]]) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]]) >>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]]) >>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask) >>> print(result)
- ivy.array([[[2.30000019, 3.23333359],
[2.30000019, 3.23333359], [2.30000019, 3.23333359]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]) >>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]]) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]]) >>> out = ivy.zeros(shape=(1, 3, 2)) >>> ivy.scaled_dot_product_attention(q, ... k, ... v, ... scale=1, ... dropout_p=0.1, ... is_causal=True, ... training=True, ... out=out) >>> print(out)
ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])
>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]]) >>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]]) >>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]]) >>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask) >>> print(result)
ivy.array([[[2.30000019, 3.23333359], … [2.30000019, 3.23333359], … [2.30000019, 3.23333359]]])
>>> q = ivy.native_array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]) >>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]]) >>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]]) >>> out = ivy.zeros(shape=(1, 3, 2)) >>> ivy.scaled_dot_product_attention(q, ... k, ... v, ... scale=1, ... dropout_p=0.1, ... is_causal=True, ... training=True, ... out=out) >>> print(out)
ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])
With
ivy.Container
input:>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]), ... b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]])) >>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> result = ivy.scaled_dot_product_attention(q, ... k, ... v, ... scale=1, ... dropout_p=0.1, ... is_causal=True, ... training=True) >>> print(result) { a: ivy.array([[[5.19999981, 1.], ... [2.59249449, 2.68226194], ... [4.4000001, 5.5999999]]]), b: ivy.array([[[0.2, 1.], ... [2.19603825, 2.9960382], ... [4.4000001, 5.5999999]]]) }
>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]), ... b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]])) >>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> mask = ivy.Container( ... a=ivy.array([[[1.0, 1.0, 1.0],[1.0, 1.0, 1.0],[1.0, 1.0, 1.0]]]), ... b=ivy.array([[[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0,1.0]]]) ... ) >>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask) >>> print(result) { a: ivy.array([[[4.26894283, 5.40236187], ... [4.39999437, 5.59999037], ... [4.4000001, 5.5999999]]]), b: ivy.array([[[4.35046196, 5.54282808], ... [4.39989519, 5.5998764], ... [4.4000001, 5.5999999]]]) }
With a mix of
ivy.Array
andivy.NativeArray
inputs:>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]]) >>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]]) >>> result = ivy.scaled_dot_product_attention(q, ... k, ... v, ... scale=1, ... dropout_p=0.1, ... is_causal=True, ... training=True) >>> print(result)
ivy.array([[[0.40000001, 1.29999995], … [2.19994521, 3.09994531], … [4.30000019, 5.30000019]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]) >>> k = ivy.native_array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]]) >>> v = ivy.native_array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]]) >>> out = ivy.zeros(shape=(1, 3, 2)) >>> ivy.scaled_dot_product_attention(q,k,v,scale=1,out=out) >>> print(out) ivy.array([[[4.03946018, 5.0280633 ], ... [4.29981947, 5.29981089], ... [4.30000019, 5.30000019]]])
With a mix of
ivy.Array
andivy.Container
inputs:>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]), ... b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]])) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]]) >>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,is_causal=True) >>> print(result) { a: ivy.array([[[0.40000001, 1.29999995], ... [2.06345534, 2.9634552], ... [4.30000019, 5.30000019]]]), b: ivy.array([[[0.40000001, 1.29999995], ... [2.19336844, 3.09336829], ... [4.30000019, 5.30000019]]]) } With a mix of :class:`ivy.Array` and :class:`ivy.Container` inputs:
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3],[4.4, 5.6]]]), ... b=ivy.array([[[3.2, 1.], [2.2, 3.6],[4.0, 5.6]]])) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]]) >>> mask = ivy.native_array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]]) >>> result = ivy.scaled_dot_product_attention(q, ... k, ... v, ... scale=1, ... mask=mask, ... dropout_p=0.1, ... training=True) >>> print(result) { a: ivy.array([[[2.30000019, 3.23333359], ... [2.30000019, 3.23333359], ... [2.30000019, 3.23333359]]]), b: ivy.array([[[2.30000019, 3.23333359], ... [2.30000019, 3.23333359], ... [2.30000019, 3.23333359]]]) }
- Array.scaled_dot_product_attention(self, key, value, /, *, scale=None, mask=None, dropout_p=0.0, is_causal=False, training=False, out=None)[source]#
ivy.Array instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.
- Parameters:
self (
Array
) – The queries input array. The shape of queries input array should be in [batch_shape,num_queries,feat_dim]. The queries input array should have the same size as keys and values.key (
Union
[Array
,NativeArray
]) – The keys input array. The shape of keys input array should be in [batch_shape,num_keys,feat_dim]. The keys input array should have the same size as queries and values.value (
Union
[Array
,NativeArray
]) – The values input array. The shape of values input should be in [batch_shape,num_keys,feat_dim]. The values input array should have the same size as queries and keys.scale (
Optional
[float
], default:None
) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.mask (
Optional
[Union
[Array
,NativeArray
]], default:None
) – The mask input array. The mask to apply to the query-key values. Default is None. The shape of mask input should be in [batch_shape,num_queries,num_keys].dropout_p (
Optional
[float
], default:0.0
) – Specifies the dropout probability, if greater than 0.0, dropout is appliedis_causal (
Optional
[bool
], default:False
) – If true, assumes causal attention masking and errors if both mask and is_causal are set.training (
Optional
[bool
], default:False
) – If True, dropout is used, otherwise dropout is not activated.out (
Optional
[Array
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Array
- Returns:
ret – The output following application of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .
Examples
With
ivy.Array
input:>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]]) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]]) >>> result = ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1, ... is_causal=True, training=True) >>> print(result) ivy.array([[[0.40000001, 1.29999995], [2.19994521, 3.09994531], [4.30000019, 5.30000019]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.],[4.4, 5.6]]]) >>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3],[4.2, 5.1]]]) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1],[4.3, 5.3]]]) >>> mask = ivy.array([[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0],[0.0, 0.0, 0.0]]]) >>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1, mask=mask) >>> print(result) ivy.array([[[0.40000001, 1.29999995], [2.19994521, 3.09994531], [4.30000019, 5.30000019]]])
>>> q = ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]]) >>> k = ivy.array([[[0.6, 1.5], [2.4, 3.3], [4.2, 5.1]]]) >>> v = ivy.array([[[0.4, 1.3], [2.2, 3.1], [4.3, 5.3]]]) >>> out = ivy.zeros(shape=(1, 3, 2)) >>> ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1, ... is_causal=True, training=True, out=out) >>> print(out) ivy.array([[[0.40000001, 1.29999995], [2.19994521, 3.09994531], [4.30000019, 5.30000019]]])
- Container.scaled_dot_product_attention(self, key, value, /, *, scale, mask=None, dropout_p=0.0, is_causal=False, training=False, key_chains=None, to_apply=True, prune_unapplied=False, map_sequences=False, out=None)[source]#
ivy.Container instance method variant of ivy.scaled_dot_product_attention. This method simply wraps the function, and so the docstring for ivy.scaled_dot_product_attention also applies to this method with minimal changes.
- Parameters:
self (
Container
) – The queries input container. The shape of queries input array leaves should be in [batch_shape,num_queries,feat_dim]. The queries input array leaves should have the same size as keys and values.key (
Union
[Array
,NativeArray
,Container
]) – The keys input array container. The shape of keys input array leaves should be in [batch_shape,num_keys,feat_dim]. The keys input array leaves should have the same size as queries and values.value (
Union
[Array
,NativeArray
,Container
]) – The values input array container. The shape of values input array leaves should be in [batch_shape,num_keys,feat_dim]. The values input array leaves should have the same size as queries and keys.scale (
Union
[float
,Container
]) – The scale float value. The scale float value is used to scale the query-key pairs before softmax.mask (
Optional
[Union
[Array
,NativeArray
,Container
]], default:None
) – The mask input array/container. The mask to apply to the query-key values. Default is None. The shape of mask input array leaves should be in [batch_shape,num_queries,num_keys].dropout_p (
Optional
[float
], default:0.0
) – Specifies the dropout probability, if greater than 0.0, dropout is appliedis_causal (
Optional
[bool
], default:False
) – If true, assumes causal attention masking and errors if both mask and is_causal are set.training (
Optional
[bool
], default:False
) – If True, dropout is used, otherwise dropout is not activated.key_chains (
Optional
[Union
[List
[str
],Dict
[str
,str
],Container
]], default:None
) – The key-chains to apply or not apply the method to. Default isNone
.to_apply (
Union
[bool
,Container
], default:True
) – If True, the method will be applied to key_chains, otherwise key_chains will be skipped. Default isTrue
.prune_unapplied (
Union
[bool
,Container
], default:False
) – Whether to prune key_chains for which the function was not applied. Default isFalse
.map_sequences (
Union
[bool
,Container
], default:False
) – Whether to also map method to sequences (lists, tuples). Default isFalse
.out (
Optional
[Container
], default:None
) – optional output array, for writing the result to. It must have a shape that the inputs broadcast to.
- Return type:
Container
- Returns:
ret – The output container following applications of scaled dot-product attention. The output array is the weighted sum produced by the attention score and value. The shape of output array is [batch_shape,num_queries,feat_dim] .
Examples
With
ivy.Container
input:>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]), ... b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]]))
>>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> result = ivy.scaled_dot_product_attention(q, k, v, scale=1, dropout_p=0.1, ... is_causal=True, training=True) >>> print(result) { a: ivy.array([[[5.19999981, 1.], [2.59249449, 2.68226194], [4.4000001, 5.5999999]]]), b: ivy.array([[[0.2, 1.], [2.19603825, 2.9960382], [4.4000001, 5.5999999]]]) }
>>> q = ivy.Container(a=ivy.array([[[0.2, 1.], [2.7, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[1.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> k = ivy.Container(a=ivy.array([[[4.2, 1.], [2.2, 3.3], [4.4, 5.6]]]), ... b=ivy.array([[[3.2, 1.], [2.2, 3.6], [4.0, 5.6]]])) >>> v = ivy.Container(a=ivy.array([[[5.2, 1.], [2.1, 3.], [4.4, 5.6]]]), ... b=ivy.array([[[0.2, 1.], [2.2, 3.], [4.4, 5.6]]])) >>> mask = ... ivy.Container(a=ivy.array([[[1.0, 1.0, 1.0], ... [1.0, 1.0, 1.0], ... [1.0, 1.0, 1.0]]]), ... b=ivy.array([[[1.0, 1.0, 1.0], ... [1.0, 1.0, 1.0], ... [1.0, 1.0,1.0]]])) >>> result = ivy.scaled_dot_product_attention(q,k,v,scale=1,mask=mask) >>> print(result) { a: ivy.array([[[4.26894283, 5.40236187], [4.39999437, 5.59999037], [4.4000001, 5.5999999]]]), b: ivy.array([[[4.35046196, 5.54282808], [4.39989519, 5.5998764], [4.4000001, 5.5999999]]])
}