learn-cutlass-5
Cutlass use abstract layout
to express the mapping rules from logic index to physical index.
Affine2
18_amphere_fp64_tensorop_affine2_gemm
Affine2 is a speical layout in cutlass.
In the normal GEMM, the fast changing dimension of a matrix always has stride
equals to 1, e.g. ColumnMajor and RowMajor matrix. Affine2 matrix can have
larger than 1 stride in both dimensions. To support such layout, we need to
change to method to visit the global memory:
- We can only visit 1 element a time because elements are not stored
consecutively anymore. Vectorized load/store is not possible. - One extra multiplication is needed in calculating the global memory
address
addr = base_pointer + coord1 * stride1 + coord2 * stride2
The explanation is a little abstract, let’s create an example to illustrate it.
1 |
|
And the output should be
1 | 68, -21, 56, 59, |
So affine2 is a layout that builds a submatrix through extracting original matrix based on the given stride.
Quaternion
21_quaternion_gemm
Quaternion is an interesting concept mostly used in computer graphics. In my opinion, it can be seen as analogy to complex number.
The detailed information about quaternion can be found here.
learn-cutlass-5