learn-cutlass-5

Cutlass use abstract layout to express the mapping rules from logic index to physical index.

Read more

NVVM (NVCC & LLVM)

NVIDIA’s CUDA Compiler (NVCC) is based on the widely used LLVM open source compiler infrastructure. Developers can create or extend programming languages with support for GPU acceleration using the NVIDIA Compiler SDK.

Read more

learn-cutlass-3

Warp-level GEMMs may be implemented either by TensorCores issuing mma.sync or wmma instructions, or by thread-level matrix computations issued to CUDA cores. Wmma is an API in CUDA C++ for using TensorCores and if you want to use TensorCores by mma.sync you must use ptx by asm.

Read more

learn-cutlass-2

I always wonder why cutlass provides many kinds of implementions of GEMM instead of just only one. In my opinion, in different situations the best implementions of GEMM differs. So that is what differs cutlass from cublas. You can make your own custiomlized implemention of GEMM to provide the best performance.

Read more

冰雹

中午在微博上看到广州下冰雹了,但是自己却没有遇见,今天下午刚到超算,听见外面有几声巨大的雷响,就发现外面在下冰雹,于是想出去感受下被冰雹砸中的感觉,刚到楼下就转为雨点了,这冰雹持续时间也太短了吧。记得上一次遇见冰雹还是在高中时候快高考在教室中模考,当时还特意去窗户边看了下。

我的一生一芯

一生一芯是一个开放社区性质的公益教学项目,主要是为了解决中国芯片设计人才缺失的问题,主办方是国科大和计算所,要求学生设计出一块自己的CPU并成功流片。

Read more

learn-cutlass-1

In cutlass 3.0, it introduces a new library, Cute, to describe and manipulate tensors of threads and data.

Read more

learn-cutlass-0

learn cutlass is a series of tutorials to learn cutlass by reading its examples or source code

CUTLASS is a header-only template library. After reading that, you will be lost in templates.

Read more