Mali G610 deep learning runtime notes

11.05.2024

I have had this Orange Pi 5 for over a year now, and I am using it to piece together my knowledge on the topic of mobile gpu runtimes and embedded (really, system-on-chip) applications. I am interested in using the Orange Pi as part of a robotics project I am currently working on, so today we’re trying to understand if we can do this in Biden’s America, or if I’ll have to wait a few more months for the mainlining of more features in the panfrost driver. To my understanding, there is solid OpenCL support for this platform in Mesa already. So to make this guide different from others, we are going to try to use Vulkan, ideally get support for the IREE runtime for MLIR. Stability for Vulkan on Mesa is experimental, and I know nothing really about graphics driver development, but we’ll see if we can get something to work. Maybe we’ll learn something too.

Important: also I found this driver from rockchip, so I want to try to see what functionality we can get out of this driver from the vendor.

Setup

The installed system:

I also made sure that I have the panthor-gpu dts overlay applied so we should have the latest panfrost OpenGL support. I have tested the desktop works, so I assume OpenGL is working fine. And I can get clinfo to work if I add rockchip’s userspace drivers as well.

However, want to try the absolute latest version of mesa, because it includes this patch from Collabora, which should allow us to use Vulkan. I want to run some benchmarks, but first we need to cross-compile Mesa, so that I can have that sweet new PanVK driver.

Building Mesa

I built the PanVK driver from the current HEAD of Mesa with the following options

# build libdrm first
git clone https://gitlab.freedesktop.org/mesa/drm
cd drm/
mkdir build
cd build/
meson
sudo ninja install

# now build the drivers
git clone https://gitlab.freedesktop.org/mesa/mesa.git
cd mesa
mkdir build
cd build
meson -Dvulkan-drivers=panfrost -Dgallium-drivers=panfrost,swrast -Dlibunwind=false -Dprefix=/opt/panfrost
sudo ninja install
echo /opt/panfrost/lib/aarch64-linux-gnu | sudo tee /etc/ld.so.conf.d/0-panfrost.conf
sudo ldconfig

Basically, it’s identical to this guide, except we make sure that vulkan-drivers=panfrost because that’s a possibility now. This is going to take some time, so make sure to put those correct flags.

Vulkan Loader & ICD

According to IREE’s vulkan documentation,

“ there are four main components involved in this architecture:

We’ll of course concern ourselves with the Application later. The Loader was part of what got built as part of the userspace provided by Mesa, and it is called /opt/panfrost/lib/aarch64-linux-gnu/libvulkan_panfrost.so. The ‘Layers’ are “optional components that augment the Vulkan development environment”, that includes some interesting and useful stuff like debugging, profiling and tracing, but we’ll also ignore that for now.

To my understanding, the ICD acts as the handle to the ‘client’ or the instance of the library provided by a particular driver. It’s not created by default with this experimental driver, so we have to create this JSON file at /etc/vulkan/icd.d/panvk_icd.json:

{
"file_format_version": "1.0.0",
	"ICD": {
		"library_path": "/opt/panfrost/lib/aarch64-linux-gnu/libvulkan_panfrost.so",
		"api_version": "1.0.0"
	}
}

After that, I ran vulkaninfo from the vulkan-tools package, and it worked. badass. Although, that environment variable does not bode well.

orangepi5:uVkCompute:% PAN_I_WANT_A_BROKEN_VULKAN_DRIVER=1 vulkaninfo --summary
'DISPLAY' environment variable not set... skipping surface info
WARNING: panvk is not a conformant Vulkan implementation, testing use only.
WARNING: [../src/panfrost/vulkan/panvk_physical_device.c:71] Code 0 : device /dev/dri/renderD129 does not use the panfrost kernel driver (VK_ERROR_INCOMPATIBLE_DRIVER)
WARNING: [../src/panfrost/vulkan/panvk_physical_device.c:71] Code 0 : device /dev/dri/renderD128 does not use the panfrost kernel driver (VK_ERROR_INCOMPATIBLE_DRIVER)
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.275


Instance Extensions: count = 15
-------------------------------
VK_EXT_acquire_xlib_display            : extension revision 1
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_headless_surface                : extension revision 1
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_wayland_surface                 : extension revision 6
VK_KHR_xcb_surface                     : extension revision 6
VK_KHR_xlib_surface                    : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers:
----------------

Devices:
========
GPU0:
	apiVersion         = 1.0.296
	driverVersion      = 24.2.99
	vendorID           = 0x13b5
	deviceID           = 0xa8670000
	deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
	deviceName         = Mali-G610 (Panfrost)
	driverID           = DRIVER_ID_MESA_PANVK
	driverName         = panvk
	driverInfo         = Mesa 24.3.0-devel (git-07ca1bbb05)
	conformanceVersion = 0.0.0.0

Update 2.4.25: According to this new post from Collabora, the special environment variable is no longer needed.

Tests

Time to test some benchmarks so I know what to expect. I will use μVkCompute because we’re interested primarily in doing compute workloads on this thing (that’s why we’re using vulkan, because it makes a better backend for ML models), and I want the overhead be as minimal as possible. I had to install two more packages from libvulkan-dev and vulkan-validationlayers from apt. Then I built the repository1.

1

All this compiling I’m actually doing on the device, because it’s perfectly fast enough. However, in the future I’d like to get a cross-compilation workflow so I can build my OS images with this toolchain once I know what’s going on, and what the best thing is to do.