opencl: add optimized q4_1 mm kernel for adreno (#19840)
* Add Q4_1 OpenCL Kernels
* opencl: refactor transpose
* opencl: format
* opencl: refactor q4_1 unpack
* opencl: move `ggml_cl_mul_mat_q4_1_f32_adreno`
* opencl: refactor `ggml_cl_mul_mat_q4_1_f32_adreno` and kernels
* opencl: rename kernel files and kernes
* opencl: fix build for non adreno
* opencl: move code around and format
---------
Co-authored-by: Li He <lih@qti.qualcomm.com>
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: