This repository contains a Keras (and TensorFlow Keras) reimplementation of EfficientNet, a lightweight convolutional neural network architecture achieving the state-of-the-art accuracy with an order ...
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...
Abstract: The current wave of advances in Deep Learning (DL) have been triggered by the availability of large-scale datasets, efficient CPU and GPU hardware, and development of software frameworks ...