Listen to this blog:
However, traditional homogeneous architectures based on Von Neumann processors face challenges in meeting the requirements of AI workloads, which often involve massive parallelism, large data volumes, and complex computations. Heterogeneous computing architectures integrate different processing units with specialized capabilities and features and have emerged as promising solutions for AI applications. In our view, AI is driving the next era of software-defined heterogeneous computing, enabling better solutions for complex problems.
Heterogeneity in computing architectures refers to the situation where different types of computing devices, such as general-purpose processors, reconfigurable devices, accelerators, and sensors, are integrated and interconnected in various ways to form complex systems. These systems can span multiple levels of granularity, from a single chip to a data center to a distributed processing network. Heterogeneity can involve different instruction sets, memory models, programming models, and communication protocols working together.
The primary motivation for heterogeneity is to optimize the trade-offs between performance, power consumption, cost, and greater flexibility. AI applications like real-time video analysis or speech recognition may require high computational intensity and low latency. They can benefit from specialized hardware accelerators delivering high performance per watt. On the other hand, some AI tasks, such as natural language processing or recommendation systems, may involve diverse and dynamic data sources and models. These tasks can benefit from reconfigurable processor architectures that adapt to rapidly evolving AI workloads and data characteristics.
However, heterogeneous computing architectures also pose significant software development and optimization challenges.
These challenges include:
Software tools and frameworks that can abstract the complexity and heterogeneity of the underlying hardware and provide high-level programming models and interfaces are essential in meeting these challenges.
AI techniques in these software tools and frameworks are emerging for optimizing performance and energy consumption - especially for resource-constrained edge devices. For example, machine learning can model the behavior and characteristics of different processing units and predict the optimal system configuration. AI planning can explore the parameter space and generate efficient execution plans. Reinforcement learning can take feedback and adapt to dynamic environments and changing workloads.
At EdgeCortix, we have developed MERA as a Machine-learning Enhanced Runtime Acceleration software and compiler framework designed for heterogeneous computing environments, especially for edge AI solutions. MERA aims to simplify and automate the software development process for heterogeneous systems by providing a unified programming interface, a smart compiler that can generate optimized code targeting a combination of processors, and a runtime that can dynamically manage the execution and adaptation of heterogeneous systems. MERA enables software-defined heterogeneous computing solutions by configuring software and the underlying hardware to match the demands of an application.
Leveraging MERA's programming interface, machine learning developers can write application code in a high-level language such as Python or C++, without worrying about the details of each target processor or device. MERA's compiler then analyzes the code and automatically partitions it into different segments that can run across or in combination with additional devices. The compiler applies various optimization techniques such as loop unrolling, vectorization, task parallelization, memory management, data layout transformation, and others to generate efficient code for each device. The compiler also generates an executable file containing each target device's code segments and metadata.
MERA's compiler and runtime is responsible for deploying and executing the executable across a heterogeneous processor mix, selecting the best combination of devices - CPUs, network processor units (NPUs), or even programmable logic - for each application. The compiler can also monitor the workload and environment conditions such as model type, model precision like INT8 or FP16, data size or resolution, power consumption, and more, and dynamically adjust the configuration and behavior of the heterogeneous systems accordingly. For example, the compiler & runtime can switch between different devices or code segments based on model precision or performance criteria and partition a deep neural network model or multiple such models into optimal groups of operations for distribution across the heterogeneous processors available.
MERA also supplies a library of pre-optimized kernels and set of patterns it recognizes from standard AI operations (such as convolutions, pooling, activations, masked or multi-head attention, and others) that can run optimally on the computing engines corresponding to the heterogeneous processors. The MERA library makes use of different scheduling and memory allocation techniques that during compilation can iteratively optimize a given target metric, such as latency and compute utilization on a given configuration.
The MERA software and compiler framework supports a wide range of software-defined heterogeneous compute engines. One example is EdgeCortix's SAKURA-I edge inference processor with its runtime-configurable dedicated NPU (based on EdgeCortix's Dynamic Neural Accelerator technology), which can pair with a general-purpose host processor (with Arm, x86, or RISC-V cores) along with an additional discrete FPGA accelerator. MERA enables a machine learning-based end application to be accelerated across this mixture of processing engines under low-power conditions while maximizing performance.
Designers developing with the EdgeCortix MERA software can achieve high-performance and low-power AI inference on EdgeCortix's SAKURA-I edge AI processor without writing low-level code or manually tuning hardware parameters. Moreover, when adding new types of processors or other third-party devices with a supported interface, the existing MERA library easily extends to support a new instruction set architecture.
As Moore’s Law sunsets, we see diminishing performance gains from transistor shrinkage, and homogeneous multicore solutions don't solve all computing problems, especially where efficiency is a deciding criterion. Semiconductor engineers are increasingly focused on architectural improvements, moving toward heterogeneous computing where multiple processors (CPUs, GPUs, ASICs, FPGAs, and NPUs) work together to improve performance.
New types of software and compilers are critical to accelerated AI performance. In this context, we introduced EdgeCortix's MERA software and compiler framework, created for deploying real-time yet high-performance edge AI solutions. In this new era of software-defined heterogeneous computing, such software is a necessary and flexible tool for developing and deploying AI applications at scale.
Sakya is the founder and Chief Executive officer of EdgeCortix. Prior to founding EdgeCortix, Sakya was a Senior Research Scientist and Executive Lead for Embodied Robotics at IBM Research. He has over a decade of experience in cutting edge artificial intelligence (AI) hardware and software research, building teams from the ground up, and creating real-world AI solutions. Sakya is the inventor of over 20 patents and has published widely on AI and edge computing with over 1000 citations. Sakya holds a PhD from Max Planck Institute, Germany, and a Masters in Artificial Intelligence from the University of Edinburgh.