Compilers are complex software systems that translate languages to instructions hardware is able to execute. Modern compiler infrastructure such as LLVM have grown to become highly modular and reusable in being able to support a large number of programming languages/models and target hardware using the same intermediate representation (IR) infrastructure. The advent of specialized ML/AI chips however brings in new challenges in the process of compiler and IR design and lowering such IR to hardware to realize high performance.
PolyMage Labs offers compiler building blocks that are highly reusable for the purposes of optimization, analysis, and mapping to parallel hardware including custom accelerators. Its technology is based on the MLIR infrastructure. It can be customized to support a variety of programming languages and models as well as target hardware-specific optimizers and lowering frameworks. We particularly employ polyhedral optimization techniques and specialize in compiling computations on high-dimensional data spaces. Our technology is realizable on the compilation paths of a variety of programming models and languages, encompassing domain-specific ones and general-purpose ones.
Our compiler building blocks are in terms of additional MLIR “ops” and their surrounding infrastructure. These operations are lowered or "code generated" using a number of techniques including polyhedral AST generation driven by the Integer Set Library and MLIR Affine and Standard dialect infrastructure for analysis and transformation. They are highly reusable across a variety of domains served by dense matrix/tensor computations such as deep learning, image processing pipelines, stencil computations, and similar ones used in science and engineering.
Peak Performance Library Generation
Our building blocks are also meant to be repurposed to generate high-performance libraries. This approach makes the development of high tuned commonly used routines more modular and scalable -- reducing the amount of time necessary to create a version that achieves machine peak performance. The example below shows how one can obtain near-peak performance on matrix-matrix multiplication (GEMM) entirely using automatic code optimization and generation infrastructure through a compact IR specification. The specification on the right realizes a highly complex schedule established to be the state-of-the-art (more details here). It shows how powerful code transformation directives can be encoded in a compact manner: these include polyhedral scheduling encoding multi-level tiling, loop interchange, unroll-and-jam / register tiling, and vectorization letting the code generation infrastructure emit several thousands of lines of highly optimized code. Using compiler infrastructure makes the approach more explorable and tunable. It significantly reduces the amount of time needed in realizing the best version if something in the hardware or in the computation patterns were to change.
We strongly believe that automatic code generators have a large role to play in the development of some of the most critical primitives used in deep learning. Performance obtained through MLIR-based automatic code generators can not only compete with but also surpass that achieved by expert hand-written and tuned libraries. The experimental plot below shows one scenario where significant improvements were obtained with defacto state-of-the-art vendor libraries on a state-of-the-art accelerator. Using an automatic code generation approach also allows teams developing and optimizing libraries for deep learning models to be more productive while achieving close to theoretical machine peak performance.
The team at PolyMage Labs distinguishes itself through its deep and specialized expertise at the intersection of the polyhedral framework, high-performance computing, and MLIR. Its expertise and skills were acquired from academic research as well as from the creation of and continued involvement in a number of open-source compiler projects and tools based on the polyhedral framework.
PolyMage Labs' engineers actively contribute to open-source ML/AI compiler projects, notably MLIR and TensorFlow/MLIR, upstreaming code to their repositories as well as participating in their online design discussions and fora. A significant amount of our engineering and development work continues to benefit, and benefit from, state-of-the-art upstream MLIR infrastructure, being in sync with its codebase and community.
Engaging with us
If you are a hardware vendor building specialized chips for Machine Learning and Artificial intelligence and are interested in exploring our products or services for your software/compiler stack, please do reach out to us. Alternatively, if you are building new programming models or languages that require high-performance compiler technology, we may be able to help.
We have partnered with PolyMage Labs to solve core compiler technical challenges together using MLIR and its polyhedral abstractions. PolyMage Labs, with its unique and internationally recognized leading expertise in compiler infrastructure for ML/AI, was able to obtain highly promising results in a short period of time.
- Sean Lie (Co-Founder, Chief Hardware Architect, Cerebras Systems Inc.)