Technology


Based on advanced and cutting edge compiler research

Compilers are complex software systems that translate languages to instructions hardware is able to execute. Modern compiler infrastructure such as LLVM has grown to become highly modular and reusable in being able to support a large number of programming languages/models and target hardware using the same intermediate representation (IR) infrastructure. The advent of specialized ML/AI chips, however, brings in new challenges in the process of compiler and IR design and lowering such IR to hardware to realize high performance.

PolyMage Labs offers PolyBlocks, which are “blocks” that can be used to build compilers and code generators for high-performance computing. PolyBlocks is highly reusable for optimization, analysis, and mapping to parallel hardware including custom accelerators.

Write high-level Python, and obtain highly optimized GPU code fully automatically with our PolyBlocks compiler!

Our technology is based on the MLIR infrastructure; it can be customized to support a variety of programming languages and models as well as target hardware-specific optimizers and lowering frameworks. We particularly employ polyhedral optimization techniques and specialize in compiling computations on high-dimensional data spaces. Our technology is realizable on the compilation paths of a variety of programming models and languages, encompassing domain-specific ones and general-purpose ones.

Our compiler building blocks comprise many reusable and extensible transformation passes along with some new operations and surrounding infrastructure. These operations are lowered or “code generated” using a number of techniques, including polyhedral AST generation driven by the Integer Set Library and MLIR Affine and Standard dialect infrastructure for analysis and transformation. They are highly reusable across various domains served by dense matrix/tensor computations such as deep learning, image processing pipelines, stencil computations, and similar ones used in science and engineering.

Code Transformations

Our building blocks are also meant to be repurposed to generate high-performance libraries. This approach makes the development of highly-tuned commonly-used routines more modular and scalable, reducing the time necessary to create a version that achieves machine peak performance. The example below shows how one can obtain near-peak performance on matrix-matrix multiplication (GEMM) entirely using automatic code optimization and generation infrastructure through a compact IR specification. The specification on the right realizes a highly complex schedule established to be state-of-the-art (more details here). 

It shows how powerful code transformation directives can be encoded in a compact manner: these include polyhedral scheduling encoding multi-level tiling, loop interchange, unroll-and-jam / register tiling, and vectorization, letting the code generation infrastructure emit several thousands of lines of highly optimized code. Using compiler infrastructure makes the approach more explorable and tunable. It significantly reduces the amount of time needed to realize the best version if something in the hardware or in the computation patterns were to change.

A Glimpse of PolyBlocks Compiler Performance

slides-taglines1
slides-taglines2
slides-taglines3
slides-taglines4
slides-taglines5
slides-taglines6

We strongly believe that automatic code generators have a large role to play in the development of some of the most critical primitives used in deep learning. Performance obtained through MLIR-based automatic code generators can not only compete with but also surpass that achieved by expert hand-written and tuned libraries. The experimental plot below shows one scenario where significant improvements were obtained with defacto state-of-the-art vendor libraries on a state-of-the-art accelerator. Using an automatic code generation approach also allows teams developing and optimizing libraries for deep learning models to be more productive while achieving close to theoretical machine peak performance.

Expertise

The team at PolyMage Labs distinguishes itself through its deep and specialized expertise at the intersection of the polyhedral framework, high-performance computing, and MLIR. Its expertise and skills were acquired from academic research as well as from the creation of and continued involvement in a number of open-source compiler projects and tools based on the polyhedral framework.

PolyMage Labs’ engineers actively contribute to open-source ML/AI compiler projects, notably MLIR and TensorFlow/MLIR, upstreaming code to their repositories and participating in their online design discussions and fora.  Much of our engineering and development work continues to benefit, and benefit from, state-of-the-art upstream MLIR infrastructure, being in sync with its codebase and community.

Engaging with us

If you are a hardware vendor building specialized chips for Machine Learning and Artificial intelligence and are interested in exploring our products or services for your software/compiler stack, please do reach out to us. Alternatively, if your business is driven by ML/AI algorithms and you are looking for a productive and performant path to acceleration on the latest hardware such as NVIDIA GPUs, PolyBlocks can have a dramatic impact.

Testimonials