Blog Post

AAD tools: comparison of approaches

A detailed analysis of AAD tools - comparing the technology, advantages, and disadvantages of tape-based, code transformation, code generation AAD™ tools and MatLogica AADC

Introduction

The benefits of Automated Adjoint Differentiation (AAD) libraries are clear: fast XVA risk, model calibration, hedging, and live risk are only some of the use cases that are difficult to do well without a modern AAD tool. Many financial institutions have implemented an AAD solution - using in-house resources, an open-source or a vendor’s product - and now experience both the benefits and limitations that their chosen solution brings to their quant modelling software.

As stated by Antoine Savine, “The main challenge faced by global investment banks today is a computational one”, and computation of risks is a primary factor driving operational costs - whether via manual differentiation, automatic differentiation or Bump-and-Revalue. Organisations have to employ teams of expensive quants to enable and support their solution, while an average compute bill for a Tier 2 bank is in excess of $10M per year, so even cutting that by 10% is significant.

AAD is desired due to its numerical stability and good performance, so having a suitable tool is important to a bank’s competitive advantage. There are currently two commonly practised approaches for implementing AAD (tape-based and code transformation). A third approach is code generation, which is presently less well-known. Each methodology comes with advantages and disadvantages.
Types of automatic differentiation tools

Types of automatic differentiation tools


In this post, we’ll discuss the main features and differences between these approaches and go over the advantages and challenges of each one.

Tape-based automatic differentiation tools

The tape-based approach is currently the most commonly used for AAD.

Operator Overloading (OO) is used to capture the elementary operations whilst executing the analytics, and all mathematical operations (the computational graph) are recorded on the data structure, known as ‘tape’. The tape is then processed backwards in order to compute all of the risks using the Adjoint Differentiation method.
Tape-based AAD tool

Tape-based AAD tool

Although the OO, tape-based approach is faster than the bump-and-revalue method, there are limitations stemming from the original software design, including factors such as sparse use of vectorisation and multi-threading, which are retained and multiplied as the replicative implications of the OO and tape processing affect each Monte Carlo iteration.

Due to the nature of the OO tape methodology, all computations are linearised, i.e. all the loops are unrolled. The size of the tape is dependent on the number of elementary operations in the original code. Accordingly, more memory is required to store the tape data structure - which is slow and expensive.

Code Transformation

With the code transformation methodology, both the function and its adjoint version are generated automatically at compilation stage (i.e. ahead of execution). It is similar to the manual differentiation approach - which involves developing an adjoint function manually - but it does this in a programmatic way.

However, whilst the approach can work well for localised projects it does not scale out at all, and therefore does not present a feasible solution for a real-life quant software environment.
Source Code Transformation AAD approach

Source Code Transformation AAD approach

Code Generation

This novel method was introduced by Dmitri Goloubentsev and Evgeny Lakshtanov in Wilmott magazine in 2019. It involves generating the adjoint code at runtime during the first execution of the loop, by redefining the elementary operations using OO. When the function is executed for the first time, the system collects the sequence of operations (taking branching into account) and then generates the code to replicate the original function and compute the adjoints using the Chain Rule for arbitrary inputs. A reference implementation of such an approach that uses a standard external C++ compiler was published in 2019 and is available on GitHub.

This generated code can process arbitrary inputs (random numbers or market rates) and is used in all subsequent iterations. It delivers maximised performance as it uses the information available at runtime and can apply additional optimisations.

A run-time code generation AAD™ tool might look like this:
Code Generation AAD approach

Code Generation AAD approach

Here, the AD Library executes the original analytics for one data sample and produces intermediate code for a specialised function to replicate original calculations and calculate derivatives using the adjoint method. An external compiler is then used to generate the native machine code for the function and its adjoint version.

The compilation is done once for a task (trade/portfolio etc.) and doesn’t depend on the number of iterations. But, both the function and its adjoint need to be regenerated each time the task configuration changes, such as when pricing a new trade, a change in trading date, or amending the portfolio. Accordingly, for real-life quant and risk systems, the time taken to generate this code is part of the overall execution and therefore crucial for any overall performance gains.

In contrast to the tape-based approach, code generation AAD™ can speed up both the original function and its sensitivities. Therefore, second-order and scenario risk can be accelerated, unlike the tape-based AAD tool that can only accelerate 1st order risk. But, it is only beneficial when the number of iterations substantially outweighs the initial compilation time. Minimising the compilation time is thus the key to this solution, and using an off-the-shelf compiler, such as LLVM or C++, simply cannot deliver.

Code generation + operator overloading - the MatLogica AADC way

For the code generation method the compilation time is part of the overall execution time, using an off-the-shelf compiler is good enough to test the potential in principle , but it can’t be used in practice. MatLogica’s AADC uses a fundamentally different approach: OO is used to generate machine code in streaming mode directly. During the initial execution of the user analytics, AADC generates binary kernels on-the-fly for both the original and the adjoint function. This generation process is highly optimised for compilation speed and does not rely on an external compiler. Any configuration change, such as a new trade or change to the portfolio, is therefore processed on the fly - making code generation a practical AAD solution.

In addition to the fast compilation, the generated kernels are very quick as they are vectorized to native AVX2 or AVX512 architecture and therefore process 4 or 8 samples in parallel, with AVX512 being about 1.7 times faster than AVX2. They are also multithread-safe. This results in speed-ups of 100x including the compilation time!
AADC Just-In-Time Compilation

AADC Just-In-Time Compilation

Conclusions

A modern AAD tool is a must-have for quantitative and risk management software. In the table below, we summarise the key practical aspects of the automatic differentiation tools we have discussed.
Comparison of AAD tools: performance, usability, memory

Comparison of AAD tools: performance, usability, memory

In this post, we discussed the inner workings of each automatic differentiation approach and presented a comparison of the value each approach can deliver. With the code transformation approach being nearly impossible in practice, and taking into account all aspects, MatLogica AADC stands out in terms of performance. When compared to tape-based Adept, it is 23x faster for XVA pricing and greeks on a single AVX512 CPU core. In addition, with MatLogica’s out-of-the-box secure multithreading, the speed-ups scale almost linearly with the number of cores.

Thus, with very manageable changes to the quant libraries, speed-ups of 100x can be easily achieved on existing hardware and yield impressive cost savings on compute bills. Even organisations that already use some form of AAD will benefit from the AADC approach with speed-ups of 5-20x on a single core.

MatLogica AADC is a modern AAD tool that acts as an abstraction layer that delivers optimal performance and ease of use. Our clients get initial results in weeks, and once the integration is completed organisations will observe additional benefits such as reduced turnaround for new model development and better numerical stability.

The ability to compute risks faster also enables running more what-if scenarios and extra backtesting, enabling better decision-making. The resulting Live Risk capability permits new trading opportunities to be seized before the competition!