MatLogica | Technology

Technology

How MatLogica AADC Technology Works

Understand how MatLogica transforms object-oriented code into high-performance data-oriented execution. Achieve 6-100x speedups with automatic adjoint differentiation through just-in-time compilation.

The Core Innovation

MatLogica AADC is a specialized just-in-time (JIT) compiler that transforms your object-oriented code into highly optimized machine code at runtime. It automatically adds vectorization, multi-threading, and automatic adjoint differentiation—achieving performance impossible with traditional approaches.

The Performance Challenge in Quantitative Finance

Developing accurate repetitive calculations such as Monte Carlo simulations using traditional object-oriented languages (C++, C#, Python) faces fundamental challenges:

Traditional Approaches and Their Limitations

Manual Optimization Challenges:

  • Multithreading: Tricky to implement correctly, leads to race conditions and intermittent bugs
  • Vectorization (SIMD): Requires restructuring code, compiler often fails to vectorize OO code
  • Memory management: Cache-friendly data layout conflicts with OO design
  • Maintenance: Optimized code becomes hard to maintain and extend

Traditional AAD Tool Limitations:

  • Tape-based AAD: Records operations on tape, replays backward for derivatives in each loop
  • Memory overhead: Tape size grows with number of operations, expensive memory access
  • Adjoint factor 2-5x: Computing derivatives takes 2-5x original function time
  • Cannot accelerate scenarios: Only speeds up first-order Greeks, not pricing or scenarios
  • Performance penalty: Actually slows down original function execution

What Quantitative Finance Needs

  • Fast pricing: Portfolio valuation in real-time
  • Complete Greeks: All sensitivities computed efficiently
  • Scenario analysis: Thousands of what-if calculations
  • Easy maintenance: Code remains readable and extensible
  • No rewrites: Leverage existing analytics

How MatLogica AADC Solves These Problems

The Code Generation Approach

MatLogica's easy-to-integrate JIT compiler converts user code (C#, C++, or Python, including combinations) into vectorized, multi-threaded, NUMA-aware machine code ('kernel') with the minimal number of operations theoretically necessary to complete the task.

Key Innovation: Runtime Code Generation

By generating the kernel at runtime, AADC takes advantage of information available during program execution that ahead-of-time compilers cannot see:

  • Actual control flow paths taken (not all possible paths)
  • Exact data access patterns for optimal memory layout
  • Specific values enabling constant folding and algebraic simplification
  • Loop bounds and iteration patterns for perfect unrolling

Core Technologies

1. State-of-the-Art Code Compression

AADC's advanced code compression mechanism compresses optimized machine code, leading to:

  • Better cache utilization: More code fits in L1/L2 cache
  • Reduced memory bandwidth: Fewer loads from main memory
  • Fewer CPU operations: Minimal theoretical operations executed
  • Maximum speedup extracted: Something other libraries fail to achieve

2. Direct Machine Code AAD

AADC computes reverse accumulation equations directly in machine code rather than using tape recording. This fundamental difference delivers:

Tape-Based AAD:
  • Records operations to tape (on each loop)
  • Replays tape backward (on each loop)
  • High memory usage
  • Adjoint factor 2-5x
  • Slows original function
AADC Code Generation:
  • Generates adjoint code directly once
  • The code re-used for each loop
  • No tape overhead
  • Low memory usage
  • Adjoint factor <1
  • Accelerates original function

3. Automatic Vectorization

AADC automatically generates AVX2 and AVX512 vectorized code:

  • AVX2: Process 4 double-precision values per cycle
  • AVX512: Process 8 double-precision values per cycle
  • Automatic data layout: Arrays of structures → structures of arrays
  • No manual SIMD: Write scalar code, get vector performance

4. Multi-Threading and NUMA Awareness

Generated kernels are thread-safe by design:

  • No locks or synchronization: Each thread operates independently
  • Linear scaling: Performance scales with CPU cores
  • NUMA-aware: Memory allocated close to processing cores
  • Automatic parallelization: No manual thread management

The Breakthrough Result

Adjoint Factor Less Than 1

AADC calculates both the original function and derivatives 6-100x faster than the original code calculates the function alone.

This means you get all derivatives for free — actually faster than not computing them!

Minimal Code Changes Required

This performance is achieved with minimal changes to original code since MatLogica's compiler does virtually all the work:

  1. Replace double with idouble: Drop-in replacement active type
  2. Mark inputs/outputs: Identify what to differentiate
  3. AADC handles the rest: Compilation, optimization, vectorization, threading

No need for:

  • Manual vectorization or SIMD intrinsics
  • Thread synchronization code
  • Template metaprogramming
  • Code transformation or preprocessing
  • Tape management

How AADC Works: Step by Step

Step 1: The User Amends the Code

Semi-automated integration (weeks):

  • Replace double with active types (idouble) throughout the code
  • In case of simulations: identify the loop and instruct AADC to start/stop recording
  • In case of "one off" trade where AAD is required, record the valuation function
  • Mark Inputs/Outputs, and where derivatives are required within the recording
  • Use the recording for simulations and/or AAD instead of the original function

Step 2: AADC - at Runtime

At runtime, AADC generates optimized binary kernel representing original calculations and its adjoint if needed

  • iDouble allows to capture sequence of elementary operations
  • AADC Builds computational graph in binary code for the original function and AAD, if required
  • Constant folding and algebraic simplification
  • Dead code elimination
  • Common subexpression elimination
  • Optimal memory use
  • Optimal register allocation
  • iDouble performance penalty ~2%

Step 3: AADC Binary Kernel is produced

Generate optimized machine code directly

  • AADC emits AVX2/AVX512 vectorized kernel
  • Generated for forward and adjoint code
  • Compressed code for cache optimization
  • Creates thread-safe kernel

Step 4: Execution Phase

Use generated kernel for all subsequent calculations

  • Process multiple scenarios in parallel
  • Vectorize across Monte Carlo paths
  • Multi-thread across CPU cores
  • Compute all sensitivities simultaneously
  • Kernels serializable for cloud execution
  • Kernels are void of any trade, portfolio, or model data - just binary code representing a specific calculation

Step 5: Optional - troubleshooting

In most cases, the results are within numerical stability, but...

  • We offer debugging tool that shows where discrepancy occured
  • The debugger also works for adjoint
  • Missed branches flagged automatically

Performance Characteristics

Metric Traditional OO Code Tape-Based AAD AADC
Original Function Speed Baseline (1x) 0.5x (slower due to tape) 6-100x faster
Adjoint Factor N/A (no AAD) 2-5x <1x
Memory Usage Baseline High (tape storage) Low
Vectorization Compiler-dependent Difficult/impossible Automatic AVX2/512
Multi-Threading Manual, error-prone Complex tape management Automatic, thread-safe
Second-Order Greeks Slow (double bump) Cannot accelerate Fast (bump AAD delta)

What AADC Enables

Pricing & Risk

  • Real-time portfolio valuation
  • Live Risk with intraday updates
  • Complete Greeks (delta through vanna)
  • XVA calculations (CVA, DVA, FVA)
  • FRTB sensitivities

Analysis & Testing

  • Thousands of what-if scenarios
  • Stress testing at scale
  • Historical VaR with full revaluation
  • Back-testing with Greeks
  • Model calibration

See AADC in Action

Experience how AADC transforms your analytics performance

Request a Demo

info@matlogica.com

Related topics: how JIT compiler works, automatic adjoint differentiation mechanism, code generation AAD explained, adjoint factor less than 1, AVX2 AVX512 vectorization automatic, NUMA-aware computing, code compression optimization, tape-based AAD vs code generation, runtime code optimization, multi-threading automatic differentiation, cache-friendly code generation