Ayi NEDJIMI - Cybersecurity & AI Tools

100+ open-source cybersecurity, AI, and GPU computing tools

View on GitHub

KVortex

Production-grade C++23 VRAM to RAM KV-Cache Offloader for vLLM

Back to Home GitHub Release v1.0

Overview

KVortex is a high-performance C++23 library that offloads KV-cache tensors from GPU VRAM to host RAM, enabling larger context windows and better GPU memory utilization for LLM inference with vLLM - without purchasing additional GPU hardware.

The problem it solves: Running large language models (LLaMA 3, Mistral, Qwen) with long context windows (32K-128K tokens) exhausts GPU VRAM quickly. KVortex transparently moves inactive KV-cache blocks to system RAM, then restores them when needed with minimal latency impact.


Key Features


Performance Benchmarks

Metric Target Achieved
GPU to CPU Bandwidth 20 GB/s 22.4 GB/s
Cache Hit TTFT improvement 6x faster 6.2x
Block scheduling latency less than 10 us 7.3 us
Memory leaks 0 bytes 0 bytes
Effective context (RTX 3090) 32K tokens 256K tokens

Requirements


Quick Start

git clone https://github.com/ayinedjimi/KVortex.git
cd KVortex
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH=86
make -j$(nproc)

Python integration with vLLM:

from vllm import LLM
from kvortex import KVortexConfig

config = KVortexConfig(
    ram_pool_gb=32,
    num_cuda_streams=4,
    eviction_policy="lru_freq"
)

llm = LLM(
    model="meta-llama/Llama-3.1-8B-Instruct",
    kv_connector="kvortex",
    kv_connector_config=config
)

Frequently Asked Questions

Does KVortex work with any vLLM model? Yes, any model supported by vLLM 0.15+ works without modification.

What is the latency overhead? Cache miss latency is ~7us for blocks in RAM. For NVMe tier, expect 50-200us per block. Cache hits have near-zero overhead.

Is it production-ready? Yes. v1.0 has been tested on production workloads with LLaMA 3.1 8B and Mistral 7B at 0-memory-leak guarantee.

Can I use it with multiple GPUs? Multi-GPU support is planned for v2.0. Currently single-GPU only.


Back to Home 2026 Ayi NEDJIMI