OCTAX: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX

Radji, Waris; Michel, Thomas; Piteau, Hector

OCTAX: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX

Waris Radji¹, Thomas Michel¹, Hector Piteau²

¹Inria Scool ²Independent researcher

Accepted at International Conference on Learning Representations 2026 (Poster)

Paper Code arXiv

Abstract

Reinforcement learning (RL) research requires diverse, challenging environments that are both tractable and scalable. While modern video games may offer rich dynamics, they are computationally expensive and poorly suited for large-scale experimentation due to their CPU-bound execution. We introduce OCTAX, a high-performance suite of classic arcade game environments implemented in JAX, based on CHIP-8 emulation, a predecessor to Atari, which is widely adopted as a benchmark in RL research.

OCTAX provides the JAX community with a long-awaited end-to-end GPU alternative to Atari games, offering image-based environments, spanning puzzle, action, and strategy genres, all executable at massive scale on modern GPUs. Our JAX-based implementation achieves orders-of-magnitude speedups over traditional CPU emulators. We demonstrate OCTAX's capabilities by training RL agents across multiple games, showing significant improvements in training speed and scalability compared to existing solutions.

The environment's modular design enables researchers to easily extend the suite with new games or generate novel environments using large language models, making it an ideal platform for large-scale RL experimentation.

Why OCTAX?

The Computational Bottleneck in RL Research

Modern RL research demands extensive experimentation to achieve statistical validity, yet computational constraints severely limit experimental scale. RL papers routinely report results with fewer than five random seeds due to prohibitive training costs. The Rainbow paper alone required 34,200 GPU hours of experiments, a cost prohibitively high for small research laboratories.

End-to-End GPU Acceleration

Fully vectorized CHIP-8 emulation in JAX enables running thousands of game instances in parallel on modern GPUs, eliminating CPU-GPU transfer bottlenecks.

Authentic Game Mechanics

Perfect behavioral fidelity to original CHIP-8 games, providing Atari-like visual complexity with significantly reduced computational overhead.

Diverse Challenge Portfolio

20+ games spanning puzzle, action, strategy, exploration, and shooter genres — testing everything from long-horizon planning to reactive control.

LLM-Assisted Generation

Pipeline for automated environment generation using large language models that can directly output CHIP-8 assembly code, enabling curriculum learning research.

System Architecture

CHIP-8, a 1970s virtual machine specification, provides several research advantages:

64×32 monochrome display — Image-based observations without overwhelming computational resources
4KB memory footprint — Allows thousands of simultaneous game instances
35-instruction set — Reduces emulation overhead compared to modern processors
Deterministic execution — Ensures experimental reproducibility across hardware
16-key input system — Sufficient complexity for interesting control challenges

Performance Results

Performance scaling comparison between OCTAX and EnvPool

Key Performance Findings

14× faster than EnvPool at 8,192 parallel environments
350,000 steps/second (1.4 million frames/second) on consumer-grade RTX 3090
~2 MB GPU memory per environment for linear scaling

OCTAX achieves near-linear scaling up to 350,000 steps/second with 8,192 parallel environments on a consumer-grade RTX 3090.

Training Results

PPO and PQN learning curves across 16 games — interquartile mean returns with 10th-90th percentile ranges over 5M timesteps, computed across 12 random seeds.

BibTeX

@article{radji2025octax,
  title={Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX},
  author={Radji, Waris and Michel, Thomas and Piteau, Hector},
  journal={arXiv preprint arXiv:2510.01764},
  year={2025}
}

Acknowledgements

OCTAX builds upon the rich history of CHIP-8 games from the CHIP-8 Database. This work was inspired by the Arcade Learning Environment and is built with JAX, Flax, and Optax.

We thank the JAX and RL research communities for their invaluable open-source contributions.

Related JAX RL Environments

Brax

Gymnax

Jumanji

Pgx

OCTAX: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX

Abstract

Why OCTAX?

The Computational Bottleneck in RL Research

End-to-End GPU Acceleration

Authentic Game Mechanics

Diverse Challenge Portfolio

LLM-Assisted Generation

System Architecture

Performance Results

Key Performance Findings

Training Results

BibTeX

Acknowledgements