Abstract
Reinforcement learning (RL) research requires diverse, challenging environments that are both tractable and scalable. While modern video games may offer rich dynamics, they are computationally expensive and poorly suited for large-scale experimentation due to their CPU-bound execution. We introduce OCTAX, a high-performance suite of classic arcade game environments implemented in JAX, based on CHIP-8 emulation, a predecessor to Atari, which is widely adopted as a benchmark in RL research.
OCTAX provides the JAX community with a long-awaited end-to-end GPU alternative to Atari games, offering image-based environments, spanning puzzle, action, and strategy genres, all executable at massive scale on modern GPUs. Our JAX-based implementation achieves orders-of-magnitude speedups over traditional CPU emulators. We demonstrate OCTAX's capabilities by training RL agents across multiple games, showing significant improvements in training speed and scalability compared to existing solutions.
The environment's modular design enables researchers to easily extend the suite with new games or generate novel environments using large language models, making it an ideal platform for large-scale RL experimentation.
Why OCTAX?
The Computational Bottleneck in RL Research
Modern RL research demands extensive experimentation to achieve statistical validity, yet computational constraints severely limit experimental scale. RL papers routinely report results with fewer than five random seeds due to prohibitive training costs. The Rainbow paper alone required 34,200 GPU hours of experiments, a cost prohibitively high for small research laboratories.
End-to-End GPU Acceleration
Fully vectorized CHIP-8 emulation in JAX enables running thousands of game instances in parallel on modern GPUs, eliminating CPU-GPU transfer bottlenecks.
Authentic Game Mechanics
Perfect behavioral fidelity to original CHIP-8 games, providing Atari-like visual complexity with significantly reduced computational overhead.
Diverse Challenge Portfolio
20+ games spanning puzzle, action, strategy, exploration, and shooter genres — testing everything from long-horizon planning to reactive control.
LLM-Assisted Generation
Pipeline for automated environment generation using large language models that can directly output CHIP-8 assembly code, enabling curriculum learning research.
System Architecture
CHIP-8, a 1970s virtual machine specification, provides several research advantages:
- 64×32 monochrome display — Image-based observations without overwhelming computational resources
- 4KB memory footprint — Allows thousands of simultaneous game instances
- 35-instruction set — Reduces emulation overhead compared to modern processors
- Deterministic execution — Ensures experimental reproducibility across hardware
- 16-key input system — Sufficient complexity for interesting control challenges
Performance Results
Key Performance Findings
- 14× faster than EnvPool at 8,192 parallel environments
- 350,000 steps/second (1.4 million frames/second) on consumer-grade RTX 3090
- ~2 MB GPU memory per environment for linear scaling
OCTAX achieves near-linear scaling up to 350,000 steps/second with 8,192 parallel environments on a consumer-grade RTX 3090.
Training Results
PPO and PQN learning curves across 16 games — interquartile mean returns with 10th-90th percentile ranges over 5M timesteps, computed across 12 random seeds.
BibTeX
@article{radji2025octax,
title={Octax: Accelerated CHIP-8 Arcade Environments for Reinforcement Learning in JAX},
author={Radji, Waris and Michel, Thomas and Piteau, Hector},
journal={arXiv preprint arXiv:2510.01764},
year={2025}
}
Acknowledgements
OCTAX builds upon the rich history of CHIP-8 games from the CHIP-8 Database. This work was inspired by the Arcade Learning Environment and is built with JAX, Flax, and Optax.
We thank the JAX and RL research communities for their invaluable open-source contributions.