Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Mamba Long Context Training
Explore how Mamba, a linear-scaling model, is trained for long contexts using DeepSpeed and SlimPajama, demonstrating its potential for efficient, accurate long-sequence processing.
Mamba is a selective state space model architecture that performs competitively with transformers on benchmarks while scaling linearly with sequence length. It holds great promise for processing long input sequences, but the base model was only pre-trained on 2048 token context length. This project continues pretraining Mamba with longer sequences from the SlimPajama dataset to test whether the model can process long context accurately.