🐍

【ML】Mamba Introduction Part1

2024/06/10に公開

機械学習

mamba

tech

1. First

I heard about the new machine learning model "Manba" , it has so interesting architecture and great potential.
Already, the model seems achieved higher performance and faster training/inference, it's so exiting. Let's take a look inside of model.

2. Overview

2.1 SSM(State Space Model) architecture

Mamba adopted "Selective SSM(State Space Model)" model that architecture attentions to information only needed, and succeed the improvement of high calculation efficiency.

2.2 High speed inference

Achieved high speed inference(5x transformer) and avoid the Curse of Dimensionality, the calculation cost increases same as input sequence size.(it observed until 1000k sequence).

The ordinally model's calculation cost increases with the square of the input.

2.3 Optimal memory usage

With the adopted GPU access system from flash attention and optimal hardware access, needed memory is reduced.

Totally, Mamba has the potential may exceed the transformer model, the further research is anticipated.(Of course, Mamba achieved great result even now.)

Part1 is over, the next part, I'll write continuing of this article.
Part2 content suppose to Manba's Principle(Architecture).