1 min readfrom Machine Learning

Introducing AutoMuon, a one line drop in for AdamW [P]

Hey everyone, I've been working on a small Python package called AutoMuon that makes the Muon optimizer usable as a drop-in replacement for AdamW in arbitrary PyTorch training pipelines.

The core idea is relatively simple: Muon works primarily on 2D weight matrices (linear projections, conv layers) on hidden states, but you still need AdamW for embeddings, norms, and biases, etc. AutoMuon scans your model at init, figures out the right optimizer for each parameter automatically.

I am open to PRs, especially for expanding the module-type exclusion list if you hit edge cases in your architecture. Would love to know if anyone tries it on something other than transformers or CNNs and what they find. I feel that it would likely struggle with fully custom architectures, like flash-linear-attention for instance, so that would require some user tuning.

I am planning to add more tests for time series forecasting, genomics, language modeling, etc. I want to see how generalizable Muon really is!

https://github.com/SkyeGunasekaran/automuon

pip install git+https://github.com/SkyeGunasekaran/automuon.git

submitted by /u/Skye7821
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#rows.com
#financial modeling with spreadsheets
#financial modeling
#real-time data collaboration
#real-time collaboration
#natural language processing
#AutoMuon
#Muon optimizer
#AdamW
#PyTorch
#training pipelines
#2D weight matrices
#linear projections
#conv layers
#hidden states
#embeddings
#norms