Architecture Transformer MoE Linear Attention Components Tokenizer Norm Model Family BERT GPT Misc Agent SP TP