anton on Twitter: “Interesting, George Hotz mentioning GPT-4 size/architecture in a recent podcase …
GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference. Glad that Geohot said it out loud.
See more –> Source
Come join our Discord community and discuss!
Follow us on Twitter and TikTok