QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD
Introduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.
I’ve had really good luck with chatgpt 4o, and, to be fair, I have teased some decent responses out of deepseek 3 (iirc). Different ways of expanding on the basic principles of asking it to ‘step back and visualize different options before moving forward and fully implementing them with all necessary code, following best practices, etc.’ tends to get pretty good results.
That might just be LLMs in general. ChatGPT does the same. Copilot is a little more well-tuned, but I really only ever have it do boilerplate.
I’ve had really good luck with chatgpt 4o, and, to be fair, I have teased some decent responses out of deepseek 3 (iirc). Different ways of expanding on the basic principles of asking it to ‘step back and visualize different options before moving forward and fully implementing them with all necessary code, following best practices, etc.’ tends to get pretty good results.