CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models
CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models Why this matters Frontier models are very good at very many things. They are also expensive to call, ship every prompt off to someone else's datacenter, and are explicitly trained to refuse the messy edge cases a real defender lives in incident write-ups, attacker-grade payloads found in your own logs, vulnerability disclosure drafts. Defensive cybersecurity is not a place where any of those tradeoffs are acceptable. - Sensitive evidence stays internal. A SOC analyst triaging a leaked credential dump, a malware reverse-engineer dissecting a sample, a vulnerability researcher writing up a CVE — none of them should be pasting that content into a hosted API. The data itself can be the breach. - Per-call API cost compounds. A mid-size SOC processes thousands of low-confidence alerts per day. Hosted-API costs for "explain this CVE" or "what CWE applies here" turn defensive automation into a budget question. - Air-gapped and partially-connected environments are the rule, not the exception in critical infrastructure, healthcare, and government work. If your tooling can't run on a laptop or a single on-prem GPU, it doesn't ship there. - Adversaries are getting more automated. Ransomware gangs use LLMs to draft phishing in 30 languages; bug-bounty automators chain agentic tools to fuzz, triage and exploit faster than humans can review. Defense at the same speed needs models defenders own and can run. So: local matters. But "local" alone isn't enough. Why a small specialized model, not just a small model A 70B generalist running locally on four GPUs is "local" but it isn't deployable. A 4B generalist running locally on a single consumer GPU is deployable but it doesn't beat the 8B specialist on the work you actually need it to do. The bet behind CyberSecQwen-4B is that for narrow, well-evaluated cyber threat intelligence tasks — CWE classification, CVE-to-CWE mapping, structured CTI Q&A — a careful 4B fine-tune can match or beat an 8B specialist while fitting on a 12 GB consumer card. We tested this against the strongest public baseline we could find: Cisco's Foundation-Sec-Instruct-8B, evaluated under their own published protocol on CTI-Bench. CyberSecQwen-4B retains 97.3 % of Foundation-Sec-Instruct-8B's CTI-RCM accuracy while exceeding its CTI-MCQ score by +8.7 points, at half the parameter count. That's the only number that should matter to a defender choosing what to deploy. A 5-minute walkthrough The 5-minute video below walks through the training methodology, the AMD MI300X workflow, and the benchmark results in a more visual format. If you'd rather read everything in detail, the rest of the post covers the same ground with the exact configs. Why AMD MI300X The whole pipeline — training, adapter merging, evaluation — runs end-to-end on a single AMD Instinct MI300X 192 GB instance via the AMD Developer Cloud. The combination of 192 GB HBM3 and ROCm 7's vLLM stack means we never had to think about quantization tricks, gradient checkpointing, or splitting the model across devices. Full bf16, FlashAttention-2 forward+backward, batch size of 4, sequence length…