Artificial Intelligence

The DeepSeek Jailbreaking Concerns Highlight The Importance of Red Teaming

Published: Mar. 18, 2025

In the competitive landscape of AI development, rapid innovation can go hand-in-hand with vulnerabilities. Evolving legal requirements and regulatory frameworks are making AI security a key concern for those responsible for risk management. For Large Language Models (LLMs) the security concerns center around adverse attacks by users to generate outputs they know are restricted or forbidden.

Developmental testing – called red teaming – involves simulating malicious attacks on LLMs to identify vulnerabilities before they can be exploited. This proactive approach is essential for developing robust AI systems that can withstand potential threats. Once the model is launched and outside users attempt the same techniques, it is known as jailbreaking – the deliberate attempt to seek prohibited performance outcomes. Recently, jailbreaking been in the spotlight because of the extreme results noted when applied to the newly release DeepSeek LLM. 

DeepSeek, an AI model developed by a Chinese company of the same name, has rapidly emerged as a notable player in the LLM marketplace. Released in two major versions—DeepSeek-V3 on December 25, 2024, and DeepSeek-R1 in January 2025—DeepSeek immediately attracted significant attention for its advanced reasoning capabilities and cost-efficient training methods. (Although, those training methods are the source of further controversy due to the speculated use of competitor models to train the model). 

Despite its impressive performance, however, it quickly became known for its exceptional vulnerability to common jailbreaking techniques that other LLMs have long since been hardened against. Researchers from CiscoWallarm, and Palo Alto’s Unit 42 have raised serious concerns regarding the strength of its guardrails and susceptibility to jailbreaking, due to the ease of bypassing the built-in security restrictions.

Weaknesses that are found due to the inherent limitations and behavioral patterns of LLMs reveal the vulnerabilities in that system’s design and oversight. For instance, when a user asks an AI model to generate content that it has been instructed to avoid, such as information supporting illegal activities, a successful jailbreak would mean the user successfully elicited the the requested information despite the system’s safeguards. 

DeepSeek Jailbreaking Outcomes

Using common jailbreaking techniques, various researchers bypassed DeepSeek’s guardrails at high rates with minimal specialized knowledge. The outcomes were concerning and multifaceted. As noted in the testing reported above, researchers coerced DeepSeek into providing detailed instructions for creating malicious software, as well as malicious code snippets. The model also demonstrated how to perform various stages of potential cyberattacks, from initial compromise to post-exploitation activities. Perhaps most alarmingly, DeepSeek generated detailed steps for creating improvised incendiary devices such as Molotov cocktails. These vulnerabilities, when present in an enterprise platform, can lead to data breaches at a minimum and significant financial losses, legal liability, and widespread reputational damage in more extreme cases.

The ease with which researchers bypassed DeepSeek’s safeguards, particularly as compared to other commonly used LLMs, underscores the importance of comprehensive red teaming in AI development and deployment. Like white-hat hackers in the security domain, red-team testers attempt to simulate malicious, adversarial, or other unexpected system use to identify potential vulnerabilities before system deployment. Red teaming is also increasingly essential to align AI systems with evolving data protection and legislative AI standards, such as the EU AI Act and Colorado AI Act.

The recent DeepSeek jailbreaking incident serves as a case study in the risks these systems pose when not sufficiently tested and guarded against undesired or illegal outcomes. As LLMs become more prevalent in business operations and daily life, ensuring their security is crucial for both operational integrity and regulatory compliance.