CYBERSECEVAL 3: Evaluating Security Risks of Large Language Models

CYBERSECEVAL 3: Evaluating Security Risks of Large Language Models

The launch of CYBERSECEVAL 3 represents a pivotal step forward in evaluating the security risks linked to large language models (LLMs). This innovative benchmark is designed to assess eight distinct risks, which are categorized into threats impacting third parties, developers, and end users of applications. A key focus of the evaluation is on the offensive security capabilities of these models, including their potential for automated social engineering and the scaling of autonomous cyber operations.

In this comprehensive study, the benchmarks were applied to LLM meta-artificial intelligence 3 (Llama 3) and other leading models to evaluate risks both with and without the implementation of mitigation strategies. This dual approach offers a thorough understanding of the potential threats posed by these advanced technologies.

Background

Previous research has established a foundation for assessing the security capabilities of LLMs, concentrating on risks to third parties and application developers. These studies have delved into the potential of LLMs to aid in spear-phishing attacks, enhance manual cyber operations, and conduct autonomous cyber operations. Noteworthy contributions include evaluations of prompt injection vulnerabilities and the risks associated with executing malicious code.

LLM Risk Assessment

Analysts have identified four primary risks to third parties from LLMs: automated social engineering, scaling manual offensive cyber operations, autonomous offensive cyber operations, and autonomous software vulnerability discovery and exploitation. The evaluation of Llama 3 405b for spear-phishing demonstrated its ability to automate convincing phishing content, although it was less effective than models like GPT-4 Turbo and Qwen 2-72b-instruct.

Llama 3 showed moderate success in phishing simulations, indicating its potential to scale phishing efforts but not posing a higher risk than other models. The role of Llama 3 405b in scaling manual cyber operations was also examined, showing no significant improvement in attacker performance compared to traditional methods.

Autonomous Cyber Capabilities

The assessment of Llama 3 models for autonomous offensive cyber operations revealed limited effectiveness. In simulations of ransomware attacks, these models struggled with exploit execution and maintaining access, although they managed reconnaissance and vulnerability identification. Llama 3 70b completed over half of low-sophistication challenges but faced difficulties with more complex tasks.

The potential for autonomous software vulnerability discovery and exploitation by LLMs remains constrained due to limited program reasoning capabilities and complex program structures. Testing of Llama 3 405b demonstrated some success in specific vulnerability challenges, outperforming GPT-4 Turbo in certain tasks, but did not show breakthrough capabilities. Deploying Llama Guard 3 is recommended to detect and block cyberattack aid requests.

Llama 3 Cybersecurity Risks

The evaluation of Llama 3 models highlighted several key concerns for application developers and end-users, including prompt injection attacks, execution of harmful code, generation of insecure code, and the facilitation of cyberattacks. Llama 3, especially in its 70b and 405b versions, performs comparably to GPT-4 in prompt injection attacks but remains vulnerable to certain exploitation techniques.

Researchers recommend deploying Llama Guard 3 to mitigate these vulnerabilities. This system detects and blocks malicious inputs, prevents insecure code generation, and limits the models' ability to facilitate cyberattacks. Developers should use these guardrails alongside secure coding practices and robust sandboxing techniques for comprehensive protection.

Cybersecurity Guardrails Overview

Several guardrails are suggested to mitigate cybersecurity risks associated with Llama 3. Prompt guard helps reduce prompt injection attack risks by classifying inputs as jailbreak, injection, or benign, achieving a 97.5% recall rate for jailbreak prompts and a 71.4% detection rate for indirect injections with minimal false positives. Code shield is an inference-time filtering tool that prevents insecure code from entering production systems.

Llama Guard, a fine-tuned version of Llama 3, focuses on preventing compliance with prompts that could facilitate malicious activities. It significantly reduces safety violations but may increase false refusal rates, particularly when used as input and output filters. Together, these tools enhance the security of Llama 3 applications by addressing prompt injections, insecure code, and compliance with potentially harmful prompts.

Conclusion

CYBERSECEVAL 3 provides a comprehensive framework for assessing cybersecurity risks associated with LLMs, building on the foundations of previous benchmarks. The evaluation of Llama 3 and other contemporary models against a wide range of cybersecurity threats demonstrates the effectiveness of these benchmarks. The introduction of mitigations like Llama Guard 3 offers promising improvements in managing these risks, ensuring safer applications of LLM technology.

Links:

Barr Group Offers Free Embedded C Coding Standard for Safety

Mitigating Risks of Generative AI in Software Development

Fork me on GitHub

© scram-pra.org