hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Get Out of Jail Free Prompt: How Jailbreaking Compromises Safety in Large Language Models
Halmstad University, School of Information Technology.
Halmstad University, School of Information Technology.
2025 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The increasing popularity of large language models (LLM) has raised important concerns regardingtheir security and the implications of their misuse. This thesis evaluates the vulnerability of LLMs toknown jailbreaking techniques and the potential harm caused by successful attacks. Using a structuredscoring framework, three popular LLMs (GPT-4o, DeepSeek-V3, and Mistral’s Le Chat) were testedagainst five jailbreaking techniques. All models were susceptible to varying degrees, with DeepSeek-V3 being the most vulnerable, particularly to role-playing. Although most jailbroken responses wereconsidered Low Harm, a few were highly actionable with weak ethical disclaimers, underscoring real-world risks. The thesis emphasizes the importance of prioritizing improved pattern recognition, strictercontextual adherence, and responsible access controls for sensitive capabilities, especially when itcomes to cybersecurity-related content. Until these issues are addressed, jailbreaking will remain athreat requiring proactive and adaptive mitigation strategies.

Place, publisher, year, edition, pages
2025.
National Category
Artificial Intelligence Security, Privacy and Cryptography
Identifiers
URN: urn:nbn:se:hh:diva-56596OAI: oai:DiVA.org:hh-56596DiVA, id: diva2:1973652
Subject / course
Digital Forensics
Educational program
IT Forensics and Information Security, 180 credits
Supervisors
Examiners
Available from: 2025-06-23 Created: 2025-06-19 Last updated: 2025-10-01Bibliographically approved

Open Access in DiVA

fulltext(1497 kB)297 downloads
File information
File name FULLTEXT02.pdfFile size 1497 kBChecksum SHA-512
74965f4a7a42b243d76970679c2a92400c85ebb68161f9c7a2f8cfce4d144c48291d797109b3f490f6c1dc6598b643dc523ecc57d83f159ebf6dd0c2f57adeb2
Type fulltextMimetype application/pdf

By organisation
School of Information Technology
Artificial IntelligenceSecurity, Privacy and Cryptography

Search outside of DiVA

GoogleGoogle Scholar
Total: 297 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 970 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf