Adversarial Misuse of Generative AI

By External News-Site | January 29, 2025

Original Source: Mandiant now part of Google

GTIG takes a holistic, intelligence-driven approach to detecting and disrupting threat activity, and our understanding of government-backed threat actors and their campaigns provides the needed context to identify threat enabling activity. We use a wide variety of technical signals to track government-backed threat actors and their infrastructure, and we are able to correlate those signals with activity on our platforms to protect Google and our users. By tracking this activity, we’re able to leverage our insights to counter threats across Google platforms, including disrupting the activity of threat actors who have misused Gemini. We also actively share our insights with the public to raise awareness and enable stronger protections across the wider ecosystem.

Our analysis of government-backed threat actor use of Gemini focused on understanding how threat actors are using AI in their operations and if any of this activity represents novel or unique AI-enabled attack or abuse techniques. Our findings, which are consistent with those of our industry peers, reveal that while AI can be a useful tool for threat actors, it is not yet the game-changer it is sometimes portrayed to be. While we do see threat actors using generative AI to perform common tasks like troubleshooting, research, and content generation, we do not see indications of them developing novel capabilities.

Our key findings include:

We did not observe any original or persistent attempts by threat actors to use prompt attacks or other machine learning (ML)-focused threats as outlined in the Secure AI Framework (SAIF) risk taxonomy. Rather than engineering tailored prompts, threat actors used more basic measures or publicly available jailbreak prompts in unsuccessful attempts to bypass Gemini's safety controls.
Threat actors are experimenting with Gemini to enable their operations, finding productivity gains but not yet developing novel capabilities. At present, they primarily use AI for research, troubleshooting code, and creating and localizing content.
APT actors used Gemini to support several phases of the attack lifecycle, including researching potential infrastructure and free hosting providers, reconnaissance on target organizations, research into vulnerabilities, payload development, and assistance with malicious scripting and evasion techniques. Iranian APT actors were the heaviest users of Gemini, using it for a wide range of purposes. Of note, we observed limited use of Gemini by Russian APT actors during the period of analysis.
IO actors used Gemini for research; content generation including developing personas and messaging; translation and localization; and to find ways to increase their reach. Again, Iranian IO actors were the heaviest users of Gemini, accounting for three quarters of all use by IO actors. We also observed Chinese and Russian IO actors using Gemini primarily for general research and content creation.
Gemini's safety and security measures restricted content that would enhance adversary capabilities as observed in this dataset. Gemini provided assistance with common tasks like creating content, summarizing, explaining complex concepts, and even simple coding tasks. Assisting with more elaborate or explicitly malicious tasks generated safety responses from Gemini.
Threat actors attempted unsuccessfully to use Gemini to enable abuse of Google products, including researching techniques for Gmail phishing, stealing data, coding a Chrome infostealer, and bypassing Google's account verification methods.

Rather than enabling disruptive change, generative AI allows threat actors to move faster and at higher volume. For skilled actors, generative AI tools provide a helpful framework, similar to the use of Metasploit or Cobalt Strike in cyber threat activity. For less skilled actors, they also provide a learning and productivity tool, enabling them to more quickly develop tools and incorporate existing techniques. However, current LLMs on their own are unlikely to enable breakthrough capabilities for threat actors. We note that the AI landscape is in constant flux, with new AI models and agentic systems emerging daily. As this evolution unfolds, GTIG anticipates the threat landscape to evolve in stride as threat actors adopt new AI technologies in their operations.

AI-Focused Threats

Attackers can use LLMs in two ways. One way is attempting to leverage LLMs to accelerate their campaigns (e.g., by generating code for malware or content for phishing emails). The overwhelming majority of activity we observed falls into this category. The second way attackers can use LLMs is to instruct a model or AI agent to take a malicious action (e.g., finding sensitive user data and exfiltrating it). These risks are outlined in Google's Secure AI Framework (SAIF) risk taxonomy.

We did not observe any original or persistent attempts by threat actors to use prompt attacks or other AI-specific threats. Rather than engineering tailored prompts, threat actors used more basic measures, such as rephrasing a prompt or sending the same prompt multiple times. These attempts were unsuccessful.

Jailbreak Attempts: Basic and Based on Publicly Available Prompts

We observed a handful of cases of low-effort experimentation using publicly available jailbreak prompts in unsuccessful attempts to bypass Gemini's safety controls. Threat actors copied and pasted publicly available prompts and appended small variations in the final instruction (e.g., basic instructions to create ransomware or malware). Gemini responded with safety fallback responses and declined to follow the threat actor's instructions.

In one example of a failed jailbreak attempt, an APT actor copied publicly available prompts into Gemini and appended basic instructions to perform coding tasks. These tasks included encoding text from a file and writing it to an executable and writing Python code for a distributed denial-of-service (DDoS) tool. In the former case, Gemini provided Python code to convert Base64 to hex, but provided a safety filtered response when the user entered a follow-up prompt that requested the same code as a VBScript.

The same group used a different publicly available jailbreak prompt to request Python code for DDoS. Gemini provided a safety filtered response stating that it could not assist, and the threat actor abandoned the session and did not attempt further interaction.

Source URL: https://cloud.google.com/blog/topics/threat-intelligence/adversarial-misuse-generative-ai/ Author: External News-Site

Posted in Security News