Pancake's AI blog

Made by AI and reviewed by slaves^Whumans

Deep Dive into SparkCat

The OCR-Powered Android Malware

SparkCat is a sophisticated Android malware campaign—now also affecting iOS—that uses optical character recognition (OCR) techniques to steal sensitive cryptocurrency wallet recovery phrases. In this post, we explore its origins, infection methods, payload activities, and the technical details crucial for detection and analysis, tailored for reverse engineers.


Overview and Origin

SparkCat was first identified by cybersecurity researchers in early 2024 and has since been found infiltrating both Google Play and Apple’s App Store. Initially, it emerged within seemingly legitimate apps—ranging from food delivery and AI-powered messaging platforms to crypto-related utilities—often hiding within third-party software development kits (SDKs) disguised as analytics modules. Its ability to bypass traditional store vetting procedures suggests either a deliberate supply chain compromise or collusion with malicious developers.

The malware has been active since March 2024, as evidenced by timestamps in configuration files hosted on GitLab repositories. Notably, artifacts in the iOS version (e.g., directory names such as “qiongwu” and “quiwengjing”) hint at a developer with Chinese language proficiency, though attribution remains inconclusive.


Infection Methods

SparkCat leverages a multi-pronged infection strategy:

  1. Malicious SDK Integration:

    • Android: The malware is embedded as a Java-based component—often referred to as “Spark”—within trojanized apps. Once the app launches, the malicious SDK initializes in the overridden onCreate method of the application class, downloading a configuration file from a GitLab URL. This file is Base64-decoded and decrypted using AES-128 in CBC mode before the malware sets its command-and-control (C2) endpoints.
    • iOS: A similar framework, obfuscated with tools like HikariLLVM, is integrated into apps under aliases such as “GZIP” or “googleappsdk.”
  2. Permission Abuse & Triggered Execution:
    SparkCat requests seemingly benign permissions—such as access to the photo gallery—during legitimate user interactions (e.g., when initiating a support chat). Once granted, it scans stored images for text resembling crypto wallet recovery phrases (mnemonics).

  3. Distribution Channels:

    • Official App Stores: Infected apps have been distributed via Google Play and, notably, the Apple App Store—the first known instance of an OCR-based stealer on iOS.
    • Third-Party Sources: Telemetry indicates that additional infected samples are spread through unofficial channels.

Payload Activities and Low-Level Technical Details

OCR-Based Data Exfiltration

At its core, SparkCat exploits the Google ML Kit’s OCR library to process images stored in the device’s gallery:


Communication and Encryption

The exfiltrated data is sent to attacker-controlled servers using a multi-layered encryption process:


Payload Activities and Low-Level Technical Details

SparkCat employs several obfuscation techniques to hinder analysis:

Additionally, SparkCat’s filtering of OCR results involves multiple processing steps: - Processor Modules: Classes such as KeywordsProcessor, DictProcessor, and WordNumProcessor filter recognized text based on parameters like minimum/maximum letter count and dictionary matches. These thresholds are configurable via JSON objects received from the C2. oai_citation_attribution:6‡securelist.com


Code Examples and Analysis

1. Configuration File Retrieval and Decryption

String downloadConfig(String url) {
    String base64Config = httpGet(url);
    return decryptConfig(base64Config, "hardcodedAESKey", "fixedIV");
}

String decryptConfig(String base64Config, String key, String iv) {
    byte[] encodedBytes = Base64.decode(base64Config, Base64.DEFAULT);
    byte[] decryptedBytes = AES128CBCDecrypt(encodedBytes, key.getBytes(), iv.getBytes());
    return new String(decryptedBytes, StandardCharsets.UTF_8);
}

2. Initializing the Google ML Kit OCR Module

import com.google.mlkit.vision.common.InputImage;
import com.google.mlkit.vision.text.TextRecognition;
import com.google.mlkit.vision.text.TextRecognizer;

TextRecognizer recognizer = TextRecognition.getClient();

void processImage(Bitmap bitmap) {
    InputImage image = InputImage.fromBitmap(bitmap, 0);
    recognizer.process(image)
        .addOnSuccessListener(visionText -> {
            String extractedText = visionText.getText();
            if (matchesKeywords(extractedText)) {
                exfiltrateData(bitmap, extractedText);
            }
        })
        .addOnFailureListener(e -> { });
}

Detection and Reverse Engineering Considerations

For reverse engineers, several low-level details are essential when analyzing SparkCat:


Attribution and Threat Actor Profile

While definitive attribution remains challenging, several indicators suggest a possible origin:

Despite these clues, the researchers caution that there is insufficient evidence to definitively attribute SparkCat to a known cybercrime gang or nation-state actor. oai_citation_attribution:9‡theverge.com


Conclusion

SparkCat represents a new era of mobile malware that blends traditional social engineering with advanced technical obfuscation and encryption techniques. Its use of OCR to steal cryptocurrency wallet recovery phrases makes it particularly dangerous in today’s digital economy. Reverse engineers and security professionals must leverage both static and dynamic analysis tools to uncover its layers of encryption, obfuscation, and stealth communication.

Key takeaways:

Staying vigilant, maintaining updated security software, and scrutinizing app permissions remain critical to defending against such sophisticated threats.


References


source on github // --pancake