Decode Phishing Emails
Overview
atob() obfuscation
This p5.js script is designed to find and decode base64-encoded strings that are nested within one another. The script has several functions that work together to achieve this goal:
isBase64(str)
: This function checks if a given string str is a valid base64-encoded string. It uses a regular expression to test the string and the atob()
function to try decoding the string. If the decoding is successful, the function returns true; otherwise, it returns false.
decode(str)
: This function takes a string str and trims any single or double quotes from the beginning and end of the string. Then, if the trimmed string is a valid base64-encoded string, it decodes the string using the atob()
function and returns the decoded string; otherwise, it returns null.
repeatedlyDecode(str)
: This function recursively decodes any base64-encoded strings found within the given string str. It first checks if str is a valid base64-encoded string. If it is, the function decodes the string and then uses a regular expression to find any substrings that are surrounded by single or double quotes. The function then adds these quoted substrings to the matches array along with their decoded values and recursively calls the repeatedlyDecode()
function for each of these quoted substrings.
1let input =
2 "";
3
4let matches = [];
5
6function setup() {
7 noLoop();
8 isBase64(input);
9 repeatedlyDecode(input);
10 print(matches);
11}
12
13function draw() {
14 background(220);
15}
16
17function repeatedlyDecode(str) {
18 if (isBase64(str)) {
19 const decoded = atob(str);
20 const regex = /(['"])(?:(?=(\\?))\2.)*?\1/g; // regex to match quoted substrings
21 let match;
22 while ((match = regex.exec(decoded)) !== null) {
23 //print(match)
24 matches.push([match[0],decode(match[0])]); // add the match to the array of matches
25 repeatedlyDecode(decode(match[0]));
26 }
27 }
28}
29
30function decode(str) {
31 if (typeof str !== 'string') {
32 return null;
33 }
34 // trim single or double quotes from beginning and end of str
35 str = str.replace(/^['"]|['"]$/g, '');
36 if (isBase64(str)) {
37 return atob(str);
38 }
39 return null;
40}
41
42
43function isBase64(str) {
44 const base64Regex = /^[A-Za-z0-9+/=]+$/;
45 if (!base64Regex.test(str)) {
46 return false;
47 }
48 try {
49 const decoded = atob(str);
50 return true;
51 } catch (e) {
52 return false;
53 }
54}
HEX strings
In this program, the re
module is used to perform regular expression matching and replacement. The decode_hex function is defined to take a regular expression match object as input, extract the hexadecimal string from the match, decode it to regular ASCII characters, and return the decoded string.
The with statement is used to open the input.html file and read its contents into the content variable. The re.sub
function is then used to replace all occurrences of the regular expression r'\\x[0-9a-fA-F]{2}'
with the result of calling the decode_hex
function on each match. This regular expression matches any sequence of characters that starts with \x and is followed by two hexadecimal digits.
The resulting decoded content is written to an output.html file using the open function and the 'w' write mode.
Note that this program assumes that the input HTML file contains only hexadecimal representations of ASCII characters that are encoded using the \x notation. If there are other types of encodings present in the file, they will not be handled by this program.
1import re
2
3def decode_hex(match):
4 hex_str = match.group(0)[2:]
5 decoded = bytes.fromhex(hex_str).decode('ascii')
6 return decoded
7
8with open('input.html', 'r') as file:
9 content = file.read()
10 decoded_content = re.sub(r'\\x[0-9a-fA-F]{2}', decode_hex, content)
11
12with open('output.html', 'w') as file:
13 file.write(decoded_content)
URI-encoded
This script defines a custom function called url_decode()
that takes a URL-encoded string as input and returns the decoded string. The url_decode()
function uses a while loop to iterate through the input string character by character. If a percent sign is encountered, the function uses the int()
function to convert the following two characters to a hexadecimal value, and then uses the chr()
function to convert the hexadecimal value to an ASCII character. If there is an invalid literal for int()
, the function simply adds the three characters to the output string as-is. The url_decode()
function then returns the decoded string.
The rest of the script is similar to the previous example. It loads the input HTML file, decodes the URL-encoded characters using the url_decode()
function, and writes the decoded HTML to a new file named decoded.html. Note that this script also assumes that the input HTML file is named output.html and is in the same directory as the Python script. You may need to adjust the file paths in the script to match your specific use case.
1# Custom function to decode URL-encoded characters
2def url_decode(string):
3 i = 0
4 result = ""
5 while i < len(string):
6 if string[i] == '%':
7 try:
8 result += chr(int(string[i+1:i+3], 16))
9 i += 3
10 except ValueError:
11 result += string[i:i+3]
12 i += 3
13 else:
14 result += string[i]
15 i += 1
16 return result
17
18# Load the input HTML file
19with open('output.html', 'r') as input_file:
20 html = input_file.read()
21
22# Decode the URL-encoded characters
23decoded_html = url_decode(html)
24
25# Write the decoded HTML to a new file
26with open('decoded.html', 'w') as output_file:
27 output_file.write(decoded_html)