Configuration Extraction with YARA

Sept. 25, 2022 // theatha

Introduction

Hello everyone! Until seeing this blog from _n1ghtw0lf, I did not know that we can use YARA rules for configuration extraction. He wrote a YARA rule for dotnet samples using dotnet and a custom module. Then, it is inspired me to do the same thing for other kinds of samples besides samples that are written in dotnet. However, I could not find any module that gets the data at the given offset. Thus, I decided to write my own helper. Also, I will give an example YARA rule that uses this module to extract the Danabot sample's configuration.

The Situation

YARA is mostly aimed at helping people to classify the samples. Typically, researchers would use YARA to detect patterns and identify the sample then will progress with the configuration extraction scripts if they found a known malware. If you insist to use YARA to extract some valuable information from the samples, there are some modules that can help. However, they are not much efficent for configuration extraction. For example, let's write a configuration extractor with YARA for the Danabot samples without modules. Try to understand YARA rule below;

  1. import "console"
  2. rule DanabotV1_Config_Extraction {
  3. meta:
  4. author = "Taha Y."
  5. danabot_samples = "https://github.com/f0wl/danaConfig"
  6. strings:
  7. $s1 = {4D0069006E00690049006E00690074003A004500780063006500700074000000}
  8. condition:
  9. $s1 and console.hex("[+] OFFSET ", @s1+214)
  10. and console.log("[+] C2-#1:") and console.log("octet-1: ",uint8(@s1+214))
  11. and console.log("octet-2: ",uint8(@s1+215)) and console.log("octet-3: ",uint8(@s1+216)) and console.log("octet-4: ",uint8(@s1+217))
  12. and console.log("[+] C2-#2:") and console.log("octet-1: ",uint8(@s1+224))
  13. and console.log("octet-2: ",uint8(@s1+225)) and console.log("octet-3: ",uint8(@s1+226)) and console.log("octet-4: ",uint8(@s1+227))
  14. and console.log("[+] C2-#3:") and console.log("octet-1: ",uint8(@s1+234))
  15. and console.log("octet-2: ",uint8(@s1+235)) and console.log("octet-3: ",uint8(@s1+236)) and console.log("octet-4: ",uint8(@s1+237))
  16. and console.log("[+] C2-#4:") and console.log("octet-1: ",uint8(@s1+244))
  17. and console.log("octet-2: ",uint8(@s1+245)) and console.log("octet-3: ",uint8(@s1+246)) and console.log("octet-4: ",uint8(@s1+247))
  18. }

As you can see, the builtin functions could not much help. We are unable to output the data that is at the given offset properly. Also, we are unable to detect and show the strings that are in this sample directly. So, I do not think that anyone will find this method helpful. So, let's achieve these goals with the help of YARA modules!

YARA Modules

Modules are the method YARA provides for extending its features. They allow you to define data structures and functions which can be used in your rules to express more complex conditions. Check the docs for more information.

Writing a YARA Module

Modules are written in C and built into YARA as part of the compiling process. In order to create your own modules you must be familiar with the C programming language and how to configure and build YARA from source code. Check the docs for more information.

YARA modules reside in libyara/modules, it’s recommended to use the module name as the file name for the source file. Then you should include the necessary libraries and defining the module name in the source code.

  1. #include <yara/modules.h>
  2. #include <inttypes.h>
  3. #define MODULE_NAME parseutils

Used Structures and Functions Explained

If you check the source code parseutils.c, you will see some structures and macros in use. I would like to explain what they are used for.

  1. /*
  2. YR_SCAN_CONTEXT*: It is used to inspect the file or process memory being scanned.
  3. YR_MEMORY_BLOCK*: Represents the memory block.
  4. YR_MEMORY_BLOCK_ITERATOR*: Iterator for memory block.
  5. YR_OBJECT*: Represents each object declared in the YARA module.
  6. define_function: Define your function with your desired function name and it's code.
  7. print_int_data: It is a function identifier.
  8. */
  9. // parseutils.print_int_data(offset,size)
  10. define_function(print_int_data){
  11. YR_SCAN_CONTEXT* context = yr_scan_context();
  12. YR_MEMORY_BLOCK* block;
  13. YR_MEMORY_BLOCK_ITERATOR* iterator = context->iterator;
  14. YR_OBJECT* module = yr_module();
  15. /*
  16. ... continues below
  17. ...
  18. */
  19. }

This function takes 2 parameters, offset and size. They can be defined like this:

  1. // Takes the argument given integer when calling the module within the YARA rule.
  2. int64_t offset_0 = integer_argument(1);
  3. int64_t size = integer_argument(2);
  4. // The data array is the variable that holds the last value to return.
  5. uint8_t data[size];

I used foreach_memory_block and fetch_data to get data according to the given offset and size.

  1. // define_function(print_int_data) continues
  2. // foreach_memory_block macro allows iterating over data sliced into blocks.
  3. foreach_memory_block(iterator, block)
  4. {
  5. // fetch_data returns a pointer to the block's data.
  6. // Each data in the block comes to the block_data
  7. // variable and is thrown into the data array.
  8. const uint8_t* block_data = block->fetch_data(block);
  9. int t = 0;
  10. for (size_t i = offset_0; i<offset_0+size; i++)
  11. {
  12. uint8_t c = *(block_data + i);
  13. data[t] = c;
  14. t++;
  15. }
  16. }
  17. char str[size];
  18. int index = 0;
  19. // convert the data arr to desired char arr
  20. for(int i=0; i< size; i++)
  21. index += sprintf(&str[index], "%d ", data[i]);
  22. // set desired output with set_string
  23. yr_set_string(str,module,"str");
  24. // return the char arr
  25. return_string(str);
  26. }

Then, running included "build.sh" script in YARA will be compiled with this newly created module.

The Action

Let's see parseutils in action!

Brief Summary of Danabot's Configuration Structure




Let's write a new YARA rule that actually outputs helpful information this time.

  1. import "console"
  2. import "parseutils"
  3. rule danabot_config_extractor {
  4. meta:
  5. author = "Taha Y."
  6. danabot_samples = "https://github.com/f0wl/danaConfig"
  7. strings:
  8. $s1 = {4D0069006E00690049006E00690074003A004500780063006500700074000000}
  9. $s2 = {2E6F6E696F6E} //.onion
  10. condition:
  11. $s1 and console.hex("OFFSET : ",@s1+224) and
  12. console.log("C2-ip1: ",parseutils.print_int_data(@s1+214,4)) and
  13. console.log("C2-ip2: ",parseutils.print_int_data(@s1+224,4)) and
  14. console.log("C2-ip3: ",parseutils.print_int_data(@s1+234,4)) and
  15. console.log("C2-ip4: ",parseutils.print_int_data(@s1+244,4)) and
  16. console.log("TOR: ",parseutils.print_string_data(@s2-56,62))
  17. }

The Conclusion

Although the YARA project is mainly used for detection and identifying malware samples, I managed to achieve another goal, which is extracting valuable information. Deep deep down, far far in, I feel that this effort is redundant yet shows how much YARA is flexible.

You can find mentioned resources and one extra yara rule that extracts information from another Danabot variant in "YARA_for_config_extraction" repository. If you want to discuss or ask me something, you can reach me from twitter.

Thank you for reading my blog post! Have a nice day absolute legends!