The Power of Funneling — Part 1

Alpine Security
9 min readNov 2, 2023

Understanding the Impact of Data Breach on Organizations and Effective Funneling Strategies

2 November 2023 — Annabel López

Introduction:

We kick off a series of blogs that will explore the world of cybersecurity and computer forensics. This first chapter will delve into the strategy of “Funneling” in computer forensics, a key technique for specific compromises affecting large amounts of data to be analyzed.

Organizations face increasingly complex challenges in a world where digital threats are advancing rapidly. In this article, we’ll address an exciting funneling use case that reveals how technology and human expertise combine to protect organizations.

In those series of posts, we’ll delve into the critical issue organizations face when it comes to data compromise and potential theft. Several strategies can be considered to address this concerning problem.

  • Identify the type of data compromised:

Knowing what types of data, you have or may have been compromised or even exfiltrated is critical. We will define ways to understand the different sets of data and define analysis strategies for each type of data and according to their quantity.

  • Eliminate workload by reducing known elements:

By means of validating the type of known files, we can reduce time, effort, and workload. This strategy can be merged with some of the others described as follows.

  • Identify and detect hidden malware within the data:

Business operations need to start using their files and data as soon as possible, so it is critical to discriminate clean files from the ones potentially containing malware.

  • Indexing and provisioning of data content info:

Methods to understand the contents and type of the data compromised, by means of running processes to detect specific type of information within the compromised data as PII (Personal Identifiable Information).

Compromise detected where still is not confirmed exfiltration of data repositories:

Time is critical during the incident response stage, where the priority is containment and avoid the spreading or persistency of the threat as well as to identify the potential extent of the compromise, regarding the information contained in the repository.

Regulation rules and potential fines in case personal data information has been exposed or extracted is a major concern as well, so the goal is not only the certification that the data is free of threats but identify the potential exposure of the company in case of any of the data has been compromised.

The Power of Funneling:

The concept of “Funneling” in computer forensics emerges as a very valuable strategy to cover large amounts of data to identify the potential affectation of the incident hidden in some of the files or threats across the information affected.

Security teams cannot simply review and evaluate every single file of large repositories, as the task will take forever.

What exactly is funneling and how does it help in those situations?

In simple terms, Funneling is a forensic strategy used to granularly triage a huge set of data to define different strategies based on the needs or goals that we need to achieve. What kind of goals are we speaking about?

  • Data classification
  • Data cleanliness validation
  • Data indexation

Funneling is the key on these situations as enables the security teams a quicker way to discriminate clean data from potential compromised one, identify the type of data involved on the incident, the content of the data and the most important, it speeds up the return of the business operations as they can release data repositories once they are proven safe.

Funneling: Breakdown of methodologies

Funneling covers two main objectives to achieve:

  • Identify potential threats hidden within the compromised files.
  • Identify the exact nature of the contents of the compromised files, so the companies can take action course based on the identified risks.

Funneling for Malware.

Funneling for malware identification relies on 3 main phases:

  • Automatic Phase: In this initial stage, automatic tools are applied to filter many files for known threats. This includes searching for known hashes, scanning for antivirus, and applying detection rules.
  • Semi-automatic phase: With the number of files reduced, we begin a semi-automatic phase, where tools that require a more detailed review are used. This includes timeline analysis, entropy checking, and the search for more subtle indicators of compromise.
  • Manual Phase: Finally, in this phase, the files are reduced to a manageable number and individual analysis techniques are applied. Services such as VirusTotal, sandbox analysis, and in-depth network traffic investigations are used.

Funneling for Compromised data analysis.

In this case, we will perform also in main phases:

  • Automatic Content Index: In this initial stage, automatic content analysis based on indexing of content and usage of scanned and image recognition tools are applied to the documents identified. Previously, it must be performed a process to categorize and identify actual documents containing information: PDFs, Office files, text files, etc. An automatic process to extract and index the content of the files is performed by means of usage of search engines, ocr tools, etc.
  • e-discovery phase: Once the data is indexed and organized, a specific search for data patterns based on e-discovery tools must be performed so the documentation containing those patterns will be reviewed in order to assess the value or risk of the information.
  • Manual Phase: Finally, in this phase, once the identified files containing the pattern data used in the content search is generated, a manual process for assessment is required to evaluate the risk. This normally is not a technical task, but business dependent.

ACME Case

ACME, a leading company in its sector, is facing a cybersecurity incident. A set of data repositories shared by compromised servers are suspected of hide malware files, part of next stage of the attack, or files potentially having been exfiltrated.

This is an example of a common situation today, where the initial suspected access comes from a phishing email from a trusted provider, allowed the attack through a fake HR Department email requesting workers self-assessment of a malicious attached document.

The opening and execution of the attachment by one of the employees with write access to the repositories generated a set of alerts in various systems compatible with several TTPs for malware deployment and data access and exfiltration.

What was the challenge?

At this juncture, ACME seeks assistance and services from Alpine Security with the clear objective of focusing on:

  • Certify the nature of the incident (malware deployment or data theft)
  • Determining the exposure in case of a successful data breach.
  • If data has been compromised, specifying the type of data (financial, intellectual property, personal, or other) that has been affected.
  • How to deal with the data protection agency depending on the type of information affected.

Data classification

The funneling strategy became strategic for ACME recovering business operations. In order to release access to affected data repositories based in a prioritization of the different repositories from most critical to less required so the efforts are focused on the actual real needs of the business.

The two types of funneling processes are to be applied, being the Malware funneling process first, and performing the e-discovery process once data is proven clear.

One of the most efficient tools in the process is the usage of the custom program called “funify.py” developed by the ALPINE SECURITY LABS team that is in charge of classify the different type of files present on the data repository, while performs a category segregation based on file type and extensions, for further specific analysis to be performed in parallel different ways.

What does “funify.py” do?

The “funify.py” tool (https://github.com/alpine-sec/funify) is a command line utility in Python, designed to analyze and categorize files present on a data repository in a digital data source. Its main function is to identify file extensions and group them into predefined categories according to their type.

As in any script “funify.py” comes with a help menu:

Execution mode:

usage: funify.py [-h] [-V] -f FILE

Utility to create a Summary of file extensions

options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-f FILE, --file FILE Path to Excel (.xlsx) or CSV (.csv) file

Example:

python3.11.exe funify.py -f .\comp001-fstl.csv

In order to provide the list of files to be analyzed across the data repository as parameter, we will first generate the specific list of files and full path by using another tool “mftmactime.py” tool (https://github.com/kero99/mftmactime) , created also by one Alpine Security team member.

Mftmactime will generate an output csv file containing the whole file hierarchy and paths identified within the data repository.

Funify.py will then read each line provided in the input csv file and will examine all file paths and extract the extension of each file.

As it processes the files, it counts how many times each extension appears so later the team may define different analysis strategies based on the exact content profile found in this initial stage.

def read_file(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
lines = [line.rstrip() for line in file.readlines()]
return lines

def count_extensions(file_paths):
extensions_counter = Counter()

for file_path in file_paths:
if "(deleted)" not in file_path and ",d/d" not in file_path:
_, extension = os.path.splitext(file_path)
extensions_counter[extension] += 1
return extensions_counter

Our tool will provide an organized report of the extensions into predefined categories, such as “Office”, “Document”, “Archive”, “Image”, “Video”, “Database”, “Audio”, “Machine”, “Email”, “Binaries” and “Installers”.

print("==============================")
print(" FUNIFY ")
print("==============================")
print ()
category_list = {
"Office": [
'.doc', '.dot', '.wbk', '.docx', '.docm', '.dotx', '.docb', '.pdf', '.wll', '.wwl',
'.xls', '.xlsx', '.xlsm', '.xltx', '.xltm', '.xlsb', '.xla', '.xlam', '.xll', '.xlw',
'.ppt', '.pptx', '.pptm', '.potx', '.potm', '.ppam', '.pps', '.ppsx', '.ppsm', '.sldx',
'.sldm', '.pa', '.accda', '.accdb', '.accde', '.accdt', '.accdr', '.accdu', '.mda', '.mde',
'.one', '.ecf', '.pub', '.xps'
],
"Document": [
'.csv', '.odf', '.odt', '.rtf', '.txt', '.wpd', '.wpg', '.wps', '.wri', '.xml', '.n43'
],
"Archive": [
'.7z', '.7zip', '.arj', '.bz', '.bz2', '.gz', '.gzip', '.rar', '.rzip', '.tar', '.taz',
'.tgz', '.tib', '.zip', '.xz'
],
"Image": [
'.bmp', '.gif', '.jpeg', '.jpg', '.png', '.tiff'
],
"Video": [
'.asf', '.avi', '.mov', '.mp4', '.mpeg', '.mpg', '.wmv'
],
"Database": [
'.accdb', '.dbx', '.mdb'
],
"Audio": [
'.mp3', '.flac', '.wma'
],
"Machine": [
'.vmdk', '.ova'
],
"Email": [
'.cal', '.dbx', '.edb', '.eml', '.emlx', '.mbox', '.msf', '.msg', '.nsf', '.pst', '.snm',
'.vcard', '.vcf', '.wab', '.ost'
],
"Binaries": [
'.exe', '.dll', '.bin', '.bat', '.cmd', '.com', '.cpl', '.gadget', '.inf1', '.ins', '.inx',
'.isu', '.job', '.jse', '.lnk', '.msc', '.pif', '.ps1', '.reg', '.rgs', '.scr', '.sct',
'.shb', '.shs', '.u3p', '.vb', '.vbe', '.vbs', '.vbscript', '.ws', '.wsf', '.wsh', '.cab'
],
"Installers": [
'.msi', '.msp', '.mst', '.paf'
]
}

Finally, funify.py will provide the file count for each extension and category, allowing you to better understand the composition of the data and focus on specific areas of interest during the forensic investigation.

def extract_extensions(extensions_counter):
filtered_extensions_counter = Counter()
other_extensions_counter = Counter()

for extension, count in extensions_counter.items():
if extension.count('.') > 1:
last_dot = extension.rfind('.')
last_extension = extension[last_dot:]
filtered_extensions_counter[last_extension] += count
elif len(extension) > 5 and extension.count('.') == 1:
other_extensions_counter[extension] += count
else:
last_dot = extension.rfind('.')
if last_dot != -1:
last_extension = extension[last_dot:]
filtered_extensions_counter[last_extension] += count
else:
filtered_extensions_counter[extension] += count

return filtered_extensions_counter, other_extensions_counter

The results will be painted per screen taking into account the total of files by categories and by each of the extensions.

 for category, extensions in category_list.items():
if all(filtered_extensions.get(extension, 0) == 0 for extension in extensions):
print(f"+ {category} Extensions: 0")
#print(" Files: 0")
else:
print(f"+ {category} Extensions:")
for extension in extensions:
count = filtered_extensions.get(extension, 0)
if count > 0:
print(f" + {extension}: Files: {count}")
print()

Importance in Funneling:

In the context of Malware Funneling, “funify.py” plays a crucial role in the initial phase of automatic filtering. It helps to quickly identify the file extensions present in the data source, allowing the ACME investigation team to focus its efforts on the file categories most relevant to the investigation.

This significantly accelerates the process of data reduction and the identification of potential threats as well as enables the capability or performing the e-discovery processes once the actual documents with content are identified.

As a result, we obtain a sample of the scope of the Funneling strategy with the total number of existing files in the evidence categorized by type of extension:

With a clear view of the number of files to be analyzed and the key areas identified, our attention will be focused on decompressing compressed files, exploring email attachments, and delving into the details of relevant files.

This is just the beginning of our funneling strategy. The story will continue…

References

https://github.com/alpine-sec/funify

https://github.com/kero99/mftmactime

--

--