Danubius IT Solutions' tech blog


Danubius IT Solutions' tech blog

How may your file uploads get hacked?

How may your file uploads get hacked?

Danubius IT Solutions's photo
Danubius IT Solutions
·May 23, 2023·

7 min read

Featured on Hashnode

Shared from Danubius IT Solutions' tech blog.

In this blog post, we will cover the most important file input validation techniques on the web, some attacks against them, and highlight the importance of sanitizing data sent by the end user with a case study.

Purpose of input validation

On the internet, many endpoints are publicly available, therefore not only our targeted customer group will be able to use our service, but malicious hackers as well. Their goals may vary, but the most desired one is usually to get a remote shell to the server computer, and later collect different kinds of information, use CPU to mine cryptocurrency, etc. there. As a developer, we need to build secure applications, and sanitizing input data is only one, but very important part of this process.

Difference between client and server-side validation

Server and client side validation

As you can see in the figure above, the basic scheme is quite easy: client-side validation runs on the user’s computer, while server/application-side validation happens on the server side. There are many advantages and disadvantages of both of them, and we don’t plan to summarize all of them here, only a couple.

Client-side validation can be fast, does not affect our servers/infrastructure, and the performance is based only on the client computer’s resources, but it is very easy to bypass because it runs on the local machine. It is written usually in a scripting language (most often Javascript or Typescript) so the browser can run it by itself.

Server-side validation is slower, every check needs at least 1 HTTP request-response pair until it gets results and/or feedback to the user. Our servers’ resources are used, but in exchange, it can be far more secure. Any server-side language can be chosen, usually .NET, PHP, Python, Java and related frameworks are the most famous ones.

Now that we understand the basics (if you need more info, check here), let’s move on to the specific implementation of these.

File validation techniques

Let’s iterate through some common validation methods, which are used on the web frequently. String (text) validation will be completely skipped (if you’re interested in it, check this OWASP cheat sheet), and jump instantly to file uploads. Note that the mentioned techniques are not perfectly secure by definition, we need to understand where and how they can be mixed to mitigate the risks.

File extension filtering

if(file.extension != ".jpg"){ alert("Not valid file extension."); }

The easiest way is to check the extension of the uploading file. On Windows systems, the file extension still means something, it helps to define file type, while Unix-based operating systems rely on other methods (eg. magic numbers - see it later) to get information about files. This approach is easy to implement, but also easy to bypass (changeable locally), depending on the file.extension calculation method, blacklisting or whitelisting file extensions, etc.

File type filtering

Different techniques are available, let’s see the two most popular ones.

Mime type

When a client uploads a file via http/s, the header always contains a “Content-Type: “ part, where the value is in type/subtype format, e.g. image/jpeg for a jpg picture. It is almost the same as file extension filtering because mime types are based on an extension of the file.

Magic number

Magic numbers are the first few bytes of a file, which is unique to a certain file type. Linux file command is based on this string.

Magic number

Jpegs are usually allowed in a picture uploader web form, for testing purposes let’s assume we are an attacker and want to upload a malicious binary or shell script over a magic number validation.

Jpeg upload

We can see, that the hex signature of the bash script is 23 21, which is accurate. We want to change this beginning to FF D8 FF DB because that’s the signature of jpegs, so we add 4 chars to the beginning of the file, e.g. AAAA. These will be the bytes we want to change.

Bash script

This bash script won’t be able to run without change, but imagine the same with a PHP before the starting of <?php, a malicious backend code can be run after bypassing magic number validation. (Upload as .php, but with a jpeg magic number.)

File name, length, content filtering

Filtering control characters (slash, backslash, asterisk, semicolon, etc.) is mandatory, lengths check is pretty straightforward, and usually not a real problem for attackers (images are bigger than a functional reverse shell). Fully functional content filtering is complicated, and usually not implemented in a single web app, only the previously mentioned content typing (extension, etc.) is used.

Client-side attacks

These validations are unsafe, and we do not use these to provide security but to give customers fast feedback and pleasure. How can we bypass it?

  • First and foremost you can completely switch off Javascript in your browser.

  • Use browser Developer Tools for changing Javascript methods, even if those are obfuscated, use de-obfuscator.

  • Install a middle proxy between your client and the server, such as Burp, and manipulate every request after client validation.

  • Add or change the content type there.

Server-side attacks

  • Extension validation can be incomplete (e.g. filtering all .phpX stuff, but forgetting .phtmlX or similar, which is still processed by the web server and php plugin)

  • Content checks and filtering.

  • Magic numbers we already covered.

There are many other validation techniques, which were not mentioned here, know and use them properly for building a secure applications. Remember that security bugs may not be similar to functional ones, where you can fix errors even after customers faced it. In certain situations you may not have the chance to do this after critical, escalated unauthorised access over a security hole.

Let’s see how an insufficient validation may cause deeper problems.

A short case study

We use a vulnerable demo box from Tryhackme.

Tryhackme demo box

Let’s see what’s on client side in html/js code.

Client side

We can see that there is a length, a jpeg (magic number) and an extension validator. We need to bypass every one of them somehow. With Wappalyzer, we can see that the backend is served by NodeJS, which is important to know, because of the type of possibly usable reverse shells.

Backend details

When we try to upload a non-jpeg file, an error occurs, so the javascript does its job well. So right now we are able to upload jpegs, but we don’t know where those are stored. Let’s try to investigate web directory structure with Gobuster.

Javascript example

Many of them are http error 301, because directory listing is prohibited, but assets and content can be good candidates for storing the uploaded files in. The THM box helps us with a keyword list, which contains 3-letter words. Maybe those are directories, files or else, but it’s still a good hint. I only show the final solution below, because the investigation went on different failed ways. Content dir contains uploaded stuff.

Content dir

After uploading a file, it appears here with a randomly chosen 3-char long file name and jpg extension. From previous directory listing we realised that an admin directory is also reachable without any error code (200). This is how it looks like.

Admin page

It seems that the module directory contains scripts which can be executed by this form. The problem is that our uploads are not in that dir, and we can only upload jpegs. Earlier we acknowledged that NodeJS is serving the backend, let’s use this payload to get a reverse shell, and try to upload it as jpeg. Make sure that IP and port is correct, that’s where you should listen for reverse connection.

Reverse connection

Let’s upload it and check the files again in content directory with gobuster, so we know the actual name of it.


We can download the LGV.jpg from content directory to make sure that it’s our file. It is. Let’s go back to the admin page and examine that form, capture the request with Burp proxy.

Burp proxy

We see that the param is a direct POST parameter. Now start a netcat listener on the previously configured IP and port ( in my case). If it runs, we should call that “jpg” from the modules directory with relative path. We are pretty sure that the OS is a Linux or other NIX-based system, because it answers ICMP pings and nmap base scans. (Microsoft firewall drops every ICMP request by default.)

Netcat listener

The connection was successful, we have root shell to the server now, and we didn’t even need to handle the jpeg situation, backend was able to call a jpg directly. Good enough. The room’s input validation was way below average, and the Nginx/NodeJS shouldn’t be run as root, but it still shows us the importance of input validation.

Lessons learned

Sanitize every single user input, and handle those with proper care on both client and server side. Do not assume anything, check and test those values!

Share this