Playing Hide and Seek with Malicious Scripts

When I encounter a drive-by download that involves a compromised host, there will usually be a malicious script somewhere on the website. The “malicious script” could be a meta refresh tag, an iframe, an external Javascript file, or even in a SWF file. Its location can give you a hint as to how the site got compromised as well as to help the webmaster verify and remove the infected file(s).

Oftentimes, these scripts can be found at the top or the bottom of the webpage:

Lately, however, I’m noticing that it’s getting harder to find them. Here’s a couple from just this week alone!

Near the top of an external Javascript file called “site_util.js” is a self-executing function. The lack of formatting made this one stick out.

In this drive-by, the malicious script was buried in the “jquery.js” file. Do you see it? I nearly missed it myself. (Hint: look for the document.write statement.)

I’m trying to develop a method of identifying malicious scripts and work backwards to locate where they came from. The first hurdle is trying to identify the malicious script. As a malware analyst, we can mostly recognize that a script is suspicious just by looking at it. Here’s an example:

In the above, Sample #1 is a benign JQuery script. The other two look suspicious. How do you write a program that mimics what our eyes see? Yeah, right, I’m not crazy. What I can do is write a program that identifies “entropy”. (I know that’s not the right word but it’s close enough until I find a better word.) If you look closely at the JQuery script, you can recognize many valid Javascript keywords. While it exists in the other two scripts, there’s not a whole lot. Instead, there’s a ton of random characters and numbers. So what if I write a program to look for randomness, then I might be able to spot malicious scripts. Then I’ll need a way to enumerate all the objects used by a webpage and keep track of what it loads.

Here are results from actual drive-bys using the above methodology:

The main site had an iframe to a local (infected) PHP file. The PHP file had obfuscated Javascript that redirected visitors to an exploit pack.

In this drive-by, there was an iframe to Blackhole from the main website.

Last one. The main webpage loaded a local file via an iframe which had a meta refresh tag to Blackhole.

Looks promising but I still need to figure out how to find those pesky malicious needles in the proverbial haystack.

This entry was posted in Malscript and tagged , , , . Bookmark the permalink.

3 Responses to Playing Hide and Seek with Malicious Scripts

  1. Gianluca says:

    To detect malicious scripts you can extract ngrams from them to train a classifier like a neural network or others.

  2. I second the comment to use machine learning. You should be able to extract what bad JS looks like and represent it in a vector full of features. From those features you can then train the classifier and have it do all the hard work for you. Shoot me an email if you want some code I used to do the same sort of thing with PDF files. I had a lot of success and would love to see that sort of approach used more!

  3. sachin says:


    last year I completed an algorithm based on the similar lines but the objective was different (Detection of Phishing Sites and Malware laced Sites). When I had started the algorithm I had no idea on how it would shape up.

    How to find ? I used regex . Need any more help drop me a mail.

    sachin r.

Leave a Reply