Detecting Phishing Sites in Your Logs
I recently read the Anti-Phishing Working Group's 2Q 2014 report and saw the number of unique phishing sites. I then compared the numbers with the previous year.
After more than 10 years of phishing it's still around, and growing! Back then, there were companies offering clients a way to detect phishing attacks by analyzing their own web server logs. I wrote my own program in 2006 that does this and decided to update it and offer it up as freeware in case anyone needs a tool like this (I wrote a Python script that does the same thing which I'll probably push onto Github one day).
The idea behind this and other similar tools is to analyze referers in your web server logs. These referers are generated when a user visits a phishing page and submits the form. Upon receiving the user's credentials, the phishing page will often redirect the user to the legitimate website. The referer will contain the URL of the phishing site.
Keep in mind then that if the phishing website is self-contained (that is, does not need any files from the legitimate site) and does not redirect the user back to the legitimate site then there would be no trace in the web server logs.
Let's take a look at a typical phish. Here I went to PhishTank.com and try to find a phishing site that's still up:
Here's what the site looks like:
When I proceeded through the pages where it asks for more and more personal and financial information, I eventually get to the last page:
Clicking on the Continue button takes me to the main Danish Paypal site for some reason:
I captured the source code of last phishing page and it looks like this. Notice that it contains links back to the real Paypal site. I've highlighted the link to the main logo graphic.
If we were to look at Paypal's web server logs, it might look something like this (note the last line). There's a GET request to the logo graphic and the referer is the URL of last phishing page that called the graphic up.
If we could find these entries in our log files, we'd find these phishing sites and get them taken down. And we don't need to rely on users telling us about it. There's also an added bonus. Sometimes phishers will test their creation first and their referers show up in the logs and we can take down those phishing sites before their phishing campaign can even begin!
Here's where the program, Sounder aka FishFinder, comes in:
The top portion is where you define folders and filenames. You also need to define the column that contains the referer information (be sure your logs contain referer information or this program won't work!) and line separator. There's debug modes to help you.
You can have it check the Contents of the potential phishing site by scanning for content keywords as defined below. For example, if you enter login, password, email, and username that you see there, the program will check if the website has any of those keywords and tell you if there's a match on the results file.
The Check Filename option will check if the referer contains any of the blacklisted items. The blacklist textbox should contain filenames of known bad referers. In the case of Paypal, it might be something like "paypal.com.html" or "logon.php". The whitelist textbox would be URLs that you would want to ignore like partner websites, spiders, portals, etc.
If the Capture Screen option is set, the program will screenshot the page for visual inspection. This feature requires PhantomJS (www.phantomjs.org). I've included the required "rasterize.js" file in my download so you just need to copy the PhantomJS executable into the folder.
Finally, the Server File (Referers) textbox should contain the paths to files on your web server that is often used on phishing pages. Here, I've included the path to the logo file.
You can save (and load) the settings by clicking on the appropriate buttons on the bottom. The program uses an INI file which contains helpful descriptions and worth looking at before you use the program.
When Sounder is run, it will scan the files in the Logs folder and look for any HTTP request matching the items in the "Server Files (Referers)" textbox then inspect the referer. If the referer is known bad then it will automatically flag it. If the referer is on the whitelist, it will ignore it. If the referer is neither good nor bad then it will flag it as suspicious so you can have a chance to inspect it. You should then add the referer to either the white or black list as appropriate for future runs.
If the referer is marked suspicious then it will (optionally) visit the page and check if the webpage contents contain any of the items in the "Content Keywords" textbox and grab the screen, regardless of whether there were any keyword matches.
Here's the results file that shows that this particular referer was suspicious and the keyword "login" was found on the webpage.
This is the screenshot that PhantomJS captured.
I hope you find this program useful!