Friday, October 3, 2014

Automating Simple Website Reconnaissance Measures a.k.a An Ounce of Prevention

As a pen tester as part of an internal security team, I'm responsible for periodically sweeping our networks to identify web servers and determine if there are risks presented by those websites such as information disclosures, default credentials, or insufficient access and authorization measures.  (aside: yes, change control would make sure this never happened again.  In a world filled with unicorns farting rainbows.) On anything other than a small network, this can quickly become a time-consuming task. It didn't take long to decide to automate as much of this process as possible.

Since our vulnerability scanners are regularly touching all parts of our network, they are a good choice as a source for a list of hostnames, IPs, and ports for any service speaking HTTP or HTTPS. After massaging the data in Excel I have a list of URLs to test using either the FQDN or IP and the port number.

Once I have this list, typically several thousand different URLs to test, I need to quickly eliminate the systems I don't need or want to inspect.  To do this, I wrote a simple python utility which uses urllib2 to pull in the page associated with each URL and analyze it through a simple string.find() loop.  I built a dictionary of common sites that I know I won't need to inspect, such as
  • Sites with the corporate authentication mechanisms presented
  • Default Apache / IIS web pages
  • Default Tomcat or JBoss install
  • KVMs and SAN switch interfaces
  • etc.
When the utility finds a URL matches something in the dictionary, it records this in the output file.This resulting report contains far fewer sites needing inspection than the original list.

The biggest return isn't in time saved, however. The real value comes when the utility isn't able to classify the site. These sites often contain information that should have been secured, or authentication mechanisms using weak/default credentials.  I can easily filter the output into additional tasks, such as testing for default Tomcat or JBoss credentials, etc.

In the past, I would take these unclassified results and dump them into a spreadsheet and then review them individually. Any site that would attempt to perform a Javascript redirect or refresh to a different landing page when '/' was requested would fool my utility as urllib2 is unable to follow the redirect. This lead to manually reviewing a lot of sites that would otherwise be easily identified if my utility could see the landing page.

A while back I experimented with being able to take a screenshot of each site to quickly eliminate these sites visually. Unfortunately, at the time, every utility I investigated was also stumped by the redirect. AJAX-heavy sites also fooled my utility as well as the other utilities I tested.

This summer Netflix released a tool they wrote - Sketchy - which they use to assist in their IR processes. Sketchy addresses the same issues I was experiencing with Javascript and AJAX sites. After reading about Sketchy, I knew that I wanted to try applying this to my processes to see if I could get better results and be more efficient.

Feeling inspired by all the incredible talks presented at DerbyCon,I decided it was time to start putting Sketchy to work. I blogged earlier about my experience setting up Sketchy, you can read about it here.

While Sketchy does have an API, a quick and dirty shell script worked for my needs.  The script supports grabbing a screenshot (sketch), grabbing the DOM as text (scrape), or grabbing the rendered HTML (html). For sites sketchy is unable to connect to, my script makes a log entry and does not produce an artifact.I can quickly view these resulting images and determine if the site is something that warrants further inspection.


Linksys router login page
Twitter login page


Reviewing websites is essential to identifying information disclosures, weak authentication mechanisms, and new web apps or devices that may have been deployed without your knowledge. Regularly reviewing these websites for this information prevents audit findings and helps keep your network and data safe from unauthorized access.

Sketchy was easy to install, and it didn't take long to whip up a functioning system.  With a few hours of setup, scripting, and testing, I'm able to automate what used to be several hours of work. In the end, I'm free to get more done, and much more of the proverbial low-hanging fruit is picked.

If you're using different tools to achieve the same end, I've love to hear about it. Leave me a comment or reach out to me on Twitter.


1 comment:

  1. Very cool Mike! Thanks for the details. Gonna have to try your script. You tried Chris Truncer's EyeWitness ( or Tim Tomes' PeepingTom (