There’s one thing I really don’t like about URL shorteners: We’re required to trust whoever’s posting the shortened URL that there’s nothing malicious behind it. For example, how do we know URL http://gu.com/p/3pgez points to a Guardian article and not a malicious web page?
The wget tool can be used to determine this:
$wget -S http://www.domain.com
The ‘wget -S‘ option gets just the HTTP headers for whatever site, rather than download the actual HTML files. It could be used as a more reliable fingerprinting method than the Telnet banner grabbing technique, and it can tell use whether a URL redirects to another.
So let’s test this on http://www.southwales.ac.uk, as I know there are two other domains that redirect to it (glam.ac.uk and newport.ac.uk). From the output we can see the IP address the URL resolves to, the server version (highlighted) and some other information.
There are several fields worth noting. The ‘Server‘ response field (near the top) gives a status code for the request, indicating how the server handled it. A code ‘200’ means the request was successful, and the home page is there.
Sometimes the output has a ‘Last-Modified‘ field, indicating when the home page was last modified. The browser might use this to determine whether the page was updated and therefore whether to use the cached page or download the updated file, but an attacker could also determine whether the site is regularly maintained and therefore whether it’s an easy target.
Two other fields that reveal something about the server:
* Server: As mentioned, fingerprinting the web server and software version can enable a lookup in a vulnerability and exploit database.
* X-Varnish: Appears to be a faster cache lookup method, which explains the two ID numbers in the output. A little digging revealed that Varnish is associated with Drupal, so the University’s site is likely a Drupal system running on nginx.
Now, onto the redirection: It’s normally used for legitimate purposes, and in the example here it’s used for redirecting browsers visiting the old University of Glamorgan site to the new University of South Wales one.
Redirection also enables an attacker to disguise a malicious URL behind a more legitimate-appearing one (or a shortened URL), which is useful if the objective is to get users clicking links within an email. Redirection also makes it harder for anyone who’s unaware of the trick to investigate the actual location of a malicious page or blacklist it. Attackers can keep using one domain and cycle through a list of expendable ones.
The first header field to look at is the status code. Here it’s ‘301’, indicating (along with 300 and 302) a redirection. Further down, there is the ‘Location‘ header, which didn’t appear in the last output and reveals the domain/URL a browser would be redirected to.
The way things work, the browser sends a request to the original URL, and the response header includes a status code to indicate a redirection along with the destination URL. The browser then makes another request to the URL in the location field and downloads the content if the status code is ‘200’.