Traffic filtering might be a legal or compliance requirement for some networks, it might be deployed to prevent the network being used to access content that’s blatantly illegal, or even to protect the user against sites known to be genuinely dangerous. Unfortunately there are cases where content filtering is detrimental, or the motives extend to political, religious or ideological censorship. For example, ongoing research by the Harvard Law School, Dynamic Internet Technology and the rest of the Global Internet Freedom Consortium suggests the majority of sites inaccessible from within China are blocked for mainly political reasons.
This article is a very basic guide for defeating blocking/filtering systems, using a methodology that’s recently proven effective against a well-known commercial traffic filtering system, and where the use of circumvention software wasn’t an option. However, it’s not a guide for preserving anonymity or confidentiality – the techniques outlined here aren’t suitable for communicating sensitive information.
First, it’s important to learn something about how the different types of filtering work, and how they’re combined almost universally across Internet blocking systems. From the client’s point of view, a given connection is either allowed or disallowed, according to whatever policies are implemented. In reality, commercial systems combine several layers of filtering that work on different components of the traffic.
Figure. 1 shows a very simplified representation of a TCP/IP packet. This is central to understanding the two main components of a filtering system: one deals with IP routing, and the other with the data being routed in TCP packets. Any anti-censorship system must obscure both in some way. In addition, there is a third blocking method that forwards or redirects DNS requests according to the URL being visited.
A paper called The Great Firewall Revealed (Global Internet Freedom Consortium) discusses the three methods in detail, but the following are basic descriptions of how they work, and how each can be countered individually.
IP Address Blocking
The destination IP address is checked against a blacklist of known addresses, whether they refer to banned sites or proxy servers. In larger networks this is more commonly deployed at the gateway level, as TCP inspection can be resource-intensive where high loads are involved. Even in relatively advanced systems, sites must be reviewed and added to the blacklist manually, which makes it ineffective against undiscovered proxy services that become available.
Works by inspecting the packet being routed to determine whether given keywords exist in the payload. This is how the traffic filtering system determines whether to block or allow the site based on the content of the web pages, or whether the user has submitted certain information or queries. This can be defeated by encrypting the payload, usually through SSL/HTTPS by changing the URL from ‘http://‘ to ‘https://‘.
In reality, DNS and URL blocking are slightly different things, but the problem’s the same from the users’ perspective: the web address determines whether the connection is blocked or not. The URL is scanned for keywords, or compared with a domain blacklist prior to the request being resolved, redirected or dropped. Many otherwise decent proxy services become unavailable simply because their URLs contain the word ‘proxy’.
Certain domains can also be mapped to incorrect IP addresses in the local DNS, perhaps causing URLs to be redirected to some error/warning page instead of the actual site. Most commonly experienced in the latest round of ‘anti-piracy’ measures, this is easily defeated by accessing the server by its IP address (if known).
An Overview of Proxy Servers
A basic proxy server simply relays traffic between the client and the server, effectively enabling the client to access the blocked site through a different IP address and URL which the filtering system doesn’t recognise. The commonly available web proxies are servers running PHP forms that users enter the URL of whatever sites they wish to visit.
Most proxies themselves shouldn’t be trusted, as they might record or log whatever traffic being relayed, unless they are specifically designed to protect the anonymity of their users. Traffic can be traced to the proxy, and the logs obtained by whoever’s running them.
Combining the Countermeasures into an Effective Methodology
All the above techniques are effective against the different methods of filtering traffic, but they must be combined where reasonably advanced systems are in place. Here the objective is to find a proxy service that uses SSL/HTTPS at an unrecognised IP address, and with a URL that doesn’t contain any keywords suggesting its function.
1. The first step is to consult a search engine, such as Google or IXquick, and get a list of curretly available proxies. It might be necessary to copy and paste the listed URLs in the address bar, and change ‘http://‘ to ‘https://‘. More often than not, there’ll be warnings about invalid SSL certificates, but they can be ignored here.
2. Unfortunately most the URLs listed will contain the word ‘proxy’ or other keywords that suggest their purpose. What’s needed are the ones without a suggestive URL. In Figure. 3 there are only two proxies, which I’ve highlighted, that match this criterion.
3. The next problem is the vast majority of web-based proxies have home pages containing keywords, and TCP filtering will cause this traffic to be blocked. Therefore, the TCP payload must be encrypted to prevent the content being scanned. This can be done by copying and pasting the selected URLs from the proxy list into the address bar of the browser, and again replacing ‘http://‘ with ‘https://‘. In most cases there’ll also be warnings here regarding dodgy SSL certificates, but these can be ignored unless there’s a real need to authenticate the server.