Since everything went ‘Web 2.0’, there’s been a huge change in the way intelligence is gathered, and it’s safe to say the vast majority of today’s espionage and advanced targeted attacks begin with footprinting through Internet-based research without the target being aware of it, using information that’s published on the Internet or provided to the media. Because the information/data is already public, it’s much easier to outsource, distribute and share intelligence more efficiently than ever.
This means anyone can use the same methods to their advantage, and in terms of enhancing security, for gaining more detailed knowledge of the threats, their capabilities, relationships and how they operate. Perhaps most importantly we can, in some cases, use it to make fairly accurate predictions.
Although there’s a fair amount of literature out there on Open Source Intelligence (OSINT) suggesting various structured processes for going about this, such as Treadstone 71’s Cyber Intelligence Lifecycle, a definitive ‘beginners’ guide’ seems hard to find online, if one does indeed exist. The good news is anyone can develop effective OSINT capabilities, using tools that are freely available (there are always new ones to discover). Perhaps the only caveats here are the resources and background knowledge needed for doing it professionally.
The Techniques and Attributes
Basically OSINT is another term for research, and there’s far more to it than scraping data from whatever sources. The information has to be verified and pieced together, since the majority of it will be biased in some way. Analytical skills, information management and objectivity are essential to get decent results. The researcher must learn the context of each piece of information, and where it fits in a much larger picture.
Over the last couple of years I’ve adopted a couple of techniques that could be applied in practically any investigation. The first of these is building a timeline, which is the quickest and easiest way of putting data into context and gradually forming a reliable hypothesis from the beginning.
Relationship mapping is the other technique, where an object-oriented structure is developed that describes the environment around the subject, and the relationships between the people and entities the subject interacts with. What this does is enable the researcher to see the bigger picture and make assumptions. For example, the subject is likely to have characteristics that are common across the entities in its network, or perhaps the subject is influenced by other entities and events in ways that weren’t previously known. Paterva’s freely available Maltego Community Edition and Case File, which I’ve briefly played with, have been developed specially for this, but a conventional mindmapping program could also be used.
Legal and Ethical Stuff
The main thing to remember is OSINT is about examining information (and data) that’s public, and it should not involve invasions of privacy. A legitimate researcher must know where the line is drawn between OSINT and espionage, the latter including stuff like eliciting information, actual (illegal) network penetration and eavesdropping – in other words gaining information that hasn’t pro-actively been made public. If this is being done as part of a penetration test, the researcher should be aware this constraint doesn’t apply to most attackers, and that anything obtainable is fair game to them.
There are situations where gathering intelligence is the equivalent of playing with a hornets’ nest, a couple of examples being Op CARTEL, Op Darknet and basically anything that involves messing with organised crime. This kind of work requires experience and competence in a range of other areas, an understanding of how the players operate, a lot of preparation, and especially the backing of some authority. These are things to consider before doing this on a freelance basis.
Where to Start?
First we need a starting point, an identifier which is fairly unique, such as a username, email address or even just an IP address, which a profile can be built around. It’s much easier if the subject owns a domain (I’ll come to that).
If we’re researching an organisation, its web site is always the best place to gather all the initial information, such as email addresses, phone numbers, subdomains, job descriptions, information on recent procurements, supply chain, etc. Don’t forget to examine URLs and HTML code.
Search Engine Techniques
Conventional search engines like Bing, Google and IXQuick are always the next best source, and should always turn up more data than we’ve gathered so far. A little more digging is needed to pull information from the ‘Deep Web’ – the 90% of the web that’s not immediately accessible because of the way indexing, ranking and caching works. This is where ‘Google hacking’ and the ‘Advanced Search’ feature comes in. A few examples are:
"search term” site:domain.com
"search term 1" AND "search term 2" AND "search term 3"
site:domain.com | filetype:pdf
Then there are other ‘deep web’ and ‘reputation management’ search engines like SiloBreaker, WhosTalkin, Pipl, etc. I’ve found them only marginally more useful than ‘Google hacking’.
If the subject owns a site or blog, it’s worth using the URL as a search term. The results should include various other sites where that URL was posted, which could generate several other leads. The network of associates will reveal potential traits and attributes that may have been omitted from the subject’s profile.
Command Line and Domain Tools
Often the best information is derived using the old-fashioned command line tools, especially during the reconnaissance stage of a penetration test. Standard tools such as ping, traceroute, nslookup, whois, dig and even nmap are useful for footprinting, especially when combined with knowledge of how traffic is routed across the Internet. Many of these services are provided online by Robtex, CentralOps.net and ServerSniff.net. If the subject owns a domain or web site, a good place to start is with the common domain tools such as whois, nslookup and dig. It’s not unknown for a whois entry to reveal the full name and address of an individual site owner. IP address and domain searches may return different results, so it’s important to check both.
Other command line tools are wget and strings, which together can be used for pulling all the content off a web site and extracting metadata from stuff like images, documents and executable files. This by itself can reveal a host of other unvetted data the subject never intended to make public.
Physical Locations and Geodata
Coupled with ping, traceroute and whois, InfoSniper will pin down the approximate physical locations of servers. Whether that’s of any use depends on what’s being researched and why. If the subject (person or organisation) is running its own Internet services, maybe on a Wide Area Network, their physical locations will be available.
Of course, this could be taken a stage further with Google Maps and Street View, if a penetration test involves visiting any location in person.
A decade ago there were numerous obscure forums and chatrooms, on which people normally communicated under pseudonyms. Any third-party wanting to build a profile of someone had to know where to look. Now people are registering on just a handful of social networks under their full names, and this is where it’s possible to aggregate data from ready-made profiles containing personally identifiable information.
LinkedIn is kind of a double-edged sword. It’s valuable for professional networking and information exchange, and on the other hand a perfect tool for finding an entry point into an organisation, or to leverage the information in order to compromise it. The profiles of employees/members of a given organisation can reveal whether they have common skills in particular software applications, platforms and operating systems, the points of contact for the IT department, key personnel, etc. Status updates can reveal if the subject has posted using an iPhone or Android device, and whether those devices are being issued by the employer. Are there patterns in the timing and geolocation data that reveal a routine of some sort? Do the updates reveal when the subject is most likely to be online? Which employees are open to manipulation?
Basic Principles of Sock Puppeting
An entire book (or at least another blog post) could be written about the creation, development and deployment of artificial online personas, or ‘sock puppets’, as they recently become known. They are very commonly used for reasons other than OSINT, with varying levels of skill – PR, ‘perception management’, infiltration, industrial espionage, etc.
Because the use of sock puppets is about deception rather than anonymity, it’s a grey area. Again, remember the differences between OSINT, social engineering and espionage, and know where to draw the line.
I purposely avoided the term ‘fake identity’ because sock puppeting involves deploying actual personas that are carefully developed and span multiple online accounts just like any legit identity, so they’re quite real in themselves. For this to work, each persona must have a background, must be established well in advance of any investigation, and should blend in perfectly with everyone else in the ‘online environment’. Anyone who does get curious will hopefully be content with finding whatever profiles were planted as cover. Treadstone 71 recommends building comprehensive profiles with detailed histories, but my advice would be to keep things simple and consistent so there’s less room for error.
Where server logs might be available to the subject/adversary, the researcher must use further measures to mask identifying data using a proxy/privoxy combination. With everything set up properly, it should be very difficult for anyone to associate that persona with the researcher.