Over the past month I’ve spent some time looking at intelligent fraud and anomaly detection systems, authoring a journal paper comparing a handful of methods, and more recently focussing my attention on Detica’s systems. Plus I’m working with someone to develop a multi-featured case management system for tracking malware, to save us switching between applications.
But anyway, the technologies for intelligence gathering are actually far beyond what Glenn Greenwald published from the rather outdated Snowden archive, and it’s not simply about the warehousing of intercepted data. Stream mining ‘digests’ the data in real-time, deriving from it information that analytics systems can evaluate and contextualise without human intervention. The analyst works on the end product of this. Another thing worth mentioning about the Snowden/NSA thing is that nobody’s quite sure what’s retained or discarded in the stream mining process.
Analytics has been deployed long before the ‘threat intelligence’ snake oil industry materialised, and it has uses in preventing or mitigating real threats. For example, the protection of bank accounts over perhaps the last two decades, which is where my interest in this began. An advanced field-tested detection system would also have prevented victims of identity theft being wrongly associated with Operation Ore between 1999 and 2002. Alert Logic’s own service, again a much-needed system for detecting genuine threats, was built around a core system dating from the late 90s.
I’ve singled out Detica’s NetReveal for reasons that should become obvious, the primary one being it appears the most advanced I’ve come across after much digging around.
After being first deployed in 2005 as a proof-of-concept system for the Insurance Fraud Bureau, NetReveal was adopted by AXA, Zurich, Nationwide and several other major financial institutions, so it therefore might be the very system I was looking for when writing up the review paper.
Anti Money Laundering (AML) is actually just one of the ‘use cases’ for NetReveal, and one application where the capabilities are truly tested – the whole point of money laundering is to get money from one place to another without authorities knowing, typically by disguising transactions among legitimate or routine activities. I’ve seen real-world examples of this in the past: One involved ordering surplus on behalf of an employer, selling it then pocketing the money. Another example involved non-existent employees on the books, all presumably with the same bank account created by the fraudster. Of course, this probably continued for years after I resigned, because management only saw payrolls and accounts. They didn’t give minimum-waged employees the time of day, and so didn’t have the situational awareness for spotting the discrepancies.
So the problem NetReveal must solve is quite complex. How can it discern fraudulent transactions from legitimate transactions? How can transactions be associated with seemingly unconnected events? How could a system identify a suspicious event and tie it to a sequence of other events? More importantly, how can the system be made to work in real-time?
What I’ve found is there are two broad categories of fraud detection. First there are the rule-based, signature-based and expert systems – these tend to be static, comparing current transactions with signatures of known fraud cases. While these are fast and efficient, they’re less reliable. Secondly there are deeper analysis methods such as clustering, pattern recognition and Bayesian systems – these are more adaptive and thorough, but computationally more expensive.
So what NetReveal does is take the raw data from whatever sources, categorise them into entities, perform some analysis, construct a relational map and determine the weighting of each link. The latter two stages are possible with Maltego Casefile and Palantir anyway, as a very simple but highly effective method of revealing patterns an intelligence analyst would otherwise miss. The following screenshot is an example from my malware tracking project that would, with a much larger database, be useful in attributing malware and incidents to known ‘actors’:
On the analytics side it appears a hybrid of several intelligent fraud detection systems, the specifics I won’t reveal here as they’re also used by electronic payment systems and have fundamental limitations that aren’t easily resolved. What I could reveal is that Detica was rather ahead of the curve, as the research papers I’ve found that proposed hybrid systems were mostly published after 2009. Alert Logic, an entirely different company also dealing with vast amounts of data, also appears to have followed the hybrid model.
From what I can determine, NetReveal is a modular system that can be remixed for whatever customer, with the following three ‘components’:
* Detection Modules
* Analysis Modules
* Investigation Modules
NetReveal also got reworked specifically for ‘cyber threats’, in the form of the CyberReveal product.
Detection modules appear to provide the basic rule-based system that’s highly efficient at handling real-time data. Its function is mainly to flag anything deemed as suspicious or matching predefined rules, and could be used to filter out redundant data from the sources to reduce load on the analytics engine(s).
Analysis modules are essentially a highly advanced form of analytics engine, doing stuff that’s computationally more expensive and time-consuming. It analyses transactions after they have been completed, possibly adapting the detection modules.
Investigation modules appear to provide a glorified search engine and visualisation thing, which is pretty much what you get with Casefile. Whereas information is entered manually for Casefile, Detica’s eye candy is presenting a higher volume of information from an advanced back-end.