Speaker "Siobhan Mcnamara" Details Back



Machine Learning for Identity Verification


I have been working on something we term the ‘dead zone’. The dead zone is where we score messages we have little prior information on. In order for machine learning models to be effective they are designed on past knowledge of events. What if you have no past knowledge? With respect to fraud we rarely have past knowledge. It is not static, criminals are changing their behavior to circumvent new obstacles. Their sophistication is always growing and we can easily get caught in a game of cat and mouse. In addition to that the vast majority of emails are good, it is only the few that are potentially malicious. 

In order to combat this we design a hierarchy of machine learning models, heuristics and rules to score various nuance associated with an email. By capturing and appropriately measuring the risk inherent in communication behavior and networks we can verify the trust and authenticity that should be ascribed to a message. However as with any models there is an error rate. In contrast to many applications, with respect to fraud the tolerance for error is close to 0. If we cast a wide net and quarantine all emails that induce doubt we will be wrong a lot of the time. In that scenario credibility in the trust scores of our solution diminishes until it is basically void. On the other hand if we miss malicious messages we are equally as redundant. 

In the dead zone our models have nothing in the past to learn from, our rules and heuristics fall short here. These are messages from domains never seen or rarely seen before. They established no relationships, there is no norm for their archictecture. We give this section a modest score. Inherent in a modest score are some misses, some malicious emails get passed us with a low but not an ‘untrustworthy’ score. The goal in this context is generate features which might enable you to score on some aspects of these messages or develop a better prior for their risk probability. 

Research in the dead zone has lead to the uncovering of new trends in malicious emails we had not previously been able to measure. In particular we have identified a number of campaign techniques. Unlike spam these are target, sophisticated, micro campaigns which appear to real messages. 

I would like to present on the various types of malicious campaigns we have identified, how we have come to measure them in a non static threat environment and how that has changes the nature of our dead zone. 


I am a Data scientist applying machine learning for email security and fraud detection at Agari. My background is in economics and behavioral research with a particular focus on risk perception and the behavioral nuances involved in decision making under risk and uncertainty.