|
OR/MS Today - February 2009 Mining Terrorists Can Data Mining Turn Up Terrorists? Probably not, but operations research can still play a role in helping to uncover terrorist plots. By John Hollywood, Kevin Strom and Mark Pope Several months after the 9/11 attacks, the New York Times ran an article about a mysterious new Department of Defense program, soon to be known as "Total Information Awareness" (TIA) [1]. The ostensible purpose of the program was to capture vast amounts of electronic data and conduct data mining on it to find potential terrorist activity. The program became extremely controversial its vision of analyzing large amounts of data on individuals' activities and transactions raised major privacy concerns and was soon cancelled by Congress [2]. However, the dream of using data mining techniques to detect "patterns of data" and flag would-be terrorists has lived on for obvious reasons, given the horrific consequences of the 9/11 attacks. The National Research Council (NRC) has recently released a major report examining the use of data mining for counterterrorism purposes [3], making this a natural time to examine the questions can you actually find terrorists with data mining? If not, why not, and what can be done instead? The Data Mining Approach The original vision of those favoring a "data mining approach" was to regularly run queries against multiple databases maintaining activity and transactional information on millions of individuals. The databases might have included financial databases (bank records, credit cards), phone records and travel records. The queries would search for records across the databases meeting criteria for being "of interest" to counterterrorism investigators. The specific queries used would be built using conventional data mining approaches, including both supervised and unsupervised learning techniques. With supervised learning, one would train classification algorithms (using a training set which contains historical records from both actual terrorists and non-terrorists) to find rules linking certain values for fields in the databases with those possibly having terrorist intentions. An example might be a rule flagging individuals meeting the following criteria as possibly creating a truck bomb:
With unsupervised learning, one would use anomaly-detection methods to find people engaging in abnormal behavior that, while not matching specific rules for terrorism, are "out of the ordinary" enough to be investigated. The fundamental flaw in conducting data mining against large transactional databases to identify terrorists is the false positive problem. Any realistic set of selection rules derived from data mining would almost certainly have a significant false positive error rate. As an example, consider a set of selection rules resulting from data mining that are 99 percent accurate in excluding false positives (the rules probably would be much less accurate, in practice); queries across databases with records on 200 million individuals would still falsely select 2 million individuals. Large numbers of false positives are acceptable when all they mean are receiving extra catalogs or getting messages from credit card companies asking to confirm recent purchases during one's vacation. Large numbers of false positives are quite a different story when labeling people as potential terrorists [4]. In addition to heightening massive privacy concerns, the large number of false positives would likely drown out the ability to detect actual terrorist activity. Terrorism in the United States is fortunately a rare event. As an example, the FBI's Terrorism 2002-2005 report listed only 24 acts of domestic terrorism between 2002 and 2005, and the majority of these were destructions of property by eco-terrorist groups [5]. The comparatively few real plots would be lost in the noise of millions of false positives and would prove impossible to investigate. The NRC Report identifies several other practical problems with the data mining approach:
The NRC Report does not claim, however, that data-mining algorithms have no usefulness in counterterrorism. The report explicitly mentions two exemptions. The first applies to cases in which "good training data is available" in other words, where there are very specific patterns of threatening behavior clearly linked to terrorist activity, based on prior attack history (or, alternately, expert judgment on what sorts of behavior would be clearly threatening). As an example, the NRC would support investigating flight school students who engage in the same type of behavior as the 9/11 hijackers. However, simply searching for very specific patterns of activity has an important drawback it will miss activity by would-be terrorists who do not precisely match the patterns. The second exemption applies to systems that generate extended social networks around a suspected terrorist given that person A is a suspected terrorist, these systems search databases to find person A's recent activities, transactions and associates (such as who person A roomed with at an apartment or hotel or with whom person A owns property). Law enforcement analysts can use these results to guide subsequent investigations, with investigative results fed back into the extended networks. These systems can "grow" a great deal of useful information given an initial suspect "seed." The drawback is that they require an initial suspect, which immediately gets back to the question of how to find terrorist suspects in the first place.
As reported in the media, a good number of domestic terrorist plots have been foiled. These plots vary widely in terms of the actual maturity and threat posed; in some cases, whether these were actual terrorist plots is still in question, with investigations or trials still pending. Nonetheless, there have been enough cases to draw some general conclusions. Table 1 summarizes 25 recent disrupted terrorist plots reported by the media (the table distinguishes convictions from accusations), describing both the reported objectives of the plot and the initial clue leading to its foiling.
Of these 25 reported foiled plots, only five (20 percent) of the initial clues came from intelligence operations (from the FBI, DoD or CIA). Eight initial clues (32 percent) came from unexpected discoveries made during police investigations. Six (24 percent) came from tips reporting a potential plot to law enforcement. Finally, six initial clues came from following up on suspicious activity two (8 percent) from direct police action in response to observing suspicious activity and four (16 percent) from following up on tips reporting suspicious activity. In summary, the large majority of the initial clues came from observing, reporting and properly acting on behavior of concern, including both directly threatening behavior (such as openly discussing planning terror attacks or finding bomb parts during routine police investigations) and suspicious activity (such as conducting target site surveillance). To reconsider the earlier data mining example, simply conducting mass searches linking a person from a country of interest to a vehicle rental and a fertilizer purchase with no further information is both difficult and likely to lead to numerous false positives. In contrast, suppose a local police department receives a report about a person attempting to purchase two tons of fertilizer while making it clear that he or she knows nothing about farming or landscaping. Further, suppose that, in the investigation of the suspicious activity report, the person attempting to make the purchase was on a watch list. These findings genuinely justify a follow-up investigation.
What Are The Challenges? Significant security benefits are likely to result from focusing on ways to improve the observation, reporting and handling of behavior of concern. Two steps need to happen for behavior of concern to become an initial clue. The first step is having someone observe and report suspicious or clearly threatening activity, (i.e., attack plans), and report it; the "someone" can be anyone ranging from a police officer to a security guard to a member of the general public. The second step is having a law enforcement agency recognize the report as significant enough to warrant an investigation. There are multiple types of challenges involved in having these steps take place. These include challenges related to:
With respect to "people," training of both law enforcement personnel and the general public is critical if suspicious activity reports are to be made in the first place. The importance of training is evidenced by the "Millennium Plot" to blow up Los Angeles International Airport, in which an alert border agent picked up on the suspect's suspicious behavior, helping to lead to his detainment [7]. Conversely, prior to the 2002 Paradise Hotel bombing in Kenya, a farmer saw the SUV that would carry out the attack and noted the occupants behaving suspiciously, but did not know of any way to report it [8]. Even when reports are made, process and organizational shortfalls cause the significance of these reports to be overlooked or diminished. For example, it has been reported that the CIA and FBI failed to share information that two men with terrorist connections had entered the United States. These two individuals, Khalid al-Midhar and Nawqa Alhazmi, went on to help carry out the 9/11 attacks [9]. Similarly, FBI field offices made several pre-9/11 reports of suspicious activity by students at American flight schools, but these reports did not trigger further investigations [10]. To strengthen the reporting of suspicious activity, a partnership of federal agencies and major city police departments has developed recommendations for nationwide guidelines for preparing and sharing suspicious activity reports across local, state, and federal lines [11]. Finally, while technology cannot stop a terrorist attack by itself, it can play a key role in managing data efficiently and in filtering and analyzing incoming reports. Since the volume of data that must be filtered often exceeds human capabilities, (for example, millions of 911 calls per year in a major urban area), automated tools are needed to identify, link and prioritize cases of interest. Pressing technology problems that law enforcement agencies face include: understanding what current data filtering and searching tools can do, how these tools can best be tailored to fit into their operational analysis processes, and how they can be improved.
As an example of how operations research can help in the technology area, consider that there are many types of "suspicious activity reports" that law enforcement personnel encounter that do not have formal labels tying them to terrorism. These include 911 calls, non-emergency police calls and private security "suspicious activity" reports broadly linked to crime (trespassing, theft). These reports include instances of behavior potentially related to terrorism; however, they were not initially recognized and reported as such. Technological tools (and underlying algorithms) are needed to find and assess the relevant records. The authors have had some success in analyzing structured data and free text in 911 call records from Washington, D.C., reporting suspicious activity, using expert query and text mining methods to find potential instances of target surveillance and probing, and assessing the resulting risks to city landmarks [12]. In addition, operations research could be of particular value in assessing the tradeoffs involved in responding to these challenges. For example, consider: How assertive should we train people to be in reporting behavior of concern? Clearly, being too conservative can lead to missing plots, but being too aggressive in reporting will lead to volumes of false positives, as well as missing plots due to the noise of the false positives. Similar tradeoffs apply in setting criteria to determine which reports are worth further investigation and which are not. In these and many other areas, operations researchers have potentially key roles to play in finding terrorists. We just won't be mining the American public's personal information to do it.
OR/MS Today copyright © 2009 by the Institute for Operations Research and the Management Sciences. All rights reserved. Lionheart Publishing, Inc. 506 Roswell Rd., Suite 220, Marietta, GA 30060 USA Phone: 770-431-0867 | Fax: 770-432-6969 E-mail: lpi@lionhrtpub.com URL: http://www.lionhrtpub.com Web Site © Copyright 2009 by Lionheart Publishing, Inc. All rights reserved. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||