OR/MS Today - February 2009



Mining Terrorists


Can Data Mining Turn Up Terrorists?

Probably not, but operations research can still play a role in helping to uncover terrorist plots.

By John Hollywood, Kevin Strom and Mark Pope


Several months after the 9/11 attacks, the New York Times ran an article about a mysterious new Department of Defense program, soon to be known as "Total Information Awareness" (TIA) [1]. The ostensible purpose of the program was to capture vast amounts of electronic data and conduct data mining on it to find potential terrorist activity. The program became extremely controversial — its vision of analyzing large amounts of data on individuals' activities and transactions raised major privacy concerns — and was soon cancelled by Congress [2]. However, the dream of using data mining techniques to detect "patterns of data" and flag would-be terrorists has lived on for obvious reasons, given the horrific consequences of the 9/11 attacks. The National Research Council (NRC) has recently released a major report examining the use of data mining for counterterrorism purposes [3], making this a natural time to examine the questions — can you actually find terrorists with data mining? If not, why not, and what can be done instead?

The Data Mining Approach


The original vision of those favoring a "data mining approach" was to regularly run queries against multiple databases maintaining activity and transactional information on millions of individuals. The databases might have included financial databases (bank records, credit cards), phone records and travel records. The queries would search for records across the databases meeting criteria for being "of interest" to counterterrorism investigators.

The specific queries used would be built using conventional data mining approaches, including both supervised and unsupervised learning techniques. With supervised learning, one would train classification algorithms (using a training set which contains historical records from both actual terrorists and non-terrorists) to find rules linking certain values for fields in the databases with those possibly having terrorist intentions. An example might be a rule flagging individuals meeting the following criteria as possibly creating a truck bomb:

  • from a country designated as supporting terrorism;

  • purchases fertilizer (component of certain types of explosives); and

  • rents a truck within a few months of purchasing fertilizer.

With unsupervised learning, one would use anomaly-detection methods to find people engaging in abnormal behavior that, while not matching specific rules for terrorism, are "out of the ordinary" enough to be investigated.

The Fundamental Flaw


So, can you find terrorists with data mining? No. Or, at least, it's not very likely.

The fundamental flaw in conducting data mining against large transactional databases to identify terrorists is the false positive problem. Any realistic set of selection rules derived from data mining would almost certainly have a significant false positive error rate. As an example, consider a set of selection rules resulting from data mining that are 99 percent accurate in excluding false positives (the rules probably would be much less accurate, in practice); queries across databases with records on 200 million individuals would still falsely select 2 million individuals. Large numbers of false positives are acceptable when all they mean are receiving extra catalogs or getting messages from credit card companies asking to confirm recent purchases during one's vacation. Large numbers of false positives are quite a different story when labeling people as potential terrorists [4].

In addition to heightening massive privacy concerns, the large number of false positives would likely drown out the ability to detect actual terrorist activity. Terrorism in the United States is fortunately a rare event. As an example, the FBI's Terrorism 2002-2005 report listed only 24 acts of domestic terrorism between 2002 and 2005, and the majority of these were destructions of property by eco-terrorist groups [5]. The comparatively few real plots would be lost in the noise of millions of false positives and would prove impossible to investigate.

The NRC Report identifies several other practical problems with the data mining approach:

  • Given the rarity of terrorist attacks, training sets would likely have only a few instances of actual terrorism to use in generating selection rules, making the resulting selection rules extremely inaccurate. Further, since the rules are generated using historical data, the approach would miss new types of terrorist activity.

  • The use of anomaly-detection approaches is inherently flawed since there are so many ways people can behave anomalously that have nothing to do with terrorism: "People travel to places they haven't been before, make larger withdrawals of funds than they have before, buy things they haven't bought before, and they call and e-mail people whom they have not called or e-mailed before" [6, p. 196].

The NRC Report does not claim, however, that data-mining algorithms have no usefulness in counterterrorism. The report explicitly mentions two exemptions. The first applies to cases in which "good training data is available" — in other words, where there are very specific patterns of threatening behavior clearly linked to terrorist activity, based on prior attack history (or, alternately, expert judgment on what sorts of behavior would be clearly threatening). As an example, the NRC would support investigating flight school students who engage in the same type of behavior as the 9/11 hijackers. However, simply searching for very specific patterns of activity has an important drawback — it will miss activity by would-be terrorists who do not precisely match the patterns.

The second exemption applies to systems that generate extended social networks around a suspected terrorist — given that person A is a suspected terrorist, these systems search databases to find person A's recent activities, transactions and associates (such as who person A roomed with at an apartment or hotel or with whom person A owns property). Law enforcement analysts can use these results to guide subsequent investigations, with investigative results fed back into the extended networks. These systems can "grow" a great deal of useful information given an initial suspect "seed." The drawback is that they require an initial suspect, which immediately gets back to the question of how to find terrorist suspects in the first place.

What Can Be Done Instead?


The critical problems of conducting data mining of large databases for finding terrorists do not mean that there is no role for operations research in helping to uncover terrorist plots. Let us consider some alternate questions:

  • How have terrorist plots been discovered in the past, in practice? Here, "discovered" means finding the initial clue that led to a larger investigation that, in turn, foiled the plot.

  • What can be done to improve the ways in which initial clues are actually found?

As reported in the media, a good number of domestic terrorist plots have been foiled. These plots vary widely in terms of the actual maturity and threat posed; in some cases, whether these were actual terrorist plots is still in question, with investigations or trials still pending. Nonetheless, there have been enough cases to draw some general conclusions. Table 1 summarizes 25 recent disrupted terrorist plots reported by the media (the table distinguishes convictions from accusations), describing both the reported objectives of the plot and the initial clue leading to its foiling.

Table 1. Initial Clues Leading to the Foiling of 25 Reported Terrorist Plots
Plot Description Initial Clue
Yassin Aref and Mohammed Hossain. Convicted of plotting to use an RPG-7 to assassinate a Pakistani diplomat Intelligence. Aref's name and address found in notebook in northern Iraq, plus other classified intelligence
Russell Defreitas et al. Accused of plotting to blow up fuel pipelines and fuel tanks at John F. Kennedy airport in New York Intelligence. CIA operations in South America and the Carribean
Assem Hammoud et al. Accused of plotting to attack New York - New Jersey transit lines Intelligence. FBI monitoring of Internet chat rooms used by extremists
Iyman Farris: Convicted of plotting to destroy Brooklyn Bridge using blowtorches, as well as derail a DC area train Intelligence. Interviews of 9/11 mastermind Khalid Sheikh Mohammed (KSM) and searches of his residences
Dhiren Barot. Convicted of plotting to attack financial targets in New York, Washington, DC, and Newark, NJ, as well as UK targets Intelligence. Interviews with Khalid Sheikh Mohammed; Barot's memo on elementary bomb-making was found on a laptop in Pakistan
Abdulla Ahmed Ali et al. ("Liquid Explosives Plot"). Accused of plotting to destroy transatlantic airliners using liquid explosives; convicted of plotting a terrorist bombing campaign Discovery during police investigation. Ali's luggage was searched by UK police and found to contain suspicious material after his return from Pakistan (Ali was already under surveillance by police)
David Wayne Hull. Convicted of plotting to bomb abortion clinics Discovery during police investigation. Explosives construction and plots found by informant during investigation of Hull
William Joseph Krar. Convicted of plotting to weaponize cyanide gas Discovery during police investigation. FBI search of residence subsequent to Krar's arrest for delivering false ID badges
Seas of David group. Accused of plotting to blow up the Sears Tower and FBI headquarters Discovery during police investigation. Group leader asked an undercover FBI agent he thought was affiliated with Al Qaeda for assistance
Syed Haris Ahmed and Ehsanul Islam Sadequee. Accused of videotaping US Capitol and World Bank and sharing tapes with a suspected overseas terrorist, as well as discussing various terror plots against US targets Discovery during police investigation. Identified by law enforcement when they met with three Canadians already under investigation for suspected terrorist activities
Sean Michael Gillespie. Convicted of plotting attacks on Jewish sites Discovery during police investigation. Investigation subsequent to being arrested for firebombing an OK city synagogue
Robert J. Goldstein et al. Convicted of plotting to attack the Islamic Center of Pinellas County, FL Discovery during police investigation. Local police discovered weapons and mission statement for attack during a call for a domestic dispute
Jamiyyat Ul-Islam Is-Saheeh group. Convicted of plotting to attack Los Angeles Army National Guard facilities, synagogues, and other California targets Discovery during police investigation. Local police investigation subsequent to members being arrested for armed robberies of gas stations
Ronald Allen Grecula. Convicted of attempting to provide an IED to "Al Qaeda" Tip reporting a plot. A confidential source informed the DEA about Grecula's intentions
Stephen John Jordi. Convicted of plotting to bomb abortion clinics Tip reporting a plot. Brother alerted FBI of Jordi's plans
Project 7 Militia. Accused of plotted assassinations of state and local officials to start an anti-government war; convicted of various conspiracy and weapons charges Tip reporting a plot. County sheriff approached by member of group who offered to be an informant
Paul Douglas Revak. Accused of plotting to bomb the USCG station in Bellingham, WA; convicted of "threatening to use a weapon of mass destruction" Tip reporting a plot. Fellow student at Western Washington University called authorities after Revak tried to recruit him to assist
Michael C. Reynolds. Convicted of plotting to destroy pipelines and a New Jersey refinery Tip reporting a plot. Shannen Rosmiller met Reynolds on-line, through her private efforts in monitoring extremist web sites to find potential terrorists
Gale William Nettles. Convicted of plotting to assist in blowing up the Dirksen Federal Building in Chicago Tip reporting a plot. Tip from prisoner incarcerated with Nettles
Ahmed Ressam ("Millennium Plot"). Convicted of plotting to bomb Los Angeles International Airport Police action in response to suspicious activity. US Customs Agent noticed suspicious activity by Ressam and had his car searched at Port Washington, WA
Islamic Jihad Group members. Accused of plotting to destroy US military facilities in Germany Police action in response to suspicious activity. Suspects discovered conducting surveillance of US military facilities in Hanau, Germany
"Fort Dix Plot" group. Convicted of plotting to attack service members at Ft. Dix, NJ Tip reporting suspicious activity. Circuit City employee reported video of group members firing weapons and calling for jihad (group members had given the employee the videotape to burn it to DVD)
Demetrius Van Crocker. Convicted of plotting to use explosives and Sarin against US targets Tip reporting suspicious activity. Informant alerted authorities of Crocker's "anti-government rants"
Mohammad Zaki Amawi, Marwan Othman El-Hindi, Zand Wassim Mazloum. Convicted of plotting to build IEDs to attack US forces in Iraq Tip reporting suspicious activity. Tips from community about men as well as assistance from an informant
James Elshafay and Shahawar Martin Siraj. Convicted of plotting to bomb a New York City subway station during the Republican National Convention Tip reporting suspicious activity. Tip to NYPD terrorism hotline about Siraj's "virulent anti-American tirades"

Of these 25 reported foiled plots, only five (20 percent) of the initial clues came from intelligence operations (from the FBI, DoD or CIA). Eight initial clues (32 percent) came from unexpected discoveries made during police investigations. Six (24 percent) came from tips reporting a potential plot to law enforcement. Finally, six initial clues came from following up on suspicious activity — two (8 percent) from direct police action in response to observing suspicious activity and four (16 percent) from following up on tips reporting suspicious activity. In summary, the large majority of the initial clues came from observing, reporting and properly acting on behavior of concern, including both directly threatening behavior (such as openly discussing planning terror attacks or finding bomb parts during routine police investigations) and suspicious activity (such as conducting target site surveillance).

To reconsider the earlier data mining example, simply conducting mass searches linking a person from a country of interest to a vehicle rental and a fertilizer purchase with no further information is both difficult and likely to lead to numerous false positives. In contrast, suppose a local police department receives a report about a person attempting to purchase two tons of fertilizer while making it clear that he or she knows nothing about farming or landscaping. Further, suppose that, in the investigation of the suspicious activity report, the person attempting to make the purchase was on a watch list. These findings genuinely justify a follow-up investigation.

Note:

The incidents listed in the table on page 23 were initially identified in the following sources, along with two other cases widely reported in the national media ("Millennium Plot" and Hanau, Germany plot):
  1. Office of the Press Secretary of the President, "Fact Sheet: Plots, Casings and Infiltrations Referenced in President Bush's Remarks on the War on Terror," Oct. 6, 2005.

  2. U.S. Federal Bureau of Investigation.

  3. James Jay Carafano, "U.S. Thwarts 19 Terrorist Attacks Against America Since 9/11," Washington, D.C, The Heritage Foundation, Backgrounder No. 2085, Nov. 13, 2007.

For a complete list of references for the incidents, see the online version of this issue of OR/MS Today at www.lionhrtpub.com.


What Are The Challenges?


Significant security benefits are likely to result from focusing on ways to improve the observation, reporting and handling of behavior of concern. Two steps need to happen for behavior of concern to become an initial clue. The first step is having someone observe and report suspicious or clearly threatening activity, (i.e., attack plans), and report it; the "someone" can be anyone ranging from a police officer to a security guard to a member of the general public. The second step is having a law enforcement agency recognize the report as significant enough to warrant an investigation. There are multiple types of challenges involved in having these steps take place. These include challenges related to:
  • People — how well individuals involved are trained to recognize and report on behavior of concern, including law enforcement, security guards and the general public;

  • Process — how well processes, if they exist, capture and assess reports of behavior of concern;

  • Organization — how well the law enforcement organizations involved are structured to capture and assess the reports; and

  • Technology — how effective information technology methods and tools are at filtering, storing and analyzing the reporting.

With respect to "people," training of both law enforcement personnel and the general public is critical if suspicious activity reports are to be made in the first place. The importance of training is evidenced by the "Millennium Plot" to blow up Los Angeles International Airport, in which an alert border agent picked up on the suspect's suspicious behavior, helping to lead to his detainment [7]. Conversely, prior to the 2002 Paradise Hotel bombing in Kenya, a farmer saw the SUV that would carry out the attack and noted the occupants behaving suspiciously, but did not know of any way to report it [8].

Even when reports are made, process and organizational shortfalls cause the significance of these reports to be overlooked or diminished. For example, it has been reported that the CIA and FBI failed to share information that two men with terrorist connections had entered the United States. These two individuals, Khalid al-Midhar and Nawqa Alhazmi, went on to help carry out the 9/11 attacks [9]. Similarly, FBI field offices made several pre-9/11 reports of suspicious activity by students at American flight schools, but these reports did not trigger further investigations [10]. To strengthen the reporting of suspicious activity, a partnership of federal agencies and major city police departments has developed recommendations for nationwide guidelines for preparing and sharing suspicious activity reports across local, state, and federal lines [11].

Finally, while technology cannot stop a terrorist attack by itself, it can play a key role in managing data efficiently and in filtering and analyzing incoming reports. Since the volume of data that must be filtered often exceeds human capabilities, (for example, millions of 911 calls per year in a major urban area), automated tools are needed to identify, link and prioritize cases of interest. Pressing technology problems that law enforcement agencies face include: understanding what current data filtering and searching tools can do, how these tools can best be tailored to fit into their operational analysis processes, and how they can be improved.

How Can Operations Researchers Help?


Operations researchers could help make significant contributions responding to these challenges. Key research questions to address include:

  • People: What sorts of training to recognize and report behavior of concern should be provided to groups ranging from law enforcement analysts to the general public? What material should be covered, and in what formats?

  • Process and Organization: What organizational structures and processes are best suited to sharing and assessing the reports? What policies should be used in assessing the reports, determining which ones are likely false positives and which warrant follow-up investigations?

  • Technology: What are the best approaches (including algorithms) to find and assess reports on behavior of concern — especially if they are hidden in much larger databases and repositories? What architectures should be used to store, share and manage counterterrorism data — both the initial reports and the reports of subsequent investigations?

As an example of how operations research can help in the technology area, consider that there are many types of "suspicious activity reports" that law enforcement personnel encounter that do not have formal labels tying them to terrorism. These include 911 calls, non-emergency police calls and private security "suspicious activity" reports broadly linked to crime (trespassing, theft). These reports include instances of behavior potentially related to terrorism; however, they were not initially recognized and reported as such. Technological tools (and underlying algorithms) are needed to find and assess the relevant records. The authors have had some success in analyzing structured data and free text in 911 call records from Washington, D.C., reporting suspicious activity, using expert query and text mining methods to find potential instances of target surveillance and probing, and assessing the resulting risks to city landmarks [12].

In addition, operations research could be of particular value in assessing the tradeoffs involved in responding to these challenges. For example, consider: How assertive should we train people to be in reporting behavior of concern? Clearly, being too conservative can lead to missing plots, but being too aggressive in reporting will lead to volumes of false positives, as well as missing plots due to the noise of the false positives. Similar tradeoffs apply in setting criteria to determine which reports are worth further investigation and which are not. In these and many other areas, operations researchers have potentially key roles to play in finding terrorists. We just won't be mining the American public's personal information to do it.




John S. Hollywood (jhollywood@rti.org), a research scientist with RTI International, has research interests that include applying predictive analysis, program evaluation and systems engineering approaches to improving counterterrorism and criminal justice efforts.

Kevin J. Strom (kstrom@rti.org), a senior research scientist with RTI International, has research interests that include law enforcement responses to community violence, trends and causes of interpersonal violence and interagency coordination in response to terrorism.

Mark W. Pope (mpope@rti.org), a research analyst with RTI International, has research interests that include using information technology to develop data-driven solutions for crime and terrorism, law enforcement and homeland security preparedness and prisoner reentry. The authors would like to acknowledge the National Institute of Justice (http://www.ojp.usdoj.gov/nij/) and the Institute for Homeland Security Solutions (https://www.ihssnc.org/). Some of the results in this article are taken from research the authors conducted for these organizations.


References


  1. John Markoff, "Chief Takes Over at Agency to Thwart Attacks on U.S.," New York Times, Feb. 13, 2002.
  2. 149 Cong. Rec. H8755 — H8771 (Sept. 24, 2003).
  3. National Research Council Committee on Technical and Privacy Dimensions of Information for Terrorism Prevention and Other National Goals, "Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment," Washington, D.C.: National Academies Press, 2008.
  4. The false positive problem in the counterterrorism context has been discussed for some time. See, for example, the following discussion from 2003, soon after TIA was revealed: Edelstein, Herb. "TIAin't: Data Mining in Depth," DM Review Magazine, April 2003.
  5. U.S. Federal Bureau of Investigation, "Terrorism 2002-2005," 2006.
  6. National Research Council, p. 196.
  7. WGBH Educational Foundation, "Ahmed Rassam's Millennium Plot," Frontline, 2008.
  8. Emily Wax, "Kenyan Farmer Spotted Bombers," Washington Post, Dec. 2, 2002.
  9. David Johnston, "9/11 Congressional Report Faults FBI — CIA Lapses," New York Times, July 24, 2003.
  10. Philip Shenon, "Traces of Terrorism: The Warnings; FBI Knew for Years About Terror Pilot Training," New York Times, May 18, 2002.
  11. Suspicious Activity Report Support and Implementation Project, "Findings and Recommendations of the Suspicious Activity Report Support and Implementation Project," October 2008.
  12. John Hollywood, Kevin Strom and Mark Pope, "Using 911 Calls to Identify Potential Instances of Terrorist Surveillance," The Police Chief, Vol. 75, No.10, pp. 160-165, October 2008.





  • Table of Contents
  • OR/MS Today Home Page


    OR/MS Today copyright © 2009 by the Institute for Operations Research and the Management Sciences. All rights reserved.


    Lionheart Publishing, Inc.
    506 Roswell Rd., Suite 220, Marietta, GA 30060 USA
    Phone: 770-431-0867 | Fax: 770-432-6969
    E-mail: lpi@lionhrtpub.com
    URL: http://www.lionhrtpub.com


    Web Site © Copyright 2009 by Lionheart Publishing, Inc. All rights reserved.