What is Data Mining?
Data Mining Defined: Data mining software sifts through very large volumes of data in search of patterns among the data. Kurt Thearling, an expert on data mining, calls it "the automated extraction of hidden predictive information from databases."
Data mining, which is a version of artificial intelligence, has primarily been used to analyze business and scientific data. Data mining can also be used in political campaigns to uncover trends and patterns among voters.
After the 2001 9/11 attacks, the U.S. government grew interested in potential applications of data mining techniques to counterterrorism.
What is "data" and how is it used?: Data are numerical or other facts that are collected for the purpose of analysis. Phone numbers and sale totals are data; so are names and places. When you go to a store and the sales checkout person asks for your phone number or zip code, they are collecting data that will help that store understand buying patterns, such as how many other people in your zip code bought the same item, or spent the same amount of money you did at the store? Data mining finds these patterns and permits businesses to make predictions about how buyers in that same area code will behave in the future.
How Data Mining Works: Data mining uses a variety of mathematical algorithms to analyze historical data. The results of this analysis are then used to build models based on real world behavior, which are in turn used to analyze incoming data and make predictions about future behavior.
The government would like to apply these algorithms to intelligence questions. Take the real world example of the 9/11 attacks. There is historical data about these attacks: a number of foreign men applied to flight school in the United States as part of their attack planning. Based on this historical data, a model can be created that collects data on all of the future applicants to flight schools to see whether any are foreign, or share other characteristics.
Data Mining Techniques and Counterterrorism:The 9/11 attacks of 2001 aroused increased government interest in technological approaches to preventing terrorism and brought it into public view. In February, 2002, the U.S. Office of Science and Technology Policy convened government representatives and industry leaders to discuss how they could use data mining as a counterterrorism tool.
Actually, though, interest in data mining began before September 11, 2001: In the late 1990s, the Department of Defense authorized a data mining program called Able Danger, that was used to gather counterterrorism information, including information about Al Qaeda, from late 1998 through early 2001.
Is Data Mining a Productive Counterterrorism Tool?
At present, probably not. Here are a few of the challenges preventing data mining from being a truly useful tool:
There is Not Enough Data to Indicate Patterns of Terrorist Behavior: Data mining is a technique best reserved for large data sets. There is not enough data about terrorism or terrorists to build a good predictive model.
While many, many people shop for groceries in a day, not very many plan or execute terrorist attacks. This means there is very little data on which to base a model, which in turn means that the data being used to build a model does not represent a pattern of behavior, but a one or two time event.
Put this model to work, and it is more likely to be profiling than it is finding instances of a pattern that will reveal terrorists at work. For example, if we were to build a model predicting a terrorist attack based on 9/11, we might create a model that would correlate male Arab men who had attended flight school with purchases of airline tickets. But there is no evidence whatever that this behavior is a pattern, rather than a one time event.
And, if there were someone like a Timothy McVeigh, an American born man who destroyed the Alfred P. Murrah Federal Building in 1995, preparing another attack with a truck bomb made with purchased fertilizer, he would be completely missed.
The use of data mining to track activities that may indicate a terrorist cell probably has a higher chance of success than that seeking to predict an attack.
Anomalies are not Anomalous Unless There is a Pattern: If there is no typical profile of terrorist behavior provided by data, then perhaps terrorists can be identified because they don't act the way that most people behave. However, this is neither useful, nor in keeping with traditional American individualist ideals. Americans have never valued homogeneity and it is somehow offensive to start setting standards for behavior on the basis of how much each American acts like every other one. As a 2006 Cato Institute study suggests, it would also not be very productive: "Terrorists could defeat it by acting as normally as possible." (Effective Counterterrorism and the Limited Role of Predictive Data Mining , Jeff Jonas and Jim Harper, 2006).
The Government Must Use a Great Deal of Private Information: The biggest complaint about data mining, whether by businesses or governments, is that it violates our privacy by collecting and examining a great deal of information about us. Government intrusion seems especially egregious to us because it is, after all, the government that is supposed to protect citizens' right to privacy.