by Farhad Manjoo
On Aug. 28, 2001, a 33-year-old Egyptian flight-school student named Mohamed Atta walked into a Kinko's copy shop in Hollywood, Fla., and sat down at a computer with Internet access. He logged on to American Airlines' Web site, punched in a frequent-flyer account number he'd signed up for three days before, and ordered two first-class, one-way e-tickets for a Sept. 11 flight from Boston to Los Angeles. Atta paid for the tickets -- one of which was for Abdulaziz Alomari, a Saudi flight student also living in Florida -- with a Visa card he had recently been issued.
The next day, Hamza Alghamdi, a Saudi man who was also training to become a pilot, went to the same Kinko's. There, he used a Visa debit card to purchase a one-way seat on United Airlines Flight 175, another Sept. 11 flight from Boston to Los Angeles. The day after that, Ahmed Alghamdi, Hamza's brother, used the same debit card to purchase a business-class seat on Flight 175; he might have done it from the Hollywood Kinko's, too. And at around the same time, all across the country, 15 other Arab men, several of them flight students, were also buying seats on California-bound flights leaving on the morning of Sept. 11. Six of the men gave the airlines Atta's home phone number as a principal point of contact. Some of them paid for the seats with the same credit card. A few used identical frequent-flyer numbers.
It's now obvious that there was a method to what the men did that August; had someone been on their trail, their actions would have seemed too synchronized, and the web of connections between them too intricate, to have been dismissed as mere coincidence. Something was up. And if the authorities had enjoyed access, at the time, to the men's lives -- to their credit card logs, their bank records, details of their e-mail and cell phone usage, their travel itineraries, and to every other electronic footprint that people leave in modern society -- the government might have seen in the disparate efforts of 19 men the makings of the plot they were to execute on Sept. 11, 2001. Right?
We could have predicted it.
That's the underlying assumption of Total Information Awareness, a new Defense Department program that aims to collect and analyze mountains of personal data -- on Americans as well as foreigners -- in the hope of spotting the sort of "suspicious" behavior that preceded the attacks on New York and Washington. The effort, sponsored by the Defense Advanced Research Projects Agency, or DARPA, is at this point only a vaguely defined research project; officials at the agency have so far declined to fully brief the public on the program and its potential cost, and the few documents made available have stressed that technologists will need several years to achieve many of TIA's goals.
Civil libertarians, not unexpectedly, are already raising a ruckus, their temper brought to a flaring point by the appointment of the man tapped to head the agency: John Poindexter, Ronald Reagan's national security advisor, who was convicted -- though, on appeal, acquitted -- of lying to Congress during the Iran-Contra scandal. The invasion of privacy threatened by the name "Total Information Awareness" itself is also sure to raise constitutional questions. But computer scientists who specialize in the kinds of technologies necessary to make something like TIA work are intrigued -- even as they express concern. For some, the threat posed by terrorism is so great that the need for a comprehensive response can be equated to the need for the Manhattan Project. It's a comparison meant to convey both how dangerous and how vital to our society constant data collection may be.
"Frankly, I don't see any other way for us to survive as a civilization," says Jeffrey Ullman, a computer scientist at Stanford University and an expert on database theory. "We're heading for a world where any creep with a grudge can build himself a dirty bomb. Al-Qaida has just broken new ground, but you can't see these things as a unique phenomenon. We have to have in place a system that makes it very hard for individuals anywhere to do such things."
But can a system like TIA ever work? There are obvious, huge technical problems, including the sheer amount of data that will have to be analyzed; the difficulties in integrating disparate databases; and the challenge of predicting unprecedented terrorist threats. The whole idea might seem, to a non-expert, like just another unwieldy, expensive and dangerous bit of American military excess.
Specialists in "data mining" technologies, including people who are critical of the Bush administration, are, however, guardedly hopeful. They worry about many aspects of such a program: the "false positives," the harm to privacy, the possibility that personal information will be misused, the almost inevitable codification of racial and religious profiling. They stress that there should be strict laws governing the collection of data. But most of them think it could work and should at least be researched. It's a conclusion that on at least one level is not too surprising: Funding for TIA means more funding for computer scientists.
Public outcry has so far been muted. People already feel constantly monitored, and one may wonder why the FBI shouldn't know you prefer Paul Newman's brand of marinara when your supermarket is well aware that you do. Privacy experts provide an obvious response: Your supermarket can't put you in jail. They also say that it's still early and that once the scope of TIA becomes widely known, there will be widespread agitation over its invasiveness and the consequences of its misuse.
They could be right. But what if TIA does work? What if it can spot the kind of trail the 9/11 hijackers left in their wake -- the test flights, the car rentals, the gym memberships, the flight schools, the public Internet terminals, the driver's licenses with fake addresses, the one-way tickets -- all of which are completely innocent when one person does them, but which could raise flags when several people (who know each other) do them at around the same time. Would the public be for a system that helps find terrorists, despite its concerns over civil liberties?
Shortly after Sept. 11, 2001, Stanford's Ullman posted on his Web site a long essay he'd written reacting to the attacks. The piece was mostly political; Ullman criticized religious fundamentalism, Palestinians who think terrorism will buy them freedom and the misplaced zeal of our drug war (which he says can stand in the way of the war on terrorism). There was only one part that had anything to do with his research:
"Modern technology has given criminals and terrorists many new and deadly options," he wrote. "Just about the only defensive weapon to come out of the developments of the past 50 years is information technology: our ability to learn electronically what evils are being planned. If we use it wisely, we can keep our personal freedom, yet use information effectively against its enemies."
The specific information technology that Ullman believes will be our salvation is called data mining. If you tend to use such modern conveniences as credit cards, supermarkets and online bookstores, chances are you've been helped -- or, depending on how you see it, hurt -- by data mining. Broadly speaking, the phrase means the process of looking at a heap of information and finding something you think you might want. It implies a "fuzziness" about your search, a hunt for patterns buried in the data that are not obvious. Credit card companies use a form of data mining to determine whether your purchases look "unusual" and may, therefore, be fraudulent. Amazon.com uses it to recommend books by looking at other books you've purchased. When you hand over your discount-club card at a grocery store checkout, you're actually letting the store keep data on your personal shopping habits; some chains are finding ways to mine that data.
Total Information Awareness uses a data-mining system that DARPA calls Evidence Extraction and Link Discovery (EELD). According to the TIA site, the system will have "detection capabilities to extract relevant data and relationships about people, organizations and activities from message traffic and open source data. It will link items relating potential terrorist groups or scenarios, and learn patterns of different groups or scenarios to identify new organizations or emerging threats."
"Collecting everything -- that's what would give it its power," explains Raghu Ramakrishnan, a computer scientist at the University of Wisconsin at Madison. To determine whether an individual might be a threat, the system would look at all of his activities and all his relationships, and "you would ask if there is statistically significant evidence that these activities are 'suspicious,'" Ramakrishnan says. "If three things occur together, you might be able to make the statement that they are 'highly correlated' -- that in, say, 99.9 percent of the cases where I found these two activities occurring together, I would also find this other thing happening."
Take as an example the purchase of one-way airline tickets. For years, airlines have known that this is one signal of dangerous activity but does not in and of itself indicate a sure threat. (Before 9/11, only international passengers using one-way tickets were deemed a high security risk; domestic passengers going one way, even on a ticket purchased at the counter with cash, weren't seen as much of a problem at all, which is one reason why some of the hijackers weren't more closely examined.) Buying a one-way ticket could be one flag in TIA -- an indication of a marginally higher risk. But when TIA notices that someone has purchased a one-way ticket, it might also look to see if he has associated with anyone else who has done the same. Have they all recently done other things -- enrolled in flight schools, purchased weapons, etc. -- that would make them even more suspicious?
TIA would be set up to do its work automatically and in close to real time: The suspect buys the one-way ticket, his past activities and affiliations are examined, and then, if his risk factor meets a certain threshold, an intelligence or law enforcement analyst is notified. According to the Web site, TIA "provides focused warnings within an hour after a triggering event occurs or an evidence threshold is passed."
If TIA works this cleanly, many say that the chief problem it raises -- its knowledge about you, personally -- is not much of a problem at all: After all, it has information about you only so it can determine what a good guy looks like. You, as an innocent, are in the database mainly as an example of someone who's not a terrorist: the guy who buys a one-way ticket every once in a while because of some emergency business. John Poindexter would call you "noise." In an interview with the Washington Post, he described TIA as a giant filter to separate noise from what he calls "signal."
To hear Poindexter describe it, the system sounds almost elegant; if you take it to its technological extreme, there's also a supernatural aspect to it. TIA would know everything; TIA would predict evil; TIA could save the world. Indeed, some of TIA's research projects sound as though they've been copied from the Psychic Friends' Network.
You can see why more than a few pundits have compared TIA to the notion of "pre-crime" imagined in the Philip K. Dick short story (and Tom Cruise movie) "Minority Report." The comparison is not meant to be a compliment.
There are several technological and mathematical reasons why TIA can't become truly oracular. Its main limitation is that it could never really know everything. Indeed, how much it could conceivably know -- and how fast it could know it -- is at this point unclear; a database on a huge scale that's meant to be as dynamic as TIA has never been set up before, experts say, and nobody knows if it's even possible. But even if DARPA does manage to create the database, TIA will face another limitation: It can only know what you do, not what you think. And, though it would have some idea -- maybe even a good idea -- of what a terrorist plan "looks" like, TIA would be limited to terrorist attacks it has seen in the past. And it's not clear that all new terrorism will look like old terrorism. Before Sept. 11, the possibility that a data-mining system might have predicted that four planes would be simultaneously hijacked and slammed into buildings would have been close to nil -- and the likelihood that terrorists will come up with new, unprecedented threats seems close to 100 percent.
DARPA's involvement in a research area tends to accelerate advances in that field, and the group's stated goal for TIA is to do things that have never been done before. In its solicitation for research ideas for TIA, the agency asks for ideas "that enable revolutionary advances in science, technology or systems." It's possible that DARPA could hit on some new, easy way of integrating information, which scientists say would be a good side benefit of the project.
"I'm retiring," says Stanford's Jeffrey Ullman, "so I'm not trying to use you as a way to get more money for my research project. But I think the government has made a huge mistake in not funding computer scientists, and this is an area -- the information-integration part of it -- which has good commercial use as well."
One of the main challenges John Poindexter will face in building his noise filter will be its calibration: Should TIA look at more specific, narrow traits of terrorism in an effort to reduce the false positives, while risking the chance that some novel disaster will slip through? Or should it do the opposite -- look for the more general characteristics of terrorists and risk pursuing thousands (or millions) of innocent people?
"That's a good question," says Gregory Piatetsky-Shapiro, a data-mining expert who runs KDnuggets, an online newsletter devoted to the subject. The answer, he says, "is that in general you do still want to protect against past attacks -- so you would look for the kinds of things that happen there and try to stop those. But also, there are general things that you would look for in other attacks" -- things that are statistically unlikely in the general population.
Ullman says, "You ask it about all of the unusual coincidences of people who are known to be involved with al-Qaida. The system should be able to notice that four guys have enrolled in different flight schools, and you have to distinguish that from noticing that four guys in al-Qaida have bought jeans at Macy's."
But what about regular people -- people who aren't suspected of being in al-Qaida? "That's where it becomes a hard algorithm problem and a good research problem," Ullman says. "This is something that requires the brightest minds in computer science."
But could even the brightest minds prevent TIA from fingering innocent people? Not long after he heard about the system, Bobby Gladd, a statistician and self-described "political pain in the ass" who lives in Las Vegas, set out to determine how many false positives a system like TIA would produce. It turns out that you don't need an advanced degree in statistics to do the calculation Gladd did to determine that even if TIA is very good, it will still be frequently wrong.
Gladd figures that if TIA has a scheme that can correctly identify as innocent 99.9 percent of the innocent people it sees -- an exceptionally high percentage that is probably not achievable -- then it will still end up with about 240,000 falsely accused Americans. (That is, 0.1 percent of the 240 million adult Americans.) If you reduce the percentage to 80 -- more reasonable but probably still too high -- the number of false positives becomes 48 million!
"I am offended by the constitutional implications of it," Gladd says, "but at the same time I'm calling attention to it on the basis of what I do. This is a waste of time, and it's going to take away resources."
Like many other critics of the system, Gladd points out that intelligence analysts missed 9/11 not because they had too little information -- it turns out that, in retrospect, there were many "unconnected dots" pointing to an attack -- but because they didn't have the capability to analyze it. Gladd says that government money would be more wisely spent on information analysis. "Every dollar spent on TIA is going to be a dollar not spent on fighting terrorism," he says.
But Piatetsky-Shapiro says that we have to remember that law enforcement already falsely follows a lot of innocent people. Anyone who's seen Law & amp; Order knows this. The recent hunt for the Washington sniper proved this too, as thousands of calls poured into hotlines, almost all of them pointing to people uninvolved with the crime. "I think we'll never be able to eliminate false positives," Piatetsky-Shapiro says, "but maybe this tool can improve the ratio."
It's true there are significant dangers to Total Information Awareness, and the computer scientists all said they were worried about that. The whole thing may be unconstitutional; if it's not, it would still be what many people consider an invasion of privacy, and there would need to be new rules governing its use. Can such rules be set up -- and will they? And do we trust the people setting them up? Despite these questions, the computer scientists also said they think of TIA as a long-term research project. As such, they say, both the policy that will govern it and the technology that will run inside it need to be publicly debated.
While Ullman, of Stanford, is intrigued by the tool, the people who might use it give him the willies. He's for a system like TIA, but he deeply mistrusts the people in power.
"For it to work," he says, "you'd have to get Republicans agreeing to not use it to track drug dealers and other civil crimes that are not acts of war. Whether a Republican administration would ever contemplate this, I don't know. Because Ashcroft wants to catch drug dealers."
But Ullman also adds that "once you get the right laws passed, the thing that makes it a crime to misuse the data there, then you're okay. Why doesn't the military stage a coup? Because there's a tradition built up over 200 years that doesn't let this thing happen. You have to get that here, that tradition."
In the end, the debate over TIA, if it comes, may hang on this point: Are the rules good enough? For some people, no number of safeguards may be okay. Lee Tien, of the Electronic Frontier Foundation, for example, says that, "I can't possibly say yes based on what I know now. I'd have to be convinced there would be a commitment to privacy from the get-go, and we just don't see that now. This administration is known for its secrecy. They are as bad as Nixon, maybe worse. We certainly cannot trust them with this system."
He adds that, "one of my biggest fears is that they are working on this stuff and they have some breakthroughs, and then something happens -- an attack -- and all of a sudden TIA's riding the white horse to the rescue. And then it's, 'Gee we haven't worked out the privacy,' and, 'We haven't had new legal protections, but the exigencies are such that we need it now.' "
That's probably a valid fear. But so is the fear of terrorism, says Ramakrishnan. "You know, not to make its sound grandiose, but I think there is a battle here, and we're facing the kinds of things the people who invented the atom bomb were thinking. I would rather that we understood this and took the time to enforce reasonable safeguards. To the extent that we do this in the open and have in place an array of legal legislative guidelines, I'd be much happier with that. It's probably not whether we should
-- I don't think we have a choice."