It is common knowledge that most Internet file sharing is illegal, but Sauhard Sahi ’10 has proof. Analyzing a random sample of 1,021 files available on a variant of the file-sharing application BitTorrent, Sahi found that 85 to 99 percent of files were shared in violation of copyright law.
Sahi, a computer science concentrator, explained that his independent work demonstrates that “copyright infringement is widespread among BitTorrent users.”
BitTorrent is a peer-to-peer file-sharing system that lets users download large files, called torrents, quickly. While users download a file, they can also upload, or distribute, parts of it at the same time.
Sahi studied the trackerless variant of BitTorrent, which does not have a “central server to manage connections between peers," he explained. Instead, each user tracks files on an individual basis.
Files on BitTorrent have unique ID numbers, called hashes. Sahi obtained his sample of files by downloading files associated with the hashes his computer was tracking.
Sahi found that 46 percent of the files were non-pornographic movies and television shows. Almost all of these files were illegally distributed, Sahi said.
In addition, 14 percent of the files were games and software, 14 percent were pornography, 10 percent were music, 1 percent were books and guides and 1 percent were images. Sahi could not classify 14 percent of the files.
Of 1,021 files, only 10 were “definitely legal,” Sahi said.
Sahi said his research confirmed the file-sharing application’s role in promoting illegal activity.
“BitTorrent is ... passively encouraging open-piracy,” Sahi explained. “This is the first place [users] ... go when they want to download a new movie without paying for it.”
The large number of lawsuits filed by the Recording Industry Association of America (RIAA) and Motion Picture Association of America inspired Sahi to conduct this study, he explained.
Ed Felten, the computer science professor who supervised Sahi's independent work, said the project is “novel” because it measures “something that others had not been able to measure before.”

“This is the first study I have seen that tries to get a general picture of what is available on BitTorrent," Felten added.
Several technology websites have written about Sahi’s study after Felten posted the results on his blog. In addition to the discussion of his research, Sahi noted, rumors had spread that he was “being paid by the RIAA to make BitTorrent bad.”
Sahi explained that his next step would be to look at a larger sample of files and focus on the more popular, widely downloaded files rather than on all available files.
“We know there are some very popular files that are not copyright-infringing on BitTorrent,” Felten said. “It could be those are a much bigger part of the picture when weighed by actual download traffic.”