Until April, Microsoft boasted of having the largest collection of faces that anyone could use to train facial-recognition algorithms. Since then, the once publicly-available dataset has quietly disappeared.
As the Financial Times reports, Microsoft quietly deleted the dataset after the paper called attention to privacy and ethical issues, including use of the dataset by military researcherss.
Microsoft did not immediately respond to a request for comment from Fortune. But it told the Financial Times: “The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed.”
The now-deleted dataset contained more than 10 million faces culled from websites like Flickr, which host photographs uploaded under a Creative Commons license—meaning many can be used free of copyright concerns.
The name of the Microsoft dataset, MS Celeb, was chosen because many of the images it contains are famous people who live public lives. Many of the other faces in the set, however, belong to people who are not celebrities—including journalists and privacy researchers—and who were not aware their images had been included.
Microsoft is hardly the only company to assemble large datasets by scraping photos from the open Internet. In January, IBM announced it was sharing a collection of 1 million faces in the name of promoting more diversity in artificial intelligence. Meanwhile, a website called Megapixels identifies several other massive collections as part of a bid to halt what it describes as a “growing crisis of authoritarian biometric surveillance.”
While many of the facial recognition sets are culled from public websites like Flickr, that is not the only way companies obtain pictures of faces. As a recent Fortune investigation revealed, startups have been using photo collection apps to surreptitiously collect millions of faces, while other companies have been scanning public collections of mug shots.