A public data set of 40,000 images drawn from the Tinder online dating service has been withdrawn after the service said the image collection was a breach of its terms of use.
Developer Stuart Colianni had released the images under a public Creative Commons licence on Kaggle, a platform for predictive modelling and analytics competitions, where data miners compete to produce the best models for exploiting researchers’ data sets.
Colianni also released the script he used to automatically “scrape” the images from Tinder on the GitHub source code hosting website, saying he had found other facial data sets “disappointing”.
“The datasets tend to be extremely strict in their structure, and are usually too small,” he wrote on GitHub. “Tinder gives you access to thousands of people within miles of you. Why not leverage Tinder to build a better, larger facial dataset?”
Colianni said he used the script to scrape profile photos from 40,000 San Francisco Bay Area users’ profiles, 20,000 of each gender, and planned to use a Google-developed technology called Inception to create a neural network capable of distinguishing between male and female images.
Tinder offers “near unlimited access to create a facial data set,” Colianni wrote.
The data set was reportedly downloaded several hundred times before it was removed.
Tinder’s terms of use grant it broad and transferrable rights to exploit content uploaded to it by users, but the company said Colianni violated section 11 of its terms of service, which prohibit the use of scraping tools.
Profile images can be viewed by any user of the free application, but can’t be harvested in a way that removes the images from their context, Tinder said.
“We… continue to implement measures against the automated use of our API, which includes steps to deter and prevent scraping,” the company stated.
Colianni said he had removed the data set from Kaggle at Tinder’s request, but continued to make the scraper tool available.
The broad access Tinder provides to developers via its API has generated controversy in the past, with some arguing information posted to the site is made too easily available to the public.
Last year developers released a service called Swipebuster that allowed interested parties to search for users on the dating app by first name, age, gender and location for a fee.
More broadly, the use of large data sets to train machine learning tools has increasingly come under fire for threatening individuals’ privacy, an issue that arose after the Royal Free NHS Trust agreed to provide patient data to Google’s DeepMind Health.
Put your knowledge of artificial intelligence (AI) to the test. Try our quiz!
After relocating from California to Texas in 2020, Oracle's Larry Ellison now reveals plan to…
Share price hit after Meta admits heavy AI spending plans, after posting strong first quarter…
For third time Google delays phase-out of third-party Chrome cookies after pushback from industry and…
Elon Musk firm touts cheaper EV models, as profits slump over 50 percent in the…
Bad news for Tim Cook, as Counterpoint records 19 percent fall in iPhone sales in…
TikTok pledges to challenge 'unconstitutional' US ban in the courts, after President Joe Biden signs…