Developer Pulls ‘Scraped’ Image Database After Tinder Complains

A public data set of 40,000 images drawn from the Tinder online dating service has been withdrawn after the service said the image collection was a breach of its terms of use.

Developer Stuart Colianni had released the images under a public Creative Commons licence on Kaggle, a platform for predictive modelling and analytics competitions, where data miners compete to produce the best models for exploiting researchers’ data sets.

Image scraping

Colianni also released the script he used to automatically “scrape” the images from Tinder on the GitHub source code hosting website, saying he had found other facial data sets “disappointing”.

“The datasets tend to be extremely strict in their structure, and are usually too small,” he wrote on GitHub. “Tinder gives you access to thousands of people within miles of you. Why not leverage Tinder to build a better, larger facial dataset?”

Colianni said he used the script to scrape profile photos from 40,000 San Francisco Bay Area users’ profiles, 20,000 of each gender, and planned to use a Google-developed technology called Inception to create a neural network capable of distinguishing between male and female images.

Tinder offers “near unlimited access to create a facial data set,” Colianni wrote.

The data set was reportedly downloaded several hundred times before it was removed.

Privacy issues

Tinder’s terms of use grant it broad and transferrable rights to exploit content uploaded to it by users, but the company said Colianni violated section 11 of its terms of service, which prohibit the use of scraping tools.

Profile images can be viewed by any user of the free application, but can’t be harvested in a way that removes the images from their context, Tinder said.

“We… continue to implement measures against the automated use of our API, which includes steps to deter and prevent scraping,” the company stated.

Colianni said he had removed the data set from Kaggle at Tinder’s request, but continued to make the scraper tool available.

“The Tinder API documentation has been available to the public for years, and there are numerous open source projects on GitHub such as Pynder showing how to make Tinder bots and interact with the Tinder API,” he wrote.

The broad access Tinder provides to developers via its API has generated controversy in the past, with some arguing information posted to the site is made too easily available to the public.

Last year developers released a service called Swipebuster that allowed interested parties to search for users on the dating app by first name, age, gender and location for a fee.

More broadly, the use of large data sets to train machine learning tools has increasingly come under fire for threatening individuals’ privacy, an issue that arose after the Royal Free NHS Trust agreed to provide patient data to Google’s DeepMind Health.

Put your knowledge of artificial intelligence (AI) to the test. Try our quiz!

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

Google Delays Removal Of Third-Party Cookies, Again

For third time Google delays phase-out of third-party Chrome cookies after pushback from industry and…

7 hours ago

Tesla Posts Biggest Revenue Drop Since 2012

Elon Musk firm touts cheaper EV models, as profits slump over 50 percent in the…

8 hours ago

Apple iPhone Q1 Sales In China Fall 19 Percent, Says Counterpoint

Bad news for Tim Cook, as Counterpoint records 19 percent fall in iPhone sales in…

12 hours ago

President Biden Signs TikTok Ban Or Divest Bill Into Law

TikTok pledges to challenge 'unconstitutional' US ban in the courts, after President Joe Biden signs…

13 hours ago