Developer Pulls ‘Scraped’ Image Database After Tinder Complains

The developer said he plans to use a database of 40,000 Tinder profile photos to train artificially intelligent image recognition technology

A public data set of 40,000 images drawn from the Tinder online dating service has been withdrawn after the service said the image collection was a breach of its terms of use.

Developer Stuart Colianni had released the images under a public Creative Commons licence on Kaggle, a platform for predictive modelling and analytics competitions, where data miners compete to produce the best models for exploiting researchers’ data sets.

Image scraping

Colianni also released the script he used to automatically “scrape” the images from Tinder on the GitHub source code hosting website, saying he had found other facial data sets “disappointing”.

“The datasets tend to be extremely strict in their structure, and are usually too small,” he wrote on GitHub. “Tinder gives you access to thousands of people within miles of you. Why not leverage Tinder to build a better, larger facial dataset?”
Colianni said he used the script to scrape profile photos from 40,000 San Francisco Bay Area users’ profiles, 20,000 of each gender, and planned to use a Google-developed technology called Inception to create a neural network capable of distinguishing between male and female images.

Tinder offers “near unlimited access to create a facial data set,” Colianni wrote.

The data set was reportedly downloaded several hundred times before it was removed.

Privacy issues

Tinder’s terms of use grant it broad and transferrable rights to exploit content uploaded to it by users, but the company said Colianni violated section 11 of its terms of service, which prohibit the use of scraping tools.

Profile images can be viewed by any user of the free application, but can’t be harvested in a way that removes the images from their context, Tinder said.

“We… continue to implement measures against the automated use of our API, which includes steps to deter and prevent scraping,” the company stated.

Colianni said he had removed the data set from Kaggle at Tinder’s request, but continued to make the scraper tool available.

data encryption“The Tinder API documentation has been available to the public for years, and there are numerous open source projects on GitHub such as Pynder showing how to make Tinder bots and interact with the Tinder API,” he wrote.

The broad access Tinder provides to developers via its API has generated controversy in the past, with some arguing information posted to the site is made too easily available to the public.

Last year developers released a service called Swipebuster that allowed interested parties to search for users on the dating app by first name, age, gender and location for a fee.

More broadly, the use of large data sets to train machine learning tools has increasingly come under fire for threatening individuals’ privacy, an issue that arose after the Royal Free NHS Trust agreed to provide patient data to Google’s DeepMind Health.

Put your knowledge of artificial intelligence (AI) to the test. Try our quiz!