Symantec’s Machine Learning Prevents Data Loss

Symantec Data Loss Prevention 11 will feature “vector machine learning” to help define sensitive data policies

Symantec is adding new machine learning technology to its data loss prevention (DLP) product to ease efforts to classify sensitive data and define policies.

The feature, called Vector Machine Learning (VML), will be included in Symantec Data Loss Prevention 11 when it becomes available during the first half of 2011. The technology, the company explained, aims to go beyond traditional fingerprinting and data describing approaches used to find sensitive information.

Automating Policy Creation

“Vector Machine Learning is used to develop policies that define sensitive data, or data that the DLP system should look for, or detect,” explained Robert Hamilton, senior product marketing manager for Symantec. “Vector Machine Learning is trained using positive and negative sample documents to create a profile that is then used within a DLP policy.”

“For example: a software developer needs to protect their proprietary code from leaving the organisation via email or USB drives,” he said. “While it needs to protect proprietary code, it doesn’t want the DLP system to flag open source which can move around freely. So it uses samples of proprietary source code as the positive examples, and samples of the open source as the negative samples. The profile developed by VML is then configured into a policy that they name Proprietary Source Code.”

The feature can help automate policy creation, Jon Oltsik, an analyst with Enterprise Strategy Group, told eWEEK.

“When you get beyond canned policies, many DLP technologies are hard to program and somewhat inflexible,” he said. “Machine learning can help create a map of users and data that can help pinpoint where sensitive content is, who accesses it and whether actual use supports business processes and security policies.”

Among the other capabilities slated for version 11 is a new application file access control feature to ensure applications such as iTunes and Skype do not transmit sensitive data. Symantec also added a FlexResponse feature to allow users to apply encryption or Enterprise Rights Management (ERM) to files found on the endpoint as part of the discovery scanning process.

Other work is being done to streamline the remediation process by identifying locations where data is at the greatest risk and automatically notifying the associated data owners. This is done through a risk scoring feature that prioritises folders based on the amount and severity of sensitive data they contain as well as how many people have the ability to read or write to files in the folders, the company said.

“Organisations have a lot of unstructured data, often terabytes, and the sensitive data is hidden within that vast sea,” Hamilton said. “DLP can be intimidating – customers are concerned that DLP is going to tell them they have thousands of unprotected files, and they’ll have concerns about where to start their clean-up efforts. Risk Scoring helps them quickly find hot spots of risk out on their network file shares in order to understand where to start their cleanup efforts.”