Connect with us

Tech

Membership inference attacks detect data used to train machine learning models

Published

on

deep neural network AI

Join Transform 2021 this July 12-16. Register for the AI event of the year.


One of the wonders of machine learning is that it turns any kind of data into mathematical equations. Once you train a machine learning model on training examples—whether it’s on images, audio, raw text, or tabular data—what you get is a set of numerical parameters. In most cases, the model no longer needs the training dataset and uses the tuned parameters to map new and unseen examples to categories or value predictions.

You can then discard the training data and publish the model on GitHub or run it on your own servers without worrying about storing or distributing sensitive information contained in the training dataset.

But a type of attack called “membership inference” makes it possible to detect the data used to train a machine learning model. In many cases, the attackers can stage membership inference attacks without having access to the machine learning model’s parameters and just by observing its output. Membership inference can cause security and privacy concerns in cases where the target model has been trained on sensitive information.

From data to parameters

Above: Deep neural networks use multiple layers of parameters to map input data to outputs

Each machine learning model has a set of “learned parameters,” whose number and relations vary depending on the type of algorithm and architecture used. For instance, simple regression algorithms use a series of parameters that directly map input features to the model’s output. Neural networks, on the other hand, use complex layers of parameters that process input and pass them on to each other before reaching the final layer.

But regardless of the type of algorithm you choose, all machine learning models go through a similar process during training. They start with random parameter values and gradually tune them to the training data. Supervised machine learning algorithms, such as those used in classifying images or detecting spam, tune their parameters to map inputs to expected outcomes.

For example, say you’re training a deep learning model to classify images into five different categories. The model might be composed of a set of convolutional layers that extract the visual features of the image and a set of dense layers that translate the features of each image into confidence scores for each class.

The model’s output will be a set of values that represent the probability that an image belongs to each of the classes. You can assume that the image belongs to the class with the highest probability. For instance, an output might look like this:

Cat: 0.90
Dog: 0.05
Fish: 0.01
Tree: 0.01
Boat: 0.01

Before training, the model will provide incorrect outputs because its parameters have random values. You train it by providing it with a collection of images along with their corresponding classes. During training, the model gradually tunes the parameters so that its output confidence score becomes as close as possible to the labels of the training images.

Basically, the model encodes the visual features of each type of image into its parameters.

Membership inference attacks

A good machine learning model is one that not only classifies its training data but generalizes its capabilities to examples it hasn’t seen before. This goal can be achieved with the right architecture and enough training data.

But in general, machine learning models tend to perform better on their training data. For example, going back to the example above, if you mix your training data with a bunch of new images and run them through your neural network, you’ll see that the confidence scores it provides on the training examples will be higher than those of the images it hasn’t seen before.

training examples vs new examples

Above: Machine learning models perform better on training examples as opposed to unseen examples

Membership inference attacks take advantage of this property to discover or reconstruct the examples used to train the machine learning model. This could have privacy ramifications for the people whose data records were used to train the model.

In membership inference attacks, the adversary does not necessarily need to have knowledge about the inner parameters of the target machine learning model. Instead, the attacker only knows the model’s algorithm and architecture (e.g., SVM, neural network, etc.) or the service used to create the model.

With the growth of machine learning as a service (MaaS) offerings from large tech companies such as Google and Amazon, many developers are compelled to use them instead of building their models from scratch. The advantage of these services is that they abstract many of the complexities and requirement of machine learning, such as choosing the right architecture, tuning hyperparameters (learning rate, batch size, number of epochs, regularization, loss function, etc.), and setting up the computational infrastructure needed to optimize the training process. The developer only needs to set up a new model and provide it with training data. The service does the rest.

The tradeoff is that if the attackers know which service the victim used, they can use the same service to create a membership inference attack model.

In fact, at the 2017 IEEE Symposium on Security and Privacy, researchers at Cornell University proposed a membership inference attack technique that worked on all major cloud-based machine learning services.

In this technique, an attacker creates random records for a target machine learning model served on a cloud service. The attacker feeds each record into the model. Based on the confidence score the model returns, the attacker tunes the record’s features and reruns it by the model. The process continues until the model reaches a very high confidence score. At this point, the record is identical or very similar to one of the examples used to train the model.

membership inference attack models

Above: Membership inference attacks observe the behavior of a target machine learning model and predict examples that were used to train it.

After gathering enough high confidence records, the attacker uses the dataset to train a set of “shadow models” to predict whether a data record was part of the target model’s training data. This creates an ensemble of models that can train a membership inference attack model. The final model can then predict whether a data record was included in the training dataset of the target machine learning model.

The researchers found that this attack was successful on many different machine learning services and architectures. Their findings show that a well-trained attack model can also tell the difference between training dataset members and non-members that receive a high confidence score from the target machine learning model.

The limits of membership inference

Membership inference attacks are not successful on all kinds of machine learning tasks. To create an efficient attack model, the adversary must be able to explore the feature space. For example, if a machine learning model is performing complicated image classification (multiple classes) on high-resolution photos, the costs of creating training examples for the membership inference attack will be prohibitive.

But in the case of models that work on tabular data such as financial and health information, a well-designed attack might be able to extract sensitive information, such as associations between patients and diseases or financial records of target people.

overfitting vs underfitting

Above: Overfitted models perform well on training examples but poorly on unseen examples.

Membership inference is also highly associated with “overfitting,” an artifact of poor machine learning design and training. An overfitted model performs well on its training examples but poorly on novel data. Two reasons for overfitting are having too few training examples or running the training process for too many epochs.

The more overfitted a machine learning model is, the easier it will be for an adversary to stage membership inference attacks against it. Therefore, a machine model that generalizes well on unseen examples is also more secure against membership inference.

This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Replicated: Demand for on-premises software equally as strong as SaaS

Published

on

Replicated: Demand for on-premises software equally as strong as SaaS

Join Transform 2021 this July 12-16. Register for the AI event of the year.


While there is a strong demand for cloud applications and software-as-a-service, security, regulatory, and compliance requirements continue to drive demand for on-premises software. In a new Dimensional Research report, 92% of companies said on-premises software was growing. The report, sponsored by Replicated, a software delivery and management company, found that current customer demand for on-premises software was equal to that of public cloud.

Above: Customer demand for on-premises software delivery isn’t slowing down anytime soon.

While it may be popular to believe that “cloud is king” and SaaS is the best and most in-demand modern enterprise software, data shows that demand for on-premises software is equally as strong. It’s the smart choice for customers operating under security, regulatory, and compliance requirements; many organizations cannot allow their customer data to be shared in multi-tenant environments. Additionally, software companies that do not currently provide an on-premises solution to customers leave money on the table and miss a significant business and competitive opportunity.

This new report from Dimensional Research, sponsored by Replicated, highlights the missed business opportunities for software vendors who are not offering an on-premises version. The report provides detailed insights around the current use, need, and challenges for on-premises software and its installation, configuration and management. This report also takes a closer look at the parallel rise in the adoption of container-based applications and the use of Kubernetes.

Perhaps the most important findings are that 92% of surveyed participants reported their on-premises software sales as growing, and that on-premises solutions are equally as popular as their public cloud alternatives. This directly counters the popular narrative that SaaS has overtaken on-premises software delivery, as security and data protection stay top of mind for enterprise software customers.

The survey from Dimensional Research includes feedback from 405 business and technology professionals at executive and manager seniority levels, representing software companies of all sizes around the world across a wide variety of different industries.

Read the full report from Replicated

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Continue Reading

Tech

Roblox hits Q1 bookings of $652.3 million, up 161%, in first report as public company

Published

on

Roblox's user-generated game characters.

Did you miss GamesBeat Summit 2021? Watch on-demand here! 


Roblox, the platform for Lego-like user-generated games, reported its earnings for the first time as a publicly traded company. This met analysts’ expectations. Bookings for the first quarter ended March 31 were $652.3 million, up 161% from the same quarter a year ago.

Roblox has done among its target audience of children and teens during the pandemic, as players turned to it for remote, socially distanced play with their friends at a time when they couldn’t meet in-person.

Roblox previously raised $520 million at a $29.5 billion valuation in a financing round ahead of its direct listing on the New York Stock Exchange as a public company. It opened on March 10 at a valuation of $41.9 billion a share and has hovered around that value. Investors greeted the results positively, with Roblox trading up 5% at $67.18 a share in after-hours trading.

Q1 results

Analysts expected a loss of 21 cents a share on bookings of $568.6 million. Most video game companies emphasize non-GAAP bookings, or the total value of virtual currency purchases by players during the quarter, instead of revenues, which under accounting rules are limited to those purchases that are expected to be fully resolved within a certain time period. For instance, a player may buy Robux currency in the first quarter, but spend it over 10 months. That revenue has to be recognized over time, as it is spent inside the platform’s games.

Roblox’s quarterly revenue came in at $387 million, up 140% from a year earlier. The GAAP net loss for the quarter was $134.2 million. But operating cash flow as positive, and so that means cash is coming into the business, said chief business officer Craig Donata in an interview with GamesBeat.

“We had a strong quarter in terms of bookings, revenue, and operating cash flow, and more important, in terms of daily active user growth and time spent by players,” Donato said.

Roblox gets a 30% cut from the bookings generated by sales of Robux, the virtual currency used by players to play user-generated games, the company’s bookings for 2020 were $1.9 billion, double what they were the year before. Roblox’s games have become so popular that people have played the best ones billions of times. On average, 32.6 million people come to Roblox every day. More than 1.25 million creators have made money in Roblox. In the year ended December 31, 2020, users spent 30.6 billion hours engaged on the platform, an average of 2.6 hours per daily active user each day.

Above: Roblox’s user-generated game characters.

Image Credit: Roblox

Net cash provided by operating activities increased nearly four times in Q1 2021 over Q1 2020 to $164.5 million (including one-time direct listing expenses of $51.9 million). Exclusive of one-time expenses related to the direct listing, net cash provided by operating activities would have been $216.4 million.

Free cash flow increased 4.1 times over Q1 2020 to $142.1 million. Average daily active users (DAUs) were 42.1 million, an increase of 79% year over year driven by 87% growth in DAUs outside of the U.S. and Canada and 111% growth in DAUs over the age of 13.

Hours engaged were 9.7 billion, an increase of 98% year over year primarily driven by 104% growth in engagement in markets outside of the U.S. and Canada, and 128% growth from users over the age of 13. Average bookings per DAU (ABPDAU) was $15.48, an increase of 46% year over year.

April results

Rather than make forecasts about how its upcoming quarter is expected to go, Roblox is not making a forecast. Rather, it is disclosing the actual results for the month of April, which is part of the second quarter.

For the month of April alone, daily active users were 43.3 million, up 37% from April of last year and up sequentially from 42.3 million in the month of March 2021. Hours engaged in April were 3.2 billion, up 18% year over year and flat sequentially from March 2021.

Bookings were between $242 million and $245 million, up 59% to 61% year over year and up sequentially 7% to 9% from March 2021 when bookings were $225.3 million.

Average bookings per DAU were between $5.59 to $5.66, up 16% to 17% year over year and 5% to 6% sequentially from March 2021. April revenue was $143 million to $145 million, up 136% to 140% year over year and 5% to 7% sequentially from March 2021.

“Our first quarter 2021 results enabled us to continue investing aggressively in the key areas that we believe will drive long term growth and value, specifically hiring talented engineering and product professionals and growing the earnings for our developer community,” said chief financial officer of Roblox Michael Guthrie,  in a statement. “We believe we must continue to innovate and so remain focused on building great technology to make progress on our key growth vectors, primarily international expansion and expanding the age demographic of our users.”

The company closed the March quarter with 1,054 employees, up from 651 a year earlier.

GamesBeat

GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it.

How will you do that? Membership includes access to:

  • Newsletters, such as DeanBeat
  • The wonderful, educational, and fun speakers at our events
  • Networking opportunities
  • Special members-only interviews, chats, and “open office” events with GamesBeat staff
  • Chatting with community members, GamesBeat staff, and other guests in our Discord
  • And maybe even a fun prize or two
  • Introductions to like-minded parties

Become a member

Continue Reading

Tech

IronSource’s Supersonic launches LiveGames publishing service for indies

Published

on

IronSource's Supersonic launches LiveGames publishing service for indies

Did you miss GamesBeat Summit 2021? Watch on-demand here! 


Mobile monetization firm IronSource said its Supersonic Studios division has launched LiveGames, a self-service way for indie game developers to manage mobile games and their live services (such as tournaments or updates).

This is for Supersonic publishing solution, which IronSource launched more than a year ago. The announcement comes after it announced that it plans to go public via a special purpose acquisition company (SPAC) at an $11.1 billion valuation.

The product offers developers who publish their mobile games with Supersonic access to game management and full visibility and transparency into in-game metrics that enable them to better manage and grow their published games.

Nadav Ashkenazy, the general manager of Supersonic Studios, said in an interview with GamesBeat that the goal is to make publishing tools accessible to indie developers so they can get their games off the ground. It helps with the “growth loop,” after a game reaches a large scale and then needs attention in terms of improving numbers, such as the average playtime per user.

“After you scale a game globally, everything gets more complicated,” Ashkenazy said. “For average playtime per user, we give you a snapshot for that.”

The idea is to support developers as independent companies by productizing what is otherwise a manual process. It also adds some important transparency for developers that they normally can’t get out of game publishers, said Omer Kaplan, the chief revenue officer at IronSource, in an interview with GamesBeat.

“Historically, publishing is a black box,” Kaplan said. “A developer’s game meets retention goals. Then a publisher handles growth and gives a revenue share. We make everything transparent. We have complete transparency for the developers using our publishing solution on the IronSource platform.”

Several rival products in the market help developers test the performance and marketability of their prototypes, with IronSource launching its self-serve testing product for Supersonic developers in 2020. However, one of the biggest challenges comes once a game has been published, since many of the insights relating to a game and its performance are not commonly visible to the developer, limiting the ability to understand, test, iterate and improve for the long term.

Above: IronSource’s LiveGames helps studios manage their game data.

Image Credit: IronSource

With Supersonic, IronSource has focused on helping game companies become better developers, rather than treat each game as a standalone unit.

Through LiveGames, developers will have access to data such as daily, monthly, and annual profit for each of their published games; advanced analytics including retention, playtime, lifetime value, and ad engagement for each region and user acquisition channel; rewarded video and interstitial ad analysis; and advanced analytics from A/B tests for test comparison.

Stan Mettra, the CEO of game studio Born2play, is using LiveGames with the game StackyDash. He said in a statement this is the first time the company has so many insights into the performance of the game. That helps take away blind spots and helps the company take steps to increase revenue. About 25 studios used the LiveGames service in alpha testing and they’re now ready to start using the product.

“We’re encouraging the developers to remain independent,” Kaplan said.

Tel Aviv, Israel-based IronSource has previously said it would raise $2.3 billion in cash proceeds for both shareholders and the company itself through the transactions, which includes both the proceeds from the SPAC (a faster way of going public compared to an initial public offering) and an additional private investment known as a PIPE, or private investment in a public equity. SPACs have become a popular way for fast-moving companies to go public without all the hassle of a traditional IPO. Regulators have come up with more rules to govern SPACs, but the idea is to raise money faster.

IronSource said it recorded 2020 revenue of $332 million and adjusted earnings before interest, taxes, depreciation, and amortization (EBITDA) of $104 million. IronSource said its monetization platform is designed to enable any app or game developer to turn their app into a scalable, successful business by helping them to monetize and analyze their app and grow and engage their users through multiple channels, including unique on-device distribution through partnerships with telecom operators such as Orange and a device makers such as Samsung.

In 2020, IronSource said 94% of its revenues came from 291 customers with more than $100,000 of annual revenue, a dollar-based net expansion rate of 149%.

GamesBeat

GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it.

How will you do that? Membership includes access to:

  • Newsletters, such as DeanBeat
  • The wonderful, educational, and fun speakers at our events
  • Networking opportunities
  • Special members-only interviews, chats, and “open office” events with GamesBeat staff
  • Chatting with community members, GamesBeat staff, and other guests in our Discord
  • And maybe even a fun prize or two
  • Introductions to like-minded parties

Become a member

Continue Reading

Trending