Connect with us

Tech

Intel’s image-enhancing AI is a step forward for photorealistic game engines

Published

on

Intel deep learning photorealistic enhancement full architecture

Elevate your enterprise data technology and strategy at Transform 2021.


Intel recently unveiled a deep learning system that turns 3D rendered graphics into photorealistic images. Tested on Grand Theft Auto 5, the neural network showed impressive results. The game’s developers have already done a great job of recreating Los Angeles and southern California in detail. But with Intel’s new machine learning system, the graphics turn from high-quality synthetic 3D to real-life depictions (with very minor glitches).

And what’s even more impressive is that the Intel’s AI is doing it at a relatively high framerate as opposed to photorealistic render engines that can take minutes or hours for a single frame. And this is just the preliminary results. The researchers say they can optimize the deep learning models to work much faster.

Does it mean that real-time photorealistic game engines are on the horizon, as some analysts have suggested? I would not bet on it yet, because several fundamental problems remain unsolved.

Deep learning for image enhancement

Before we can evaluate the feasibility of running real-time image enhancement, let’s have a high-level look at the deep learning system Intel has used.

The researchers at Intel have not provided full implementation details about the deep learning system they have developed. But they have published a paper on arXiv and posted a video on YouTube that provide useful hints on the kind of computation power you would need to run this model.

The full system, displayed below, is composed of several interconnected neural networks.

The G-buffer encoder transforms different render maps (G-buffers) into a set of numerical features. G-buffers are maps for surface normal information, depth, albedo, glossiness, atmosphere, and object segmentation. The neural network uses convolution layers to process this information and output a vector of 128 features that improve the performance of the image enhancement network and avoid artifacts that other similar techniques produce. The G-buffers are obtained directly from the game engine.

intel ai photorealistic image enhancement g-buffers

The image enhancement network takes as input the game’s rendered frame and the features from the G-buffer encoder and generates the photorealistic version of the image.

The remaining components, the discriminator and the LPIPS loss function, are used during training. They grade the output of the enhancement network by evaluating its consistency with the original game-rendered frame and by comparing its photorealistic quality with real images.

Inference costs for image enhancement

First, let’s see that, if the technology becomes available, whether gamers will be able to run it on their computers. For this, we need to calculate inference costs, or how much memory and computing power you need to run the trained model. For inference, you’ll only need the G-buffer encoder and image enhancement network, and we can cut the discriminator network.

Intel deep learning photorealistic enhancement inference architecture

The enhancement network accounts for the bulk of the work. According to Intel’s paper, this neural network is based on HRNetV2, a deep learning architecture meant for processing high-resolution images. High-resolution neural networks produce fewer visual artifacts than models that down-sample images.

According to Intel’s paper, “The HRNet processes an image via multiple branches that operate at different resolutions. Importantly, one feature stream is kept at relatively high resolution (1/4 of the input resolution) to preserve fine image structure.”

This means that, if you’re running the game at full HD (1920×1080), then the top row layers will be processing inputs at 480×270 pixels. The resolution halves on each of the lower rows. The researchers have changed the structure of each block in the neural network to also compute inputs from the G-buffer encoder (the RAD layers).

intel photorealistic deep learning image enhancement network

According to Intel’s paper, the G-buffer’s inputs include “one-hot encodings for material information, dense continuous values for normals, depth, and color, and sparse continuous information for bloom and sky buffers.”

The researchers note elsewhere in their paper that the deep learning model can still perform well with a subset of the G-buffers.

So, how much memory does the model need? Intel’s paper doesn’t state the memory size, but according to the HRNetV2 paper, the full network requires 1.79 gigabytes of memory for a 1024×2048 input. The image enhancement network used by Intel has a smaller input size, but we also need to account for the extra parameters introduced by the RAD layers and the G-buffer encoder. Therefore, it would be fair to assume that you’ll need at least one gigabyte of video memory to run deep learning–based image enhancement for full HD games and probably more than two gigabytes if you want 4K resolution.

HRNet memory requirements

One gigabyte of memory is not much given that gaming computers commonly have graphics cards with 4-8 GB of VRAM. And high-end graphics cards such as the GeForce RTX series can have up to 24 GB of VRAM.

But it is also worth noting that 3D games consume much of the graphics card’s resources. Games store as much data as possible on video memory to speed up render times and avoid swapping between RAM and VRAM, an operation that incurs a huge speed penalty. According to one estimate, GTA 5 consumes up to 3.5 GB of VRAM at full HD resolution. And GTA was released in 2013. Newer games such as Cyberpunk 2077, which have much larger 3D worlds and more detailed objects, can easily gobble up to 7-8 GB of VRAM. And if you want to play at high resolutions, then you’ll need even more memory.

So basically, with the current mid- and high-end graphics cards, you’ll have to choose between low-resolution photorealistic quality and high-resolution synthetic graphics.

But memory usage is not the only problem deep learning–based image enhancement faces.

Delays caused by non-linear processing

A much bigger problem, in my opinion, is the sequential and non-linear nature of deep learning operations. To understand this problem, we must first compare 3D graphics processing with deep learning inference.

Three-dimensional graphics rely on very large numbers of matrix multiplications. A rendered frame of 3D graphics starts from a collection of vertices, which are basically a set of numbers that represent the properties (e.g., coordinates, color, material, normal direction, etc.) of points on a 3D object. Before every frame is rendered, the vertices must go through a series of matrix multiplications that map their local coordinates to world coordinates to camera space coordinates to image frame coordinates. An index buffer bundles vertices into groups of threes to form triangles. These triangles are rasterized—or transformed into pixels— and every pixel then goes through its own set of matrix operations to determine its color based on material color, textures, reflection and refraction maps, transparency levels, etc.

3D render pipeline

Above: The 3D render pipeline (Source: LearnEveryone)

This sounds like a lot of operations, especially when you consider that today’s 3D games are composed of millions of polygons. But there are two reasons you get very high framerates when playing games on your computer. First, graphics cards have been designed specifically for parallel matrix multiplications. As opposed to the CPU, which has at most a few dozen computing cores, graphics processors have thousands of cores, each of which can independently perform matrix multiplications.

Second, graphics transformations are mostly linear. And linear transformations can be bundled together. For instance, if you have separate matrices for world, view, and projection transformations, you can multiply them together to create one matrix that performs all three operations. This cuts down your operations by two-thirds. Graphics engines also use plenty of tricks to further cut down operations. For instance, if an object’s bounding box falls out of the view frustum (the pyramid that represents the camera’s perspective), it will be excluded from the render pipeline altogether. And triangles that are occluded by others are automatically removed from the pixel rendering process.

Deep learning also relies on matrix multiplications. Every neural network is composed of layers upon layers of matrix computations. This is why graphics cards have become very popular among the deep learning community in the past decade.

But unlike 3D graphics, the operations of deep learning can’t be combined. Layers in neural networks rely on non-linear activation functions to perform complicated tasks. Basically, this means that you can’t compress the transformations of several layers into a single operation.

For instance, say you have a deep neural network that takes a 100×100 pixel input image (10,000 features) and runs it through seven layers. A graphics card with several thousand cores might be able to process all pixels in parallel. But it will still have to perform the seven layers of neural network operations sequentially, which can make it difficult to provide real-time image processing, especially on lower-end graphics cards.

Therefore, another bottleneck we must consider is the number of sequential operations that must take place. If we consider the top layer of the image enhancement network there are 16 residual blocks that are sequentially linked. In each residual block, there are two convolution layers, RAD blocks, and ReLU operations that are sequentially linked. That amounts to 96 layers of sequential operations. And the image enhancement network can’t start its operations before the G-buffer encoder outputs its feature encodings. Therefore, we must add at least the two residual blocks that process the first set of high-resolution features. That’s eight more layers added to the sequence, which brings us to at least 108 layers of operations for image enhancement.

This means that, in addition to memory, you need high clock speeds to run all these operations in time. Here’s an interesting quote from Intel’s paper: “Inference with our approach in its current unoptimized implementation takes half a second on a GeForce RTX 3090 GPU.”

The RTX 3090 has 24 GB of VRAM, which means the slow, 2 FPS render rate is not due to memory limitations but rather due to the time it takes to sequentially process all the layers of the image enhancer network. And this isn’t a problem that will be solved by adding more memory or CUDA cores, but by having faster processors.

Again, from the paper: “Since G-buffers that are used as input are produced natively on the GPU, our method could be integrated more deeply into game engines, increasing efficiency and possibly further advancing the level of realism.”

Integrating the image enhancer network into the game engine would probably give a good boost to the speed, but it won’t result in playable framerates.

For reference, we can go back to the HRNet paper. The researchers used a dedicated Nvidia V100, a massive and extremely expensive GPU specially designed for deep learning inference. With no memory limitation and no hindrance by other in-game computations, the inference time for the V100 was 150 milliseconds per input, which is ~7 fps, not nearly enough to play a smooth game.

Development and training neural networks

Another vexing problem is the development and training costs of the image-enhancing neural network. Any company that would want to replicate Intel’s deep learning models will need three things: data, computing resources, and machine learning talent.

Gathering training data can be very problematic. Luckily for Intel, someone had solved it for them. They used the Cityscapes dataset, a rich collection of annotated images captured from 50 cities in Germany. The dataset contains 5,000 finely annotated images. According to the dataset’s paper, each of the annotated images required an average of 1.5 hours of manual effort to precisely specify the boundaries and types of objects contained in the image. These fine-grained annotations enable the image enhancer to map the right photorealistic textures onto the game graphics. Cityscapes was the result of a huge effort supported by government grants, commercial companies, and academic institutions. It might prove to be useful for other games that, like Grand Theft Auto, take place in urban settings.

cityscapes image segmentation

Above: The Cityscapes dataset is a collection of finely annotated images of urban settings

But what if you want to use the same technique in a game that doesn’t have a corresponding dataset? In that case, it will be up to the game developers to gather the data and add the required annotations (a photorealistic version of Rise of the Tomb Raider, maybe?).

Compute resources will also pose a challenge. Training a network of the size of the image enhancer for tasks such as image segmentation would be feasible with a few thousand dollars—not a problem for large gaming companies. But when you want to do a generative task such as photorealistic enhancement, then training becomes much more challenging. It requires a lot of testing and tweaking of hyperparameters, and many more epochs of training, which can blow up the costs. Intel tuned and trained their model exclusively for GTA 5. Games that are similar to GTA 5 might be able to slash training costs by finetuning Intel’s trained model on the new game. Others might need to test with totally new architectures. Intel’s deep learning model works well for urban settings, where objects and people are easily separable. But it’s not clear how it would perform in natural settings, such as jungles and caves.

Gaming companies don’t have machine learning engineers, so they’ll also have to outsource the task or hire engineers, which adds more costs. The company will have to decide whether the huge costs of adding photorealistic render are worth the added gaming experience.

Intel’s photorealistic image enhancer shows how far you can push machine learning algorithms to perform interesting feats. But it will take a few more years before the hardware, the companies, and the market will be ready for real-time AI-based photorealistic rendering.

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics.

This story originally appeared on Bdtechtalks.com. Copyright 2021

GamesBeat

GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it.

How will you do that? Membership includes access to:

  • Newsletters, such as DeanBeat
  • The wonderful, educational, and fun speakers at our events
  • Networking opportunities
  • Special members-only interviews, chats, and “open office” events with GamesBeat staff
  • Chatting with community members, GamesBeat staff, and other guests in our Discord
  • And maybe even a fun prize or two
  • Introductions to like-minded parties

Become a member

Continue Reading
Advertisement
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Tech

Lucidworks: Chatbots and recommendations boost online brand loyalty

Published

on

Who is loyal

Elevate your enterprise data technology and strategy at Transform 2021.


Pandemic-related shutdowns led consumers to divert the bulk of their shopping to online — and many of those shoppers are now hesitant about returning to stores as businesses begin to open back up. A recent survey of 800 consumers conducted by cloud company Lucidworks found that 59% of shoppers plan to either avoid in-person shopping as much as possible,  or visit in-person stores less often than before the pandemic.

Above: Shoppers across the U.S. and U.K. agree that high-quality products, personalized recommendations, and excellent customer service are the top three reasons they’re brand-loyal.

Image Credit: Lucidworks

As the world stabilizes, shoppers want brands to provide a multi-faceted shopping experience — expanded chatbot capabilities, diverse recommendations, and personalized experiences that take into account personal preferences and history, Lucidworks found in its study. More than half of shoppers in the survey, 55%, said they use a site’s chatbot on every visit. American shoppers use chatbots more than their counterparts in the United Kingdom, at 70%.

The majority of shoppers, 70%, use chatbots for customer service, and 53% said they want a chatbot to help them find specific products or check product compatibility. A little less than half, or 48%, said they use chatbots to find more information about a product, and 42% use chatbots to find policies such as shipping information and how to get refunds.

A quarter of shoppers will leave the website to seek information elsewhere if the chatbot doesn’t give them the answer. Brands that deploy chatbots capable of going beyond basic FAQs and can perform product and content discovery will provide the well-rounded chatbot experience shoppers expect, Lucidworks said.

Respondents also pointed to the importance of content recommendations. The survey found that almost a third of shoppers said they find recommendations for “suggested content” useful, and 61% of shoppers like to do research via reviews on the brand’s website where they’ll be purchasing from. A little over a third — 37% — of shoppers use marketplaces such as Amazon, Google Shopping, and eBay for their research.

Brands should try to offer something for every step in the shopping journey, from research to purchase to support, to keep shoppers on their sites longer. How online shopping will look in coming years is being defined at this very moment as the world reopens. Brands that are able to understand a shopper’s goal in the moment and deliver a connected experience that understands who shoppers are and what they like are well-positioned for the future, Lucidworks said.

Lucidworks used a self-serve survey tool, Pollfish, in late May 2021 to survey 800 consumers over the age of 18—400 in the U.K. and 400 in the U.S.—to understand how shoppers interact with chatbots, product and content recommendations, where they prefer to do research, and plans for future in-store shopping.

Read the full U.S./U.K. Consumer Survey Report from Lucidworks.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Continue Reading

Tech

Breakroom teams up with High Fidelity to bring 3D audio to online meetings

Published

on

Breakroom teams up with High Fidelity to bring 3D audio to online meetings

Elevate your enterprise data technology and strategy at Transform 2021.


Social meeting space Breakroom has integrated High Fidelity‘s 3D audio into its 3D virtual world for social and business events.

The deal is a convergence of some virtual world pioneers who have made their mark on the development of virtual life. Philip Rosedale is the CEO of High Fidelity, and he also launched Second Life in 2003. And Sine Wave Entertainment, the creator of Breakroom, got its start as a content brand in Second Life before it spun out to create its own virtual meeting spaces for real world events.

Adam Frisby, chief product officer and cofounder of Sine Wave, said in our interview conducted inside Breakroom that the High Fidelity spatial audio will help Breakroom create a triple-A quality experience in a virtual world.

“The real benefit of having 3D audio in a virtual world like this is you can have lots of conversations going on simultaneously,” said Frisby. “3D audio is the only way to replicate the real-world experience in an online environment. You can have a 150-person conference and end up with 10 groups of people talking at the same time. That has helped us with engagement.”

Above: Breakroom lets an event have dozens of simultaneous conversations where people don’t talk over each other, thanks to High Fidelity.

Image Credit: Sine Wave

Most online events get engagement times of 20 or 30 minutes. But Breakroom’s average events, ranging from 600 to 1,000 attendees, have engagement times of an hour and 40 minutes, Frisby said.

Sine Wave’s Breakroom draws heavily on lessons learned in Second Life to create a frictionless, mass market, user-friendly virtual world.

“You can hear everything better with High Fidelity,” said Rosedale, in our interview in Breakroom. “Breakroom combines low-latency server-side video and spatial audio in a way that lets you hold an event like it’s in the real world.”

High Fidelity is a real-time communications company. Its mission is to build technologies that power more human experiences in today’s digital world. The company’s patented spatial audio technology, originally developed for its VR software platform, adds immersive, high-quality voice chat to any application — for groups of any size. You can really tell how close you are to someone in a High Fidelity space when they talk to you, as voices become fainter the farther away they are.

“We are super excited about this general direction and we wound up building the audio subsystem and extracting that first,” Rosedale said. “It works well where there is no possibility of face-to-face meetings.”

breakroom 3

Above: I could hear Philip Rosedale’s voice clearly in this conversation in Breakroom.

Image Credit: Sine Wave

Spatial audio in a 3D virtual world helps encourage spontaneous conversations into a fun, productive setting, in a way that flatscreen video calls and webinars simply can’t match, Frisby said. It’s easy to tell in Breakroom who is speaking to you, and from what direction.

It took me a little while to figure out how to unmute my voice. Rosedale was jumping up and down while we were talking.

“It’s all remote rendered. And that means that we can bring people in on a variety of platforms,” Frisby said. “No matter what your target hardware is, you can actually get in here and still get good high fidelity. It’s a good quality 3D rendering experience here regardless of what device you’re on.”

I asked Rosedale if he could hear me chewing lettuce, which sounded very loud on my headsets. But he said no. It definitely helps if you have good headsets with 3D audio.

Breakroom is being used by organizations like Stanford University, the United Nations, and The Economist. Breakroom runs on any device with a Chrome browser, offering good 3D graphics and audio quality, with no installation required.

Frisby said that Breakroom is also a way for companies to enable remote workers to gather and meet each other in more relaxed environments as if it were an intermediate space between online-only environments and going back to work in offices.

breakroom 4

Above: Breakroom and High Fidelity are enabling conferences with spatial audio.

Image Credit: Sine Wave

Its full suite of communication tools includes voice chat, instant messenger, and in-world email. It has video conferencing, media sharing, and desktop sharing tools. It has a diverse range of fully customizable avatars and scenes. You can get around just by pointing and clicking on the environment.

It also has event management tools to facilitate conversation and agenda flow, branded interactive exhibition stands, and private meeting rooms, available for rent by sponsors. It has environments including dance clubs, beach and mountain retreats, casual games, quiz shows, and live music/comedy shows. It has an integrated shop where brands can upload and sell their content to customers for real cash.

It gives you the ability to seamlessly license and import any item from the Unity Asset Store (Sine Wave is a verified partner of Unity). The iOS and Android version of Breakroom is in closed beta and Breakroom for consoles and the Oculus Quest 2 coming soon. It has LinkedIn and Eventbrite integration, including ticket sales. It also has a self-serve portal for customers to quickly customize and configure their organizations’ Breakroom, as well as sub-licensing agreements which enable Breakroom customers to host and monetize events and experiences to their own customer base.

Frisby said it has been a technical challenge so that people don’t get kicked out of the room, but his team has managed to refine the technology during the pandemic. He thinks conferences are great use cases for the technology because so many people come together simultaneously and push the tech to the limit.

As for High Fidelity, Rosedale believes that the education market will come around, and the whole world will eventually move to better spatial experiences.

GamesBeat

GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it.

How will you do that? Membership includes access to:

  • Newsletters, such as DeanBeat
  • The wonderful, educational, and fun speakers at our events
  • Networking opportunities
  • Special members-only interviews, chats, and “open office” events with GamesBeat staff
  • Chatting with community members, GamesBeat staff, and other guests in our Discord
  • And maybe even a fun prize or two
  • Introductions to like-minded parties

Become a member

Continue Reading

Tech

Moderne helps companies automate their code migration and fixes

Published

on

https://www.youtube.com/watch?v=uR9EPALJKjI&feature=emb_title

Elevate your enterprise data technology and strategy at Transform 2021.


While every company may well be a software company these days, the software development sphere has evolved greatly over the past decade to get to this stage, with developer operations (DevOps), agile, and cloud-native considerations at the forefront.

Moreover, with APIs and open source software now serving as critical components of most modern software stacks, tracking code changes and vulnerabilities introduced by external developers can be a major challenge. This is something fledgling startup Moderne is setting out to solve with a platform that promises to automatically “fix, upgrade, and secure code” in minutes, including offering support for framework or API migrations and applying CVE (common vulnerabilities and exposures) patches.

The Seattle-based company, which will remain in private beta for the foreseeable future, today announced a $4.7 million seed round of funding to bring its SaaS product to market. The investment was led by True Ventures, with participation from a slew of angel and VC backers, including GitHub CTO Jason Warner; Datadog cofounder and CEO Olivier Pomel; Coverity cofounder Andy Chou; Mango Capital; and Overtime.vc.

Version control

If a third-party API provider or open source framework is updated, with the older version no longer actively supported, companies need to ensure their software remains secure and compliant. “It requires revving dependencies [updating version numbers in configuration files] and changing all the call sites for the APIs that have changed — it’s tedious, repetitive, but hasn’t been automated,” Moderne CEO and cofounder Jonathan Schneider told VentureBeat.

Moderne is built on top of OpenRewrite, an open source automated code refactoring tool for Java that Schneider developed at Netflix several years ago. While developers can already use the built-in refactoring and semantic search features included in integrated development environments (IDEs), if they need to perform a migration or apply a CVE patch, they have to follow multiple manual steps. Moreover, they can only work on a single repository at a time.

“So if an organization has hundreds of microservices — which is not uncommon for even very small organizations, and larger ones have thousands — each repository needs to be loaded into [the] IDE and operated one by one,” Schneider said. “A developer can spend weeks or months doing this across the codebase.”

OpenRewrite, on the other hand, provides “building blocks” — individual search and refactoring operations — that can be composed into an automated sequence called recipes anyone can use. Moderne’s offering complements OpenRewrite and allows companies to apply these recipes in bulk to their codebases.

Above: Moderne screenshot

Enterprises, specifically, can accumulate vast amounts of code. One of Moderne’s early product design partners is a “large financial institution” that incorporates some 250 million lines of Java code — or “one-eighth of all GitHub Java code,” Schneider noted, adding that this is actually on the “low to medium” side for what a typical enterprise might have.

“Some of this code is obsolete (e.g. accrued through historical acquisitions), some is under rapid development (e.g. mobile apps) — but the majority represents super valuable business assets, such as ATM software and branch management software,” Schneider said.

And let’s say a company decides to redeploy developers internally to work on rapid development projects — it still needs to consider the core software components that underpin the business and need to be maintained. Moderne automates the code migration and CVE patching process, freeing developers to work on other mission-critical projects.

When Moderne eventually goes to market, it will adopt an open core business model, with a free plan for the open source community and individual users, while the premium SaaS plan will support larger codebases and teams with additional features for collaboration.

The company said it will use its fresh cash injection to grow a “vibrant open source community for OpenRewrite,” expand its internal engineering team, and bolster its SaaS product ahead of launch.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

Continue Reading

Trending