Invented by DING; Ding, CHERNYAK; Roman, LIU; Shan, Tencent America LLC

Video is everywhere. We watch it, share it, use it to teach, and even ask computers to analyze it for us. But what happens when video is not just for our eyes, but for smart machines to understand and use? A new patent application lays out a way to make video work better for computers. Let’s break down exactly what this patent covers, why it matters, and how it works, all in the simplest terms.

Background and Market Context

The world is changing fast, and so is the way we use video. Once, video was mainly for people to watch. Now, it’s just as likely to be used by machines. Think of self-driving cars, security systems, video search engines, or smart robots. These machines need to “see” what’s happening in videos, not just show it to humans. Machine vision is a big deal, and the better it gets, the more things computers can do for us.

But there’s a problem. Video files are huge, and sending them across the internet or storing them takes up lots of space. To help with this, we use video compression, which makes files smaller by removing extra information. Standards like HEVC, VVC, AVC, and AV1 do a great job at shrinking video so people can watch it with less internet or less storage.

Here’s the catch: these standards are made for human eyes. When a computer tries to analyze a compressed video, it sometimes loses tiny details that matter a lot to a machine, even if people don’t notice. That’s because we care about smooth motion and nice colors, but a computer might care about hard edges, tiny shapes, or patterns that help it do its job.

Industries like smart cameras, robotics, autonomous vehicles, and video analytics need to give their machines the best possible video data. But they still need it to be small and fast to send, just like regular video. There’s a gap between what works for people and what works for machines. The market wants a way to send video that works for both.

This need is growing. Smart cities, AI security, remote monitoring, and medical imaging are just a few fields racing to close the gap. Companies that make video chips, software, and cloud services all want to serve both people and machines. If they can use one video stream for both, it saves time, money, and makes everything work smoother.

This is where the new patent application fits in. It promises a way to send video that’s good for machines, but also fits the way we already send and store video for people. It keeps everything fast and small, but adds the extra help machines need to do their jobs.

Scientific Rationale and Prior Art

To understand the heart of this invention, we need to know a bit about how video compression works and what’s been done before.

When you record a video, you get a giant pile of pictures, called frames. Each picture is made of lots of tiny dots called pixels. Most of these pixels are similar from one frame to the next, or even from one spot in a frame to another. Video compression works by finding these similarities and only sending the differences. That’s how standards like HEVC or AV1 save space.

There are two main tricks: intra prediction and inter prediction. Intra prediction looks for patterns inside one frame, using what’s nearby to guess what a pixel should be. Inter prediction looks at earlier frames to see if the same thing is in the same spot (like a car moving), and only sends what’s changed. These tricks work great for making videos small and still look good to people.

But for machines, small changes can matter a lot. If a security camera loses a bit of sharpness, a person won’t care. But a machine might miss a face or a license plate. Traditional video standards just don’t care about this, and that’s a problem for machine tasks.

Some people have tried to fix this by adding “side data” to video streams. This could be things like extra information about depth, objects, or even raw data. Sometimes, this is sent as a separate file, or as “supplemental enhancement information” (SEI) attached to the video. The problem is, most video players and networks ignore this extra data because it’s not part of the official standard, or it’s not easy to keep it in sync with the main video.

Another approach has been to tweak the video compression itself by turning off some features that hurt machine analysis. But this often means making a special version of the video just for machines, and another for people. That uses more space, more network, and makes things complicated.

A few standards groups have started looking at “video coding for machines” (VCM). The idea is to make a video stream that works for both people and computers by building in a way to send extra information machines need. But there’s no simple, standard way to do this yet.

What this patent does is clever: it wraps up regular compressed video with an extra layer of “restoration data.” This data helps machines rebuild the tiny details lost during normal compression. It does this inside the same file or stream, in a way that’s easy to send, store, and decode. The new method uses the same blocks and pieces (called NAL units) that video encoders and decoders already know, so it fits right in.

By using a special way to mix (or “interleave”) regular video blocks with machine-helpful blocks, the system lets computers pull out what they need, while people (and regular video players) just use the regular parts. This makes the system backward compatible and easy to use, unlike older methods.

So, the invention is not making a whole new video format. It’s making a smart way to add a “helper channel” for machines inside the existing video stream, using the same wrapping and signals that current software and hardware already expect.

Invention Description and Key Innovations

Now, let’s get to the heart of the invention. What exactly does this patent cover, and how does it change the game?

The patent describes a method for both video encoding (making the video file or stream) and video decoding (reading or playing it). The core idea is simple: whenever you send a video, you also send extra “restoration data” right alongside the regular compressed video. Both types of information are packed together, but clearly marked, so the decoder knows which is which.

Here’s how it works, step by step:

First, the video is compressed the normal way, using any of the standard formats: HEVC, VVC, AV1, and so on. This makes the main “first portion” of the data, which is just like any other video file.

Next, the system figures out what extra information is needed for machines to do their jobs well. This could be things like how to sharpen edges, restore lost detail, fix color shifts, or put back any other important information that regular compression removed. This becomes the “second portion,” called restoration data.

Both the regular video and the restoration data are wrapped up in blocks called NAL units. Each NAL unit is marked with its type. Some are coded video data (CVD), some are restoration data (RSD), and some are parameter sets (VPS) that describe the rules.

What’s smart here is that these different NAL units are “interleaved”—mixed together in a set order. This keeps the video stream easy to send and store, and makes it easy for the decoder to pull out whichever type it needs.

On the receiving end, the decoder reads the stream. It separates out the regular video data and decodes it just like any other video, making pictures that people can watch. At the same time, it looks for the restoration data. If a machine needs to do something special (like analyze the video), it uses the restoration data to rebuild the tiny details and make a second set of pictures that are better for machine tasks.

What kind of restoration data can be sent? The patent lists several types:

– Spatial resampling: This helps fix details that got lost when the video was shrunk or resized.
– Retargeting: This helps if the video needs to be stretched or shifted for a machine’s needs.
– Temporal restoration: This helps fix small errors that build up over many frames.
– Bit depth shift: This restores extra color information that was thrown away to save space.

The restoration data can be sent for the whole video (sequence level), for just one picture (picture level), or even for the end of the stream. The system can use the same signaling as regular video (like SEI messages or parameter sets), so it’s easy to add to current workflows.

Let’s make this concrete. Imagine a smart camera in a parking lot. It records video and sends it to a server. The video is compressed using the usual standard, so people can watch it. But it also sends restoration data so the server can run license plate recognition, even if the video is blurry or low-res. The server grabs the restoration data and rebuilds the sharp edges needed for the computer to read the plates, without needing a whole second video stream.

Another key point is that everything is backward compatible. Regular video players just see the usual video and ignore the restoration data. Only systems set up for machine analysis need to use the extra bits.

The patent also explains how to organize the data in the file or stream. There can be containers for the whole video, for each picture, and at the end. Each NAL unit can have its own header, so it’s easy to parse and process. The system can work across all the main video standards, so it’s very flexible.

This setup makes the system easy to add to existing products. You don’t need to change the whole video pipeline—just add the new NAL units, and update the encoder and decoder to handle them. That makes it attractive for chip makers, camera makers, and cloud services.

In summary, the core innovations are:

– Mixing regular video and machine-friendly restoration data in one stream, using clear, standard blocks.
– Allowing both human and machine uses from a single video stream, saving space and time.
– Making everything backward compatible, so old systems still work.
– Using standard signaling methods, so the new system fits right in with current video workflows.
– Supporting many types of restoration data, so machines can get exactly what they need for their tasks.

This is a big step forward for both the video and AI industries. It bridges the gap between human and machine video needs in a simple, practical way.

Conclusion

This patent application is a smart answer to a real need. As machines watch more and more video, we need to give them what they need, without breaking what already works for people. By mixing regular compressed video with special restoration data, all inside the same stream, the new method makes video better for both humans and machines. It’s easy to add to existing systems, works with all the big video standards, and opens the door to smarter, faster, and more useful video analysis in every field.

For companies and inventors, this means new chances to build products that work for everyone. For engineers, it means less complexity and more flexibility. For the world, it means smarter cities, safer roads, better security, and more powerful AI—all running on the video systems we already have.

If you work in video, AI, or smart devices, this is a technology to watch. It’s simple, but it changes everything.

Click here https://ppubs.uspto.gov/pubwebapp/ and search 20250365445.