I think we will need to look at streaming scenes from a more powerful machine rather than creating them on board. I don't believe we are ready in any way to process that much information on such a small system at this time (HoloLens also gives out quite a lot of heat from its on board processor). The old pi had trouble streaming higher bitrates, though then we still have the same ethernet so I am not sure how much is possible.
I'm waiting for the day when you can get a raspberry pi style machine that will do full 1080p and will transcode anything I throw at it in real time or faster. It can't be that far off in the future can it? A couple of years maybe?
The Pi's biggest strength is it's GPU. Using some of the OpenMAX API's for video or 3D encoding would probably be a better use than a CPU-bound task like web serving:
The Pi is an interesting system in that it has a truly impressive GPU (according to the manufacturers "capable of BluRay quality playback"), especially compared to its bottom-shelf CPU. Which means you need everything encoded in h.264 or some other format the GPU can do, but that's achievable.
If bandwidth ain't much of an issue, can just dump frames to a separate device for encoding, aside, Can't imagine someone using a pi 5 just for camera usage (aside from projects needing cameras)
"In future we’ll have to do something, but for Pi 5 we feel the hardware encode is a mm^2 too far."
Sounds reasonable, given a fast cpu & less-than optimal hw-accelerated encoding options. As for that "something", maybe:
1) Drop hw-accelerated encoding and decoding entirely, and use the freed up silicon for much beefier cpus (like ones including -bigger- vector units, more cores etc. Cortex X?). That would be useful for any cpu heavy applications.
2) Include hw encoder for a common (1), relatively 'heavy' codec. And hw decoder for same + maybe others.
3) Only include decoder(s?), like they seem to have done for RPi5.
4) Include some kind of flexible compute fabric that can be configured to do the heavy lifting for popular video codecs.
Combined with:
5) Move to newer silicon node to obtain higher efficiency or transistor budget.
Whatever route a future RPi would go, imho hw-accelerated decoding is much more useful than encoding.
Only catch with the Pi might be decode/encode performance if you plan to do anything with the video on that side of things, you'd have to have video software that can take advantage of hardware acceleration on the Pi.
Personally I'd rather folks shoot high, aim for great, & if someone has special needs, they can transpile out their suboptimal version as they like.
Software rendering at 1080p should work just fine for everyone. Even if a pi cant realtime decode a high bit rate av1, it probably can re-encode a movie overnight, maybe in a day. That is a chore, yes, but in return we get to be transfering the best quality available, we're getting much better results.
I dont think there's any reason to wait. The good stuff should start getting seeded. We need media servers to help fill in the gap, do overnight non-real-time transcoding.
> You need way more processing power than an RPi to do this at 30fps, and C/C++, not Python. (There are literally dozens of projects for the RPi and TFlow online but they all get like 0.1 fps or less by using Flask and browser reload of a PNG... great for POC but not for real video)
I think 8 streams at 15 fps (aka 120 fps total) is possible with a ($35) Raspberry Pi 4 + ($75) Coral USB Accelerator. I say "I think" because I haven't tested on this exact setup yet. My Macbook Pro and Intel NUC are a lot more pleasant to experiment on (much faster compilation times). A few notes:
* I'm currently just using the coral.ai prebuilt 300x300 MobileNet SSD v2 models. I haven't done much testing but can see it has notable false negatives and positives. It'd be wonderful to put together some shared training data [1] to use for transfer learning. I think then results could be much better. Anyone interested in starting something? I'd be happy to contribute!
* iirc, I got the Coral USB Accelerator to do about 180 fps with this model. [edit: but don't trust my memory—it could have been as low as 100 fps.] It's easy enough to run the detection at a lower frame rate than the input as well—do the H.264 decoding on every frame but only do inference at fixed pts intervals.
* You can also attach multiple Coral USB Accelerators to one system and make use of all of them.
* Decoding the 8 streams is likely possible on the Pi 4 depending on your resolution. I haven't messed with this yet, but I think it might even be possible in software, and the Pi has hardware H.264 decoding that I haven't tried to use yet.
* I use my cameras' 704x480 "sub" streams for motion detection and downsample that full image to the model's expected 300x300 input. Apparently some people do things like multiple inference against tiles of the image or running a second round of inference against a zoomed-in object detection region to improve confidence. That obviously increases the demand on both the CPU and TPU.
* The Orange Pi AI Stick Lite is crazy cheap ($20) and supposedly comparable to the Coral USB Accelerator in speed. At that price if it works buying one per camera doesn't sound too crazy. But I'm not sure if drivers/toolchain support are any good. I have a PLAI Plug (basically the same thing but sold by the manufacturer). The PyTorch-based image classification on a prebuilt model works fine. I don't have the software to build models or do object detection so it's basically useless right now. They want to charge an unknown price for the missing software, but I think Orange Pi's rebrand might include it with the device?
I'm not at all familiar with video encoding/decoding processes or what they entail. Can you explain how hardware support for it would drastically change power requirements?
Video engineer here. Many seemingly network restricted tasks could be unlocked with faster CPUS doing advanced compression and decompression.
1. Video Calls
In video calls, encoding and decoding is actually a significant cost of video calls, not just networking. Right now the peak is Zoom's 30 video streams onscreen, but with 1000x CPUS you can have 100s of high quality streams with advanced face detection and superscaling[1]. Advanced computer vision models could analyze each face creating a face mesh of vectors, then send those vector changes across the wire instead of a video frame. The receiving computers could then reconstruct the face for each frame. This could completely turn video calling into a CPU restricted task.
2. Incredible Realistic and Vast Virtual Worlds
Imagine the most advanced movie realistic CGI being generated for each frame. Something like the new Lion King or Avatar like worlds being created before you through your VR headset. With extremely advanced eye tracking and graphics, VR would hit that next level of realism. AR and VR use cases could explode with incredibly light headsets.
To be imaginative, you could have everything from huge concerts to regular meetings take play in the real world, but be scanned and sent to VR participants in real time. The entire space including the room and whiteboard or live audience could be rendered in realtime for all VR participants.
Currently they're bandwidth bound, but I'm dissolving that. In the next year I imagine that it'll move to being more memory bound. Then that will turn into being bound by I/O, but that can be solved with memory and more clever caching schemes. CPU should never be an issue unless I move the video encoding back in house, but I don't see that happening soon, and I would separate that into a its own server farm.
I think this overstates it a bit. I know what hardware video encoding is, but I still think I'm part of the target audience for the Pi.
I have several Pi's around the house that I use for various projects, and none of them have ever involved video encoding. The one thing I can think of where it might be beneficial is my plex server, but that's an x86 machine right now, and it'll probably stay that way.
There is a video of the video pipeline framework of NVidia that might give a hint of what is possible. Nano has h264 encoders and decoders that can feed 8 simultaneous 720px streams.
I presume that this is through ethernet or flash, but might not be applicable to CSI, since CSI port/lane count seems to lowish. Never tried it though.
reply