Undoubtedly point-clouds can be beaten, and adding a single wide-FOV render is an efficient way to fill space “offscreen.” I’m just cautious about explaining this because it invites the most baffling rejections. At one point I tried explaining the separation of figuring out where stuff is, versus showing that location to you, using beads floating in a fluid simulation. Tracking the liquid and how things move within it is obviously full of computer-melting complexity. Rendering a dot, isn’t. And this brain case acted like I’d described simulating the entire ocean for free. As if the goal was plucking all future positions out of thin air, and not, y’know, remembering where it is, now.
The lowest-bullshit way is probably frustum slicing. Picture the camera surrounded by transparent spheres. Anything between two layers gets rendered onto the further one. This is more-or-less how “deep view video” works. (Worked?) Depth information can be used per-layer to create lumpen meshes or do parallax mapping. Whichever is cheaper at obscene framerates. Rendering with alpha is dirt cheap because it’s all sorted.
Point clouds (or even straight-up original geometry) might be better at nose-length distances. Separating moving parts is almost mandatory for anything attached to your hands. Using a wide-angle point render instead of doing a cube map is one of several hacks available since Fisheye Quake, and a great approach if you expect to replace things before the user can turn around.
But I do have to push back on active fake focus. Lightfields are better. Especially if we’re distilling the scene to be renderable in a hot millisecond, there’s no reason to motorize the optics and try guessing where your pupils are headed. Passive systems can provide genuine focal depth.
My suggestions are mostly about maintaining quality while limiting bandwidth requirements to the headset, wouldn’t a lightfield require a fair bit of bandwidth to keep updated?
(Another idea is to annotate moving objects with predicated trajectories)
Less than you might think, considering the small range perspectives involved. Rendering to a stack of layers or a grid of offsets technically counts. It is more information than simply transmitting a flat frame… but update rate isn’t do-or-die, if the headset itself handles perspective.
Optimizing for bandwidth would probably look more like depth-peeled layers with very approximate depth values. Maybe rendering objects independently to lumpy reliefs. The illusion only has to work for a fraction of a second, from about where you’re standing.
Alpha-blending is easy because, again, it is a set of sorted layers. The only real geometry is some crinkly concentric spheres. I wouldn’t necessarily hand-wave Silent Hill 2 levels of subtlety, with one static moment, but even uniform fog would be sliced-up along with everything else.
Reflections are handled as cutouts with stuff behind them. That part is a natural consequence of their focus on lightfield photography, but it could be faked somewhat directly by rendering. Or you could transmit environment maps and blend between those. Just remember the idea is to be orders of magnitude more efficient than rendering everything normally.
Undoubtedly point-clouds can be beaten, and adding a single wide-FOV render is an efficient way to fill space “offscreen.” I’m just cautious about explaining this because it invites the most baffling rejections. At one point I tried explaining the separation of figuring out where stuff is, versus showing that location to you, using beads floating in a fluid simulation. Tracking the liquid and how things move within it is obviously full of computer-melting complexity. Rendering a dot, isn’t. And this brain case acted like I’d described simulating the entire ocean for free. As if the goal was plucking all future positions out of thin air, and not, y’know, remembering where it is, now.
The lowest-bullshit way is probably frustum slicing. Picture the camera surrounded by transparent spheres. Anything between two layers gets rendered onto the further one. This is more-or-less how “deep view video” works. (Worked?) Depth information can be used per-layer to create lumpen meshes or do parallax mapping. Whichever is cheaper at obscene framerates. Rendering with alpha is dirt cheap because it’s all sorted.
Point clouds (or even straight-up original geometry) might be better at nose-length distances. Separating moving parts is almost mandatory for anything attached to your hands. Using a wide-angle point render instead of doing a cube map is one of several hacks available since Fisheye Quake, and a great approach if you expect to replace things before the user can turn around.
But I do have to push back on active fake focus. Lightfields are better. Especially if we’re distilling the scene to be renderable in a hot millisecond, there’s no reason to motorize the optics and try guessing where your pupils are headed. Passive systems can provide genuine focal depth.
That last paper is from ten years ago.
My suggestions are mostly about maintaining quality while limiting bandwidth requirements to the headset, wouldn’t a lightfield require a fair bit of bandwidth to keep updated?
(Another idea is to annotate moving objects with predicated trajectories)
Less than you might think, considering the small range perspectives involved. Rendering to a stack of layers or a grid of offsets technically counts. It is more information than simply transmitting a flat frame… but update rate isn’t do-or-die, if the headset itself handles perspective.
Optimizing for bandwidth would probably look more like depth-peeled layers with very approximate depth values. Maybe rendering objects independently to lumpy reliefs. The illusion only has to work for a fraction of a second, from about where you’re standing.
How does it handle stuff like fog effects, by the way? Can it be made to work (efficiently) with reflections?
The “deep view” link has video - and interactive online demos.
Alpha-blending is easy because, again, it is a set of sorted layers. The only real geometry is some crinkly concentric spheres. I wouldn’t necessarily hand-wave Silent Hill 2 levels of subtlety, with one static moment, but even uniform fog would be sliced-up along with everything else.
Reflections are handled as cutouts with stuff behind them. That part is a natural consequence of their focus on lightfield photography, but it could be faked somewhat directly by rendering. Or you could transmit environment maps and blend between those. Just remember the idea is to be orders of magnitude more efficient than rendering everything normally.
Admittedly you can kinda see the gaps if you go looking.