Hi there everyone 🙂
I figured I would post this small progress report now, even if I don’t have much more to show yet. The past few weeks were not very prone to intensive coding on my part, but I still managed to pull some stuff through.
Improved movement handling
As Keriel may have noticed when I made him play the bare-bones NeREIDS networking test from last time (but he was kind enough not to comment about it), there was a significant stutter of the avatar movement at that time : The rendering engine indeed displayed the avatar position, body and head orientation as computed during the most recent state update. State updates being performed at a much lower frequency (every 32ms) than current display framerate, this was a quite noticeable discomfort to the eye and mind. Now, I’ve put some more work into the engine, and implemented the typical solution to this problem : the rendering part now uses a linear interpolation between the last two known states of a given entity, factored by the ratio between time to last tick and the fixed time between game ticks. This results in a very smooth experience now, even if it adds up some more latency to the input (it is only visual latency, and on a scale unnoticeable by human perception).
Researches on rendering
I did some work on the rendering engine. Even though I’ve not improved the desperately flat and blocky look of past NeREIDS network tests at this point, I’ve begun to switch my forward rendering approach towards a deferred rendering approach.
“Forward rendering” refers to the submission of triangles to your graphic card in a very straightforward style, asking for them to be drawn directly onto the backbuffer (the backbuffer being the resource that holds every color for every pixel, to be displayed on your screen once all the rendering is done for a given frame). This is the canonical way for games to render stuff since graphic cards have been in use.
“Deferred rendering”, on the other hand, takes another approach : Instead of defining, for each pixel covered by a given triangle, a color based on a texture and light contributions, and drawing that pixel directly to the backbuffer, you render your stuff to another temporary buffer, asking your shaders to output geometry values instead of color. Those values may typically be : the pixel “depth”, from which one can retrieve where the surface it represents lies relative to the viewpoint ; the pixel “normal”, i.e. the orientation of said surface ; the material it is made of… well, stuff like that. What you get is then called a “geometry” buffer, and only when your geometry buffer is drawn, would you use its data to build the final image (this time on the backbuffer, to be displayed on the screen). Hence the name “deferred”.
Each technique has its own set of strength and drawbacks. Forward rendering is a very direct way to code and is easy to setup, and allows you to use a whole lot of graphic tricks for free (hardware edge anti-aliasing, transparency effects, and so on). Deferred rendering on the other hand requires more memory and more preparatory computations, it does not provide the same ease of use for transparent materials and anti-aliasing, and is just harder to get right (at least for me, since a few math is required to reinterpret the data contained in the geometry buffer before doing anything cool with it).
So, why would one bother to use a more cumbersome and seemingly less useful technique in the first place ?
Well, deferred rendering has its advantages, when it comes to lighting. When computing dynamic light effects using a forward renderer, you indeed need to either define a different shader with lighting algorithm for each combination of lights that an object may receive (e.g. whether it is lit by sunlight, or a torch, or two torches, or moonlight and a torch, and so on…), which is not a very scalable way of coding, or to use a multiple passes approach (one pass for each light and “adding” the amount of light received by an object from each light source) but then the scene geometry must be redrawn on each pass, quite inefficiently going down the entire rendering pipeline each time, sending same triangle data over and over, transforming that data in the vertex shader, possibly sampling multiple textures in the pixel shader each time, which is not a very scalable technique in itself, when your number of dynamic lights per object gets high.
Dynamic lighting using deferred rendering would use multiple passes also, but it does benefit from the fact that scene geometry is already computed and available in the geometry buffer, so you don’t have to pay the cost of shading a given piece of geometry more than once. Moreover, you only perform your lighting computations on the final geometry buffer, meaning on the pixels that will actually be visible, which can also reduce costs as compared to forward rendering which would pay for lighting computations on pieces of geometry that may latter be covered by another solid object which happens to stand closer to the viewpoint.
I must admit, lighting scares me quite a bit. It scares me because it is a complex matter in 3D rendering, and needs to rely on a lot of tricks, that themselves scare me. You can read more of my thoughts on the matter in that other post I’ve written today. Maybe I should try to get lighting right from within a forward rendering architecture, before trying a more complex deferred architecture ? Well, as a matter of fact, I believe the deferred approach is actually cleaner, and will help decouple the topic of lighting from the other parts of the engine. And that’s about the only reason why I chose that path : Deferred rendering really shines indeed, performance-wise, when the number of dynamic lights affecting a given object gets quite high (which is a case often encountered in recent game titles), but I’m not quite sure NeREIDS would use so many lights in the first place. It’s just that I find the whole concept more elegant. So, deferred rendering it is.
But deferred rendering has the important drawback that you do not get your handy hardware accelerated edge-anti-aliasing features out of the box, and also that transparent objects require very special care, as a given texel on the geometry buffer represent just *one* layer of an opaque surface, and are not suited to layer several semi transparent surfaces on top of each other before we actually compute the pixel color. In fact, transparency using a deferred renderer is such a hairy topic, that many deferred engines finally use forward rendering techniques on their transparent materials after the opaque scene has been drawn using the deferred algorithm. Which… huh… which seems so wrong, viewed from the shiny philosophical realm of architectural ideas.
But I’ll have to deal with it and go down from my platonic realm of ideas to the concrete, hardcore hardware reality of the real translucency of transparent materials :p. I don’t know yet what I can come up with. I also need to take care of anti-aliasing of some kind, and I’m currently reviewing techniques such as FXAA, which operate in a per-pixel post-process, meaning after an image has been rendered, meaning it is compatible natively with deferred renderers.
I also have in mind to give the final NeREIDS some kind of a cartoonish look, which means a cell shading system and some black outlines. I’m thinking the cell-shading step-effect should be applied to light contribution from normals instead of final color, resulting in large solid-colored fills while still allowing temporal variations of light intensity (and even possibly HDR). For the outline, I’m currently testing some filters such as the Sobel filter for edge detection, in order to draw those edges black in a post-process. I’ve come up with an algorithm that works on the depth-buffer, but does not quite satisfy me yet (I guess most of the UDK users claiming their Sobel solution to be “Borderlands-style” is not quite correct, as the Borderlands game series really achieve impressive good-looking result, when a Sobel filter would have failed, ihmo).
Since using a deferred renderer would give me, keys in hand, an already computed normals buffer, maybe I could also use them to improve my edge detection filter. Really, I don’t quite know yet. My dream would be to come up with an implementation that unifies the FXAA with the black outlines, as it is likely that most of the pixels that are in need of edge-antialiasing would also need a black outline and vice-versa. I may have to learn some more maths before being able to do that, though.
Researches on parallel computing
I’ve also done some work around the terrain & environment manager. More specifically, around parallelisation of the computations it needs to perform. The system I have in mind for NeREIDS will typically put a bigger strain on the CPU than other terrain systems I know of. This may become an issue, so I’ve redesigned the terrain manager architecture so that its task can be divided among any number of cores while requiring a minimal amount of synchronization. I hope to tell you about NeREIDS terrain soon, but I’d rather have something to show for it first.
As you may know, a few gigahertz clock speed is what we have in our home’s computers for almost ten years now, not doubling every 18 months any more, as was the case during previous (glorious?) years. Moore’s law relative to the exponential increase of the number of transistors still holds more or less, but comes with increasing the number of processor cores on your chip. What this means for we poor programmers, is that we can’t just sit on our arses and assume that a sequential algorithm we’ve coded will be twice as fast when running on twice as fast hardware, 18 months from now. A sequential algorithm in 18 months is likely to be run on shiny new processors clocked at those same few gigahertz, using only one of their cores, meaning no faster. To get full benefits from the power of modern and future processors, we need to code algorithms designed to be run in parallel, so that they use all available cores, instead of just one core while all the others sit idle. And designing parallel algorithms not only is a different discipline that programming sequential ones, it is also a harder problem to solve. Or maybe, most of us are not trained enough for this, I dunno. The fact is, we now need to actually think before being able to take advantage of our current (and likely future) hardware. Imagine that.
I’ve thus experimented with different parallelization techniques on a few different platforms to get a feel at what works bests, and what performance gains to expect from “best”. I’ve had to learn a lot about task parallelism in a short time, sometimes banging my head over new ways of doing things and new pitfalls. As I hope to develop in yet another post [link to come], I’m standing at the crossroads with C++11, and in some corners you can still pretty much smell wet paint. So each time something is not working as you expected, you may begin to doubt your compiler first and find the true problem later, which does not improve productivity by much…
But in the end, I’m quite satisfied with this work around parallel-computing, and I’m glad I have some new techniques on my belt now. I’ll surely dig into the matter some more in the future, but for now I need to really code some of the core features of the terrain manager, and get some real feedback from a few first tests.
That’s all for today, folks 🙂