Hi there everyone !
Remember, two weeks ago, I was pursuing the goal of displaying a few avatars that would react to networked inputs (over UDP). I repeatedly ran into solid walls when trying to implement network stuff, timing stuff, rendering stuff, math stuff, terrain stuff… and could not meet that 3 weeks old objective.
I also realized that my previous “progress reports” had taken a more technical feel than originally intended. I should try to keep the discussion about programming aspects in other entries. And keep things casual over here. So, to apologize for having skipped last week’s report, I’ll try to post some more this week about networking over the technical side.
But here, tonight, let’s just celebrate.
Tonight, my 3-weeks quest is fulfilled. With the help of Keriel again, hosting NeREIDS binaries from Paris, here’s some 600km networked and on-screen action :
Okay, THAT qualifies as ugly. Prettiness was not the primary objective, though.
I’m here on the foreground, see ? Keriel is waving his hands next to… to that cone sort of thing on the left side. Don’t pay attention to it, it’s a six month old remnant of a former test. It won’t make the cut into a final release, I promise 😛
In fact, would you please avoid paying attention to anything related to the looks of NeREIDS at this point. Better you close your eyes and listen. I’ll just tell you what goes behind the scenes.
Well, behind this scene. And yes, you have to take my word for it.
So, when time had come (last week) for a report on NeREIDS (for two weeks ago… see, this blog is just like network messages : it has latency), I was totally stuck. My previous hopes of having found a solution to the step-locking of time between client and server had blown over the next few tests. And for more than two weeks I was struggling with that very issue.
I wrote in a previous report that I had a hard time trying to find information about time synchronization as implemented by people before me. That’s because it seems that almost nobody even tries, or care. Even in that (otherwise great) article about networking with Unreal Engine, the authors admit that they accept whatever timestamp the client passes to them, just simulating and validating the client state at that time, and that they lived happily (with speedhacking) ever after.
Well, I wasn’t much into the state of mind to accept speed hacks as granted. Alright, chances are great that NeREIDS would never passionate anyone enough to be an interesting thing to hack in the first place, but that doesn’t mean I’m not cursed by a very painful kind of perfectionism.
Anyway, NeREIDS planned features and architecture do not suffer from having clients dictate the time it is. NeREIDS is not a game where you shoot a handful of zombies along with 3 friends. NeREIDS is (well, will be) a Multiplayer Role-playing game, also with lots of NPC intended to have some kind of strategic AI, with a dynamic environment, with persistent server capabilities and things going on behind the scenes even without any client bothering. NeREIDS server is the one that dictates the time it is. NeREIDS server is the one that needs its current game-frame to be wrapped up and computed once. And when it’s done, it’s done. No, you, Client, cannot ask for the world state 35 seconds in the past cause there’s too much going on already and I, Server, don’t have enough resources to recompute that state for you. Go to hell and synchronize or die. Maybe ‘I’ should not be called a ‘server’ per se… but that’s another matter.
Each of my attempts to recalibrate the client current frame count based on server feedback seemed to perform worse than the previous one. Then, after another test where my frames were kept in sync very nicely for two hours and finally exploded into an exponential desynchronization and timeout as soon as I dragged a window, I decided to indulge into my hatred for all things living, and spent the following days exterminating humanity playing Plague Inc : Evolved.
When I finally devised the time-synchronization implementation that would become the successful one, I ran into another brick-wall : Windows. More specifically, Windows and high-precision clocks. When I was laying out the structure of my core libraries, I spent a lot of time ensuring a state-of-the art implementation of a full-precision clock. I had followed numerous advices on forums on how to code that kind of stuff, as well as the guidelines from big M itself, up to the point that I would force the main thread to a single core so that that clock can be steady. I basically hate those machine-level concerns and I hate those follow-that-manual-to-set-that-flag-there-and-trust-me pieces of code a great deal more. And by hating it I also suck at it, so this was very painful for me, yet I did all this. I did all this because none of those things are ever simple and out of the box, and because precise time mattered to me a lot.
See, when you begin with Windows programming, once you need to know about time beyond your every day precise-to-the-second wall-clock, you quickly learn about GetTickCount(), which gives you a number of milliseconds since your computer started. Which is great. A few years later, armed with a little bit of experience, you know that GetTickCount() is only precise up to 16ms or so on today’s PC, so when you need further precision, you try to look for better alternatives. When you look for those alternatives, you’ll get hundreds of coders that can point you towards a more tricky implementation with QueryPerformanceCounter(). When you scratch beyond that new surface, you get to a dozen forum threads where people advise you to deal with QueryPerformanceCounter() in a specific way, otherwise it’s a mess. When you look for other solutions to that mess you’re left with a handful of wisdom holders who can give you very precious advice. And then you have it : a super steady and super precise clock. And you’re happy. And your implementation using QueryPerformanceCounter() is now something you’re proud of. Something that you programmed two years ago and that never failed you since. And when someday, you turn on NeREIDS debugging to “very verbose frame by frame hell of a log entry”, you realize that on a few occasions, your very hype sub-microsecond-precise-clock you had relied on just ate away entire seconds, stalled while GetTickCount() kept ticking.
You double check your code, you scratch the matter some more and you get there, finally, to the guy with the real insight, that can tell you that “QueryPerformanceCounter() sometimes jumps when there is heavy traffic on the bus”.
Thus I only reached the keep-everyone-in-sync objective in the middle of last week. I had yet to render a piece of terrain, render some avatars, have one react to input, process it, send all this to a server, process it again there, send back an answer as well as the position of other entities… Well, my 3D programming is a few years rusty, and all that messaging stuff is non trivial by a great amount, having to deal with queues of inputs, remembering past states and notify different kinds of objects all over the place. So this took some time (stupid as I was, I kinda hoped I could wrap this up on Saturday…).
So, what is really going on behind the scenes at this point ?
On the downside, the code for messaging the entity states (you know, the position of the avatars, their velocity, orientation, things like that… Also Keriel waving his hand… wait, are you saying he doesn’t have hands ? Okay, but his head moves at least ! See ?) is an awful mess. And it is nowhere near a robust, efficient, usable implementation. It’ll need to be worked on in the near future. At the moment, it’s mostly hard-coded and ugly, in order to quickly hack this little network test and provide you, dear reader, with a screenie. But hey, it is networked. And it kinda works.
Another annoying thing at the moment is some unknown bottleneck at the OS or process level. Two days ago, when testing with input, I had very serious framerate drops while holding down a key to move the avatar around, and client and server were stuttering and coughing upon each other’s messages. Simply adding a Sleep() of one millisecond on each process allowed for a smooth experience again. I guess the two processes were competing for the CPU too much, and keyboard input made one process gain too much of a upper hand in some kind of priority sorting at the OS level. I dunno, really.
After adding the Sleep(1) on each loop, I also had a constant very round and pretty 1000 fps reported on the client. This is kinda weird, but I guess the Sleep(n) semantics means “at most n milliseconds” and uses a 1ms resolution clock, so it slept a little less than 1ms (time for me to at least loop). Well, at least the Sleep() fixed the framerate issue two days ago, but it reappeared tonight. I don’t know yet what kind of weird resource race is at stake here… when logging very verbose stuff (slowing the rendering some more), framerate gets smooth again. That is the reason for the 333 fps on the screenshot. Now that I’m writing about it, I have the feeling that I was just getting too close to the 1ms barrier from the Sleep(), so my system was choking on an almost-100% CPU loop again… I should try to sleep for 2ms, maybe.
On the upside, I had some work done that wasn’t planned for, and some live tests for features and techniques I only hoped would be usable up till now :
– First is that new clock implementation, which is less precise but seems far more robust than the previous one.
– Second is per-bit serialization, which I thought I would do without for the time being, but I finally implemented it and it works pretty well. This allows me to send about two thirds of the data I would have sent otherwise, on average with my current messages.
– Third is the use of my implementation of fixed-point maths I mentioned earlier. This helps a great deal with compressing messages that need to be sent over the wire, and this will be a good basis for getting consistent results across all platforms, as well as providing a steady resolution anywhere in the game world. It seems to hold well, and this live demo exposed a few bugs that escaped the unit tests (yes, finding a bug *is* an upside from an engineering standpoint).
– Fourth is the proof of concept for what I call my “sub-frame-derivatives” being tested live and working : I use a millimeter resolution for position, meaning that at any given server frame, an avatar can be for example 3mm or 4mm away from a given axis, but never at 3.5mm. Yet this technique allows me to deal with speed epsilons far below 1mm per tick (each server frame or “tick” being 32ms apart).
– Fifth, and last but not least, is the terrain. I promised you a flat terrain, and here you have it. But the implementation of this flat piece of land has already set the foundations for the whole NeREIDS terrain & environment system, the design of which I refined when I got too depressed by the time-synchronization issues. I have very high hopes for it.
The past weeks race for a network has tempered some of my enthusiasm to post objectives for next week… We’ll just see what I can come up with 😉
Have a nice week !