Download OpenSimulator Virtual World Server Case Study (part 2) [PDF 402KB]
OpenSimulator (http://opensimulator.org) is an open source project building a general purpose virtual world simulator. As part of a larger effort to research scaling of virtual worlds, Intel Labs has been using OpenSimulator as a test case to understand the design requirements for the server portion of a multi-user virtual world system. The previous article explained OpenSimulator's architecture, and this article shows how the architecture affects the operation of various workloads.
One way to measure the limits of a simulator is stress testing. This gives some idea of the upper bounds of the simulator operation and possible weaknesses of its implementation.
Three stress tests were developed: scripts, physics, and avatars.
Scripting is stressed by dynamically creating scripts until the CPU execution is saturated. Our particular test creates groups of scripted cubes until the creation time of the cubes exceeds some limit. While it is possible to write compute-bound scripts, in general scripts written inside objects in a virtual world tend to be small and timer- or sensor-driven. In order to measure the limits of many small scripts, we built a test which created small, timer-driven scripted objects. The actual script operation is simple (change color and rotate the object).
Physics is stressed by dynamically creating physics objects which interact with other physical objects and noting how many physical objects can be interacting before the frame rate drops below an acceptable level. The dynamic creation of physical objects means the number of objects slowly increases until the frame rate limit is reached.
The avatar stress test introduces active, moving avatars into a simulator until the simulator frame rate drops below an acceptable level. The active avatars move to random waypoints: they pick a destination and begin walking to that destination until it is reached, then choose another random destination and begin walking there. The operation is performed by an avatar-driving routine that simulates a user performing the "walk forward" command until the avatar reaches the destination. This simulates both the execution and communication load on the simulator for many active avatars.
The scripted objects are created in groups of 400, and they are created until the time it takes to create the 400 objects exceeds some threshold. This is a rough measure of when the CPU is saturated.
As can be seen in Figure 1, as the number of scripted objects increases, the CPU becomes busier and busier until it reaches nearly 100% of available compute resources. This 100% of CPU is full utilization of all 16 hardware threads in a dual quad-core server (a dual Intel® Xeon® E5540 processor-based server), so it demonstrates the multi-threading of the script engine and its ability to utilize all available processing power to execute the multiple scripts. The frame rate did not change, suggesting the script engine is scheduled independently from the main simulator heartbeat loop.
A Galton box (http://wikipedia.org/wiki/Galton_box) is a regular arrangement of pins on a board where balls are dropped onto one location at the top of the board, the balls bounce down the pins and drop out the bottom of the board in a binomial distribution.
To test the limits of the physics engine, we built a 3D Galton box in our virtual world and dropped hundreds of balls into the top. This created many physical interactions, many individual physics actors, and a method of testing whether all the interactions are correct. The balls are scripted to disappear when they leave the Galton box so the physical objects on the ground do not affect the test results. For stress testing, we're not interested in the correctness, but in how many physical balls can be added to the Galton box before the physics engine becomes overloaded.
In OpenSimulator, the physics engine performance is measured with a scaled "frame rate". As described in the previous article, the physics engine is invoked every simulator heartbeat period and, for compatibility reasons, this is scaled up and reported as 46. Some testing has shown that when the physics frame rate drops below 30, the overall performance of the simulator degrades. Thus, the physics stress test metric is the number of physics enabled balls that can be interacting in the Galton box when the reported physics frame rate drops below 30.
As can be seen in Figure 2, as the number of physical objects in the Galton box increases, the frame rate decreases. New physical balls are created and enter the top of the Galton box. This goes on for some period of time and then new balls stop being added. As the balls leave the Galton box, they are scripted to delete themselves. The effect is for the number of physically interacting balls to grow and then decrease as the balls stop being added and the exiting balls disappear. Thus the shape of the "Objects" curve. Inverted from that curve is the simulator frame rate, which reduces as the number of physical objects increases and then recovers as they decrease.
The line at the bottom of Figure 2 is the percent of total CPU utilization (percent of 16 hardware threads). The physics engine is utilizing only one hardware thread, thus the <10% total CPU utilization. One way to interpret the graph is that physical objects are added until the one CPU thread is totally utilized, then the simulator frame rate reduces as more physical objects are added.
At around 400 physics objects, the physics frame rate drops below the threshold.
As mentioned above, the avatar stress test consists of adding wandering avatars until the simulator performance begins to falter. The avatar creation and driving routine try to mimic the operations of a human by creating and operating the avatar with the normal login and navigation mechanisms. So the "move forward" action is performed by making the same protocol request as if a user pressed the key which makes the avatar move forward.
The avatars are active by wandering around. This means that messages are being sent from the client to the server to operate each avatar, and also update information is being sent from the server to all the clients. When an avatar moves, a position update must be sent to all clients. This partially simulates the load and networking requirements of real users.
Figure 3 shows the simulator frame rate as avatars are added to a scene. As the number of avatars is increased, eventually the simulator frame rate begins to fall. Avatars are logged-in in groups of 25 to spread out the overhead of initialization. For OpenSimulator, about 350 wandering avatars start the degradation of the simulator's responsiveness running on a quad-core server, with the simulator frame rate dropping below 30 with about 450 wandering avatars.
The scripting and physics workloads offer a view of contrasting architectures. Two observations can be made about the script engine. First, it is multi-threaded and uses all the CPU threads to execute the scripts, and secondly, it is not tied to the running of the heartbeat thread. Thus, the script engine can execute many scripts without affecting the responsiveness of the simulator. This is shown in Figure 1, where as the number of scripts increases, the CPU utilization goes to 100% while the simulator frame rate does not change.
The physics workload, on the other hand, utilized only 7% of a dual quad-core (16 hardware threads) server, which suggests that the physics engine is single-threaded. As discussed in the previous article, the physics engine is invoked on the simulator's heartbeat loop (a central loop which invokes object updates and physics several times a second). This means that OpenSimulator physics implementation suffers from two design problems: 1) it does not take advantage of multiple available hardware threads, and 2) because execution of physics happens on the heartbeat thread, an overloaded physics engine means slow simulator execution.
This has several lessons for virtual world server design. The various functions of the simulator (physics, scripts, communication, etc.) should be multi-threaded to utilize all of the hardware threads available in modern servers. Additionally, the functions must be scheduled independently so their operation does not affect other functions. This leads to a server design of multiple independently scheduled modules which rely on locking of the central data structures.
In this article, we stress tested OpenSimulator and found the need for multi-threading in the sub-systems and the utility of independent task scheduling. Both of these promote scaling of the virtual world server. The next article will explore platform power and networking as it relates to a virtual world server.
About the Author
Robert Adams is a software engineer in Intel Labs and is a member of the Virtual World Infrastructure team investigating systems architectures for scalable virtual environments.