Performance Comparison: Starling vs GPU Render Mode

[** UPDATE **] Starling has significantly improved performance since I ran these tests. For more current numbers check out my Performance Showdown!

With the advent of Stage3D, many developers are wondering whether it’s time to make the switch from renderMode=GPU to a Stage3D Framework such as Starling or ND2D.

To that end, I’ve created a simple test suite which I’m calling “RunnerMark”, the idea is to simulate a sample Endless Runner game, and see how that render’s on the various devices.

Overview
The goal of this test is to simulate the load that you’d experience within a simple Endless Runner game. Here’s a quick breakdown of what’s going on:

  • 1 Main character with a run animation
  • 3 on screen enemies with a “Chomp” animation
  • 1 stationary background image (sky)
  • 2 Parallax scrolling backgrounds
  • Scrolling Ground Tiles
  • Scrolling Platforms
  • ~30 small dust sprites as your character runs, to simulate some level of a particles

Project Files
If you would like to view the source, and try this on your own devices:

Testing Methods
We’re going to look at the following rendering methods:

  • GPU Render Mode
    Uses the simple “bitmapData cache” method.
  • Starling – Dynamic
    Using a combination of DynamicTexture Atlas and Images. This is the best workflow available to Starling, makes it very easy to convert your embedded assets. But may not yield the best performance.
  • Starling – TextureAtlas
    In this test, TexturePacker is used to pack all assets into a single PNG file, with an associated XML file. This is the recommended method for optimal performance, however the TexturePacker workflow is definately a step backwards from the DynamicAtlas.

NOTE: In order to streamline the code, I’ve abstracted the displayList out of the main game loop, using a GenericSprite wrapper class. Consider implementing something like this in your projects going forward, it’s a good way to keep your apps highly portable.

Results

Loading chart…

Loading chart…

This is quite strange, but for some reason both the iPad 1 and iPhone 4 struggle mightily with Starling. While this fairly simple scene runs at between 40-50fps in GPU mode, it becomes completely unplayable with Starling. This is likely caused by the ActionScript overhead of Stage3D, which is very unfortunate.

Surprisingly, using a combined Texture Atlas seems to have no real impact on our performance.

Next up, lets try the Nexus One, this has a much more balanced CPU and GPU combo, meaning a far better CPU, paired with a significantly weaker GPU:

Loading chart…

Again, GPU Mode is the clear winner here, and again, no noticeable impact from combining the Textures.

In our final test we’ll look at the iPad 2:

Loading chart…

Clearly the benchmark is getting close to maxed out at this point, but still even on the mighty iPad 2, we see that Starling struggles to maintain 60fps.

After fixing some bugs I had in the original test, Starling runs this scene with ease, as does GPU Mode, which is way more what I was expecting.

Like usual, the iPad 2 is just an unstoppable force. We’d probably need to add at least another couple hundred sprites to begin slowing it down…

In Conclusion

Based on these results, it’s very hard to recommend that anyone adopt Starling today, for current mobile projects. GPU Render mode seems to be more stable and performant across the entire gamut of devices. If you need the best performance, and you need it now, then the choice is GPU Mode.

The one reason to choose Starling going forward is to future proof your App. We know that Stage3D is the future, and there’s no guarantee that GPU Mode will always be available, so if you are planning for the longterm, it may be in your best interest to use Stage3d. Also, Starling gives you much more fine grained control over the rendering than does GPU Mode, so if you’re doing an advanced game, you may be better off with Starling.

My personal recommendation would be to build your apps on top of GPU Render Mode, but use an abstraction layer like I do in these benchmarks. This way, your projects will have no direct tie to the displayList, and you will be ready to easily them port to Starling (or whatever you like!) down the road.

Written by

29 Comments to “Performance Comparison: Starling vs GPU Render Mode”

  1. Nick says:

    I have been using rendermode=gpu since the beginning. I tried my shot at starling recently and also failed to match the gpu mode performance on a Motorola DROID X, although it was close. It was a quick attempt so I’m not sure if missed anything. I am interested to see if someone can come along and optimize your code for starling to beat your gpu mode results. Good challenge! I’ll be waiting!

  2. Hi Shawn,
    thanks a lot for the comparison! Indeed very interesting to see how fast GPU mode can be when used correctly!
    I had a quick look at your Starling code and notice a few things which will have an extremely negative impact on performance:
    * Each texture is its own bitmap, which absolutely disables any of Starling’s performance optimizations. It’s strongly recommended to use a texture atlas.
    * You are displaying an FPS counter, which is a conventional display object (not Stage3D). This causes a big performance penalty, especially on mobile.
    This is definitely something that should be exchanged to have a fair comparison. Even if Starling uses a very similar API as conventional Flash, there are some things that have to be done differently, simply because of the underlying stage3D technology. I totally agree that this can be misleading.
    Best regards, Daniel

    • shawn says:

      Thanks Daniel! I’m not singling out Starling at all, these performance issues seem to affect all the frameworks, it’s just that Starling and Stage3D seem to be becoming synonymous.

      Regarding optimization, each bitmap is backed by the same Embedded PNG, so shouldn’t Starling just recognize that it is the same Texture and not re-upload it? Is this something that could be added? GPU Mode seems to do this fine.

      I know that a TextureAtlas is the way to go for peak performance, but it seems a bit unfair, because at that point we’re doing a ton of optimization for Starling, vs just one or two lines of code for GPU Mode, at that point even if we get comparable results, it’s hard to recommend the tougher workflow.

      Nevertheless, I will update and test with TextureAtlas :)

      PS. I think the issue with DisplayList being expensive on top of Stage3D is fixed. Adobe used this identical FPS counter for their bunnyMark demo, and was able to push many thousands of bunnies on Android and iOS. It only updates once/second. But I will go ahead and implement the Starling Stats port :)

      • Hi Shawn,
        thanks for the reply! Well, you’re right, using a texture atlas and optimizations like that may look unfair — because they are not necessary with GPU mode, why should they be with Stage3D? But you could look at this the following way, too: you *can* do those optimizations with Starling/stage3D, while in GPU mode, you never really know what’s happening behind the scenes. In a full-fledged game, you add some innocent looking object, and suddenly the performance drops, and you never know why. In stage3D, you always know exactly what is going on.
        That said, I’m sure Adobe has worked hard in the past to get the GPU mode as fast as possible, so there will be situations where it’s faster than stage3D. E.g. one thing that will limit all Stage3D libraries is that any CPU-intensive code is written in ActionScript, while the native display list was written in C/C++. This will become less limiting when Adobe enhances AS3 performance (which they are trying to do), but it’s just that way in the present.
        As for the DisplayList overlay: you’re right, I haven’t tried this in a while! Perhaps this has become better in the latest version. To be 100% sure, you could run the benchmark for, say, 30 seconds, and display the average fps only after that time.
        Please keep me updated! It’s important to know how those two techniques compare.
        Thanks for your efforts!
        Daniel

        • shawn says:

          Hey Daniel, an updated source drop has been uploaded, and I have completely optimized the Starling code. I used TexturePacker to create one big TextureAtlas for the whole scene.

          I didn’t see any improvement in the iPad 1 test results. It seems this is completely CPU limited, which as you say mostly comes down to AS3 and is largely outside of your control…

          I’ll need to wait until tonight to test the other devices and update my charts, as I don’t have all the devices with me.

          Cheers!

          • Thanks a lot for the update, Shawn! The results are of course quite unfortunate — I wouldn’t have expected that huge difference on iPad 1. I will have another look at that, but at the moment I can’t think of much I could do …

          • I had a closer look, and it turns out that the CPU is not the issue.
            Even if I remove the particles and all of the “onEnterFrame” logic, we’re still at only 21 fps. That’s with just 25 objects on the stage!
            It has to be the fillrate, then. I’ve read about that observation several times in the forum: as soon as you’ve got big objects on the iPad 1, performance starts to drop extremely. This does not happen on later versions (say, iPad 2).
            The big question is: why does GPU mode not suffer from that?!

    • shawn says:

      Daniel, do you think backing the Image’s by a Texture is good enough, or does it need to be a TextureAtlas to get the best performance?

      • Optimally, all assets would be in one png! But having the movie clip in one, and the other assets in another will be fine, too. (Perhaps you could try both and find out if it makes a difference!)

  3. Sebastiano says:

    I do not have much experience with the mobile development yet, but if I am not wrong so far stage3D was not hardware accelerated on mobile. The hardware acceleration has been introduced with the latest version of air sdk: http://helpx.adobe.com/flash-player/release-note/release-notes-developer-flash-player.html

    have you used air 3.2 to make these tests?

  4. I have done a similar test on Samsung Galaxy S2, Motorola Xoom, iPod 4th generation and the new iPad3, in each one of them GPU mode is faster than Stage3D, I am really puzzled about this thing.

  5. Elliot Geno says:

    I’ve heard that N2D2 and Genome2D are both an order of magnitude faster than Starling. I have only played with Starling however. Its great that its so close to the display list, but its young, and will likely continue to improve.

  6. focus says:

    Hi, Shawn! Last graph caption says Nexus One instead of iPad2.

  7. sHTiF says:

    Hi Shawn can I use the runner mark on my blog? I just finished the GNativeRenderer for G2D that will render your runner mark GPU version without any change to the code through Genome2D. All you need to do is one line of code and it will parse display list, manage textures (single texture per unique bitmapdata) and render it though low level Genome2D shaders.

    If there is huge interest in this I can do it even as separate framework where you simply say render root through Genome2D ;)

  8. [...] demonstrate this feature I took Shawn’s RunnerMark the display list version and run it through GNativeRenderer. Click the image to [...]

  9. Philippe says:

    Hey, just finished porting the RunnerMark to haxe NME (will put it on github soon) and I’ve got a nice score of 875 on my iPod Touch 4 (which means it should be even more on an iPhone 4).

    Just braggin’ ;)

  10. Arby says:

    AIR 3.4 should offer some improvements. Please post an update!

  11. [...] Performance: Starling vs GPU Render Mode [...]

  12. [...] post shows how, with the right optimizations, you can achieve great perfomance using “GPU” [...]

  13. Nick says:

    Hi Shawn
    Can you tell us some tips/tutorials/articles on switching over to starling, coming from an Air GPU render mode skill set?
    For example in GPU mode we set the stage quality to low, and move Bitmaps around the stage, simply updating or “swapping” the BitmapData property for animations. That’s all we needed for top performance.

    At its core, what is the best way to simply move graphic objects around+ animate in starling?
    I know starlings MovieClip object is supposed to operate like a sprite sheet under the hood. Does MovieClip, Sprite, and Bitmap perform the same in starling? Is one better? I assume the Bitmap.bitmapData “swapping” technique is no good in starling?
    Thanks
    We’re not looking for a full lesson, just the core idea behind Object+animation cells…Or maybe MovieClip does its job in starling and that’s all you need?

    • shawn says:

      Hey Nick,

      It’s pretty easy, everything is just a little more work than gpu mode, but not much.
      1. BitmapData must be converted to Texture, and then rendered with the Image class. Starling has a utility methof for that Texture.fromBitmapData(); Functionally image == bitmap, and texture == bitmapData. Share Texture’s and things will be fast. For best results, all small textures can be converted into a single TextureAtlas.

      2. Starling’s MovieClip class works with TexturePacker Files, or you generate a dynamic TextureAtlas using an extention:
      http://forum.starling-framework.org/topic/dynamic-texture-atlas-generator-starling-extension

      The nice thing about using Starling, is that you can mix regular DisplayObject content over top of the Starling stuff, and as long as the redraw-regions are small, it renders really efficiently. This lets you mix-n-match rendering load between CPU and GPU, only using GPU when it’s really necessary.

  14. Nick says:

    Excellent thank you, time to play!

Leave a Reply to Elliot Geno

Message