In my previous performance showdown, I took a look at the various Stage3D-based 2D frameworks like Starling, ND2D and genome, and compare them to GPU Render Mode and a Haxe.
It’s been a few months and several revisions later for both Starling and ND2D, lets see how they’re looking now! Haxe has evolved as well, so I’ve updated with the latest numbers from the GIT Repo.
Testing Notes: AIR 3.4 (Beta) SDK, uses latest nightly builds from Starling and ND2d as of July 18, 2012.
Test Devices
- Nexus One – Android 2.3.3
- Galaxy Nexus – Android 4.0.3
- iPhone 4 – iOS 5
- iPad 2 – iOS 5
Results
*NOTE: ND2D refused to render using the latest build + AIR 3.4 (Beta)
Conclusion
A bit of a mixed bag. ND2D showed great improvement on low powered devices, but still lags behind Starling. Starling has also received a speed boost in most tests, but all in all the performance level is a little disappointing when compared to Haxe NME. We are seeing a huge performance delta on many of the devices.
RunnerMark awards 580 Points for running the entire scene at 58fps, it then awards 1 additional point for each Animated Enemy added to the scene. Once the scene is unable to render at 60fps, the test will end.
When we look at Haxe NME on Galaxy Nexus, it score 1081, which means 501 additional enemy Sprites (1081 – 580 = 501). On the same device, Starling can only display 32 extra sprites. We see this pattern repeated on the iPhone 4 as well, where Haxe can push +398, but Starling only +88, and iPad 2, +2641 for NME, and +406 for Starling.
Another frustrating thing is that GPU mode continues to heavily outperform ND2D and Starling on modern Android Devices. For example, on the cutting edge Galaxy Nexus, GPU mode can push an extra +179 sprites, where Starling can only do +32. This is a huge difference!
Now, admittedly, there is plenty of performance here to create most types of games, and most games can run OK at even 30fps. However it’s really disappointing to see how much performance is being left on the table, and I really hope the Flash Runtime team is focused hard on narrowing this gap.
The new Falcon Compiler and AS4 can’t get here soon enough!
I will prepare version with Marmalade SDK because I am very curious how it will look in compare to Haxe and Stage3D.
Cool, look forward to checking it out.
a stage3d version
http://code.google.com/p/animation-render-engine/downloads/detail?name=yd.ipa&can=2&q=#makechanges
Nicely done, have you tried to benchmark on the iPad 1 or Kindle Fire? I am worried that Stage3D runs even worst on weaker devices compared to GPU.
I will try and add these, in general iPad 1 is about the same as iPhone 4, -20% or so. The chips are very similar.
Just a note, the Genome2D benchmark is still heavily unoptimized, Stats are completely drawn through CPU there instead of Genome2D! Also the Genome2D benchmark doesn’t use texture atlases. Both issues that can have major impact.
Can I write this benchmark for Genome2D from scratch instead of extending the runner engine? As the abstract layers you use aren’t that optimal for component based approach.
Ya I’m totally down for that, you can do a pull request if you like and I’ll just bring it in and benchmark.
Couple comments though. As far as I know the display list on top of Stage3D is rendered optimally, using dirty regions, and it is only updates once per second, so it should be totally negligible really.
In terms of TextureAtlas, this is a bit of a limitation of the engine, as I can’t combine multiple movieclips in one Atlas (If I remember right…). Unless I use TextuePacker of course… but when supporting resolutions from 800×480 –> 2100×1500, pre-rendered bitmaps are really an inferior solution to SWF based vector animations that are rasterized at StartUp.
With the SWF version, your file size is much lower, your quality of graphics is much higher on HD displays, and your performance will be better suited to the resolution/power of the device. The only single downside 1 or 2 seconds at startup as you build all the assets…and then you could even cache them locally to a file after that.
Not that I have a problem with you using a pre-baked asset in your tests, it’s what Haxe is doing after all… I just think this is a missing piece of Genome. If using a TextureAtlas is integral to performance, then I think TextureAtlas’s should have a rich API to allow me to build them in a variety of ways.
Shawn point is correct, the most amazing thing about Adobe Air is the possibility to load a 1 kb swf and then, at runtime, create a 20MB bitmapData to display on super high resolution screens. The idea of having to prebake all the resterized assets to me defies the real gain that you get by using Flash and vector art. I am personally still using GPU renderMode because it is so much easier to do this task, would be nice to see something like this optimized for Stage3D.
I thought you might be interested in these benchmarks of N2D2, Starling, and blitting:
http://forums.tigsource.com/index.php?topic=24895.30
Those are basically worthless as he’s testing on a 320×480 device. Ya CPU works great there… run it on a Retina device though, and watch it fall off a cliff.
[...] and fast BitMapdata bliting, as opposed to Stage3D. Shawn Blias has some conducted some great performance showdowns. His conclusions might surprise a lot of developers who equate Stage3D with fast graphics [...]
[...] and quick BitMapdata bliting, as against to Stage3D. Shawn Blias has some conducted some good performance showdowns. His conclusions competence warn a lot of developers who proportion Stage3D with quick graphics [...]
Hi Shawn,
I don’t know if you’ve read my recent blog – but I’m really disappointed by the limitations of Stage3D for scrolling. In the situation where you want to recycle texture rows or regions for memory-efficiency – uploading textures dynamically is incredibly slow on mobile. Unbelievably slow.
Should I look into bitmapdata bliting scrolling instead? What sort of performance would you anticipate on retina displays? What if it was full screen? (I’d anticipate that this wouldn’t be good – am I right?) Will direct rendermode give me the same performance as GPU with large area bitmapdata bliting? (In the situation where I’m using Stage3D for other stuff).
I don’t have any test results off hand for direct mode, but I can’t imagine it would be performant at all on Retina.
In terms of Stage3D, I’m using it on my first project now. It seems the key to managing texture uploads is many very small uploads, and then everything is fine. As soon as you start caching vary large items, you’re in trouble;
I don’t think this is limited to Flash though, it’s just the nature of GPU’s, you need to think differently. For example, look at how Apple renders pages on Safari, lots of small 128×128 squares. This lets them stream data to the GPU in a manageable way.
When I designed my UI for Picshop (After Android 4.0) w/ GPU Render Mode, I built the entire thing off of single colored pixels. I have like 5 core textures (pixels) that make up the bulk of the UI, and I just stretch into different size rectangles and borders. It was ridiculously fast.
The power is there, you just have to harness it properly
Oh, and I did read the blog. Best of luck with everything, I hope you keep at it, we really need alternatives Libs to continue to push things forward.
I do the 128×128 squares (subsequent to the prerelease discussion). Actually, lists are strips, not squares – 128 high by the width of the list. Maybe I should try making them thinner? (Just a constant at the top) Will a small area suffice in better performance?
I do the single pixel colour textures for the scrollbar, and the row highlight.
Ya, Area is everything, You have only a limited number of bytes you can send to the GPU per frame, and the formula is:
width * height * 4 (rgba)
Also, since GPU’s must render in power of 2, any textures you create will be paddes to 128, 512, 2048 etc
So caching entire rows in a list would never be good, it would probably end up in a 1024×1024 texture for each row (couldn’t fit into a 512). Instead. cache all the little bits and pieces that make up the row, you’ll get way better performance. And make sure the duplicated textures are shared between all renderers.
Instead of 20+ 1024×1024 textures, you could probably squeeze everything into a single texture.
Hi, thanks for your article.
I’m facing a really big dilemma, to switch to stage3d or to stick with render mode gpu..
Yesterday I started to learn about starling, but after seeing this comparison, I think it’s better to stick with gpu, because I really comfortable with it. Learning about starling is like learning from zero.
What’s your opinion? I’m planning to build a side scrolling game for iOS.. have you tried the latest starling 1.3 and how it compares with gpu render mode?
I don’t want to talk about the other stage3d engine. One thing I want to know, is the stage3d much better than gpu render, or it’s about the same.
Thanks!!
If you’re making games, I’d go with Starling. It allows you alot more control over things, and is faster in some cases.
GPU RenderMode is still great for making Apps, or small’ish games, but something of significant size would probably be best done with Starling.
I’d be interested to see how Scaleform compares in these benchmarks – but maybe you’d have to buy a license to test it on a device – so not worth it just to generate a graph.
Nice, good idea, I’ll try that out.
[...] The performance is also quite good. My previous love of developing mobile applications “Starling”, barely even compares to the speed of Haxe NME. There is a nice comparison of NME against some other popular frameworks, take a look at the stats here http://esdot.ca/site/2012/runnermark-scores-july-18-2012. [...]