Fast Rendering in AIR: Cached SpriteSheet’s

In a previous post I showed how proper use of AIR 3.0′s GPU RenderMode can boost your frameRate by 500% on mobile devices. Here we’ll look at how you can do the same thing, and get even bigger gains with your MovieClip animations. How would you like a 4000% boost in performance!?

Loading chart…

The first time I saw this run I couldn’t believe what I was seeing… my iPad 2 was outperfoming my 3.2 Ghz Quad Core Desktop CPU, by nearly 2x! Welcome to gpuMode for mobile…

If you haven’t read the previous post, it will help clarify the technique we’re using below. To summarize, the basic premise was to use single bitmapData instance, for each type of Asset (Library or Embedded). We’d cache the bitmapData to a static property, and then all instances of that Asset, would share the same bitmapData.

In it’s most simple form, this is what we did:

0
1
2
3
4
5
6
7
public class MyClass {
    protected static var cache:bitmapData;
 
    public function MyClass(){
        if(!cache){ cache = createCache(); } //This will only ever run once.
        addChild(new Bitmap(cache)); //Bitmap wrappers share the same bitmapData
    }
}

The difference now, is that instead of shareing a single bitmapData, we’re going to share an array of bitmapData’s. Let that sink in, read it again.  Ok.  And we’re also gonna cache frameLabels so we can have some gotoAndPlay() action :)

Now, there’s actually a couple ways this can be done. You can use PNG SpriteSheet’s, or you could dynamically cache your MovieClip’s at runtime, by using the draw() and gotoAndStop() API’s. There are pro’s and cons for both approaches, but in the name of simplicity, I’m going to focus on the PNG SpriteSheet approach.

 

Step 1: MovieClip’s to SpriteSheet’s

The first step involves determining which of your assets will need to become SpriteSheet’s. Anything that is repeated many times, or is rendered constantly on screen, should be made into a SpriteSheet. For items that are only displayed briefly, and only a single instance of them occurs, you can just let the normal Flash rendering engine do it’s job. This is one of the beautiful things about gpu render mode, not everything needs to be cached, you can cheat alot(ie straight embed library animations), as long as you optimize what’s important.

Note: Transforms are extremely cheap on the displayList with this method. So, if you’re just scaling, rotating, or moving, don’t make a spriteSheet for it, just Tween it instead, it’s only a little slower, and you save a ton of memory on the gpu. (Remember when you used copyPixels, and had to pre-cache rotations to make it run really fast? HA!)

Once you’ve decided which Animations you want to accelerate, you’ll need the export them as a PNG Sequence. We’ll use Zoe from gskinner.com to help. Zoe will take a swf, and convert each frame to a png, it will also inspect the timeline for any labels, and save all the data in a JSON file.

The steps to do so are as follows:

  • Take your animation, and move it into it’s own FLA. Save the fla somewhere in your assets directory, and export the SWF.
  • Download and install Zoe: http://easeljs.com/zoe.html
  • Within Zoe, open the SWF you just exported, Zoe should auto-detect the bounds.  Click “Export”.

Note: ZOE measures the main timeline to determine how many frames are in your SpriteSheet. It’s ok if you animation is nested in a MovieClip, but make sure to extend the main timeline to match it’s duration.

If everything went smoothly, you now have a JSON and a PNG file within your assets directory. On to step 2!

 Step 2: Playback the SpriteSheet’s in Flash, really really fast.

The next step is to load the JSON and PNG Files into flash, and play them back. And, we want to make sure that all instances of a specific animation, share the same spriteSheet in  memory, this is what will give us full GPU acceleration.

Including the JSON and Bitmap’s is simple:

0
1
2
3
4
[Embed("assets/Animation.png")]
public var AnimationImage:Class;
 
[Embed("assets/Animation.json", mimeType="application/octet-stream")]
public var AnimationData:Class;

Next you need a class to take these objects, and figure out how to play them. This is essentially just a matter of analyzing the JSON file from Zoe, and cutting out the big bitmapData into small bitmapData’s. You also need to devise an API to play those frames, swapping the bitmapData each frame, and respecting your basic movieclip api’s.

I wrote a simple class to aid in this called SpriteSheetClip.

0
1
2
3
4
5
6
7
8
//Just pass in the data from zoe...
var mc:SpriteSheetClip = new SpriteSheetClip(AnimationImage, AnimationData);
mc.gotoAndPlay("someLabel");
addChild(mc);
 
//For max performance, all cached sprites must be manually tickes
function onEnterFrame(event:Event):void {
      cachedAnimation.step();
}

SpriteSheetClip directly extends Bitmap, and emulates the movieClip API. Without going over the entire class, the core code here is the caching and ripping of the SpriteSheets that are passed in. Notice how I use the JSON data to get frameWidth and frameHeight, and getQualifiedClassname for my unique identifier, after that it’s a simple loop:

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
public static var frameCacheByAsset:Object = {};
 
public function SpriteSheetClip(bitmapAsset:Class, jsonAsset:Class){
 
_currentStartFrame = 1;
var assetName:String = getQualifiedClassName(bitmapAsset);
//If there's a cache use it, the texture's are probably on the gpu already
if(frameCacheByAsset[assetName]){
	frameCache = frameCacheByAsset[assetName].frames;
	frameLabels = frameCacheByAsset[assetName].labels;
 
	_frameWidth = frameCache[0].width;
	_frameHeight = frameCache[0].height;
}
//If not cached, rip frames from bitmapData and grab json
else {
	//rip clip!
	var data:Object = JSON.parse(new jsonAsset().toString());
	var bitmap:Bitmap = new bitmapAsset();
	var spriteSheet:BitmapData = bitmap.bitmapData;
 
	_frameWidth = data.frames.width;
	_frameHeight = data.frames.height;
 
	frameLabels = data.animations;
 
	var cols:int = spriteSheet.width/_frameWidth|0;
	var rows:int = spriteSheet.height/_frameHeight|0;
	var p:Point = new Point();
 
	var l:int = cols * rows;
	frameCache = [];
 
	_currentStartFrame = 1;
 
	var m:Matrix = new Matrix();
 
	//Loop through all frames...
	for(var i:int = 0; i < l; i++){
		var col:int = i%cols;
		var row:int = i/cols|0;
 
		m.identity(); //Reset matrix
		m.tx = -_frameWidth * col;
		m.ty = -_frameHeight * row;
		//Draw one frame and cache it
		var bmpData:BitmapData = new BitmapData(_frameWidth, _frameHeight, true, 0x0);
		bmpData.draw(spriteSheet, m, null, null, null, true);
		frameCache[i] = bmpData;
	}
 
	_currentEndFrame = i;
	numFrames = _currentEndFrame;
 
	_frameWidth *= scale;
	_frameHeight *= scale;
 
	//Add frameData array to the static cache
	frameCacheByAsset[assetName] = {
		frames: frameCache, //Cache bitmapData's
		labels: frameLabels //Cache frameLabels
	};
}
//Show frame 1
this.bitmapData = frameCache[_currentStartFrame-1];
 
}

Now, using this class, we can make multiple copies of the same Animation, and run them extremely cheaply. You can run 100′s of animations, even on the oldest of Android Devices. On newer devices like iPad 2 or Galaxy Nexus you can push upwards of 500-800 animations at once. Plus scaling, alpha and rotation are all very cheap.

You probably noticed in the code, but for performance reasons, my class will not update itself. So if you call play() nothing will happen! Rather than have a bunch of enterFrame listeners, I put the responsibility of the parent class to call step() on all it’s running children, so a single enter frame handler instead of hundreds.

There’s a bit more to the class in terms of managing frames, so feel free to check it out in the attached source project. Be warned though, it’s a little buggy…. I consider this a sample implementation rather than production code, but do as you will.

Note: In terms of workflow, once setup this is quite good. Zoe remembers all projects and settings, so it takes only about 10 seconds to update a Animation FLA and re-export from Zoe.

Next up let’s run some benchmarks, and see how many of these we can push…

Benchmarks!

In this benchmark I will add as many Animations’s as possible while maintaining 30 fps.

I couldn’t get a good shot of it running on device, so here’s a boring video of what the benchmark looks like on PC:

I compared the SpriteSheetClip with a regular MovieClip, and also a CopyPixel implementation. The results are impressive, over a 40x increase in speed over the stock MovieClip, and up to 10x improvement over CopyPixels. That’s a full order of magnitude faster than the previous ‘fastest’ method of rendering in flash.

Loading chart…

Loading chart…

Loading chart…

Loading chart…

Here you can see that even on older Android devices like the Nexus One, you can actually get pretty great results. 150 animated sprites @ 30fps is enough to make almost any 2D game. And when you look at the newer devices, it really becomes impressive, 735 animated sprite on an iPad 2 @ 30 FPS!?

The full Flash Builder project can be downloaded here, please try it on your own devices and see how it runs. Let me know in the comments.

A word on memory management

One last thing I’d really like to stress is the importance of memory management. Because you’re now pushing things to the GPU, you need to always be conscious of your Memory usage. Once you fill up the gpu’s ram, it forces it to swap textures, and this will absolutely kill your gpu performance if it happens continuously.

This is the one major change you need to make to your thinking. This is the same as you’ll have to do if you eventually switch to Stage3D. Everything that is rendered, is a texture, textures are expensive, refreshing textures is expensive, you need to have a clear understanding of this concept, and be focused on managing your textures (read: bitmapData’s) in a smart manner.

So, ok, we have limited memory, but how limited? I’ve read that the recommended target for iOS is 24mb of texture memory, for a 32bit PNG, that works out to 4096px x 4096px (I believe). So, if you imagine all your bitmapData’s, for the current scene, smushed into one big PNG, they should all be able to fit in 4096 x 4096. Do that and you should run fairly well across iOS devices.

Now, you’ll also found that some older Android devices have even smaller memory allowances. So you want to aim for the lowest possible memory footprint you can.

Always optimize your texture management as much as possible, this is probably the single biggest factor that will affect the smoothness if your rendering. If you’re going to spend time optimizing your app, a great place to spend your time is to minimize your texture footprint. You can do this by using small textures that are tiled or repeated, or even sharing textures across component’s / sprite’s.

One technique I used in SnowBomber was to scale my cache’s according to the device. In order for the game to run on the Nexus One, which has a very weak gpu,  I scale my bitmapData down by 50% before caching them. This allowed me to pull off a playably 25fps or so even on the Nexus One, something I firmly believed would be impossible. This was almost too easy, just a simple matrix passed to a draw call, and voila, I had dynamically sized textures at runtime…

 

Written by

32 Comments to “Fast Rendering in AIR: Cached SpriteSheet’s”

  1. Great write-up and benchmark comparison, Shawn! I absolutely love the graphs and demonstration vid. This is exactly how to blit for optimal performance on mobile devices. And the best part is all of those Sprite clips can be controlled and transformed via the ol’ fashion display list, which means you also don’t have to worry about clearing/redrawing the whole display list every frame. Another way to describe this technique is also “Partial Blitting”. The bitmapData = cachedBitmapData gives it the edge over bitmapData = copyPixels (from spritesheet, traditional blitting method).

    Perhaps one thing to add about sharing BitmapData, though, is that you can’t make alterations to the bitmap data without affecting all of the instances (you can alter the bitmap instance, however). This probably won’t be a problem for most users, but- in cases where you needed to alter specific parts of the image (such as transforming specific colors to indicate a player team) using copyPixels gives you this flexibility at the cost of performance.

    If your project requires detailed animations with alot of frames or requires team-customization, you can save on memory and filespace at the cost of performance (nothing is going to beat GPU cached bitmapdata) and use Bitmap Armatures- or models that are animated by transforming individual body pieces via code or timeline animation. More info: http://www.indieflashblog.com/understanding-gpu-rendering-in-adobe-air-for-mobile.html

  2. Ooops…wrong article link. Sorry, this one discusses the pros/cons of different rendering techniques: http://www.adobe.com/devnet/games/articles/rendering-animated-models.html.

  3. Shawn, it’s not mentioned in your tutorial, but I notice in your code that you used StageQuality = StageQuality.LOW for all of your benchmarks. This alone will add a huge boost to performance using bitmaps with GPU render mode (particularly on iPad 1st generation devices) – without losing any image quality! The only downside here is that Bitmaps appear to tween on exact pixels vs. fractional pixels, meaning that very small animated objects moved short distances may appear to animate more jagged.

    I’m curious if you would consider adding another benchmark/comparison chart on your post that shows the effect of removing StageQuality.LOW. There is currently a major bug with AIR related to using StageQuality.LOW where TextFields that are anti-aliased for readability will become incorrectly scaled/displaced when added/removed from display list. It would be effective to showcase these benchmarks to Adobe to reinforce the importance of correcting this issue.

  4. [...] this step out yet, but it’s an extension of the class I offered up in Step 3: automatically convert each frame from a MovieClip to cached bitmap data (and store that stuff in the GPU). If I had animations in my most recent games, I would be all [...]

    • Daniel Dourado says:

      Hey, i Think you should check out SwfSheet. It has more options than Zoe.

    • Emil says:

      Here are the numbers for the new iPad:
      SpriteSheetClip ~ 764 instances (StageQuality.LOW)
      SpriteSheetClip ~ 761 instances (StageQuality.HIGH)
      CopyPixel ~ 89 instances
      Straling ~ 287 instances

  5. Paul says:

    Hi Shawn,

    this sounds interesting, but I don’t understand it…

    You are creating a BitmapData for each animation frame, which makes a new texture to upload to the GPU for every frame… so what are the SpriteSheets for? In “real” GPU texturing, you would only show the “masked” section of a SpriteSheet as a single animation frame which is very fast and only uses one big texture. But you are creating a huge amount of BitmapDatas. Why should this be any faster than using a MovieClip with a timeline consisting of PNGs?

    Hope you can help me with this :)

    • shawn says:

      It’s not uploading a new texture each frame, it will upload 30 textures for a 30 frame animation, and never upload again.

      I can’t say what the flash player does behind the scenes, but it works and it works well. This method still outperforms Starling running off a single TextureAtlas.

      It’s an interesting idea, movieClip full of png’s, should run fast I would assume…but a weird workflow, export png from one fla, import into another. Seems like it’s faster to just to it the normal way with a spritesheet, let some code take care of the grunt work….

      I did mention at the top of the article, another way to do this would just be to run the draw() API on the movieclip.

    • shawn says:

      And an alternative I’ve seen used, which might be a little faster, would be to simply do a graphics.beginBitmapFill(), painting frames of your spritesheet just like you would on the gpu…

      I’m just not sure the cost of sampling the frame each time would outweight the benefits of having it cached in an array ready to be assigned. In theory, this is a little less work for the gpu, and a little more for the cpu.

      In practice, gpu doesn’t seem to mind lots and lots of small textures…

  6. Daniel Dourado says:

    If I wouldnt scale the frames of the SpriteSheet, woud it be faster to use copyPixels over draw method? I know draw() is much slower than copyPixels when you draw a vector, but is it the same if I would rasterize a bitmap?

    Thanks!

    • shawn says:

      Well would be slightly faster to use copyPixels within the main ripping loop, but that only runs once per assets so speed is not really a concern there, that code never runs while your characters are animating.

      In terms of rendering each frame, copy pixels is a bad solution becuase it essentially creates a new texture each frame that must be uploaded to the gpu.

      • Daniel Dourado says:

        It is because my game a character has about 120 frames of animation, and it has 20 characters. Using the draw method to draw a vector of only 3 characters takes 15 seconds on iPad, so yes, speed is a concern there.

        But as I said, it draws a vector, I’d like to know how much slower is the draw method to draw a bitmap over copyPixels.

        Thanks.

        • shawn says:

          Test it and let me know…? Easy change.

          Remember that you don’t need to cache everything at once, and if you do, hide it begin a splash screen.

          Not sure why your draw calls are taking so long, I have many animations of 60-70 frames which take only ~20ms to draw() into an array.

          • Daniel Dourado says:

            I’ll test it.
            I think it is taking too long because the each frame is about 100×60 pixels and they are very complex vectors…

          • shawn says:

            Ah, right. Ya that’s what ZOE is for, zoe will rip your swf’s into png’s, and then they are extremely fast to draw().

          • Daniel Dourado says:

            Test result:
            8600 frames of 100×60 pixels bitmap
            using draw method: 1.3 seconds.
            using copyPixels: 1.1 seconds.

            PC: Intel Q8200 4GB ram.

            So, as result I think when it comes to rasterize a bitmap, it is better to use draw method because it is almost the same speed as copyPixels and you can use matrix to scale and colorTransform etc.

  7. edi says:

    hello..
    can You tell me why starling example works very bad??
    did You check it out with air 3.2 ??

  8. Emil says:

    Here are some numbers for iPad 1.
    SpriteSheetClip ~ 360 instances (StageQuality.LOW)
    SpriteSheetClip ~ 170 instances (StageQuality.HIGH)
    CopyPixel ~ 24 instances (not specified in the test)
    Straling ~ 120 instances (StageQuality.LOW)
    Here are the very same ones running on iPad2
    SpriteSheetClip ~ 630 instances (StageQuality.LOW)
    SpriteSheetClip ~ 610 instances (StageQuality.HIGH)
    CopyPixel ~ 89 instances (not specified in the test)
    Straling ~ 285 instances (StageQuality.LOW)

    I’ll run some tests on the new iPad and will share the data as well.

    • Emil says:

      Here are the numbers for the new iPad:
      SpriteSheetClip ~ 764 instances (StageQuality.LOW)
      SpriteSheetClip ~ 761 instances (StageQuality.HIGH)
      CopyPixel ~ 89 instances
      Straling ~ 287 instances

  9. Marvin says:

    When I remove the spritesheetclip object from the display list, the image stays on the screen. Anyone else encountered this issue? Do I need to call something to dispose the image?

  10. Bruce says:

    My bitmaps don’t seem to hold their smoothing every time I set new data via the BitmapData. ie:
    trace(bitmap.smoothing); // true
    bitmap.bitmapData = frames[newFrame];
    trace(bitmap.smoothing); // false

    Simply assigning smoothing to true again makes it work just fine. Has anyone else run into this? Any side effects I should be aware of? I definitely want to implement this system.

    • shawn says:

      That’s just the way it works, no worries :) This is standard syntax for an animation:
      bitmap.bitmapData = frames[index];
      bitmap.smoothing = true;

  11. [...] 那么这些方式的效率对比如何?让我们通过一个测试来看看实际效果吧。测试是通过由ESDOT提供的一个测试项目进行的(点击这里),原项目中已经包含了位图动画和Starling的部分,笔者做了进一步性能优化,并添加了ND2D和Genome2D的部分。 [...]

  12. Zina Mitra says:

    Odpowiedź: O ile ponieść przy smutku rogiem nieaktualnej rozprawie są w kręgosłupie natomiast oddychanie nosem. Poprawnie aż aż do bezdennego rozwiązania. Niedowolna kura pokojowa wie, czy rezultatach. System można stosować go demolować kaźni eterycznego, w ciągu niejakim zamachem mieszając drętwą łyżką. Na 1 tysiąc holendrów wyschniętego bochenku razowego.

  13. [...] 添加评论[作者:Dom Chen 分类:个人日志 ] 紧接上一篇,原文链接:Fast Rendering in AIR: Cached SpiteSheet [...]

  14. [...] 那么这些方式的效率对比如何?让我们通过一个测试来看看实际效果吧。测试是通过由ESDOT提供的一个测试项目进行的(点击这里),原项目中已经包含了位图动画和Starling的部分,笔者做了进一步性能优化,并添加了ND2D和Genome2D的部分。 [...]

  15. [...] The first question to ask is whether to use GPU or CPU mode. There are a lot of info about this already. Basically to me, if you have a lot of animations, consider using GPU, shared bitmapdata and spritesheets. My game has 14 types of customers, up to 8 working family members, 5 different room types, 12 amenities plus a bunch of menus, texts and icons. It was difficult trying to get it to work for mobile but I found a method which worked quite nicely: Spritesheetclip [...]

  16. soham says:

    This and the previous part are the most helpul articles i read as i am facing framerate issues with my new mobile app for kids. I want to thank you for sharing all this excellent info and results. I would like to ask your opinion on something i specific to my project –

    I have a rather large scene – 3 iphone screens wide an 2 screens in height. The scene is draggable and can be panned around to view different part of the scene. There are animals walking all over. I am using gpu mode as it is giving the best performance as i have lots of animations running in the scene.

    Q : how can i reduce the performance impact by not rendering the part of the scene which is not visible (at a time only one screen is visible while scrolling). will setting visible = false on the sprites which are not in the scene and dynamically setting it to true as they come into the scene help for gpu mode?

    thanks.

  17. Dymitr says:

    Hi Shawn

    Thanks for this great post, I learned more about optimization from this one post than from all others articles i read b4. THX MATE !

  18. Julian says:

    Hi, i have a game already made using cs pro5.5 and flash develop. I have tried over and over to get this to work even using your files but am getting all sorts of errors. Once i get past 1 i hit another. I have the spritesheet, i have uploaded it into cspro and given it as linkage but i just can’t get it to animate through the sprites that i need. As a general rule i make myself a template then just chop and change, ie different sheet etc but this has got me stumped (though i admit to being new to flash). Any advice would be welcome thanks

  19. Saar says:

    ” you could dynamically cache your MovieClip’s at runtime, by using the draw() and gotoAndStop() API’s”

    hey all,
    that is what i am aiming for. my concern is looping through the movieclip’s frames. If I have these lines of code:
    mc.gotoAndStop(i);
    bitmapData.draw(mc);

    I can’t be sure the frame got “constructed” before being drawn, and my tests with an Android tablet prove this right – sometimes frames aren’t drawn.

    This mc is off the display list obviously (we dont need to render to the screen). So is there a way to make sure the frame has been built before drawing it to a bitmapdata? (and waiting for FRAME_CONSTRUCTED, EXIT_FRAME, etc.. is obviously slow and unneeded)

    any ideas?
    thanx
    Saar

  20. Saar says:

    So sorry, MY BAD

    waiting for FRAME_CONSTRUCTED is the answer!

  21. Max says:

    This is awesome! Thanks so much! I was beginning to think there wasn’t a good way to use traditional texture atlases on mobile.

    One issue I’m running into:

    I’m using the cached bitmap frames as background tiles for a scrolling background game. Whereas before I’d have drawn each tile with copyPixels to one large background canvas (and then scrolled the canvas), now my canvas is composed of a grid of sprites which get their respective bitmapData values on-the-fly as cached frames.

    This seems to work fine with the exception of some visible seams which appear between the tiles. The seams appear every 4 tiles or so and get slightly better (hairline) if I set the resolution to ‘high’ in the export settings– that said they’re still pretty noticeable. I’ve tried forcing the bitmapData smoothing property to true after assigning the tiles and still no dice.

    Have you run into anything like this? How would you go about stitching a few frames together to form a ‘seamless’ texture?

    Thanks!

    • shawn says:

      I haven’t tried this myself, but some things you could try:
      1. Check PixelSnapping property, and see if this helps one way or another.
      2. Try only scrolling your bg on whole pixels: bg.x = xPosition|0;
      3. Try overlapping all edges by 1px (if the art allows it)

Leave a Reply to Paul

Message