When I started working on Loose Leaf, I quickly realized that the hardest part of the app was finding a fast and efficient drawing UIView that I could add into my app. There were a fair number of tutorials that would solve the first 80%, but there was always some important pieces left as an exercise for the reader: Smooth curves, removing lag and stutter, or main-thread-blocking save/load.
All of the code I found could do some, but not all, of the following:
- Nice smooth curving lines instead of rigid line segments
- Variable width/opacity depending on pressure/velocity
- Undo and Redo
- Save and load asynchronously without lag while drawing
- Low memory footprint
After trying innumerable existing options, I decided to build my own drawing engine to solve all these features in 1 framework. The JotUI framework is the result.
If all you need is the drawing view, head straight to the Github project page for the code and sample project. If you’re curious about how it was built and how I navigated those trade-offs, then read on below.
Building a Drawing View for iOS
For smooth curves, I had a few requirements that I wanted to keep:
- The line should always curve through the touch points exactly
- The line should always pass through the most recently added point as soon as its added
- Bonus points if its configurable for how hard/soft the smoothing is
It turns out that doing all of those things is harder than it sounds!
After trying probably tens of options for a smoothing algorithm, I finally found a single algorithm that satisfied all of these constraints. The algorithm is defined here by Maxim Shemanarev, which I found through Sean Christmann’s post.
The smoothing code is integrated in SegmentSmoother::addPoint:andSmoothness:.
Pressure and Velocity
Now that we have a nice curved Bezier path, it’s time to add some body to it. At each touch point along the curve, I can use the pressure to determine the width at that point. So I only need to smooth between widths at each point – shouldn’t be too hard…. right?!
This post by Akeil Khan describes the first strategy that I looked at. Essentially: at each point, find two points perpendicular to the line so that we can define the outline of the path, instead of its centerline. Then, in the case of Core Graphics, we can fill that path, or we can use a traingle stripe to render it in OpenGL.
Up until this point, I had been using Core Graphics to render my lines, but I found that the computation necessary to calculate this path outline and fill it was simply too slow for what I needed. This wasn’t the fault of the outline algorithm as much as it was the performance of Core Graphics.
If I drew on the main thread, it was slow enough to lag the following touch inputs. And if I drew on the background thread, then the touch inputs wouldn’t lag but the round trip from main=>background=>main for the view refresh was long enough that the stroke appeard to follow the user’s finger after a delay. This also only let me interpolate between widths at each point, but I wasn’t able to interpolate between opacity. Also, sharp turns – like at the top of a cursive lowercase b or d – had uncomfortably flat tops.
It was here that I found the GLPaint example app from Apple. This example app doesn’t smooth between touch points, doesn’t use variable width or opacity for touch inputs, doesn’t do much of anything that I need, except: it renders lines entirely differently than Core Graphics. Instead of filling a bounded shape with the same color, it draws tons of points along the line – each point rendered independently of the one before it – and together those points look and act like a single stroke.
I built on this base code, added in the smoothing algorithm from before, and researched how to add variable radius and opacity for each of the points along the path. With these additions, I had a viable rendering view that could:
- Smooth the curve between touch points
- Interpolate width and opacity along the line
- Bonus: Supports different brush textures and shapes
The code for this is in CurveToPathElement::generatedVertexArrayWithPreviousElement:forScale:.
Next up was undo & redo, saving asynchronously, and optimizing for memory.
Undo and Redo
The example app from Apple didn’t support undo/redo, it just wrote directly to an OpenGL view’s layer – not even something I knew how to quickly/easily get a PNG out of. Despite that, I was able to add in undo/redo support by adding in a cache of stroke objects before they were written to the underlying texture.
Instead of only writing directly to the view’s context, I also added in a backing texture context. Every time the user’s finger/stylus lifed from the screen, I’d push another stroke onto a queue. After the queue grew longer than the max undo count, I’d write a stroke from the queue to the permanent backing texture – exactly the same as if I was writing to the screen, except this time it was to a separate texture.
Then, if the user wanted to undo a stroke, I could calculate the bounding rect of the top stroke on the queue, clip the screen rendering to that box, draw the base texture and then all of the strokes still in the undo queue. Doing something like this in Core Graphics would take 100s of milliseconds, but doing it directly in OpenGL was significantly faster – I was super excited by how quickly I could undo and redo multiple strokes one after another without any noticeable lag for the user.
The code for this is in JotView::undo.
Saving and Loading Asynchronously
At this point, I had an OpenGL view that had an OpenGL texture and a stack of OpenGL points sitting in memory, and I needed a way to get these out of memory and on to disk as quickly as possible while still allowing the main thread to write to and mutate these datastructures.
I setup a background timer that would fire a method continuously on a background thread. If the undo queue was larger than the max number of undo items, then that background would: lock reading from the base texture, render undo items onto that base texture until the count was back below the maximum number, then unlock on the background texture. This meant that the main thread and background threads would wait on each other if either was trying to write/read the background texture.
To make sure the background thread never entirely blocked the main thread, I a set limits on the number of undo items to render – so if the background thread was taking too long it would punt back to the main thread and just pick up where it left off next time the timer fired. This kept the main thread responsive no matter how much data was queued to be written to disk.
The code for this is in JotView::validateUndoState.
Keeping Memory Low
In addition to optimizing CPU performance and keeping the main thread as clear as possible, I also aimed for the library to keep a very low memory footprint. A drawing view isn’t helpful if after a few saves and loads it crashes the app because of high memory – and working with uncompressed retina-screen sized textures meant this was a real threat.
I worked to optimize memory use in two different ways:
- minimizing the data for rendering each stroke
- re-using full-screen textures when possible
As I worked to minimize memory use, I found that even if I was diligently freeing unneeded memory – that didn’t mean that the system would reuse that memory on a later malloc(). And it makes sense in retrospect: If I allocate/deallocate memory quickly over and over and over, the system is going to have a harder and harder time finding chunks of memory to give me as the pages of memory become more fragmented. Stroke data makes this issue even more apparently, there are lots of strokes to allocate, each made up of lots of smaller chunks to allocate, meaning lots and lots of malloc() and dealloc() of very small pieces of memory.
To combat this, I setup two caches: one for screen-sized textures, and one for the stroke data.
In the texture cache, I know that I’m only ever going to create textures up-to the size of the screen, but never larger. In some situations, like generating thumbnails, I may generate smaller textures, but never larger ones. To prevent multiple allocation/deallocations, I only ever memory chunks large enough to fit an entire screen texture into memory. Since smaller textures can only use smaller amounts of memory, it’s safe to just return that 1 single sized block of memory for any texture use.
This way, small textures do get too much memory sometimes, but only briefly before they return it to the cache. Only a few textures are ever alive at a time, keeping total texture allocation very small.
Each stroke is composed of smaller stroke segments, and each segment handles its own malloc() to store its points’ data. Since segments are fairly small, maybe a few hundred bytes, I first tried allocating these small chunks of memory independently. Interestingly, even if I only malloc()’d 100 bytes, I found that the smallest amount of memory allocated was always a 2kb page. And just like the texture cache, I found that the system was more apt to allocate fresh blocks of memory for new allocations instead of reusing old freed memory. For hundreds of segments across hundreds of strokes during a drawing session, this could add up in a hurry.
To optimize this, I setup a two layered cache: one for the 2kb blocks of memory, and another for smaller units of that 2kb chunk. This let a 2kb allocation in the first cache be used as 4 500 byte items in the 2nd cache, or even 20 100 byte items. In this way, the stroke cache was kept much much smaller than it would’ve been otherwise, and caused far less church and fragmentation in memory as well.
There’s lots more love that has gone into JotUI that I haven’t talked about here – things like CPU optimization, brush textures, brush rotation, file and disk caching, and a lot more – but I hope this gives you a taste for what’s in the repo.
If you decide to use JotUI in one of your projects, I’ve love for you to reach out and let me know @adamwulf!
And of course – support more open source code and download Loose Leaf today! 🙂