About
This is the M Cubed Software weblog. To find out more about us head to our about page.
Search
Feed
Archives
- June 2010
- April 2010
- March 2010
- February 2010
- January 2010
- November 2009
- August 2009
- July 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
Syntax Colouring Blues… and Reds… and Greens…
Posted on 14/11/2007 at 04:44 AM in
With development on Minim 1.2 winding down recently, I’ve been working on the next update to Code Collector Pro, version 1.1. It went into beta this week and we’re hoping to get it ready for release not long after Minim 1.2. One of the big improvements in 1.1 is with the syntax colouring, which I haven’t been happy with since the release of 1.0. It was slow and not very accurate, with one test snippet of 500 lines taking up to 8.5 seconds to colour in 1.0. Soon after I released version 1.0.1 which fixed a few bugs and almost halved the time to render this large snippet to around 4.5 seconds, but that still was too slow to be acceptable, so I’ve spent the past few days re-writing much of the syntax colouring engine.
Step 1: Figure out what I was smoking
Sometimes I code late into the night. I almost always regret these coding sessions and they result in code that, while it works, works in a not very obvious way. When I coded parts of the syntax colouring engine I must’ve been coding very late because parts of it made no sense looking back at it (even with lots of comments). For example, instead of doing a simple regex to find strings, I was using the regex to find the first item of a string, then using NSScanner to find the end of the string and then colouring that in. I was doing something similar for comments. Before I changed any code I decided to do some work and find out how fast the system was at the moment. I have a small test set of snippets that are a mixture of code snippets of mine and some of a friend. Here are the colour times for the snippets:
| Ideal Color | 0.016 |
| Zend | 1.366 |
| Askpass | 0.483 |
| SystemConfig | 0.152 |
| 250 Obj-C | 0.683 |
| 500 Obj-C | 4.533 |
Some of these aren’t too bad, but there are 2 that take over 1 second to render. So I set to work re-coding strings and comments, removing a lot of messy code and replacing it with significantly less, and cleaner code and managing to shave 1.5 seconds off the large Obj-C snippet.
In the testing process I also found that a few items in Textmate bundles were for some reason taking up a large amount of the rendering time (almost a second). These included relatively minor bits of syntax, that were often defined in other parts of the bundle. A quick comment out of a small bit of code and I gained another 0.8 seconds off the colouring time of the 500 line snippet. But that still left 2.2 seconds to render the large snippet. A huge improvement but still an eternity.
Step 2: Blame someone else code
At this point I started looking the regex engine I was using, AGRegex. I was pretty sure that there was something slowing down the colouring in the framework as I couldn’t see how my code alone could be responsible for 2.2 seconds. So this was a good time to try out one of the new toys that comes with Leopard, Instruments. A little playing around later and I had a pretty good idea that my assumption was (mostly) right. Most of the processing time seemed to be occurring with some conversions to UTF8 strings within AGRegex.
Looking through the code most of the the conversions were on different strings so I couldn’t change much. Luckily I did find one line that was an issue. The same string was being converted to a UTF8 string twice in one line. Well we can’t have that, can we? A small change and a recompile later and I had managed to shave another 0.7 seconds of the colouring time. I had been right to blame AGRegex, but I still had 1.5 seconds to excuse away. Then something hit me…
| Ideal Color | 0.012 (1.33x faster) |
| Zend | 0.384 (3.56x) |
| Askpass | 0.175 (2.76x) |
| SystemConfig | 0.064 (2.38x) |
| 250 Obj-C | 0.258 (2.65x) |
| 500 Obj-C | 1.529 (2.96x) |
Step 3: Curse the facts of life
... my hand on my forehead. It had taken this long to realise that the main problem with my code wasn’t as much in it’s efficiency as in it’s scalability. Why did the 250 line snippet take 0.26 seconds when a snippet twice the size took 6 times as long? Because when you are performing several regexes on code, the time it takes doesn’t increase linearly but exponentially. I had finally figured out the core problem with my system, and the fix was easy.
Not long later I had my engine eating in chunks of up to 5000 characters, rather than what was given to it before. Great, now my code can be fast even for large snippets. Let’s test it and compare it to what we had when we started
| Ideal Color | 0.025 (1.56x slower) |
| Zend | 0.318 (4.29x faster) |
| Askpass | 0.112 (4.31x faster) |
| SystemConfig | 0.289 (1.90x slower) |
| 250 Obj-C | 0.315 (2.16x faster) |
| 500 Obj-C | 0.547 (8.29x faster) |
Ok, that’s not good… Some snippets weren’t as fast as before we broke up our code and some are even slower than when we started messing around. On the plus side we’ve sped up our large snippet by over 8 times. Well it shouldn’t be too hard to fix, should it. Just use the old method for smaller snippets, thereby removing all of the extra processing needed for larger snippets. Let’s try again:
| Ideal Color | 0.012 (1.33x faster) |
| Zend | 0.308 (4.44x) |
| Askpass | 0.107 (4.51x) |
| SystemConfig | 0.059 (2.58x) |
| 250 Obj-C | 0.284 (2.40x) |
| 500 Obj-C | 0.524 (8.65x) |
And those are the current speeds for Code Collector Pro 1.1b1. Obvious some snippets will see a bigger benefit than others (mostly those that are large with lots of strings and comments). And on top of all this increase in speed comes increased accuracy. And this is just one of the cool things we’ve done for Code Collector Pro 1.1. Keep an eye on this blog over the next few weeks as we reveal more.