A Character Encoding Glitch In Messages.app

By now, a lot of Mac users are probably familiar with Messages, Apple’s new Instant Messaging app. One of the features which this app stealthily removed was formatting: that is, all incoming messages are automatically stripped of bold, italic, and various other font tags. But, being the poorly coded program that it is, Messages sometimes fails to reformat an incoming message. I decided to investigate this further.

I first noticed that Messages didn’t always reformat incoming messages when my friend sent me a quote which was copied from a website. For some reason, the text remained bold when I saw it on my machine. I decided to throw up the LibOrange Xcode project that I already had lying around on my computer in order to log the HTML of the message which somehow eluded Apple’s reformatting.

At first glance, the body of the message seemed ordinary: there were no tags which stood out; in fact, the HTML itself had nothing to do with why Messages was failing to reformat. However, the message did contain a very unique attribute which allowed it to slip past the reformatting process.

As a quote, the message was not only formatted, but also surrounded by ” characters. And, as many fellow copy-and-pasters know, most websites use special characters for open/close quotes; these characters are not the standard ASCII quote character, but rather a UTF-16 character. When I realized that these fancy quotes surrounded the message, I came to a conclusion: unicode characters break Messages’ reformatter.

I tested this theory with LibOrange. I setup a simple AOL screenname which automatically echo’d messages I typed, replacing &lt; and &gt; with < and >. I found that, if an HTML format tag came after a unicode character in the message, Messages did not remove it:

A UTF-16 quote breaks Messages' reformatting

So, what conclusion can I draw from this? One which I already drew a while ago: Messages was poorly implemented. On top of that, what I’ve found shows that Apple’s code which processes incoming messages is not perfect. With this in mind, it might be possible for someone to derive an exploit which could be triggered simply by sending someone a message. However, this doesn’t seem likely to be possible, but who knows; if Apple fails at one thing, what else may they have failed at?

Why Would a Programmer Like Latin?

My interests reside primarily in modern activities: programming, watching movies, and learning about math and science. However, I have another interest which, at first, appears to have no relevance to anything modern at all: the Latin language.

Starting this year in high school, I decided to drop French and take Latin instead. While this decision was primarily rooted upon my distaste for the school’s designated French teacher, it turned out to be a good decision for an entirely different reason; as it turns out, I find Latin fun.

But–you might ask–why would I think Latin is fun? Well, quite simply, for the same reason that some people believe mathematics to be fun, and for the same reason that I rejoice when I’m writing seemingly boring lines of source code. The joy which I get out of programming and math is because of one simple idea: there are a set of rules which can be applied to tons of situations.

In mathematics, the rules are fairly straightforward: there are operators, variables, functions, and then different sorts of more complex rules as you move down the road (e.g. rules for evaluating limits, sums, derivatives, etc.). These rules can be applied to creating formulas for all sorts of scenarios. In programming, there are variables, operators, functions, classes, and different notations for all of these ideas. With only these small components, pretty much any program can be put together.

Similarly in Latin, there are nouns, verbs, adjectives, etc., all which work together to express any sort of idea. Just like with programming and math, I can understand a set of rules the minute I learn them, but they don’t become a habit–a natural instinct–until I use them a lot. Consider algebra’s order of operations, for instance: at first they are something you know (to some people, they are known simply as PEMDAS), but then they become something you don’t have to think about. I went through a similar process when learning to type in Dvorak, but I won’t go into that here.

About half way through this last school year, I discovered that I actually liked Latin just as much as I typically enjoyed programming. This interest derived from learning the language through formally introduced ideas rather than simply hearing the language spoken and picking it up naturally.

This method of “analyzing” the language probably originated in classrooms as the result of Latin’s status as “dead,” or “unspoken.” However, the process of standardizing and labeling the language caught my interest far more than any living language could. Assigning a formal set of rules to explain a huge corpus of literature, literary dialog, and even Roman graffiti is the same kind of thing that Newton set out to do with his laws of motion (which, I might add, were originally published in Latin).

At some point in the school year, I realized that the classroom wouldn’t thoroughly indulge my interest. For one thing, Latin class only took place five blocks a week with each block lasting 35-40 minutes. Secondly, what ever was I to do in the summer?

As a response to these dilemmas, I bought a textbook and readily began to teach myself Latin. I probably spent several hours a day studying Latin, with no intention of improving my English vocabulary or getting an edge up in class. It was purely for the sake of pursuing my interest. I did, however, notice that I no longer had to do any studying or work for the class; nonetheless, this should be expected if you triple or quadruple your external study time.

This summer, I’ve continued to pursue my interest to the extent of finishing the textbook which I had bought during the school year. This textbook (Wheelock’s Latin) covered many grammatical concepts, although I’m sure that there are still some pieces of literature which I would find significantly difficult to understand. Now I spend some time every day translating various Latin literature for the sake of practice and enjoyment.

Latin opened my eyes to many new ideas of linguistics. On top of this, it made me realize that something doesn’t have to have blinking lights and a metallic exterior in order to be interesting. While I realize that most programmers and mathematicians probably wouldn’t share my same interest, I encourage all of you who find mathematics or programming fun to also try out Latin. In fact, any language may satisfy your interests, so long as you study it in terms of pure rules.

Setting a Custom Resolution on a Retina MBP

The new Macbook Pro, along with its retina display, includes a new, simplified display settings preference pane. This preference pane, while allowing the user to switch between various retina-friendly preset resolutions, does not allow for the selection of custom resolutions. In fact, the available resolutions do not even include the display’s native resolution, 2880×1800.

Despite the simplified preference pane, however, it is still possible for an application to modify the resolution manually. In fact, I personally crafted a simple command-line application to do just this: the application allows the user to set many common resolutions, including the native, full resolution. A user can download this executable with this link; it can be executed by dragging the unzipped file into Terminal, typing a width, a space, and a height (both in pixels), and hitting enter.

I have posted the source code to GitHub. In the future, I plan to craft a more user-friendly version of this software; typically a CLI is scary to users, especially when they are required to compile it themselves through Xcode.  As of now, the command-line executable supports additional arguments to set fields such as the bit-depth and scale (a value of 2 for retina display, 1 for regular displays).  Since the program only allows the user to specify pre-defined screen preferences which have been built-in to I/O Kit’s drivers, a –modes flag allows the user to list all available settings.

One other technical note: in the process of working with Apple’s Quartz Display Services framework, I discovered something which was essential and which Apple’s API refused to provide: the ability to get a dictionary of attributes for a given screen mode.  However, I noticed that the CFType which apple would give me responded to -description, printing out all of the desired info.  Once I knew that this data structure was hiding my precious metadata, I did some hacking.  I found the offset in the CFType at which a pointer to the info dictionary is stored; I then made a method to retrieve that dictionary.  While this is a dirty, disgusting hack, it had to be done.

My Progress With Dvorak

I made my last post right after I began learning Dvorak. Now, I am going to give a little update on the experience I had learning it, and the problems that I ran into. First off, let’s get something out of the way: I am typing this article in Dvorak, and I am typing at about 50 words per minute, a speed that I have perceived to be above-average for only 1-2 weeks of practice. So, how did I come this far in such little time? Simple. I immersed myself.

Two weeks ago began my spring break, and, as a programmer with a limited range of interests, I was inclined to spend pretty much all of my time in front of a computer, either typing out code or messaging friends online. Because both of these activities require lots of typing, it is safe to say that I spent at least 10 hours a day pounding away at my keyboard. Of course, being the huge geek that I am, I did not let this time go to waste. Instead, I spent every moment of it with my computer set to the Dvorak keyboard layout, despite the agony that I knew I would experience in the midst of my slowness.

However, such a task as immersing oneself in Dvorak does yield dangerous consequences. There was a point a few days ago when I wondered how my QWERTY typing was being affected, so I flipped back over to give it a try. Now, I may have been a rare case, since I never typed correctly in QWERTY to begin with. I typically find that, when using QWERTY, I use the wrong hand for keys such as Y and B, and I sometimes even use my index fingers in the place of my pinkies. Because of this, touch-typing 100% correctly in Dvorak seemed to have adversely affected my QWERTY muscle memory, leading to the trouble that I encountered. In fact, I could not type anything comprehensible at all. However, after several hours of looking at the keyboard and thinking when I got confused, I completely regained my QWERTY abilities. However, once I proved my ability to recall QWERTY if necessary, I transitioned once again to Dvorak and continued my immersion.

One of the interesting things I found when learning Dvorak was this: there are really three unique stages of learning to type with Dvorak, or any other keyboard layout, for that matter. The first stage is what I like to call the “Visual Stage.” This is the stage when, having been previously unexposed to the Dvorak layout, a typist needs to look at a diagram of the keyboard layout or the keyboard itself for the sake of pressing the right key.

The next stage comes around when the typist comes to know the position of the keys, but still must use some sort of mental process to recall the location of any given key. The time that the typist takes to do this could range from a quarter of a second to 3 or 4 seconds, but the idea remains the same: the typist must think about every individual key. This stage seems to carry the most rapid improvement, and the typist should notice an increase from something like 12 WPM to something more like 25-30 WPM. This change reflects on the decrease in response time for each individual key. Remember, during this stage any improvements are usually on a key-by-key level. However, a transition is made to the next stage when the typist begins to get more used to typing any given sequence of letters at once, and begins thinking more in the context of words than in letters.

When the next phase comes around, which I like to call the “Word Phase,” it is best to practice by thinking of English sentences and typing them out. Once in this phase, all improvement pretty much reflects on the typist’s ability to type a common sequence of letters without thinking about doing so. This muscle memory can only be gained by repetition, and such repetition is easy to obtained by typing sentences, thoughts, words, etc. I found that writing code was not particularly helpful in this stage. Instead, I downloaded a list of common English words and typed them repetitively for several hours, phenomenally increasing my speed for those particular words, and even words with similar patterns to them.

All and all, learning Dvorak was a frustrating yet rewarding experience. I am still nowhere near as good at Dvorak as I am with QWERTY, but I am now to a point where my conversations, programs, and writing is not limited by my keyboard layout. I anticipate that I will continue to use this layout in the future, and that I will someday use Dvorak to surpass my QWERTY record of 106 WPM. I wish any of you out there who attempt to do likewise the best of luck. Such a process is easiest if you think not of the final goal, but instead calmly observe the process through which your skills develop.

Slowly Learning Dvorak

If you are unfamiliar, Dvorak is an alternative keyboard layout that supposedly triumphs over QWERTY from an ergonomic standpoint. This weekend I decided to take it for a spin, and was met by a lot of frustration. At this point I can get about 12 words per minute with Dvorak, and plan to continue practicing on a daily basis. I just typed those last three sentences with Dvorak, and it must have taken me at least 5 minutes. I have to say, as a skilled QWERTY typist, it’s going to be difficult to learn a new keyboard layout from scratch. However, I’m sure it will pay off once I do.

I highly suggest Dvorak for any of you out there who are interested in trying something new, or for those of you who consistently experience pain in your hands from typing too much. Even after only a few hours of using Dvorak, it is apparent that typing with it will require much less effort than QWERTY, and that I may ultimately be able to improve my typing speed from what I currently have with QWERTY. At this point I am familiar with the Dvorak layout, and the only thing slowing me down is the time that it takes me to recall where any given key is located. Because of the frustration level, I plan to only use Dvorak for an hour or two a day, which will hopefully be enough for me to become proficient with it.

Why Focus on What You’re Good At?

Recently, I haven’t been programming as much as I should have been, mainly because of school and general overall laziness. There could be many opinions about this change, such as those from educational boneheads who would say something like “yeah, that’s good, what’s the computer gonna do for you anyway?” There are also those opinions of people who would say “well, that’s ashame, but it’s a good thing that you’re focusing on school, anyway.” Well, it’s opinions like these that caused me to dig myself into a psychological rut in the first place. The fact of the matter is, I am not smarter than anyone else, and I am no better at doing things such as studying history, or writing about literature than anybody else. What I am better at is programming, and in general I find that programming is pretty much the only activity that I genuinely enjoy doing.

Last year we took a survey at school that featured questions such as “do you find yourself excelling above your peers in any academic subjects?” It also included other questions such as “do you believe yourself to be a valuable member of society?” The thing is, I answered “No” for the first, and “Yes” for the second. Although I never found myself excelling in any particular subject in school, I programmed a whole lot, and was aware that other people were envious of my programming abilities.

This year, however, I managed to focus all of my energy on school, which in turn lessened the amount of energy that I focused on programming. After a while the pattern of not programming became imposed on my brain in such a way that I no longer had any motivation to program, since I had not experienced the thrill of it for such an extended period of time. I noticed this phenomenon after winter break this year when I looked back on what I had coded during that time, and realized how little I had done compared to last year’s winter break. Given this realization, I began to worry that I was no longer interested in programming, and that the one thing that I was really good at was beginning to slip away.

Unfortunately, right after this year’s break, teachers began piling work on students, which was probably driven by a feeling of a lack of accomplishment that was left from two weeks of not teaching. This made it difficult for me to get any programming time at all. Without anything to turn to for comfort or stability, with which programming used to provide me, I managed to get very depressed during the course of the past two weeks. I won’t go into the details of what I felt, but I can tell you that I would not have felt that way had I programmed more during those two weeks.

Finally, though, this weekend arrived and I saw a great opportunity. Because Monday (Jan 16, 2012) is MLK day, it was a three day weekend, and I was finally able to get in some quality programming time. Starting on Friday night, I worked on a typing test application for Mac, which I had started a few weeks before but had not made much progress on since. I also worked on an iOS application that I called GuessNumber, about which I wrote another post to this same blog. I also worked on a Connect4 application for HTML5, which was a rather moot project as I realized on Sunday morning. But, I didn’t let anything stop me from working on another application. For all of Sunday and into Monday (today), I worked on an application called ABCleaner, that clears out duplicates from the user’s AddressBook. For this app, it was especially fun to design the GUI, which is usually a nice treat, given that I don’t program GUIs regularly.

Overall, I wrote at least a thousand lines of code this weekend, if not more. And, to be honest, it was the most fun I’ve had since the summer. I mean, I probably had some school work to do that I haven’t done yet, but honestly I needed to get school and people out of my mind and focus on what actually makes me happy. Right now I have more self-confidence and feeling of accomplishment than school will ever be able to provide me with, and it’s all because I did something that I love, and that I’m good at. So, my message to all that are reading this is simple: do what makes you happy, even if it’s not what other’s think you should be doing. You’re not going to be able to accomplish anything if what you’re trying to accomplish won’t make you feel good about yourself, because even if you do “accomplish” an uninteresting task, did you really accomplish anything?

Can You Find My Number?

We’ve all seen this trick before. A magician lays out several cards, each with 50 numbers on them, and asks you to think of a number between 1 and 100. You then tell the magician which cards contain your number, and, magically, he tells you what your number was. Anybody who has any sense of statistics understands that there is no magic going on here, and rather that enough pointers given will allow the number to be singled out. In this case, the cards are setup in a way that the magician simply has to multiply the first number on every card that contains a number, and the product will be the number in question. As a programmer, I began to come to the realization that a machine would be much better at performing such a trick, since a machine could work perfectly, and would be able to compare the numbers on all of the cards in order to use brute force to locate the user’s chosen number. Such an idea is easy enough to implement in an iOS application, so I went ahead and did just that.

Taking me about a day to implement, GuessNumber is a small iOS app that presents the user with several cards containing an array of numbers, and requires that the user tap either “Present” or “Absent” with respect to their chosen number. After doing this several times (usually 5-7 times), the app shows the user what it believes their number to have been. Of course, the app only has a 100% accuracy rate if the user is honest, and usually I find that they aren’t, intentionally or not. The source code for the Xcode project is on Github, and a demo is available on Youtube:



Why Am I Writing a BASIC Interpreter?

Many years ago, a young and curious version of myself took an interest in calculators, and with such an interest, I felt that it was necessary for me to own every different kind of calculator that I could get my hands on. After expressing my interest to my father, he went out and bought me a Casio fx-9750G PLUS programmable calculator. At the time, however, I was not that big into programming, and thus the calculator was left to collect dust in my closet.

Recently, however, I decided to take it out again, simply because it had advanced expression parsing that I believed to be necessary for certain subjects in school. For instance, the calculator allows the entry of expressions such as “(((2.5*9.81)^2 + 2) – 32)/(64^2)”, making something that would otherwise be a hassle to calculate into a piece of cake. At first, my only ambition was to be able to do such calculations with ease, which would lessen my anxiety on science and math exams. Being a programmer, though, I soon discovered the underlying delight of my calculator, the BASIC programming environment. The fx-9750G PLUS includes an environment in which you can enter programs in Casio’s BASIC language, using one of many built-in functions and operations. I quickly began to write programs that helped me with problems on tests, and other programs that really weren’t useful for school at all.

Pretty soon, I realized that entering programs on my calculator before I planned them wasn’t the best practice, since I would then have to modify the program on my calculator in order to fix bugs. I took to the habit of writing programs in my notebook before entering them on my calculator, which did ease some of the pain, but also left something to be desired. The problem was, I could think through programs all I wanted to on paper, but how could I test them? That’s when I made the decision to make a custom BASIC interpreter for my computer that would allow me to enter and test BASIC programs on the fly, with the joy of typing on a qwerty keyboard instead of Casio’s poorly thought out keyboard layout.

Since I did not plan on fully recreating Casio’s language, I decided to call my mutated spin-off ANBasic. Before I even set out on this project, I’d been dreaming for a long while of making some form of proprietary byte-code, just because I liked the idea of creating data that is unreadable to humans but easily understandable to machines. So, it was my decision early on that my ANBasic interpreter would process a script, compile it to ANBasic byte-code, and then would be able to execute this byte-code at a later point in time.

The first step to implementing this project was to design a simple tokenizer for the ANBasic language. This tokenizer would read a raw script file, split it up line by line, and pick out tokens, such as mathematical operators, function and variable names, etc. Once the tokenizer was done, I created a “grouper,” a small set of subroutines that processed control-flow statements, applied the order of operations (PEMDAS), and grouped functions with arguments. The grouped script would then be written to a binary byte-code file in my custom byte-code format. The runtime would then load this grouped script from a file, and execute the grouped script using a series of Objective-C categories on different code objects.

Although the compiler itself generates a grouped script, and could easily execute it without writing it to a file, I already had my mind set on designing a byte-code format, so that is what I did. And, if I do say so myself, the final product is pretty nifty. Although my ANBasic project isn’t quite done as of yet, it can currently compile a variety of control-flow statements, functions, expressions, etc., and can execute them. I have tested it with some of the programs that I wrote for my calculator, and it works like a charm. You can check out the ANBasicCompiler Github repository for yourself and see what I’ve been working on. At this point, I’ve already pretty much lost interest in my Casio, as I’ve recently obtained a new TI-Nspire calculator. Despite this change, I still stuck to finishing my ANBasic project, which ended up teaching me some valuable lessons about tokenization and lexical analyzation, not to mention the fact that I simply had nothing better to do.

Unit Conversion: A Good Use For Queues

In computer science, a queue is an abstract data structure that, when used correctly, can be used to search through nodes in an efficient and exhaustive way. This kind of search is also known as a Breadth-first search. A queue is made with a stack that can push and pop “nodes.” A node can be expanded into zero or more sub-nodes, which are then pushed to the end of the stack. Every time a node is popped from the front of the stack, checks are made to see if it is the search destination. If it is not, it is expanded, and the sub-nodes are pushed to the end of the stack. If there are no items left to pop from the stack, the search has been completed with no results.

Breadth-first tree diagram

This search algorithm can be applied to many things, one of which being unit conversions. In science and throughout one’s life, units are used for different things. One might use meters for distance, kilograms for weight, or minutes for time. Most units can be linked to other units with equivalencies. For instance, one foot is equal to 12 inches, meaning that there is an equivalency between feet and inches. Further more, one yard is equal to three feet, meaning that a yard is 36 inches. A conversion like this can be done by following an equivalency chain. Although there are multiple ways to implement something that does these kind of conversions, I chose to use queues, creating an interesting and thought-intensive programming exercise.

I designed a unit converter in Objective-C that uses queues for unit conversion. The converter runs with a pre-compiled list of pretty basic equivalencies. From these, it can convert any compatible units that were programmed in before hand. For example, it knows that one foot is 12 inches, and that one yard is three feet. So, in order for it to convert from inches to yards, it must follow this equivalency chain, in this case using a queue.

First, the queue starts out with one node, or the starting point. In this case, we are starting with inches. It then pushes the available equivalencies that it knows for inches. Since the pre-programmed equivalencies are pretty bare-bones, the inch unit only has one equivalency, stating that there are 12 inches in a foot. So, the new feet node is pushed to the queue, and the inches node is removed. It expands and pops the feet node, which has an equivalency to yards. When it gets around to popping the yards node, it realizes that yards is the unit that it wants, and traces back a history data structure to figure out the equivalency chain that it has followed.

On Github, I have posted a sample project that allows the user to enter two units. It then tells you the final equivalency (e.g. inches to yards), as well as the equivalency path that it took to get that answer. Here is an example of the usage:

Have: km
Want: inch
1 kilometer = 39370.078800 inchs
kilometer -> meter -> yard -> foot -> inch

As you can see, the user entered the “have” and “want” units, and the program did the rest. The framework for unit conversion is pretty easily expandable, having a simple ConversionConst.h file where equivalencies and units can be added/removed. This file gives the program its intelligence. I worked hard to make the API able to convert units without being given very much equivalency information. It can do its job as long as the equivalency chain between two units exists in the pre-compiled information.

Porting Some Code to ARC

For the past couple of days I have been investigating Automatic Reference Counting (or ARC) that comes with LLVM 3.0. This compiler takes memory management off the developers hands, provided that a few simple rules are followed, and that conventions are kept.

I decided that the best way to get used to ARC would not be to write a new project using ARC, but instead to port some existing code to ARC. One of the libraries that I maintain, ANImageBitmapRep, seemed like a good candidate, since it is used in a number of projects that, in the future, may be ported to ARC themselves.

Realizing that ARC is not yet widely used, I decided to add conditional statements to the code that would allow it to be compiled with ARC, while still being able to work with old compilers, or with ARC disabled. I did this using the __has_feature(objc_arc) macro in conjunction with the #if compiler directive.

One of the confusing pieces to the migration to ARC was that ANImageBitmapRep, using underlying CoreGraphics objects, has many functions that return Core Foundation types such as CGImageRef. Because of this, I had to figure out how to autorelease a CF type manually, without directly calling the now forbidden autorelease instance method. In order to trick ARC to doing this, I created a function marked with __attribute__((ns_returns_autoreleased)), indicating that the return value should be retained and autoreleased. This helper function took a CGImageRef and returned an object (the result of a bridged cast from CGImageRef to id).

Besides this, all that I really had to do was remove all of my retain/release calls, and fix a couple circular retain cycles. Finally, I got a compiled version of ANImageBitmapRep and an iOS demo app to compile and run using ARC. Testing for leaks on the final product was a success, revealing that ARC had done it’s job of releasing everything that it aught to. This port was a great success, and I suspect that in the future I will be writing more code for ARC only, despite the fact that I will not be able to use the __weak ownership qualifier for compatibility issues.

Return top