Supporting Unicode and Emoji characters in Unity3D (or other Engine) Game - The Lazy Way

Recently I had been working on a client project. Most of their targeted customers were not English speakers. As a result, they wanted to support multiple languages in their games. If that was not enough, they also wanted to support Emoji's in there.

We were developing their game on Unity3D. Most of the game's actions were validated and executed on the backend, which was configured on LAMP [Linux, Apache, MySQL and PHP]. By the time this issue was raised, most of the work was already done. The Database structure and the game data was already populated.

The Mess

A quick search on Google was enough to confuse you even more. Turns out, Unicode characters can be supported by various encodings like UTF8 & UTF16. Add to that, there are multiple versions of UTF8. And there are multiple versions like UTF-8, CESU8, UTF-8 modified. And then there was MySQL which has its own sets of character encoding. MySQL supports UTF-8. But if you also want to use Emoji, you need to use a special version of UTF-8, which is known as utf8mb4. So despite the U in UTF standing for Universal, there is nothing Universal in UTF.Which one's to use?

Since my database was already setup, I wanted the shortest possible way to get this sorted out. I configured the database to support utf8mb4 and found out that even though the non-english characters were showing up properly, the emoji's were not visible on the mobile devices. Turns out, even Emoji's are not standardised. There are at-least 4 different [Docomo, Softbank, Google etc. etc. etc.] versions of Emoji character space in UTF.






Fortunately, A guy named Cal did the awesome job of creating a PHP library which provides some semblance to the Emoji mess, which is linked below.

http://code.iamcal.com/php/emoji/

However, the above library will work only for web pages. Inside an app, it'd be difficult to slice an image and show the appropriate emoji.

The Lazy Solution


Long story short, I finally though of converting the Unicode binary sequence to ASCII text. I did check a few options, before finally settling for Base 64 Encoding.

$asciiString = base64_decode($unicodeString);    // PHP Code

What this does is convert your unicode string with Emoji [or anything else] to an encoded ASCII string, which you can store in your normal database, without altering the DB's or table's encoding. The flipside is, that the string length will be around 3-4 times the Unicode string's length, so perhaps you'll need to increase the fields length based on your requirements.

When you need the Unicode back to display it on device, use the below method.

$unicodeStringbase64_encode($asciiString);    // PHP Code

You can of course do this inside the game code instead. In Unity, you can include the following libraries:

System
System.Text

And then Base64 encode a Unicode string to ASCII by using:

byte[] byteStream = Encoding.UTF8.GetBytes (unicodeString);
string asciiString = Convert.ToBase64String (byteStream);

And then revert back to Unicode from ASCII using:

byte[] byteStream = Convert.FromBase64String (asciiString);
string
unicodeString = Encoding.UTF8.GetString (byteStream);

The Unicode string supports all known languages and Emoji's, and there's also the option of using the private space provision. ;)

Conclusion


This may not be the most elegant solution. Perhaps there are better solutions out there. This solution gets the job done, and is not messy. If you're stuck with a similar issue, use this solution. If you've a better solution, I'd love to know. :) 

If this post was helpful, don't forget to Like and Share!

Comments

Anonymous said…
Thank You. Very much.
Unknown said…
You're welcome First Commenter on my entire Blog!
Someday I'll name a Star after you (if only I knew your name :D )!

Popular posts from this blog

Cocos2d-X v3.x : Calling Functions after Delay

[New] Cocos2DX vs. Unity. What to Choose for 2D Development?