Sunday 26 November 2017

Computer Science Internationalization - identifiers

When programming I sometimes use Unicode identifiers instead of ASCII identifiers. The basic principle of Unicode identifiers in a programming context is that the characters used in a Unicode identifier should belong to some human language script. There are of course exceptions, such as, Apple's Swift programming language which allows use of Emoji in program identifiers.

Egyptian Hieroglyphs have the Unicode general category Lo Letter other. A letter of the Egyptian Hieroglyphs script. I have just tested using an Egyptian Hieroglyph (codepoints.net/U+13001) as a variable name and it works fine.

I recently decided to identify my html code files with my adopted Chinese name 小山. My standard working practice is to now have the start html tag of my code files as <html id="小山">. It is so cool to be able to write document.getElementById("小山").

Many programming languages now support Unicode identifiers. How good is the support and how far can one go? I chose one of my html code files, which apart from <html id="小山">, all the identifiers are ASCII identifiers: ASCII Javascript variable names, ASCII Javascript function names and ASCII CSS class and id names. My aim was to replace all these ASCII identifiers with Unicode identifiers. Furthermore, I chose to have my Unicode identifiers written in CJK (Chinese, Japanese and Korean scripts.

The end result was an html code file with all ASCII identifiers replaced by Unicode identifiers and it all works just fine. I was expecting it to work fine but this is the first time I have had an all Unicode identifiers code file so best to check that it does work fine.

My Unicode identifiers code file is a bit too long to include in this blog article. Instead, I list some of the original ASCII identifiers and their Unicode replacements.

Firstly CSS — I used CJK identifiers for class and id names. Below I show before (ASCII) and after (Unicode). I think you can guess that ✅️  means they worked just fine
.keys .keys:hover & class="keys" ➽ .ボタン .ボタン:hover & class="ボタン" ✅️ .emphasise class="emphasise" ➽ .エンファサイズ class="エンファサイズ" ✅️ #footy id="footy" ➽ #바닥글 id="바닥글" ✅️ #earth id="earth" getElementById("earth") ➽ #地球 id="地球" getElementById("地球") ✅️

I changed all function names to Unicode CJK names. Here are 2 examples of my changes.
moveMoon() onclick="moveMoon()" ➽ 달을움직이다() onclick="달을움직이다()" ✅️ stopMoon() onclick="stopMoon()" ➽ 달을멈추다() onclick="달을멈추다()" ✅️

Finally, I changed all variable names to Unicode CJK names. Here are 2 examples.
increment ➽ 增量 ✅️ raceTrack1Width ➽ 跑道一宽度 ✅️

...and here is one of my JavaScript functions, with an embedded function
function 달을움직이다(){ var 月亮=document.getElementById("まんげつ"), 位置左月亮=0, 月亮增量=增量; if(月亮在移动)return; 身份月亮=setInterval(달케이크,月亮时间); 月亮在移动=true; function 달케이크(){ if(位置左月亮>跑道二宽度-50||位置左月亮<0)月亮增量*=(-1); 位置左月亮+=月亮增量; 月亮.style.left=位置左月亮+"px"; } }

My Unicodified html code file was successfully validated by validator.w3.org

Here is the Unicode Consortium's take on Unicode identifiers:— Unicode Standard Annex #31 - Unicode Identifier and Pattern Syntax unicode.org/reports/tr31/