Friday 31 March 2017

Computer Science Internationalization - Adaptive URL

A URL can consist of a Domain Name and a pathname. In the examples below x.y.z represents the Domain Name, the remainder being the pathname. My experience of the internet is that the pathname is usually written in English or more accurately ASCII. The below ASCII pathname represents a multi-page website in the form of a journey from home to a hotel in Korea.

x.y.z/home/bus/airplane/korea/taxi/hotel

Websites, such as Google, adapt the language of their text content according to the browser preferred display language (BL). This browser preferred language can be set by the user. Letʼs go one step further than Google and adapt the language of the URL pathname according to the BL. Here is the ASCII pathname rewritten into Chinese, Japanese and Korean.

x.y.z/家/公共汽车/飞机/韩国/出租车/饭店

x.y.z/ホーム/バス/飛行機/韓国/タクシー/ホテル

x.y.z/홈/버스/비행기/한국/택시/호텔

So, how do we implement these language adaptive URL parthnames? Firstly, we need to programmatically determine the BL. One way of achieving this is to examine the Accept-Language http header sent from the browser to the server. This will contain one or more language tags. If there is more than one language tag they are presented in priority order. Language tags can take many forms. They include: zh, zh-CN and cmn for Mandarin Chinese; ja for Japanese and ko for Korean. Now that we can determine the BL we can select the appropriate URL pathname, thus internationalizing our website with a language adaptive URL pathname.

On a Linux machine, each component of the pathname will be a directory. In my schema I am assuming an index.html or index.php, per directory. A requirement of this schema is that we do not want a directory hierarchy for each language, nor do we want an index.html or index.php for each language.

My native language is English so I will make my master pathname directory names English ie home, bus, airplane, korea, taxi and hotel. I will make the Chinese, Japanese and Korean directory names as aliases to the English named master directories. This can be easily achieved on Linux with the ln -s command, where ln means link and the -s option means create symbolic link, as opposed to a hard link.

ln -s home 家
ln -s home ホーム
ln -s home 홈

ln -s hotel 饭店
ln -s hotel ホテル
ln -s hotel 호텔

What if your native language is not English? In that case, create the master pathname directory names in your native language. If your native language is Korean then the master directory names will be 집, 버스, 비행기, 한국, 택시 and 호텔 and your links will be:

ln -s 홈 home
ln -s 홈 家
ln -s 홈 ホーム

ln -s 호텔 hotel
ln -s 호텔 饭店
ln -s 호텔 ホテル

Emoji are hugely popular so letʼs construct a totally cool Emoji pathname.

x.y.z/🏡/🚌/🛩/🇰🇷/🚕/🏨

ln -s home 🏡
ln -s bus 🚌
ln -s airplane 🛩
ln -s korea 🇰🇷
ln -s taxi 🚕
ln -s hotel 🏨

I have never encountered an Emoji URL pathname on a website and so implementing such a pathname on your website would be both totally cool and unique. You could also use an Emoji pathname for those languages your website does not support. My schema only supports Chinese, English, Japanese and Korean. If the BL was an unsupported language, such as Arabic, then the Emoji pathname could be displayed in the browser address bar instead of, for example, defaulting to English.

I have used x.y.x to represent the Domain Name, the implication being it is ASCII. We can complete the language adaptive equation by having Domain Names in supported BL languages. Thus my completed equation schema would have Chinese, Japanese and Korean Domain Names in addition to an ASCII Domain Name.