Sunday 17 July 2011

Twitter Hashtags

One can now use Unicode hashtags in twitter. This means that hashtags are no longer restricted to ASCII characters. One can now, for instance, have hashtags written in the Chinese or Japanese. The hashtag #loughborough can be written in Chinese as #拉夫堡 and in Japanese as #ラフバラ . Unicode hashtags have been operational since 13th July 2011. The original announcement is on the Twitter Japan Blog at blog.jp.twitter.com/2011/07/blog-post.html. There is also an announcement of the new hangeul (한글) hashtags (해시태그/해쉬태그) on Twitter's Korea Blog at blog.kr.twitter.com/2011/07/blog-post.html

I gather from the announcement that the currently supported Scripts for hashtags are Chinese, Japanese, Hangeul and Cyrillic. Therefore the supported Scripts are a small subset of the Scripts available in the Unicode Character Set. I tested out some Scripts in hashtags and my results are:

  1. Chinese ✓
  2. Japanese ✓
  3. Hangeul ✓
  4. Cyrillic ✓
  5. Thai ✗
  6. Arabic ✗
  7. Hebrew ✗
  8. Devanagari ✗
  9. Tamil ✗
The announcement states that symbols are not allowed in hashtags. I tested #→ #① #∛ #≤ #△ #☃ #◲ #✈ #❄ #☺and none work as hashtags.

An example tweet using Japanese hashtags is at http://twitter.com/#!/andreschappo/...

I had written a previous article which covered some of the Twitter i18n issues http://schappo.blogspot.com/2011/... The implementation of Unicode hashtags is a significant i18n step forward. It will be interesting to see how this new feature develops amongst Japanese language tweeters. I have already noticed that there are some long Japanese hashtags. As I write this blog I notice that currently there are two Japanese hashtag Trending Topics in Japan:

  • #名言の文末を過去形にすると深みが増す
  • #文頭に週刊をつけるとディアゴスティーニ風になる

Japanese does not use the space character between words and so it is easy and natural to create long Japanese hashtags. It could be the length of a complete tweet by simply having # as the first character.