When coding, it is essential to consider and account for edge cases. Whilst coding Internationalised Programming Challenge 17 jsfiddle.net/coas/4djhso1y I happened upon an unexpected and fascinating edge case.
Before I reveal the edge case I need to give you some background information, starting with some Chinese characters.
娥，鄂，鹅，仒，厄，戹*，屵*，阨*，阿*，呝，俄，砨*，偔，堨*，圔*，誒*，噁*，儑*，貖*，礘*，櫮，鰪*，岋，阸*，妸，咢，匎*，卾，隘*，廅*，僫，蕚，噩，鍔，額，鰐，讹，吪，妿，咹，胺，啞*，蛯*，搤，磀，遻*，嶭*，騀，顎，鶚 ...and many more at chinese-tools.com/tools/sinograms.html?p=e
All these Chinese characters can be written in pinyin as E or e. Those characters marked with *, have multiple meanings, hence multiple pronunciations, hence multiple ways of writing in pinyin. Those characters not marked with * are only written in pinyin as E or e. Some of you may well be thinking, what of tone marks. Well, unless I explicitly request it, I have never seen a Chinese person write pinyin with tone marks.
Some of these Chinese characters are family names and some would be suitable for given names.
So, now to the edge case. A Chinese name when written in pinyin could be E E or Ee E. I asked on Weibo 微博 whether anyone knew of any Chinese name which when written in pinyin is E E or Ee E. One person responded with the Chinese name 鄂娥 which written in pinyin is E E.
I reason that a person whose only language is English would think E E are initials and not the full name. Actually, before I considered name edge cases I would probably also have thought E E are initials. I have been aware for a long time that some Chinese characters can be written in pinyin as single letters such as e or a but I had not made the connection with people names.
This example illustrates that programmers need to thoroughly research naming conventions in different countries/cultures/languages before writing validation code.
I would like to encompass several international naming conventions in my Challenge 17. So far, I have coded validation rules for Chinese and a catchall. I welcome contributions of international naming rules which I will code and incorporate into Challenge 17. You can email me, or if you do not know my email me you can tweet me @andreschappo or contact me on Weibo 微博 @schappo
Techie stuff: The regex I use for Chinese name validation is:
My regex is using negative look-ahead, recognisable by the ?! construct. For each character, I am checking that it is a Han* character and is not a radical or symbol or punctuation character. This can be generalised to:
which reads as: a character must be in Character_Set_A and not in Character_Set_B in order to be valid.
* A Han character, in this context, is actually a CJK (Chinese or Japanese or Korean) character but that is far too long a story for this blog article.