Categories

See More
Popular Forum

MBA (4887) B.Tech (1769) Engineering (1486) Class 12 (1030) Study Abroad (1004) Computer Science and Engineering (988) Business Management Studies (865) BBA (846) Diploma (746) CAT (651) B.Com (648) B.Sc (643) JEE Mains (618) Mechanical Engineering (574) Exam (525) India (462) Career (452) All Time Q&A (439) Mass Communication (427) BCA (417) Science (384) Computers & IT (Non-Engg) (383) Medicine & Health Sciences (381) Hotel Management (373) Civil Engineering (353) MCA (349) Tuteehub Top Questions (348) Distance (340) Colleges in India (334)
See More

Kakasi kanji to roomaji converter encoding difficulties

General Tech Learning Aids/Tools
Max. 2000 characters
Replies

usr_profile.png
Vikrant Srivastava

User

( 6 months ago )

I am trying to use the Kakasi kanji/hiragana/katakana to roomaji converter, as an aid to learning kanji pronunciation within specific sentences. I am using command and parameters:

kakasi -Ja -Ha -Ka -Ea -s

For example, converting today's date gives:

$ echo "731" | kakasi -Ja -Ha -Ka -Ea -s 
7 shin ?? 1 ka �

There is clearly a configuration error, that I think comes from the input encoding (UTF-8) not being correctly understood by the tool.

Could anybody with experience on this matter please advise on how to either tell kakasi to accept Unicode input, or suggest an alternative open-source tool for conversion that works better? (Please, no Windows software.)

usr_profile.png
Katie George

User

( 6 months ago )

 

Thanks to comments by @Earthliŋ and @blutorange (recognition where recognition is due), the combination of iconv with kakasi has finally worked. Initial convertion from Unicode to Shift-JIS is required, and performed using:

$ echo "731" | iconv -f utf8 -t shift-jis | kakasi -Ja -Ha -Ka -Ea -s 

7 gatsu 31 nichi

Conversion back in the other direction is not needed when output is roumaji, since the basic characters have low ASCII values that are identical under all encodings. If necessary, conversion from Shift-JIS back to Unicode can be performed with:

$ echo "731" | iconv -f utf8 -t shift-jis | kakasi -Ja -Ha -Ka -Ea -s | iconv -f shift-jis -t utf8

7 gatsu 31 nichi

For instance, to convert into Hiragana:

$ echo "731" | iconv -f utf8 -t shift-jis | kakasi -JH -KH -Ea -s | iconv -f shift-jis -t utf8

7 がつ 31 にち

Update

As pointed out by @oals in the comments, newer versions of kakasi have the little documented parameters -iutf8 and -outf8 to specify Unicode encoding for either input or output. The above conversion to Hiragana can then be more efficiently performed using:

$ echo "731" | kakasi -JH -KH -Ea -s -iutf8 -outf8

7 がつ 31 にち

Thanks for your help.

what's your interest


forum_ban8_5d8c5fd7cf6f7.gif