NAME mb - Can easy script in Big5, Big5-HKSCS, GBK, Sjis(also CP932), UHC, UTF-8, ... SYNOPSIS $ perl mb.pm script_by_mbcs.pl (auto detect encoding of script) $ perl mb.pm -e big5 script_by_big5.pl $ perl mb.pm -e big5hkscs script_by_big5hkscs.pl $ perl mb.pm -e eucjp script_by_eucjp.pl $ perl mb.pm -e gb18030 script_by_gb18030.pl $ perl mb.pm -e gbk script_by_gbk.pl $ perl mb.pm -e sjis script_by_sjis.pl $ perl mb.pm -e sjis script_by_cp932.pl $ perl mb.pm -e uhc script_by_uhc.pl $ perl mb.pm -e utf8 script_by_utf8.pl $ perl mb.pm -e wtf8 script_by_wtf8.pl C:\WINDOWS> perl mb.pm script.pl ??-DOS-like *wildcard* available MBCS quotes: qq/ DAMEMOJI 功声乗ソ / q/ DAMEMOJI 功声乗ソ / m/ DAMEMOJI 功声乗ソ / s/ DAMEMOJI 功声乗ソ / DAMEMOJI 功声乗ソ / split / DAMEMOJI 功声乗ソ / tr/ DAMEMOJI 功声乗ソ / DAMEMOJI 功声乗ソ / y/ DAMEMOJI 功声乗ソ / DAMEMOJI 功声乗ソ / qr/ DAMEMOJI 功声乗ソ / MBCS subroutines: mb::chop(...); mb::chr(...); mb::do 'file'; mb::dosglob(...); mb::eval 'string'; mb::getc(...); mb::index(...); mb::index_byte(...); mb::length(...); mb::ord(...); mb::require 'file'; mb::reverse(...); mb::rindex(...); mb::rindex_byte(...); mb::substr(...); mb::use Module; mb::no Module; MBCS special variables: $mb::PERL $mb::ORIG_PROGRAM_NAME supported encodings: Big5, Big5-HKSCS, EUC-JP, GB18030, GBK, Sjis(also CP932), UHC, UTF-8, WTF-8 supported operating systems: Apple Inc. OS X, Hewlett-Packard Development Company, L.P. HP-UX, International Business Machines Corporation AIX, Microsoft Corporation Windows, Oracle Corporation Solaris, and Other Systems supported perl versions: perl version 5.005_03 to newest perl DESCRIPTION This software is a source code filter, a transpiler-modulino. Perl is said to have been able to handle Unicode since version 5.8. However, unlike JPerl, "Easy jobs easy" has been lost. (but we have got it again :-D) In Shift_JIS and similar encodings(Big5, Big5-HKSCS, GB18030, GBK, Sjis, CP932, UHC) have any DAMEMOJI who have metacharacters at second octet. Which characters are DAMEMOJI is depends on whether the enclosing delimiter is single quote or double quote. This software escapes DAMEMOJI in your script, generate a new script and run it. Larry Wall san's Style If you're using the utf8 pragma and you have a big headache, probably, you're on the wrong way. You should back to the Larry Street where is a sign that says ver.5.00503, once. There is another path there. Follow that path. Soon, your headache will be improve. The "length()" described in the script universally functions as "bytes::length()", and the "substr()" in the script universally functions as "bytes::substr()". If you want to know the number of code points of multibyte characters contained in a scalar value, you have to write "mb::length()". If you want to execute "substr()" in code point context, you have to write "mb::substr()". Once, Larry Wall san said like this; "Easy jobs must be easy." Welcome to world of Larry Wall san's Style!! SEE ALSO https://metacpan.org/author/INA http://backpan.cpantesters.org/authors/id/I/IN/INA/ https://metacpan.org/release/Jacode4e-RoundTrip https://metacpan.org/release/Jacode4e https://metacpan.org/release/Jacode