Pod Documentaion
Unicode::MapUTF8 - Conversions to and from arbitrary character sets and
UTF8
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
# Convert a string in 'ISO-8859-1' to 'UTF8'
my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });
# Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
my $other = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });
# List available character set encodings
my @character_sets = utf8_supported_charset;
# Convert between two arbitrary (but largely compatible) charset encodings
# (SJIS to EUC-JP)
my $utf8_string = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })
# Verify that a specific character set is supported
if (utf8_supported_charset('ISO-8859-1') {
# Yes
}
Provides an adapter layer between core routines for converting to and from
UTF8 and other encodings. In essence, a way to give multiple existing
Unicode modules a single common interface so you don't have to know the
underlaying implementations to do simple UTF8 to-from other character set
encoding conversions. As such, it wraps the Unicode::String, Unicode::Map8,
Unicode::Map and Jcode modules in a standardized and simple API.
This also provides general character set conversion operation based on UTF8
- it is possible to convert between any two compatible and supported
character sets via a simple two step chaining of conversions.
As with most things Perlish - if you give it a few big chunks of text to
chew on instead of lots of small ones it will handle many more characters
per second.
By design, it can be easily extended to encompass any new charset encoding
conversion modules that arrive on the scene.
1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
where 'utf' was meant in code. Problem affected
utf16 and utf7 encodings. Problem found
by devon smith <devon@taller.PSCL.cwru.edu>
1.08 2000.11.06 - Added 'utf8_charset_alias' function to
allow for runtime setting of character
set aliases. Added several alternate
names for 'sjis' (shiftjis, shift-jis,
shift_jis, s-jis, and s_jis).
Corrected 'croak' messages for
'from_utf8' functions to appropriate
function name.
Tightened up initialization encapsulation
Corrected fatal problem in jcode from
unicode internals. Problem and fix
found by Brian Wisti <wbrian2@uswest.net>.
1.07 2000.11.01 - Added 'croak' to use Carp declaration to fix error messages.
Problem and fix found by Brian Wisti <wbrian2@uswest.net>.
1.06 2000.10.30 - Fix to handle change in stringification of overloaded
objects between Perl 5.005 and 5.6.
Problem noticed by Brian Wisti <wbrian2@uswest.net>.
1.05 2000.10.23 - Error in conversions from UTF8 to multibyte encodings
corrected
1.04 2000.10.23 - Additional diagnostic messages added for internal error
conditions
1.03 2000.10.22 - Bug fix for load time autodetction of Unicode::Map8
encodings
1.02 2000.10.22 - Added load time autodetection of Unicode::Map8 supported
character set encodings.
Fixed internal calling error for some character sets with 'from_utf8'.
Thanks goes to Ilia Lobsanov <ilia@lobsanov.com> for reporting the
problem.
1.01 2000.10.02 - Fixed handling of empty strings and added more
identification for error messages.
1.00 2000.09.29 - Pre-release version
- utf8_supported_charset($charset_name);
-
Returns true if the named charset is supported. false if it is not.
Example:
if (! utf8_supported_charset('VISCII')) {
# No support yet
}
If called in a list context with no parameters, it will return a list of
all supported character set names.
Example:
my @charsets = utf8_supported_charset;
- to_utf8({ -string = $string, -charset => $source_charset });
-
Returns the string converted to UTF8 from the specified source charset.
- from_utf8({ -string = $string, -charset => $target_charset});
-
Returns the string converted from UTF8 to the specified target charset.
1.07 2000.11.01
Copyright September, 2000 Benjamin Franz. All rights reserved.
This software is free software. You can redistribute it and/or modify it
under the same terms as Perl itself.
Benjamin Franz <snowhare@nihongo.org>
Regression tests for Jcode and 2-byte encodings
Unicode::String Unicode::Map8 Unicode::Map Jcode
|