UTF8 routines

version 1
package wikindx4\core\display
author Mark Grimshaw

 Methods

UTF8

__construct() 

UTF-8 encoding - PROPERLY decode UTF-8 as PHP's utf8_decode can't hack it.

decodeUtf8(string $utf8_string) : string

Freely borrowed from morris_hirsch at http://www.php.net/manual/en/function.utf8-decode.php bytes bits representation 1 7 0bbbbbbb 2 11 110bbbbb 10bbbbbb 3 16 1110bbbb 10bbbbbb 10bbbbbb 4 21 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb Each b represents a bit that can be used to store character data. input CANNOT have single byte upper half extended ascii codes

Parameters

$utf8_string

string

Returns

string

Decode UTF8 if string is UTF-8 encoded

decode_if_utf8(string $string) : string

Parameters

$string

string

Returns

string

Detect if string is UTF8

detectUtf8(string $string) : boolean

Parameters

$string

string

Returns

boolean

Encode string as UTF8

encodeUtf8(string $str) : string

Parameters

$str

string

Returns

string

Decode UTF-8 to unicode ONLY if the input has been UTF-8-encoded.

smartUtf8_decode(string $inStr) : string

Adapted from 'nospam' in the user contributions at: http://www.php.net/manual/en/function.utf8-decode.php

Parameters

$inStr

string

Returns

string

Encode UTF-8 if not already UTF-8

smartUtf8_encode(string $str) : boolean: string
version 1 Tools for validing a UTF-8 string is well formed. The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved. Ported to PHP by Henri Sivonen (http://hsivonen.iki.fi) Slight modifications to fit with phputf8 library by Harry Fuecks (hfuecks gmail com)
see \http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp
see \http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUnicodeToUTF8.cpp
see Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard Note: this function has been modified to simple return true or false
author hsivonen@iki.fi
see \http://hsivonen.iki.fi/php-utf8/
see \utf8_compliant

Parameters

$str

string

UTF-8 encoded string

Returns

booleantrue if valid
string

Test form input for a character set we can handle.

testFormInput(string $input) : string

$this->vars['utf8CharTest'] is a hidden field wikindx automatically adds to all forms and contains the characters 'ä™®'

See http://dev.mysql.com/tech-resources/articles/4.1/unicode.html

Parameters

$input

string

Returns

string

UTF-8 version of chr()

utf8_chr(string $code) : string

Parameters

$code

string

Returns

string

UTF8 version of htmlspecialchars()

utf8_htmlspecialchars(string $str) : string

Parameters

$str

string

Returns

string

This is a unicode aware replacement for strlen()

utf8_strlen(string $string) : string

Uses mb_string extension if available

author Andreas Gohr
see \strlen()

Parameters

$string

string

Returns

string

A unicode aware replacement for strtolower()

utf8_strtolower(string $string) : string

Uses mb_string extension if available

author Andreas Gohr
see \strtolower()
see \utf8_strtoupper()

Parameters

$string

string

Returns

string

A unicode aware replacement for strtoupper()

utf8_strtoupper(string $string) : string

Uses mb_string extension if available

author Andreas Gohr
see \strtoupper()
see \utf8_strtoupper()

Parameters

$string

string

Returns

string

A unicode aware replacement for substr()

utf8_substr(string $str, int $start, int $length) : string

Uses mb_string extension if available

author Andreas Gohr
see \substr()

Parameters

$str

string

$start

int

$length

int

Default is NULL

Returns

string

A unicode aware replacement for ucfirst()

utf8_ucfirst(string $str) : string
author Andrea Rossato
see \ucfirst()

Parameters

$str

string

Returns

string

Convert a number to UTF-8

code2utf8(int $num) : string

Parameters

$num

int

Returns

string

UTF-8 Case lookup table

loadVars() 

This lookuptable defines the upper case letters to their corresponding lower case letter in UTF-8

author Andreas Gohr

This function converts a Unicode array back to its UTF-8 representation

unicode_to_utf8(string $str) : string

This function returns any UTF-8 encoded text as a list of Unicode values

utf8_to_unicode(string $str) : string

 Properties

 

$vars 

UTF-8 routines

version 1
package wikindx4\core\utf8
author Mark Grimshaw

 Methods

UTF8

__construct() 

UTF-8 encoding - PROPERLY decode UTF-8 as PHP's utf8_decode can't hack it.

decodeUtf8(string $utf8_string) : string

Freely borrowed from morris_hirsch at http://www.php.net/manual/en/function.utf8-decode.php bytes bits representation 1 7 0bbbbbbb 2 11 110bbbbb 10bbbbbb 3 16 1110bbbb 10bbbbbb 10bbbbbb 4 21 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb Each b represents a bit that can be used to store character data.

input CANNOT have single byte upper half extended ascii codes

Parameters

$utf8_string

string

Returns

string

Encode UTF-8 from unicode &#xxx characters

encodeUtf8(string $str) : string

Parameters

$str

string

Returns

string

Decode UTF-8 ONLY if the input has been UTF-8-encoded.

smartUtf8_decode(string $inStr) : string

Adapted from 'nospam' in the user contributions at: http://www.php.net/manual/en/function.utf8-decode.php

Parameters

$inStr

string

Returns

string

Encode UTF-8 if not already UTF-8

smartUtf8_encode(string $str) : boolean
version 2 Tools for validing a UTF-8 string is well formed. The Original Code is Mozilla Communicator client code. The Initial Developer of the Original Code is Netscape Communications Corporation. Portions created by the Initial Developer are Copyright (C) 1998 the Initial Developer. All Rights Reserved. Ported to PHP by Henri Sivonen (http://hsivonen.iki.fi) Slight modifications to fit with phputf8 library by Harry Fuecks (hfuecks gmail com)
see \http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUTF8ToUnicode.cpp
see \http://lxr.mozilla.org/seamonkey/source/intl/uconv/src/nsUnicodeToUTF8.cpp
see Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard
author hsivonen@iki.fi
see \http://hsivonen.iki.fi/php-utf8/
see \utf8_compliant

Parameters

$str

string

UTF-8 encoded string

Returns

booleantrue if valid

UTF-8 version of htmlspecialchars()

utf8_htmlspecialchars(string $str) : string

Parameters

$str

string

Returns

string

A unicode aware replacement for strlen()

utf8_strlen(string $string) : string

Uses mb_string extension if available

author Andreas Gohr
see \strlen()

Parameters

$string

string

Returns

string

A unicode aware replacement for strtolower()

utf8_strtolower(string $string) : string

Uses mb_string extension if available

author Andreas Gohr
see \strtolower()
see \utf8_strtoupper()

Parameters

$string

string

Returns

string

A unicode aware replacement for strtoupper()

utf8_strtoupper(string $string) : string

Uses mb_string extension if available

author Andreas Gohr
see \strtoupper()
see \utf8_strtoupper()

Parameters

$string

string

Returns

string

A unicode aware replacement for substr()

utf8_substr(string $str, int $start, int $length) : string

Uses mb_string extension if available

author Andreas Gohr
see \substr()

Parameters

$str

string

$start

int

$length

int

Default is NULL

Returns

string

A unicode aware replacement for ucfirst()

utf8_ucfirst(string $str) : string
author Andrea Rossato
see \ucfirst()

Parameters

$str

string

Returns

string

convert an integer to its chr() representation

code2utf8(int $num) : string

Parameters

$num

int

Returns

string

UTF-8 Case lookup table

loadVars() 

This lookuptable defines the upper case letters to their corresponding lower case letter in UTF-8

author Andreas Gohr

This function converts a Unicode array back to its UTF-8 representation

unicode_to_utf8(string $str) : string

This function returns any UTF-8 encoded text as a list of Unicode values:

utf8_to_unicode(string $str) : string