Encoding
Some UTF-8 encoding/decoding
version | 1 |
---|---|
package | wikindx4\core\utf8 |
author | "Sebastián Grignoli" |
version | 1.1 |
link | http://www.framework2.com.ar/dzone/forceUTF8-es/ |
example | http://www.framework2.com.ar/dzone/forceUTF8-es/ |
UTF8FixWin1252Chars(string $text) : string
If you received an UTF-8 string that was converted from Windows-1252 as it was ISO8859-1 (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it. See: http://en.wikipedia.org/wiki/Windows-1252
string
string
encode(string $encodingLabel, string $text) : string
string
string
string
fixUTF8(string $text) : string
string
string
normalizeEncoding(string $encodingLabel) : string
string
string
removeBOM(string $str) : string
string
Default is ""
string
toISO8859(string $text) : string
string
string
toLatin1(string $text) : string
string
string
toUTF8(string $text) : string
This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8.
It assumes that the encoding of the original string is either Windows-1252 or ISO 8859-1.
It may fail to convert characters to UTF-8 if they fall into one of these scenarios:
1) when any of these characters: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß are followed by any of these: ("group B") ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶•¸¹º»¼½¾¿ For example: %ABREPRESENT%C9%BB. «REPRESENTÉ» The "«" (%AB) character will be converted, but the "É" followed by "»" (%C9%BB) is also a valid unicode character, and will be left unchanged.
2) when any of these: àáâãäåæçèéêëìíîï are followed by TWO chars from group B, 3) when any of these: ðñòó are followed by THREE chars from group B.
name | toUTF8 |
---|
string
Any string.
string
The same string, UTF8 encodedtoWin1252(string $text) : string
string
string
$brokenUtf8ToUtf8
$utf8ToWin1252
$win1252ToUtf8