Encoding

Some UTF-8 encoding/decoding

version 1
package wikindx4\core\utf8
author "Sebastián Grignoli"
version 1.1
link http://www.framework2.com.ar/dzone/forceUTF8-es/
example http://www.framework2.com.ar/dzone/forceUTF8-es/

 Methods

Fix UTF-8

UTF8FixWin1252Chars(string $text) : string
Static

If you received an UTF-8 string that was converted from Windows-1252 as it was ISO8859-1 (ignoring Windows-1252 chars from 80 to 9F) use this function to fix it. See: http://en.wikipedia.org/wiki/Windows-1252

Parameters

$text

string

Returns

string

Encode to Win1252

encode(string $encodingLabel, string $text) : string
Static

Parameters

$encodingLabel

string

$text

string

Returns

string

Fix UTF-8

fixUTF8(string $text) : string
Static

Parameters

$text

string

Returns

string

Normalize encoding label

normalizeEncoding(string $encodingLabel) : string
Static

Parameters

$encodingLabel

string

Returns

string

Remove BOM

removeBOM(string $str) : string
Static

Parameters

$str

string

Default is ""

Returns

string

Encode to ISO8859

toISO8859(string $text) : string
Static

Parameters

$text

string

Returns

string

Encode to Latin1

toLatin1(string $text) : string
Static

Parameters

$text

string

Returns

string

Function Encoding::toUTF8

toUTF8(string $text) : string
Static

This function leaves UTF8 characters alone, while converting almost all non-UTF8 to UTF8.

It assumes that the encoding of the original string is either Windows-1252 or ISO 8859-1.

It may fail to convert characters to UTF-8 if they fall into one of these scenarios:

1) when any of these characters: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß are followed by any of these: ("group B") ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶•¸¹º»¼½¾¿ For example: %ABREPRESENT%C9%BB. «REPRESENTÉ» The "«" (%AB) character will be converted, but the "É" followed by "»" (%C9%BB) is also a valid unicode character, and will be left unchanged.

2) when any of these: àáâãäåæçèéêëìíîï are followed by TWO chars from group B, 3) when any of these: ðñòó are followed by THREE chars from group B.

name toUTF8

Parameters

$text

string

Any string.

Returns

stringThe same string, UTF8 encoded

Encode to Win1252

toWin1252(string $text) : string
Static

Parameters

$text

string

Returns

string

 Properties

 

$brokenUtf8ToUtf8 
 

$utf8ToWin1252 
 

$win1252ToUtf8