Search notes:

UTF-8

UTF-8 is an unicode encoding that uses 1 to 4 bytes to represent a Unicode Character.

Essential characteristics of UTF-8

One of the main characteristics of UTF-8 is that preserves the full range of the US-ASCII characters. Thus, when UTF-8 was introduced, it was already compatible with software and data that processed ASCII characters exclusively. (See also Adoption of ISO 10646).

UTF-8 encodes uses a varying number of bytes to encode the individual Universal Character Set (UCS).

The byte-values C0, C1, F5 and FF occur never in UTF-8 encoded text.

The boundaries between characters are easily found from any byte in a UTF-8 stream.

Boyer-Moore fast search algorithm can be applied.

Byte format types

from	to	byte sequence
`0x00`	`0x7f`	`0xxxxxxx`
`0x80`	`0x07ff`	`110xxxxx 10xxxxxx`
`0x800`	`0xffff`	`1110xxxx 10xxxxxx 10xxxxxx`
`0x00010000`	`0010ffff`	`11110xxx 10xxxxxx 10xxxxxx 10xxxxxx`

BOM (Byte Order Mark)

The BOM for UTF-8 is ef bb bf, however, the Unicode standard does not recommend it.

Even though the BOM is not recommend for UTF-8, PowerShell scripts check for such a BOM and especialy in conjunction with COM, it is often very benefical to use the BOM.

Misc

UTF-8 was developed by Ken Thompson

Fatal error: Uncaught PDOException: SQLSTATE[HY000]: General error: 8 attempt to write a readonly database in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php:78 Stack trace: #0 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(78): PDOStatement->execute(Array) #1 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(30): insert_webrequest_('/notes/developm...', 1741087396, '3.145.201.123', 'Mozilla/5.0 App...', NULL) #2 /home/httpd/vhosts/renenyffenegger.ch/httpsdocs/notes/development/Unicode/UTF-8(93): insert_webrequest() #3 {main} thrown in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php on line 78