Search notes:

Character encoding

A character encoding specifies how a sequence of Unicode characters is converted to bytes (while decoding transforms such bytes back to respective Unicode characters again).

In Windows, a character encoding is identified by a Windows code pages.

Unicode and ISO 10646 define a few encodings for the UCS (Universal Character Set):

UTF-8
UTF-16
UTF-32
UCS-2
UCS-4

Determining the character encoding of a file or a byte stream

uchardet (by freedesktop.org) takes a sequence of bytes in an unknown character encoding and attempts to determine the encoding of the text. The returned encoding names are iconv-compatible.

In a Unix shell, the character encoding of a file might be determined with file or file -i.

With Python, the encoding of a bytestream can be determined with bs4.UnicodeDammit.

Fatal error: Uncaught PDOException: SQLSTATE[HY000]: General error: 8 attempt to write a readonly database in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php:78 Stack trace: #0 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(78): PDOStatement->execute(Array) #1 /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php(30): insert_webrequest_('/notes/developm...', 1745008477, '18.117.76.18', 'Mozilla/5.0 App...', NULL) #2 /home/httpd/vhosts/renenyffenegger.ch/httpsdocs/notes/development/Unicode/character-encoding(82): insert_webrequest() #3 {main} thrown in /home/httpd/vhosts/renenyffenegger.ch/php/web-request-database.php on line 78

Character encoding

Determining the character encoding of a file or a byte stream

See also