Handling international characters in PHP

Posted in Tutorials

Tweet This Share on Facebook Bookmark on Delicious Digg this Submit to Reddit

If you need to handle international characters in PHP, the “PHP The Right Way” says to do something like this …

international characters in PHP

international characters in PHP

The mb_internal_encoding(‘UTF-8’) function tells PHP to use UTF-8 encoding.  Typically place at the top of PHP script.

The mb_http_output(‘UTF-8’) sets the HTTP output character encoding to UTF-8.

But more important is to have the meta charset=”utf-8″ in your HTML (as shown in the above HTML and explained in this separate tutorial).

Note that if you are using PHP string functions such as substr, strlen, and strpos with international characters, you need to use the corresponding mb_* version of the functions: mb_substr, mb_strlen, and mb_strpos.

For example, in the above, we have a Spanish phrase in which we want to extract the first 14 characters.  When we use mb_substr, we get the expected result…

correct result when using mb_substr

correct result when using mb_substr

But if we run with with substr, we get incorrect result …

incorrect result when using substr

incorrect result when using substr

String length counting is incorrect when you use the strlen with international characters.  You need to use mb_strlen as demonstrated in the below example…

strlen versus mb_strlen

strlen versus mb_strlen