====== Handling UTF-8 with PHP ======
This page is intended as a reference for functionality PHP provides which can either help with handling UTF-8 or should be regarded as a risk when used in conjunction with UTF-8 encoded strings. Further information can be found on the [[php:i18n]] and [[php:i18n:charsets]] pages.
Note that this page applies to PHP < version 6, which is expected to have native support for Unicode / UTF-8
===== UTF-8 "Dangerous" PHP Functionality =====
The following functions / functionality in PHP may pose issues when used in conjunction with UTF-8, depending on what you're doing.
**Important Note** the words //"it depends"// are critical to bear in mind here - do not blindly replace all use of these functions without understanding //why// you're doing so. Remember ASCII (aka US-ASCII or ASCII7) is a subset of UTF-8 and that UTF-8 has been designed so that no character sequence in a //well formed// UTF-8 string can be mistaken as a sub-sequence of another, longer character. These two facts will often mean you can survive with PHP's own string functions depending on the //exact nature of what you are doing with them// - see the strpos discussion below. Blindly replacing all uses "just in case" is likely to lead to apps with run like lame dogs.
**Note on Locales** the discussion below could be read to suggest "locales are evil", which would be to misunderstand the problem.
If you're writing code for yourself, to be used on a server you control, locales could be made to work if your server has locales installed which support UTF-8. That would mean functions like [[phpfn>strtolower]] behave correctly.
But this is no use if you're writing applications which will be installed by third parties (like [[http://sourceforge.net/softwaremap/trove_list.php?form_cat=183|these]] for example) because it's //system specific// (it's not even just OS specific). If the default system locale does not support UTF-8, in theory your application could change the locale "on the fly" using [[phpfn>setlocale]] but in practice that requires two things; that there is a locale available on the system which supports UTF-8 (not guaranteed) and that the correct locale identifier string can be found (there a definately differences between Windows and *Nix locale identifiers and even amongst the Unixes believe there are variations e.g. [[http://www.php.net/manual/en/function.setlocale.php#40396|FreeBSD]]). What's more, you can't rely on users to be able to change the locale correctly to suit your applications needs - on a shared host they probably won't be able to change the locale for the user that Apache is running with. //Bottom line// - locales are not the way to go for applications intended to be "write once, run anywhere".
//Update//: You can downgrade your character type locale to the POSIX (C) locale via setlocale(), like
setlocale ( LC_CTYPE, 'C' );
This //should// work on all platforms. It would mean functions like ''strtolower()'' are only considering characters in the ASCII range - that opens the way for some significant performance optimizations.
**Note on well formedness** the term "well formed UTF-8" appears frequently here. See [[php:i18n:charsets#checking_utf-8_for_well_formedness]] for details of how to check for well formedness. The point there is you should check UTF-8 strings for well formedness when using functions like [[phpfn>explode]] (see below) which will work with UTF-8 so long as it is well formed.
**Note** that you can find "UTF-8 aware" implementations of many of these functions under CVS [[http://sourceforge.net/projects/phputf8/|here]].
==== The PCRE Extension ====
Official docs at [[http://www.php.net/pcre]].
=== /i (PCRE_CASELESS) pattern modifier ===
* Official documentation: [[http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php|PCRE pattern modifiers]]
* Risk: high
* Impact: could corrupt a UTF-8 string
Unless the /u modifier is used as well, picks up it's understanding of upper and lowercase from the server's locale. Depending on what you're doing, this may result in false matches which in turn lead to corrupt UTF-8 strings.
=== /u (PCRE_UTF8) pattern modifier ===
* Official documentation: [[http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php|PCRE pattern modifiers]]
* Risk: low
* Impact: matches 5 and 6 byte sequences which are not Unicode
UTF-8 allows for 5 and 6 byte character sequences but these have no meaning in Unicode (ie. there are displayable characters for these sequences). This might lead to "junk" in a web page (browsers would display a ?). See [[http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php#54805|this PHP manual comment]] - you should filter for 5/6 byte sequences
=== \w \W \b \B meta characters ===
* Official documentation: [[http://www.php.net/manual/en/reference.pcre.pattern.syntax.php|PCRE pattern syntax]]
* Risk: high
* Impact: could result in corrupt UTF-8
The \w means "word character", the meaning of which is loaded from the servers current locale. From the manual;
> A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place []. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.
Depending on what you are doing and the strings involved, this may mean \w makes a match across two UTF-8 sequences, leading to corrupted (badly formed) UTF-8 strings.
Similar applies to \W (non-word meta char), \b (word boundary) and \B (non-word boundary) all of which pick up their meaning from the server's locale settings.
Using the /u pattern modifier prevents words from being mangled but instead PCRE skips strings of characters with code values greater than 127. Therefore, \w will not match a multibyte (non-lower ascii) word at all (but also won't return portions of it). From the ''pcrepattern'' man page;
> In UTF-8 mode, characters with values greater than 128 never match ''\d'', ''\s'', or ''\w'', and always match ''\D'', ''\S'', and ''\W''. This is true even when Unicode character property support is available.
It's worth researching the PCRE [[http://www.php.net/manual/en/reference.pcre.pattern.syntax.php#regexp.reference.unicode|Unicode character properties]] (available since PHP 4.4.0 / 5.1.0) - for example ''\p{L}'' should match anything in Unicode which can be regarded as a letter
==== The String Extension ====
Official docs at [[http://www.php.net/strings]].
=== htmlentities ===
* Official documentation: [[phpfn>htmlentities]]
* Risk: high
* Impact: could corrupt a UTF-8 string
Rumour - although this function (claims) to have UTF-8 support, bug reports claim it's broken at least until PHP 5.
Using it on a UTF-8 string with the wrong charset would, very likely, result in corruption / junk output.
Otherwise, when using UTF-8, you don't need entities - see [[php:i18n:charsets#common_problem_areas_with_utf-8|Common Problem Areas with UTF-8]].
=== html_entity_decode ===
* Official documentation: [[phpfn>html_entity_decode]]
* Risk: high
* Impact: could corrupt a UTF-8 string
Highly suspect - see comments on htmlentities above - this does the reverse.
=== htmlspecialchars ===
* Official documentation: [[phpfn>htmlspecialchars]]
* Risk: low
* Impact: in theory (not confirmed) should not damage a UTF-8 string
htmlspecialchars //should// (not confirmed) do the right thing by default (without the third argument specifying UTF-8) if it is given a //well formed// UTF-8 string because the characters it replaces are all within the ASCII 7 range.
That said you can also explicitly tell it to expect UTF-8 like;
$html = htmlspecialchars($utf8_string, ENT_COMPAT, 'UTF-8');
=== sprintf ===
* Official documentation: [[phpfn>sprintf]]
* Risk: medium
* Impact: The problematic features are probably rare.
* FIXME TODO
Yet to investigate in detail. The ''x'' and ''X'' type specifiers are probably an issue. See also [[http://www.php.net/manual/en/function.sprintf.php#55837|this manual comment]] - have not verified .
The string padding functionality assumes single-byte input, so it won't pad correctly if there are multi-byte utf8 characters in the arguments.
FIXME TODO [[phpfn>printf]], [[phpfn>sscanf]], [[phpfn>fscanf]] and [[phpfn>vsprintf]] - same arguments probably apply
=== str_ireplace ===
* Official documentation: [[phpfn>str_ireplace]]
* Risk: high
* Impact: could corrupt a UTF-8 string
str_ireplace() relies of the server's locale setting to convert all characters to lower case. If the locale setting is something other than ASCII or UTF-8, it may mistakenly match UTF-8 sub-sequences with characters in the locale, and while replacing corrupt the string. Certainly it cannot be relied upon to understand what "uppercase" and "lowercase" means in UTF-8, unless the locale explicitly supports UTF-8.
=== str_split ===
* Official documentation: [[phpfn>str_split]]
* Risk: high
* Impact: could corrupt a UTF-8 string
str_split() breaks up a string given a length argument. The length is a length in //bytes// not //characters//. That means it could break a multibyte UTF-8 sequence into invalid parts.
That said, if you know for sure that a given UTF-8 string contains, say, only 2 byte sequences, you might reasonably want to use [[phpfn>str_split]] to break it up into single character sequences.
=== strcasecmp ===
* Official documentation: [[phpfn>strcasecmp]]
* Risk: medium
* Impact: results cannot be trusted
strcasecmp() internally converts the two strings it is comparing to lowercase, based on the server locale settings. As such, it cannot be relied upon to be able to convert appropriate //multibyte// characters in UTF-8 to lowercase and, depending on the actual locale, may have internally corrupted the UTF-8 strings it is comparing, having falsely matched byte sequences. It won't actually damage the UTF-8 string but the result of the comparison cannot be trusted.
That said, if two given UTF-8 strings are known to contain only characters in the ASCII 7 range, strcasecmp() could be used to compare them successfully, irrespective of the locale setting.
=== strcspn ===
* Official documentation: [[phpfn>strcspn]]
* Risk: medium
* Impact: results cannot be trusted
strcspn() will return a length in bytes not characters, which may not always be what you require.
Also if the mask you provide it contains multibyte characters, these will be split, internally, into their component bytes, perhaps meaning results which are not semantically true - ''10xxxxxx'' bytes in a sequence could be matched ((see table here [[wp>UTF-8]])).
=== stristr ===
* Official documentation: [[phpfn>stristr]]
* Risk: high
* Impact: could return a corrupt a UTF-8 string
stristr internally converts characters to lower case using the server's locale and in determining the substring to return, the result may be a corrupted UTF-8 string and the matching will be undpredictable (locale dependent).
=== strlen ===
* Official documentation: [[phpfn>strlen]]
* Risk: low
* Impact: results in bytes not characters
strlen simply counts the number of bytes in a string, not the number of characters. This means for UTF-8 the integer it returns is actually longer than the number of characters in the string.
Note that this may not always be a problem - see the strpos discussion below for an example where working in bytes not characters produces expected results.
=== strpos ===
* Official documentation: [[phpfn>strpos]]
* Risk: low
* Impact: results in bytes not characters
strpos will behave correctly with //well formed// UTF-8 but the result it returns will be in bytes not characters, which may to may not be what you desire, depending on what you want to do with that result.
You //would// be able to use the result in conjunction with [[phpfn>substr]] for example (remember each UTF-8 sequence is unique) but if you want to validate a string in some manner, based on character length not byte length, strpos may not be semantically correct.
Consider the following example;
";
$substr = substr($haystack, 0, $pos);
print "Substr: $substr
";
This will display;
Position in bytes is 12
Substr: Iñtërnâti
The point being it "works" despite the fact the string is UTF-8 - there's no need to replace the use of [[phpfn>substr]] or [[phpfn>subpos]] in the case.
By contrast, pulling out an arbitrary substring which happens to cut a 2 byte UTF-8 sequence breaks the string;
";
''$substr'' now contains badly formed UTF-8 and your browser should display something wierd as a result (probably a ?)
=== strrev ===
* Official documentation: [[phpfn>strrev]]
* Risk: high
* Impact: could return a corrupt a UTF-8 string
strrev first has to split a string into an array of bytes then reverse their order - this would corrupt multibyte characters in a UTF-8 string.
Note you could still use strrev() if you know that a given UTF-8 string only contains characters in the ASCII 7 range.
=== strrpos ===
* Official documentation: [[phpfn>strrpos]]
* Risk: low
* Impact: results in bytes not characters
strrpos will return an answer in bytes not characters. See strpos above for more info.
=== strspn ===
* Official documentation: [[phpfn>strspn]]
* Risk: low
* Impact: results in bytes not characters
strspn will return an answer in bytes not characters - See strpos above for more info - similar arguments apply
=== strtolower ===
* Official documentation: [[phpfn>strtolower]]
* Risk: high
* Impact: could return a corrupt a UTF-8 string
strtolower uses the servers locale setting to understand the meaning of "uppercase" and "lowercase". Depending on the locale character set, this could mean it falsely matches parts of a UTF-8 string with sequences in the character set it thinks it's using - the result would be "corrupt" UTF-8.
Otherwise strtolower would fail to be able to understand the meaning of "uppercase" and "lowercase" in UTF-8 if the locale does not support UTF-8 (your locale might be US-ASCII, in which can strtolower won't corrupt the UTF-8 but also won't convert uppercase multibyte UTF-8 characters to their lowercase equivalent).
=== strtoupper ===
* Official documentation: [[phpfn>strtolower]]
* Risk: high
* Impact: could return a corrupt a UTF-8 string
See notes on strtolower above.
=== substr ===
* Official documentation: [[phpfn>substr]]
* Risk: medium to high
* Impact: accepts arguments in bytes positions not characters - could corrupt a UTF-8 string
If used in an arbitrary manner to chop off part of a string, it could potentially split UTF-8 sequences resulting in corruption. At the same time if used in conjunction with functions like strpos (see notes above), would be able to extract a portion of a UTF-8 string without corrupting it, although you'll be passing it arguments in terms of byte positions not character positions.
=== substr_replace ===
* Official documentation: [[phpfn>substr_replace]]
* Risk: medium to high
* Impact: accepts arguments in bytes positions not characters - could corrupt a UTF-8 string
If arbitrary start and length arguments are supplied, could corrupt a UTF-8 string. Otherwise could be used in some instances when working with relative UTF-8 character positions - see notes on substr above.
=== trim, ltrim, rtrim ===
* Official documentation: [[phpfn>trim]], [[phpfn>ltrim]], [[phpfn>rtrim]]
* Risk: low
* Impart: could corrupt a UTF-8 string if second (optional) charlist arg is used
Used in the "default" manner (without the second charlist argument) these functions are safe to use on a UTF-8 string, because the whitespace characters they are searching for are all in the ASCII 7 range.
If the 2nd argument is used, to extend the list of characters this functions attempt to trim, //and// multibyte (non-ASCII7) characters are in the 2nd argument, then there is a risk of corrupting the returned subject string. This is because (l/r)trim will split the charlist into their component bytes and bytes in a multibyte sequence of the form ''10xxxxxx''((referring to the table here [[wp>UTF-8]])) could be trimmed from other multibyte sequences in the subject string. Probably (unconfirmed) this can only happen when trimming from the right hand side of the string, so this problem //may// only affect [[phpfn>trim]] and [[phpfn>rtrim]].
=== ucfirst ===
* Official documentation: [[phpfn>ucfirst]]
* Risk: high
* Impact: could return a corrupt a UTF-8 string
See notes to strtolower above
=== ucwords ===
* Official documentation: [[phpfn>ucwords]]
* Risk: high
* Impact: could return a corrupt a UTF-8 string
See notes to strtolower above
=== wordwrap ===
* Official documentation: [[phpfn>wordwrap]]
* Risk: medium to high
* Impact: could return a corrupt a UTF-8 string
If the fourth "cut" argument is used, could split a UTF-8 sequence, resulting in corruption.
**To be confirmed** - what is the meaning of a "word" to this function. Is it the same as [[phpfn>ucwords]];
> The definition of a word is any string of characters that is immediately after a whitespace (These are: space, form-feed, newline, carriage return, horizontal tab, and vertical tab).
If that is correct, wordwrap will only be dangerous if the cut argument is used.
==== Array Extension ====
Official docs at [[http://www.php.net/array]].
FIXME - needs to become an explicit list of functions. Just a description right now.
The main issue related to arrays is sorting and (thankfully) this will be non-critical to most applications.
Functions like [[phpfn>sort]], when sorting alphanumerically, will lack the knowledge to know how to sort //multi byte// UTF-8 characters in a manner which is semantically correct. [[phpfn>sort]] will still sort ASCII 7 characters correctly (semantically correct) but will only be able to sort multibyte UTF-8 characters based on their byte-by-byte values.
Because of UTF-8's design, this will mean, after a sort, ASCII 7 characters will be at one end of a range while 4 byte sequences are at the other, with 2 and 3 byte sequences in between.
==== Mail Functions ====
FIXME - [[phpfn>mail]] and UTF-8 - content type headers? base64 encoding?
As mentioned at [[wp>UTF-8]] (compared to UTF-7);
> UTF-8 requires the transmission system to be eight-bit clean. In the case of e-mail this means it has to be further encoded using quoted printable or base64.
Some links;
* [[phpfn>mb_send_mail]]
* [[http://www.php.net/manual/en/function.mb-send-mail.php#29843|Example of base64 encoding headers]]
* [[http://www.quakemachine.com/blog/myplugins/utf-mail/|utf-mail]] - download utf-mail.zip - plugin for Wordpress but shows one way to do it (without using mb_string).
* [[http://www.advogato.org/article/812.html|Sending Unicode e-mail through a script (as in PHP)]] - how to do it with mb_string
Seems to be two approach (at least specific to the body of the email - ignoring subject / headers) - if you want to send plain text you have to encode that body with something like [[phpfn>base64_encode]]. Alternative you could "attach" an HTML body which then only needs to needs to have the correct charset declaration.
==== Variables Handling ====
See official docs at [[http://www.php.net/manual/en/ref.var.php]].
=== serialize / unserialize ===
* Official documentation: [[phpfn>serialize]], [[phpfn>unserialize]]
* Risk: low
* Impact: problem when using these for stuff like RPC / data exchange with external systems
Sometimes people use this functionality as a manner to talk to PHP from other languages. The serialized encoding embeds string lengths (in bytes!) into the encoded string. External languages / environments may have different understandings of string lengths.
=== var_dump / debug_zval_dump ===
* Official documentation: [[phpfn>var_dump]], [[phpfn>debug_zval_dump]]
* Risk: low
* Impact: lengths reported on strings will be in bytes
Just a potential debugging "gotcha" - if web page encoded as UTF-8, you may only see 3 characters, for example, while these functions report, say, 5 as string length
==== XML Extension (SAX) ====
Official docs at [[http://www.php.net/xml]].
The SAX parser (officially)((PHP5 uses libxml2 which supports more encodings - [[http://minutillo.com/steve/weblog/2004/6/17/php-xml-and-character-encodings-a-tale-of-sadness-rage-and-data-loss|rumour has it]] (not confirmed) that creating the parser like ''xml_parser_create("");'' will be it to support more than just the three official character sets, auto-detecting from the charset declaration)) supports three encodings ISO-8859-1, US-ASCII and UTF-8 - see [[http://www.php.net/manual/en/ref.xml.php#xml.encoding|here]]. It distinguishes between //source encoding// (the encoding of an XML document it is parsing) and //target encoding// - the encoding of strings passed to your SAX callback functions.
The source encoding is either passed explicitly to [[phpfn>xml_parser_create]] or (since PHP 5) determined automatically from the charset declaration in the XML document. If no source encoding is specified, PHP defaults to ISO-8859-1 (perhaps a design flaw - would have been smarter to default to UTF-8). If the source encoding contains byte sequences PHP doesn't understand, it will raise an error e.g. the XML_ERROR_UNKNOWN_ENCODING or XML_ERROR_INCORRECT_ENCODING error codes.
The target encoding can be controlled with the [[phpfn>xml_parser_set_option]] function. Any incoming characters outside the range of the target encoding are replaced with a question mark. That means if the source encoding is UTF-8 and the target encoding is US-ASCII, multibyte UTF-8 characters will be replaced with a question mark.
Note that the XML SAX extension //should// (not confirmed) spot badly formed UTF-8 in the source encoding. Also it's definition of what is UTF-8 is only those within the the Unicode range (unlike the PCRE extension) - i.e. doesn't regard 5 and 6 byte sequences as being UTF-8.
See [[http://minutillo.com/steve/weblog/2004/6/17/php-xml-and-character-encodings-a-tale-of-sadness-rage-and-data-loss|PHP, XML, and Character Encodings: a tale of sadness, rage, and (data-)loss]]. See also [[http://magpierss.sourceforge.net/|Magpie RSS 0.7+]] which implements a work around for detecting / converting other character sets (currently in the rss.parse.inc file).
==== XML DOM Extension ====
Both PHP4 + PHP5 xml-dom extensions use UTF-8 as internal encoding. This means that they mostly get it right, however there is one major GOTCHA, since they extect input strings to be utf8-encoded. If you use iso-8859-1 as your internal encoding (which you most likely do), this means that each and every string that you input to the DOM api should be encoded with utf8_encode. It's important to realize that you have to do this regardless of which encoding the document is out in. Annoying to say the least, but atleast it's consistent.
=== utf8_encode and utf8_decode ===
* Official documentation: [[phpfn>utf8_encode]], [[phpfn>utf8_encode]]
* Risk: medium
* Impact: will result in corrupt UTF-8 if used incorrectly - they are used to convert //only// between UTF-8 and ISO-8859-1 - use on another other charset (excepting ASCII-7) would result in junk / lost characters
These functions are designed to convert between ISO-8859-1 and UTF-8 (nothing more, nothing less). In particular older versions of IE / Win98 used CP1252 (a Windows encoding similar to but not the same as ISO-8859-1). See [[http://www.php.net/manual/en/function.utf8-encode.php#45226|this manual entry]].
Some links
* [[http://www.zend.com/codex.php?id=838&single=1|utf8encode]] utf-8 encodes HTML unicode entities (NNNN).
* [[http://www.zend.com/codex.php?id=835&single=1|utf8ToUnicodeEntities]] decodes utf-8 encoded strings into HTML unicode entities (NNNN;) or javascript ones (%uNNNN) .
==== URL Functions ====
Is it a good idea to use UTF-8 in URLs (security issues / mapping to filesystem / DB primary keys etc.)?
=== urlencode, rawurlencode ===
* Official documentation: [[phpfn>urlencode]], [[phpfn>rawurlencode]]
* Risk: low
* Impact: encoding a string that has previously been utf-8 encoded is generally safe - it'll appear as a multibyte sequence compliant with [[http://tools.ietf.org/html/rfc3986|RFC 3986]].((Note that this is not compatible with the ECMAScript %uNNNN-style encoding used by the escape() and unescape() functions, but is compatible with the new [[http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Functions:encodeURIComponent|encodeURIComponent()]] and friends.)) The multibyte sequence will present correctly on a page declared to be encoded with the UTF-8 charset. However, a utf-8 encoder other than [[phpfn>utf8_encode]] should be used to convert unicode entities to a utf-8 encoded string.
=== urldecode, rawurldecode ===
* Official documentation: [[phpfn>urldecode]], [[phpfn>rawurldecode]]
* Risk: medium
* Impact: incoming [[http://tools.ietf.org/html/rfc3986|RFC 3986]] compliant strings will be correctly decoded as valid utf-8. Note however that these functions operate on bytes rather than characters, and thus encoded strings that do not represent valid utf-8 (e.g. "%80%80%80" or "%c0%bc") will be decoded without error. ECMAScript %uNNNN-style encodings are not supported.
Some links
* [[http://http://www.zend.com/codex.php?id=839&single=1|utf8RawUrlDecode]] (which needs [[http://www.zend.com/codex.php?id=838&single=1|utf8encode]]) can safely replace php's built-in ([[phpfn>urldecode]], [[phpfn>rawurldecode]]) decoding functions.
* [[http://www.zend.com/codex.php?id=835&single=1|utf8ToUnicodeEntities]] decodes utf-8 encoded strings into HTML unicode entities (NNNN;) or javascript ones (%uNNNN) .
* [[http://keithdevens.com/weblog/archive/2005/Nov/22|uriescape]]
==== GD Extension ====
Official docs at [[http://www.php.net/gd]].
FIXME Stuff todo here. In particular functions like [[phpfn>imagettftext]]. Guessing it will depend largely on what the GD font you are using is able to support.
Some links;
* [[http://www.webclass.ru/eng/Tutorials/PHP/Setting_Cyrillic_for.html|Setting Cyrillic for GD]] - slightly suspect (e.g. interchange between use of ISO-8859-1 and CP1251) but basic principle seems correct.
* [[http://www.phpclasses.org/browse/package/2132.html|Write Farsi to Image]]
Otherwise suspect Gallery v2 has this nailed these days - need to look
==== exif extension ====
Official docs at [[http://www.php.net/exif]].
FIXME Stuff to research here - what are the issues in reading exif data - are exotic charsets used? etc.
Some links;
* [[http://www.ozhiker.com/electronics/pjmt/|The PHP JPEG Metadata Toolkit]] - aside from having built in UTF-8 support, very cool library
===== UTF-8 Safe Functionality =====
Special mentions for stuff which may be "surprisingly" safe with UTF-8. Note if "well formedness" is mentioned, it may mean you should be checking the strings for well formedness before using these functions.
==== explode ====
* Official documentation: [[phpfn>explode]]
* Risk: none
So long as all arguments used are //well formed// UTF-8, no problems.
This works because every complete character sequence in a UTF-8 string is unique (cannot be mistaken as part of a longer sequence)
==== str_replace ====
* Official documentation: [[phpfn>str_replace]]
* Risk: none
So long as all arguments used are //well formed// UTF-8, no problems.
This works because every complete character sequence in a UTF-8 string is unique (cannot be mistaken as part of a longer sequence).