While developing a web app, I needed to use the php’s multibyte family of functions. Having to deal with Greek characters specifically (although I always use utf-8) I needed the multibyte equivalent of strtok() to tokenize a stream of Greek characters. A quick look in the php documentation yielded almost every other function, but nothing relevant with what I need, so I decided to create my own version. I’m sharing it with you guys, as Google won’t help you either.
function mb_strtok($delimiters, $str=NULL) { static $pos = 0; // Keep track of the position on the string for each subsequent call. static $string = ""; // If a new string is passed, reset the static parameters. if($str!=NULL) { $pos = 0; $string = $str; } // Initialize the token. $token = ""; while ($pos < mb_strlen($string)) { $char = mb_substr($string, $pos, 1); $pos++; if(mb_strpos($delimiters, $char)===FALSE) { $token .= $char; } else { // Don't return empty strings. if($token!="") return $token; } } // Check whether there is a last token to return. if ($token!="") { return $token; } else { return false; } }
On the first call of mb_strtok()
, you must pass a string containing the delimiters as the first parameter, and the string to tokenize as the second. Both parameters may be multibyte strings.
The second call of mb_strtok()
must have only the first parameter, i.e. the string containing the delimiters.
Calling mb_strtok()
again with both parameters, loses state about the previous string, a starts a new round of tokenization.
You should use this function as you would use strtok()
, for example in a while loop. The function returns a boolean false
when there are no more tokens to return.
You may have noticed that the order of the parameters are reversed compared with strtok()
. This is because I wanted to keep the code simple, and avoid using func_get_args()
which would complicate the code.
good job mate
PHP Niiiiinjaaaaa 😛