mb_strtok() – A PHP implementation

While developing a web app, I needed to use the php’s multibyte family of functions. Having to deal with Greek characters specifically (although I always use utf-8) I needed the multibyte equivalent of strtok() to tokenize a stream of Greek characters. A quick look in the php documentation yielded almost every other function, but nothing relevant with what I need, so I decided to create my own version. I’m sharing it with you guys, as Google won’t help you either.

function mb_strtok($delimiters, $str=NULL)
{
	static $pos = 0; // Keep track of the position on the string for each subsequent call.
	static $string = "";

	// If a new string is passed, reset the static parameters.
	if($str!=NULL)
	{
		$pos = 0;
		$string = $str;
	}

	// Initialize the token.
	$token = "";

	while ($pos < mb_strlen($string))
	{
		$char = mb_substr($string, $pos, 1);
		$pos++;

		if(mb_strpos($delimiters, $char)===FALSE)
		{
			$token .= $char;
		}
		else
		{
			// Don't return empty strings.
			if($token!="")
				return $token;
		}

	}

	// Check whether there is a last token to return.
	if ($token!="")
	{
		return $token;
	}
	else
	{
		return false;
	}
}

On the first call of mb_strtok(), you must pass a string containing the delimiters as the first parameter, and the string to tokenize as the second. Both parameters may be multibyte strings.

The second call of mb_strtok() must have only the first parameter, i.e. the string containing the delimiters.

Calling mb_strtok() again with both parameters, loses state about the previous string, a starts a new round of tokenization.

You should use this function as you would use strtok(), for example in a while loop. The function returns a boolean false when there are no more tokens to return.

You may have noticed that the order of the parameters are reversed compared with strtok(). This is because I wanted to keep the code simple, and avoid using func_get_args() which would complicate the code.

You may also like

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *