mb_strtok() – A PHP implementation

While developing a web app, I needed to use the php’s multibyte family of functions. Having to deal with Greek characters specifically (although I always use utf-8) I needed the multibyte equivalent of strtok() to tokenize a stream of Greek characters. A quick look in the php documentation yielded almost every other function, but nothing relevant with what I need, so I decided to create my own version. I’m sharing it with you guys, as Google won’t help you either.

function mb_strtok($delimiters, $str=NULL)
{
	static $pos = 0; // Keep track of the position on the string for each subsequent call.
	static $string = "";

	// If a new string is passed, reset the static parameters.
	if($str!=NULL)
	{
		$pos = 0;
		$string = $str;
	}

	// Initialize the token.
	$token = "";

	while ($pos < mb_strlen($string))
	{
		$char = mb_substr($string, $pos, 1);
		$pos++;

		if(mb_strpos($delimiters, $char)===FALSE)
		{
			$token .= $char;
		}
		else
		{
			// Don't return empty strings.
			if($token!="")
				return $token;
		}

	}

	// Check whether there is a last token to return.
	if ($token!="")
	{
		return $token;
	}
	else
	{
		return false;
	}
}

On the first call of mb_strtok(), you must pass a string containing the delimiters as the first parameter, and the string to tokenize as the second. Both parameters may be multibyte strings.

The second call of mb_strtok() must have only the first parameter, i.e. the string containing the delimiters.

Calling mb_strtok() again with both parameters, loses state about the previous string, a starts a new round of tokenization.

You should use this function as you would use strtok(), for example in a while loop. The function returns a boolean false when there are no more tokens to return.

You may have noticed that the order of the parameters are reversed compared with strtok(). This is because I wanted to keep the code simple, and avoid using func_get_args() which would complicate the code.

You might be interested in …

Free Open Source Exchange Rates for PHP

English, PHP

Inspired by the Open Source Exchange Rates and money.js, I’ve developed a PHP class that consumes the openexchangerates.org service. Since the service fetches the exchange rates from the (unofficial) Google Calculator API, I played around with it as well, and found some differences on the exchange rates provided by the two services. It is probably […]

Read More

How to check if a shortcode is registered in WordPress

English, PHP, WordPress

Quick and easy function to check if a plugin/theme/whatever has add/registered a shortcode in WordPress: Just add this into your plugin or theme’s functions.php and wherever you need to check if the shortcode exists, just call is_shortcode_defined(“button”);  or something similar from an if statement, as such: Hope this helps.

Read More

Problems with Mobile Broadband On Demand on a Mac (Vodafone Greece)

English, Mac

If you bought a pay-as-you-go Mobile Broadband On Demand from Vodafone Greece, that came with a 3G USB modem, model K3565 -Rev 2 (sometimes named K3565-H), by Huawei Technologies, and you are on a Mac (I’m on 10.6.4), you may have troubles making it work.

Read More

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *