mb_strtok() – A PHP implementation

While developing a web app, I needed to use the php’s multibyte family of functions. Having to deal with Greek characters specifically (although I always use utf-8) I needed the multibyte equivalent of strtok() to tokenize a stream of Greek characters. A quick look in the php documentation yielded almost every other function, but nothing relevant with what I need, so I decided to create my own version. I’m sharing it with you guys, as Google won’t help you either.

function mb_strtok($delimiters, $str=NULL)
{
	static $pos = 0; // Keep track of the position on the string for each subsequent call.
	static $string = "";

	// If a new string is passed, reset the static parameters.
	if($str!=NULL)
	{
		$pos = 0;
		$string = $str;
	}

	// Initialize the token.
	$token = "";

	while ($pos < mb_strlen($string))
	{
		$char = mb_substr($string, $pos, 1);
		$pos++;

		if(mb_strpos($delimiters, $char)===FALSE)
		{
			$token .= $char;
		}
		else
		{
			// Don't return empty strings.
			if($token!="")
				return $token;
		}

	}

	// Check whether there is a last token to return.
	if ($token!="")
	{
		return $token;
	}
	else
	{
		return false;
	}
}

On the first call of mb_strtok(), you must pass a string containing the delimiters as the first parameter, and the string to tokenize as the second. Both parameters may be multibyte strings.

The second call of mb_strtok() must have only the first parameter, i.e. the string containing the delimiters.

Calling mb_strtok() again with both parameters, loses state about the previous string, a starts a new round of tokenization.

You should use this function as you would use strtok(), for example in a while loop. The function returns a boolean false when there are no more tokens to return.

You may have noticed that the order of the parameters are reversed compared with strtok(). This is because I wanted to keep the code simple, and avoid using func_get_args() which would complicate the code.

You might be interested in …

mb_strtok() – Δημιουργία με PHP

PHP

Καθώς έφτιαχνα μία εφαρμογή, χρειάστηκε να χρησιμοποιήσω τις multibyte functions (που υποστηρίζουν χαρακτήρες πολλαπλών bytes) της php. Συγκεκριμένα, χρησιμοποιούσα κωδικοποίηση utf-8 για την υποστήριξη ελληνικών χαρακτήρων, και χρειάστηκε να χρησιμοποιήσω την αντίστοιχη function της strtok() για να κομματιάσω μία σειρά Ελληνικών χαρακτήρων. Ψάχνοντας τον οδηγό χρήσης της PHP βρήκα σχεδόν κάθε άλλη function, εκτός από […]

Read More

Problems with Mobile Broadband On Demand on a Mac (Vodafone Greece)

English, Mac

If you bought a pay-as-you-go Mobile Broadband On Demand from Vodafone Greece, that came with a 3G USB modem, model K3565 -Rev 2 (sometimes named K3565-H), by Huawei Technologies, and you are on a Mac (I’m on 10.6.4), you may have troubles making it work.

Read More

Free Open Source Exchange Rates for PHP

English, PHP

Inspired by the Open Source Exchange Rates and money.js, I’ve developed a PHP class that consumes the openexchangerates.org service. Since the service fetches the exchange rates from the (unofficial) Google Calculator API, I played around with it as well, and found some differences on the exchange rates provided by the two services. It is probably […]

Read More

3 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *