3. PHP 多字节实现
3.1 概述
Joomla! 1.5以utf-8编码存储和处理所有的内容和字符串,这意味着认为所有语言字符集都包含多字节字符,包括通常欧洲语言中可发音的拉丁字符,非拉丁都是多字节的。为了避免截断或者逻辑错误,php字符串处理函数应该能处理多字节,然而php5和php4的字符串函数都是单字节的。
php的‘mbstring’扩展提供了多字节处理功能函数,然而这个扩展并不是所有服务器都安装的,为了保证Joomla规范中的php兼容性,在Joomla1.5中引入了一个多字节的字符串处理库 JString,这个类中的字符串处理函数的名称与php中的字符串处理函数保持一直。
strtolower($text) //单字节的php函数
JString::strtolower($text) //utf-8 字符串处理函数
如果‘mbstring’已经安装,那么这个库将使用‘mbstring’扩展。这个库加载的时候自动检查是否存在‘mbstring’扩展。
在这个库加载之前,application加载‘libraries/joomla/common/compat’目录中一系列函数, ‘phputf8env.php’动态加载mbstring,然后设置合适的环境变量。
在安装时候,会检查两个mbstring不能在运行时候覆盖的设置,这两个设置是:
mbsring.language = neutral
mbstring.func_overload = 0
前一个配置是设置日语实例,后一个激活mbstring函数覆盖普通的字符串处理函数。这两个参数可以在php.ini中设置,或者在.htaccess中设置,如果要在.htaccess中设置,那么需要配置apache服务器‘Allow Override’项为 ‘All’ or ‘Options’ ,然后可以在.htaccess 中设置:
php_value mbstring.language neutral
php_value mbstring.func_overload 0
3.2 使用JSrting 函数的准则
并不是所有的字符串函数都需要用多字节版本替换。事实上,某些情况下使用多字节函数可能导致Joomla崩溃。因为以下是那些函数需要替换的准则:
准则1:不是所有的字符串处理函数都有JString版本,如果JString中没有对应的函数,那么采用php中的函数就是安全,比如‘explode()’.
准则二: 仅仅应用与strlen
$byte_count = strlen($utf_string); // 返回有多少字节
$char_count = JString::strlen($utf_string); // 返回有多少字符
php中的单字节函数当计算二进制长度的时候就需要使用,比如用fwrite函数的时候。
准则三:不是所有的其他字符串函数需要替换,比如要处理的数据是ASCII或者二进制数据的时候。比如在gzip的功能中。
3.3 JString 类 API
/**
* String handling class for utf-8 data
* Wraps the phputf8 library
* All functions assume the validity of utf-8 strings. If in doubt use TODO
*
* @author David Gal <
为防备电子邮件地址收集器,这个 E-mail 地址被隐藏,你的浏览器必须支持 Javascript 才可看到这个邮件地址
>
* @package Joomla.Framework
* @since 1.5
*/
class JString
{
/**
* UTF-8 aware alternative to strpos
* Find position of first occurrence of a string
*
* @static
* @access public
* @param $str - string String being examined
* @param $search - string String being searced for
* @param $offset - int Optional, specifies the position from
* which the search should be performed
* @return mixed Number of characters before the first match or FALSE on failure
* @see http://www.php.net/strpos
*/
function strpos($str, $search, $offset = FALSE)
/**
* UTF-8 aware alternative to strrpos
* Finds position of last occurrence of a string
*
* @static
* @access public
* @param $str - string String being examined
* @param $search - string String being searced for
* @return mixed Number of characters before the last match or FALSE on failure
* @see http://www.php.net/strrpos
*/
function strrpos($str, $search)
/**
* UTF-8 aware alternative to substr
* Return part of a string given character offset (and optionally length)
*
* @static
* @access public
* @param string
* @param integer number of UTF-8 characters offset (from left)
* @param integer (optional) length in UTF-8 characters from offset
* @return mixed string or FALSE if failure
* @see http://www.php.net/substr
*/
function substr($str, $offset, $length = FALSE)
/**
* UTF-8 aware alternative to strtlower
* Make a string lowercase
* Note: The concept of a characters "case" only exists is some alphabets
* such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
* not exist in the Chinese alphabet, for example. See Unicode Standard
* Annex #21: Case Mappings
*
* @access public
* @param string
* @return mixed either string in lowercase or FALSE is UTF-8 invalid
* @see http://www.php.net/strtolower
*/
function strtolower($str)
/**
* UTF-8 aware alternative to strtoupper
* Make a string uppercase
* Note: The concept of a characters "case" only exists is some alphabets
* such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
* not exist in the Chinese alphabet, for example. See Unicode Standard
* Annex #21: Case Mappings
*
* @access public
* @param string
* @return mixed either string in uppercase or FALSE is UTF-8 invalid
* @see http://www.php.net/strtoupper
*/
function strtoupper($str)
/**
* UTF-8 aware alternative to strlen
* Returns the number of characters in the string (NOT THE NUMBER OF BYTES),
*
* @access public
* @param string UTF-8 string
* @return int number of UTF-8 characters in string
* @see http://www.php.net/strlen
*/
function strlen($str)
/**
* UTF-8 aware alternative to str_ireplace
* Case-insensitive version of str_replace
*
* @static
* @access public
* @param string string to search
* @param string existing string to replace
* @param string new string to replace with
* @param int optional count value to be passed by referene
* @see http://www.php.net/str_ireplace
*/
function str_ireplace($search, $replace, $str, $count = NULL)
/**
* UTF-8 aware alternative to str_split
* Convert a string to an array
*
* @static
* @access public
* @param string UTF-8 encoded
* @param int number to characters to split string by
* @return array
* @see http://www.php.net/str_split
*/
function str_split($str, $split_len = 1)
/**
* UTF-8 aware alternative to strcasecmp
* A case insensivite string comparison
*
* @static
* @access public
* @param string string 1 to compare
* @param string string 2 to compare
* @return int < 0 if str1 is less than str2; > 0 if str1 is
* greater than str2, and 0 if they are equal.
* @see http://www.php.net/strcasecmp
*/
function strcasecmp($str1, $str2)
/**
* UTF-8 aware alternative to strcspn
* Find length of initial segment not matching mask
*
* @static
* @access public
* @param string
* @param string the mask
* @param int Optional starting character position (in characters)
* @param int Optional length
* @return int the length of the initial segment of str1 which does not
* contain any of the characters in str2
* @see http://www.php.net/strcspn
*/
function strcspn($str, $mask, $start = NULL, $length = NULL)
/**
* UTF-8 aware alternative to stristr
* Returns all of haystack from the first occurrence of needle to the end.
* needle and haystack are examined in a case-insensitive manner
* Find first occurrence of a string using case insensitive comparison
*
* @static
* @access public
* @param string the haystack
* @param string the needle
* @return string the sub string
* @see http://www.php.net/stristr
*/
function stristr($str, $search)
/**
* UTF-8 aware alternative to strrev
* Reverse a string
*
* @static
* @access public
* @param string String to be reversed
* @return string The string in reverse character order
* @see http://www.php.net/strrev
*/
function strrev($str)
/**
* UTF-8 aware alternative to strspn
* Find length of initial segment matching mask
*
* @static
* @access public
* @param string the haystack
* @param string the mask
* @param int start optional
* @param int length optional
* @see http://www.php.net/strspn
*/
function strspn($str, $mask, $start = NULL, $length = NULL)
/**
* UTF-8 aware substr_replace
* Replace text within a portion of a string
*
* @static
* @access public
* @param string the haystack
* @param string the replacement string
* @param int start
* @param int length (optional)
* @see http://www.php.net/substr_replace
*/
function substr_replace($str, $repl, $start, $length = NULL )
/**
* UTF-8 aware replacement for ltrim()
* Strip whitespace (or other characters) from the beginning of a string
* Note: you only need to use this if you are supplying the charlist
* optional arg and it contains UTF-8 characters. Otherwise ltrim will
* work normally on a UTF-8 string
*
* @static
* @access public
* @param string the string to be trimmed
* @param string the optional charlist of additional characters to trim
* @return string the trimmed string
* @see http://www.php.net/ltrim
*/
function ltrim( $str, $charlist = FALSE )
/**
* UTF-8 aware replacement for rtrim()
* Strip whitespace (or other characters) from the end of a string
* Note: you only need to use this if you are supplying the charlist
* optional arg and it contains UTF-8 characters. Otherwise rtrim will
* work normally on a UTF-8 string
*
* @static
* @access public
* @param string the string to be trimmed
* @param string the optional charlist of additional characters to trim
* @return string the trimmed string
* @see http://www.php.net/rtrim
*/
function rtrim( $str, $charlist = FALSE )
/**
* UTF-8 aware replacement for trim()
* Strip whitespace (or other characters) from the beginning and end of a string
* Note: you only need to use this if you are supplying the charlist
* optional arg and it contains UTF-8 characters. Otherwise trim will
* work normally on a UTF-8 string
*
* @static
* @access public
* @param string the string to be trimmed
* @param string the optional charlist of additional characters to trim
* @return string the trimmed string
* @see http://www.php.net/trim
*/
function trim( $str, $charlist = FALSE )
/**
* UTF-8 aware alternative to ucfirst
* Make a string's first character uppercase
*
* @static
* @access public
* @param string
* @return string with first character as upper case (if applicable)
* @see http://www.php.net/ucfirst
*/
function ucfirst($str)
/**
* UTF-8 aware alternative to ucwords
* Uppercase the first character of each word in a string
*
* @static
* @access public
* @param string
* @return string with first char of each word uppercase
* @see http://www.php.net/ucwords
*/
function ucwords($str)