Joomla!-开源天空

2008-09-08
首页 专栏热点 Joomla! 源代码分析 [翻译]Joomla! UTF-8规范 (WIP) 三


[翻译]Joomla! UTF-8规范 (WIP) 三

E-mail

3. PHP 多字节实现
3.1 概述

Joomla! 1.5以utf-8编码存储和处理所有的内容和字符串,这意味着认为所有语言字符集都包含多字节字符,包括通常欧洲语言中可发音的拉丁字符,非拉丁都是多字节的。为了避免截断或者逻辑错误,php字符串处理函数应该能处理多字节,然而php5和php4的字符串函数都是单字节的。

php的‘mbstring’扩展提供了多字节处理功能函数,然而这个扩展并不是所有服务器都安装的,为了保证Joomla规范中的php兼容性,在Joomla1.5中引入了一个多字节的字符串处理库 JString,这个类中的字符串处理函数的名称与php中的字符串处理函数保持一直。

strtolower($text)           //单字节的php函数
JString::strtolower($text)  //utf-8 字符串处理函数

如果‘mbstring’已经安装,那么这个库将使用‘mbstring’扩展。这个库加载的时候自动检查是否存在‘mbstring’扩展。

在这个库加载之前,application加载‘libraries/joomla/common/compat’目录中一系列函数, ‘phputf8env.php’动态加载mbstring,然后设置合适的环境变量。

在安装时候,会检查两个mbstring不能在运行时候覆盖的设置,这两个设置是:

mbsring.language = neutral
mbstring.func_overload = 0

前一个配置是设置日语实例,后一个激活mbstring函数覆盖普通的字符串处理函数。这两个参数可以在php.ini中设置,或者在.htaccess中设置,如果要在.htaccess中设置,那么需要配置apache服务器‘Allow Override’项为 ‘All’ or ‘Options’ ,然后可以在.htaccess 中设置:

php_value mbstring.language neutral
php_value mbstring.func_overload 0


3.2 使用JSrting 函数的准则

并不是所有的字符串函数都需要用多字节版本替换。事实上,某些情况下使用多字节函数可能导致Joomla崩溃。因为以下是那些函数需要替换的准则:

准则1:不是所有的字符串处理函数都有JString版本,如果JString中没有对应的函数,那么采用php中的函数就是安全,比如‘explode()’.


准则二: 仅仅应用与strlen

$byte_count = strlen($utf_string);          // 返回有多少字节
$char_count = JString::strlen($utf_string); // 返回有多少字符

php中的单字节函数当计算二进制长度的时候就需要使用,比如用fwrite函数的时候。

准则三:不是所有的其他字符串函数需要替换,比如要处理的数据是ASCII或者二进制数据的时候。比如在gzip的功能中。


3.3 JString 类 API

/**
 * String handling class for utf-8 data
 * Wraps the phputf8 library
 * All functions assume the validity of utf-8 strings. If in doubt use TODO
 *
 * @author David Gal < 为防备电子邮件地址收集器,这个 E-mail 地址被隐藏,你的浏览器必须支持 Javascript 才可看到这个邮件地址 >
 * @package Joomla.Framework
 * @since 1.5
 */
class JString
{
 /**
  * UTF-8 aware alternative to strpos
  * Find position of first occurrence of a string
  *
  * @static
  * @access public
  * @param $str - string String being examined
  * @param $search - string String being searced for
  * @param $offset - int Optional, specifies the position from
         *                  which the search should be performed
  * @return mixed Number of characters before the first match or FALSE on failure
  * @see http://www.php.net/strpos
  */
 function strpos($str, $search, $offset = FALSE)

 /**
  * UTF-8 aware alternative to strrpos
  * Finds position of last occurrence of a string
  *
  * @static
  * @access public
  * @param $str - string String being examined
  * @param $search - string String being searced for
  * @return mixed Number of characters before the last match or FALSE on failure
  * @see http://www.php.net/strrpos
  */
 function strrpos($str, $search)

 /**
  * UTF-8 aware alternative to substr
  * Return part of a string given character offset (and optionally length)
  *
  * @static
  * @access public
  * @param string
  * @param integer number of UTF-8 characters offset (from left)
  * @param integer (optional) length in UTF-8 characters from offset
  * @return mixed string or FALSE if failure
  * @see http://www.php.net/substr
  */
 function substr($str, $offset, $length = FALSE)

 /**
  * UTF-8 aware alternative to strtlower
  * Make a string lowercase
  * Note: The concept of a characters "case" only exists is some alphabets
  * such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
  * not exist in the Chinese alphabet, for example. See Unicode Standard
  * Annex #21: Case Mappings
  *
  * @access public
  * @param string
  * @return mixed either string in lowercase or FALSE is UTF-8 invalid
  * @see http://www.php.net/strtolower
  */
 function strtolower($str)

 /**
  * UTF-8 aware alternative to strtoupper
  * Make a string uppercase
  * Note: The concept of a characters "case" only exists is some alphabets
  * such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does
  * not exist in the Chinese alphabet, for example. See Unicode Standard
  * Annex #21: Case Mappings
  *
  * @access public
  * @param string
  * @return mixed either string in uppercase or FALSE is UTF-8 invalid
  * @see http://www.php.net/strtoupper
  */
 function strtoupper($str)

 /**
  * UTF-8 aware alternative to strlen
  * Returns the number of characters in the string (NOT THE NUMBER OF BYTES),
  *
  * @access public
  * @param string UTF-8 string
  * @return int number of UTF-8 characters in string
  * @see http://www.php.net/strlen
  */
 function strlen($str)

 /**
  * UTF-8 aware alternative to str_ireplace
  * Case-insensitive version of str_replace
  *
  * @static
  * @access public
  * @param string string to search
  * @param string existing string to replace
  * @param string new string to replace with
  * @param int optional count value to be passed by referene
  * @see http://www.php.net/str_ireplace
 */
 function str_ireplace($search, $replace, $str, $count = NULL)

 /**
  * UTF-8 aware alternative to str_split
  * Convert a string to an array
  *
  * @static
  * @access public
  * @param string UTF-8 encoded
  * @param int number to characters to split string by
  * @return array
  * @see http://www.php.net/str_split
 */
 function str_split($str, $split_len = 1)

 /**
  * UTF-8 aware alternative to strcasecmp
  * A case insensivite string comparison
  *
  * @static
  * @access public
  * @param string string 1 to compare
  * @param string string 2 to compare
  * @return int < 0 if str1 is less than str2; > 0 if str1 is
         *         greater than str2, and 0 if they are equal.
  * @see http://www.php.net/strcasecmp
 */
 function strcasecmp($str1, $str2)

 /**
  * UTF-8 aware alternative to strcspn
  * Find length of initial segment not matching mask
  *
  * @static
  * @access public
  * @param string
  * @param string the mask
  * @param int Optional starting character position (in characters)
  * @param int Optional length
  * @return int the length of the initial segment of str1 which does not
         *         contain any of the characters in str2
  * @see http://www.php.net/strcspn
 */
 function strcspn($str, $mask, $start = NULL, $length = NULL)

 /**
  * UTF-8 aware alternative to stristr
  * Returns all of haystack from the first occurrence of needle to the end.
  * needle and haystack are examined in a case-insensitive manner
  * Find first occurrence of a string using case insensitive comparison
  *
  * @static
  * @access public
  * @param string the haystack
  * @param string the needle
  * @return string the sub string
  * @see http://www.php.net/stristr
 */
 function stristr($str, $search)

 /**
  * UTF-8 aware alternative to strrev
  * Reverse a string
  *
  * @static
  * @access public
  * @param string String to be reversed
  * @return string The string in reverse character order
  * @see http://www.php.net/strrev
 */
 function strrev($str)

 /**
  * UTF-8 aware alternative to strspn
  * Find length of initial segment matching mask
  *
  * @static
  * @access public
  * @param string the haystack
  * @param string the mask
  * @param int start optional
  * @param int length optional
  * @see http://www.php.net/strspn
 */
 function strspn($str, $mask, $start = NULL, $length = NULL)

 /**
  * UTF-8 aware substr_replace
  * Replace text within a portion of a string
  *
  * @static
  * @access public
  * @param string the haystack
  * @param string the replacement string
  * @param int start
  * @param int length (optional)
  * @see http://www.php.net/substr_replace
 */
 function substr_replace($str, $repl, $start, $length = NULL )

 /**
  * UTF-8 aware replacement for ltrim()
  * Strip whitespace (or other characters) from the beginning of a string
  * Note: you only need to use this if you are supplying the charlist
  * optional arg and it contains UTF-8 characters. Otherwise ltrim will
  * work normally on a UTF-8 string
  *
  * @static
  * @access public
  * @param string the string to be trimmed
  * @param string the optional charlist of additional characters to trim
  * @return string the trimmed string
  * @see http://www.php.net/ltrim
 */
 function ltrim( $str, $charlist = FALSE )

 /**
  * UTF-8 aware replacement for rtrim()
  * Strip whitespace (or other characters) from the end of a string
  * Note: you only need to use this if you are supplying the charlist
  * optional arg and it contains UTF-8 characters. Otherwise rtrim will
  * work normally on a UTF-8 string
  *
  * @static
  * @access public
  * @param string the string to be trimmed
  * @param string the optional charlist of additional characters to trim
  * @return string the trimmed string
  * @see http://www.php.net/rtrim
 */
 function rtrim( $str, $charlist = FALSE )

 /**
  * UTF-8 aware replacement for trim()
  * Strip whitespace (or other characters) from the beginning and end of a string
  * Note: you only need to use this if you are supplying the charlist
  * optional arg and it contains UTF-8 characters. Otherwise trim will
  * work normally on a UTF-8 string
  *
  * @static
  * @access public
  * @param string the string to be trimmed
  * @param string the optional charlist of additional characters to trim
  * @return string the trimmed string
  * @see http://www.php.net/trim
 */
 function trim( $str, $charlist = FALSE )

 /**
  * UTF-8 aware alternative to ucfirst
  * Make a string's first character uppercase
  *
  * @static
  * @access public
  * @param string
  * @return string with first character as upper case (if applicable)
  * @see http://www.php.net/ucfirst
 */
 function ucfirst($str)

 /**
  * UTF-8 aware alternative to ucwords
  * Uppercase the first character of each word in a string
  *
  * @static
  * @access public
  * @param string
  * @return string with first char of each word uppercase
  * @see http://www.php.net/ucwords
 */
 function ucwords($str)

相关文章:
[翻译]Joomla! UTF-8 规范 (WIP) 二

发表您的文章评论

您的姓名 (昵称)
标题:
评分: 很差一般较好很好
评论:
验证码:
请输入验证码