GD输出汉字的函数的分析

把GB码转变成UTF8,php中TTF扶植UTF8编码的非ASCII字符输出.在言之有序这段代码之后,发掘能够兑现粤语与ASCII混合输出图象,那样在大家操作图象函数时能够更方便.
代码如下: ? function gb2utf8($gb卡塔尔国 { if(!trim($gb)) return $gb;
$filename=gb2312.txt; $tmp=file($filename); $codetable=array();
while(list($key,$value)=each($tmp))
$codetable[hexdec(substr($value,0,6))]=substr($value,7,6); $utf8=;
while($gb) { if (ord(substr($gb,0,1))127) { $this=substr($gb,0,2);
$gb=substr($gb,2,strlen($gb)-2);
$utf8.=u2utf8(hexdec($codetable[hexdec(bin2hex($this))-0x8080])); }
else { $this=substr($gb,0,1); $gb=substr($gb,1,strlen($gb)-1);
$utf8.=u2utf8($this); } } /*$ret=; for($i=0;$istrlen($utf8);$i+=3)
$ret.=chr(substr($utf8,$i,3)); return $ret;*/ return $utf8; } function
u2utf8($c) { /*for($i=0;$icount($c);$i++)*/ $str=; if ($c 0x80) {
$str.=$c; } else if ($c 0x800) { $str.=chr(0xC0 | $c6); $str.=chr(0x80 |
$c & 0x3F); } else if ($c 0x10000) { $str.=chr(0xE0 | $c12);
$str.=chr(0x80 | $c6 & 0x3F); $str.=chr(0x80 | $c & 0x3F); } else if ($c
0x200000) { $str.=chr(0xF0 | $c18); $str.=chr(0x80 | $c12 & 0x3F);
$str.=chr(0x80 | $c6 & 0x3F); $str.=chr(0x80 | $c & 0x3F); } return
$str; } ? ——————————————– ? Header
(Content-type: image/jpeg); $im = imagecreate (800, 400); $black =
ImageColorAllocate ($im, 0, 0, 0); $white = ImageColorAllocate ($im,
255, 255, 255); include(gb2utf8.php); $str=gb2utf8(aaa中过32434);
ImageTTFText ($im, 90, 10, 110, 300, $white,
/usr/share/fonts/default/TrueType/simsun.ttc, $str); ImageJPEG ($im);
ImageDestroy ($im); ?

很早早前找到三个把GB码转变为UTF-8的函数,合作二个GB到UNICODE的相比较表,用于在GD中输出汉字。后来察觉在欲输出的内容中带有西方文字字符时,会产出混乱。后来找到了改造后的代码,化解了难点。现将四个函数做一相比深入分析如下。
首先,那是多少个UNICODE到UTF-8编码调换的函数,这一有的改良前后没有生成:
function u2utf8($cState of Qatar { for($i=0;$icount($cState of Qatar;$i++State of Qatar $str=””; if ($c 0x80卡塔尔 {
$str.=$c; } else if ($c 0x800State of Qatar { $str.=(0xC0 | $c6State of Qatar; $str.=(0x80 | $c
0x3F卡塔尔(قطر‎; } else if ($c 0x10000State of Qatar { $str.=(0xE0 | $c12卡塔尔(قطر‎; $str.=(0x80 | $c6
0x3F卡塔尔国; $str.=(0x80 | $c 0x3F卡塔尔(قطر‎; } else if ($c 0x二零零四00卡塔尔 { $str.=(0xF0 |
$c18卡塔尔; $str.=(0x80 | $c12 0x3FState of Qatar; $str.=(0x80 | $c6 0x3F卡塔尔国; $str.=(0x80 |
$c 0x3F卡塔尔; } return $str; }
这里完全都以依照UTF-8编码的平整,通过判别字符归属差异的UNICODE编码段范围,举办分歧的活动和位与操作,以转账为UTF-8编码。关于该法规可参谋上的辨证。
那是改进前的GB转变为UTF-8编码的函数,在那之中调用了地方的u2utf8函数。
function gb2utf8($gb卡塔尔(قطر‎/* Program writen by sadly */ { if(!trim($gb))
return $gb; $filename=”gb2312.txt”; $tmp=file($filename);
$codetable=array(); while(list($key,$value)=each($tmp))
$codetable[hexdec(substr($value,0,6))]=substr($value,7,6); $utf8=””;
while($gb) { if (ord(substr($gb,0,1))127) { $this=substr($gb,0,2);
$gb=substr($gb,2,strlen($gb));
$utf8.=u2utf8(hexdec($codetable[hexdec(bin2hex($this))-0x8080]卡塔尔国卡塔尔(قطر‎; }
else { $gb=substr($gb,1,strlen($gb卡塔尔(قطر‎卡塔尔; $utf8.=u2utf8(substr($gb,0,1卡塔尔卡塔尔(قطر‎; }
} $ret=””; for($i=0;$istrlen($utf8卡塔尔国;$i+=3卡塔尔国$ret.=chr(substr($utf8,$i,3卡塔尔State of Qatar; return $ret; }
函数中while循环部分,把汉字每个遵照“对照表”转变为UNICODE,再通过u2utf8函数转变为UTF-8。但从当中能够看看,while循环甘休后,又用二个for循环,把每三个字节合成了一个UTF-8字符,未有思索到中间的西方文字字符。所以,倘诺欲输出的内容中不管是先河时现身西文字符,或是汉字个中穿插西文字符,转变为UTF-8后,都会被遵照“每多少个字节截取”的法门截开,导致乱码。
以下是改进后的函数: function gb2utf8($gb卡塔尔/* Program writen by
sadlymodified by agun */ { if(!trim($gb)) return $gb;
$filename=”gb2312.txt”; $tmp=file($filename); $codetable=array();
while(list($key,$value)=each($tmp))
$codetable[hexdec(substr($value,0,6))]=substr($value,7,6); $ret=””;
$utf8=””; while($gb) { if (ord(substr($gb,0,1))127) {
$this=substr($gb,0,2); $gb=substr($gb,2,strlen($gb));
$utf8=u2utf8(hexdec($codetable[hexdec(bin2hex($this))-0x8080]卡塔尔State of Qatar;
for($i=0;$istrlen($utf8卡塔尔(قطر‎;$i+=3State of Qatar $ret.=chr(substr($utf8,$i,3卡塔尔卡塔尔国; } else {
$ret.=substr($gb,0,1State of Qatar; $gb=substr($gb,1,strlen($gb卡塔尔国卡塔尔(قطر‎; } } return $ret; }
修正后的函数将
GB转变为UNICODE、UNICODE转变为UTF-8、多少个字节合成叁个UTF-8字符,那多个步骤在八个周而复始里变成,非常是多少个字节合成五个UTF-8字符这一步骤,放在判定了字符归于西方文字依然归属汉字的基准分支里,据此决定截取叁个字节依然多少个字节。于是结果正确了!

You can leave a response, or trackback from your own site.

Leave a Reply

网站地图xml地图