===== PSCWS23 - ˵ĵ =====
$Id: readme.txt,v 1.3 2008/12/21 04:37:59 hightman Exp $

[  PSCWS23 ]

PSCWS23  hightman  2006 꿪Ĵ PHP ʵֵļķִϵͳڶ͵ļơ

PSCWS Ӣ PHP Simple Chinese Words Segmentation ͷĸд SCWS Ŀǰ
 SCWS Ϊ FTPHP ĿһĿչ 2008-12 ޶

SCWS һ׿Դѵķִϵͳṩõ PHP ӿڡ
Ŀҳhttp://www.ftphp.com/scws

PSCWS ĵڶ͵ýӿȫһ£ʵҲͨãڲִ㷨һеڶ
õƥN(ĬΪ2)᪷˫ƥȽڴʻƵȡš

ʹٶϵڶԿһЩ𲻴׼ȷҲɫ


[  ]

 demo.php ķִʵ, ϵͳ FreeBSD 6.2 , CPU Ϊ˫ǿ 3.0G

PSCWS2 - Ϊ 80, 535 ı,  ʱ 4.9 ,  44688 
         ִʾ 93.67%, ٻ 88.54% (F-1: 0.91)

PSCWS3 - Ϊ 80, 535 ı,  ʱ 6.8 ,  48181 
         ִʾ 92.99%, ٻ 87.91% (F-1: 0.90)

ͬȳı scws-1.0 (PHP չʽ) ʱ 0.65 (CΪ 0.17).
    ִʾ 95.60%, ٻ 90.51% (F-1: 0.93), ǿҽ߸ scws-1.0 (C)

עڵ CPU ĻҲ²ࡣ

[ ļṹ ]

  ļ                                         ʹñ?
  --------------------------------------------------------------
  dict/dict.xdb        - XDB ʽʵ              (Ҫļ)
  pscws2.class.php     - PSCWS ڶ  (Ҫļ)
  pscws3.class.php     - PSCWS   (Ҫļ)   
  dict.class.php       - ʵ              (Ҫļ)
  xdb_r.class.php      - XDB ʽȡ            (Ҫļ)

  demo.php             - ʾļ, ֧ web/ (ѡ)
  readme.txt           - ˵ļ                  (ѡ)

[ ʹ˵ ]

PSCWS2  PSCWS3 ӦļֱΪ pscws2.class.php  pscws3.class.php ֱΪ
ڶ漰档 PHP еĵ÷£

// ͷļ, õ3ļӦΪ pscws3.class.php
require '/path/to/pscws2.class.php';

// ִ, Ϊʵ·
$pscws = new PSCWS2('/path/to/dict/dict.xdb');

//
// , 趨һЩִʲѡ
// : set_dict, set_ignore_mark, set_autodis, set_debug ... ȷ
// 

//  segment ִдʻи, segment ĵڶΪص, ⽫ʹϵͳԶкõĴ
// ɵΪݸûصȥִУΪ򽫴ɵ鷵ء

$res = $pscws->segment($string);
print_r($res);

 رأصεã

function seg_cb($res) { print_r($res); }
$pscws->segment($string, 'seg_cb');

--- ෽ȫֲ ---
(ע: 캯ɴʵ·Ϊ,  set_dict Чһ)

class PSCWS2 { | class PSCWS3 {
  
  void set_dict(string dict_fpath);
  ˵÷ִõĴʵļ
  dict_fpath Ϊʵ·ڲݴʵ·ĺ׺ӦĴʽ
  ֵޡ
  д WARNING Ĵʾ

  void set_ignore_mark(bool set);
  ˵÷ִʽǷԱš
  set Ϊ͵ true  falseֱʾҪԺͲԡ
  ֵޡ

  void set_autodis(bool set);
  ˵÷ִ㷨ǷԶʶ
  set Ϊ͵ true  falseֱʾҪʶͲʶ
  ֵޡ

  void set_debug(bool set);
  ˵÷ִʹǷִʹ̵ĵϢ
  set Ϊ͵ true  falseֱʾҪͲ
  ֵޡ

  void set_statistics(bool set);
  ˵÷ִʹǷ¼ʻֵĴλá
  set Ϊ͵ true  falseֱʾҪ¼Ͳ¼
  ֵޡ
   segment() ִн get_statistics() ȡͳϢ

  Array &get_statistics(void);
  ˵ϴ segment() õķִʽĸʻֵĴλϢ(÷)
  ޡ
  ֵԴʻΪֵɴ(times)(poses)λбɡ
  ÷Ӧ segment() ãÿ segment() ǰͳϢԶ㡣

  mixed &segment(string text [, string cb]);
  ˵ַ text ִзִʡ
  text ΪҪִзִʵַ
        cb ǴִʽĻصƣкõĴɵһ
  ֵ cb ûдʱкõĴɵ(÷ʽ)
          ûصִʽֱӷ true
  cb һ segment() пǶεõġ
        ûд cb segment()  text ִʽٽһηأ
	 text ܳʱٶȽ齫 text ԵĻбзֺε
	segment() дЧʣ
};

[ ڴʵ ]

PSCWS23 ֵ֧ĴʵʽXDBSQLiteCDB/GDBM  Txt ıʽʵĺ׺
Զʶ𣨺׺ΪСд磺dict.xdb, dict.sqlite ...

ĿǰƼĬϲ XDB ʽרΪ SCWS Ҳô PHP ʵֵ XTreeDBЧ
ǳ CDB Կ졣

ʽ飬һҲƼʹã CDB/GDBM Ҫ PHP  dba չؿ⺯
ѡ --enable-dba --with-cdb --with-gdbm

ṩĬϴʵͨõĻϢʻ㼯Լ 26 ʡҪƴʵ
;ϵܻշѡ

[ ע ]

PSCWS23 ɴ PHP ʵ֣ͬĴʵʽҪʵ PHP չ֧֣ĬƼĴʵʽ
ѾΪ XDB ԭCDBҪⲿչ֧֡

PSCWS23 õڸְ汾 PHP4  PHP5 ϣ֧ GBK ַϵͳ
 UTF-8 ַʺñϵͳμĿҳϵ scws-1.0.0 ׹֧ GBK
 UTF-8 ַͬʱִ֧ԱעȡעBIG5 ַ԰ GBK ַ

ṩصĴʵ Intel ܹƽ̨ģŵܹĻпܻ⵼
дȫ󣨵͵磺Sparc ܹ Solaris/SunOS У뼰ʱ
ϵѰ

[ ϵ ]

SCWS Ŀվhttp://www.ftphp.com/scws
ҵĸ Emailhightman2@yahoo.com.cn   һֱţлл


--

2008.12.20 - hightman
