Ruby/CHISE
■What's Ruby/CHISE?
Ruby/CHISE is a implementation of Chaon model.
What's Chaon model?
Chan model means a model that uses a character as a bundle of features not code point.
Ruby/CHISE extend the model and use a character as a object.
■download
new
CHISE Character database
You should have CHISE Character database to use Ruby/CHISE. The CHISE Character database is distributed with XEmacs CHISE. For your convenience, I provide a CHISE Chacater database tarball.
- 2003-11-04 : char-db.tar.gz for Windows
Build XEmacs CHISE on GNU/Linux, use "make-chisedb-tarball.rb" to make tarball. - 2003-11-04 : char-db-linux.tar.bz2 for Linux
make a tarball under "/usr/local/lib/xemacs-21.4.12/i686-pc-linux/char-db"
CVS access
You can access the CVS tree.
license
This software is released under GPL.See COPYING.
■install
move the direcotory "chise" to somewhere.
eg.
# mv chise /usr/local/lib/ruby/site_ruby/1.8
Usually,it'll be on /usr/local/lib/ruby/site_ruby/.
Cd to "ext", input like this.
% ruby extconf.rb % make # make install
Ruby/CHISE fully functional without C extention. If you have it, it works faster.
config
In chise/config.rb and ext/config.h,
DB_DIR = '/usr/local/lib/xemacs-21.4.10/i686-pc-linux/char-db'
change these lines.
IDS_DB_DIR = '/home/eto/work/chise/ids/''
Store IDS Text database files here. (see below "ids" section.)
dependency
You need followings.
- db3-3.2.9 or higher.
- bdb-0.5.0 or higher.
- ruby 1.8.1 or higher.
You can find Ruby package by RAA.
Unicode
Currentle, you should use utf-8 as a charset when using Ruby/CHISE.
You can use a editor "Meadow + Mule-UCS" to use Unicode on Windows.
There are some other free editor.
- Sakura editor
- Notepad.
- IE (to see)
■How to
Basic
require "chise" str = "字" # extends String. use UTF-8 as charset. p str.ucs # show the code_point in UCS. p str.total_strokes # show the total strokes. p str.gb2312 # etc. str.each_feature {|f, v| # show the all features. print f, ": ", v, "\n" }
■Ideographics Structure
I designed Ruby/CHISE for use ideographic structure of Kanji character.
Ruby/CHISE uses IDS (Ideographic Description Structure) to describe the ideographics structure. This is a specification in Unicode. Start sequence with IDC(Ideographic Description Characters) (U+2FF0 to U+2FFB) that specify the connection of the parts. Followin two or three characters are composed.
preparation to use IDS
get IDS Text database by following,
% cd ~/work/chise (change as your environment) % cvs -d :pserver:anonymous@cvs.m17n.org:/cvs/root login password: (just hit return again) % cvs -d :pserver:anonymous@cvs.m17n.org:/cvs/chise co -d ids ids
Then, change the line in chise/config.rb.
IDS_DB_DIR = '/home/eto/work/chise/ids/''
Input the direcoty here.
After setting IDS_DB_DIR, run this line.
./tools/idsdbdumpall.rb (takes time a lot.)
Then, you'll have a feature like ids.
Decompose
There are two methods String#decompose, String#decompose_all. String#decompose decmopose one level. String#decompose_all decompose recursively.
p "字".decompose p "字".decompose_all p "榊".decompose p "榊".decompose_all p "終了".decompose p "終了".decompose_all p "鬱".decompose p "鬱".decompose_all
The result is a IDS. Many environment can not show IDC correctly. You can see it with IE.
Compose
Use String#compose.
p "⿰木木".compose # 林
You can find characters by using String#find method.
p "日雲".find
Output is a string of characters that contains "日" and "雲".
■usage
Please see test case to usage.
class String char convert it as Characgter. class Character [] get a feature. return nil if there is no such features. []= set a feature.
tools
- define-kanji-meaning.rb
- add new features about the meaning of Kanji Character in IDS.
- dump-database.rb
- This tool dumps all data of the database to text files. You can see the inside of the database.
- make-chisedb-tarball.rb
-
Usage: make-chisedb-tarball.rb <directory of XEmacs CHISE> <tmp dir>
example. % ./make-chisedb-tarball.rb /usr/local/lib/xemacs-21.4.14/i686-pc-linux
This makes a tarball (chise-db.tar.gz) of the Chise Character DataBase files. Set a directory that contains "chise-db" directory to the first argument. You can input the temp directory to the second arg. (default value: /var/tmp) This tools also rename the characters that contains Windows forbidden characters. Please use eo to extract the tarball on Windows. - make-ids-database.rb
- This tool read all of the IDS Text database, and store them as features. This tool take time a lot.
- move-obsolete-files.rb
- There are some obsolete features. This tool moves the obsolete files to anothre directory.
- rename-files.rb
- Currently, the directory tree of XEmacs CHISE and the requirement of libchise is not same. This tool renames the old files trees to the new file trees.
- trim_bom.rb
- This tool remove a BOM (Byte Order Mark) in the head of file.
■description of each library
- ext/*
Extension using libchise by C. - network.rb, makegraph.rb, graphviz.rb, defkanji.rb, kanjilist.rb
Calculate the network of Kanji characters. Make a graph by Graphiviz.ruby makegraph.rb
You need Graphviz. Output is "min.svg".
ruby defkanji.rb
define the meaning under ideographics of Kanji characters.
- stroke.rb, kage.rb, kageserver.rb, csf.rb
Libraries to use StrokeFont. You can use two system, KAGE by Koichi Kamichi and CSF by Saka Naozumi. You need fonr files.
change the line in csf.rb, CSF_FONT_DIR = 'd:/work/chise/csf/'.You need sgl (my own graphic library) to use this. sgl is not published yet.
ruby stroke.rb
Show a code table and you can see the character.
- www.fonts.jp
- KAGE model
- KST32ZX, Zhuanwen-Like Kanji StrokeFont(KST)
- KST32A, Very compact JIS X 0208 Kanji (with no-Kanji) Stroke Table.
■discussion
compatibility with Ruby/M17N
How to make it compatible with Ruby/M17N.
■links
■history
- 2003-0110 : Test
- 2003-0112 : add XString
- 2003-0115 : add reading IDS Text DB
- 2003-0116 : IDS Text DB 1.0
- 2003-0117 : remove XString, move the methods to String. read more IDS Text DB.
- 2003-0120 : add IDS_Tree. check integrity of IDS Tree structure.
- 2003-0130 : add reverse translation of IDS.
- 2003-0213 : ruby-chise-20030213.tar.bz2
change the name to Ruby/CHISE from Ruby/UTF-2000. - 2003-0312 : ruby-chise-20030312.tar.bz2
add some libraries. - 2003-1004 : ruby-chise-20031004.tar.bz2
change feature names. add Copyright notice. - 2003-10-31 : presentation at LC2003.
- 2003-1110 : ruby-chise-20031110.tar.bz2
change directory structure to "chise/*.rb". add installer. - 2004-07-08 : ruby-chise-0.2.targz
add libchise extension. make many changes.