Ruby/CHISE

What's Ruby/CHISE?

Ruby/CHISE is a implementation of Chaon model.

What's Chaon model?

Chan model means a model that uses a character as a bundle of features not code point.

Ruby/CHISE extend the model and use a character as a object.

download

new

CHISE Character database

You should have CHISE Character database to use Ruby/CHISE. The CHISE Character database is distributed with XEmacs CHISE. For your convenience, I provide a CHISE Chacater database tarball.

  • 2003-11-04 : char-db.tar.gz for Windows
    Build XEmacs CHISE on GNU/Linux, use "make-chisedb-tarball.rb" to make tarball.
  • 2003-11-04 : char-db-linux.tar.bz2 for Linux
    make a tarball under "/usr/local/lib/xemacs-21.4.12/i686-pc-linux/char-db"

CVS access

You can access the CVS tree.

license

This software is released under GPL.See COPYING.

install

move the direcotory "chise" to somewhere.
eg.

# mv chise /usr/local/lib/ruby/site_ruby/1.8

Usually,it'll be on /usr/local/lib/ruby/site_ruby/.

Cd to "ext", input like this.

% ruby extconf.rb
% make
# make install

Ruby/CHISE fully functional without C extention. If you have it, it works faster.

config

In chise/config.rb and ext/config.h,

DB_DIR = '/usr/local/lib/xemacs-21.4.10/i686-pc-linux/char-db'
change these lines.

IDS_DB_DIR = '/home/eto/work/chise/ids/''
Store IDS Text database files here. (see below "ids" section.)

dependency

You need followings.

You can find Ruby package by RAA.

Unicode

Currentle, you should use utf-8 as a charset when using Ruby/CHISE.

You can use a editor "Meadow + Mule-UCS" to use Unicode on Windows.

There are some other free editor.

How to

Basic

require "chise"

str = "字"		# extends String. use UTF-8 as charset.
p str.ucs		# show the code_point in UCS.
p str.total_strokes	# show the total strokes.
p str.gb2312		# etc.
str.each_feature {|f, v|	# show the all features.
  print f, ": ", v, "\n"
}

Ideographics Structure

I designed Ruby/CHISE for use ideographic structure of Kanji character.

Ruby/CHISE uses IDS (Ideographic Description Structure) to describe the ideographics structure. This is a specification in Unicode. Start sequence with IDC(Ideographic Description Characters) (U+2FF0 to U+2FFB) that specify the connection of the parts. Followin two or three characters are composed.

preparation to use IDS

get IDS Text database by following,

% cd ~/work/chise (change as your environment)
% cvs -d :pserver:anonymous@cvs.m17n.org:/cvs/root login
password: (just hit return again)

% cvs -d :pserver:anonymous@cvs.m17n.org:/cvs/chise co -d ids ids

Then, change the line in chise/config.rb. IDS_DB_DIR = '/home/eto/work/chise/ids/''
Input the direcoty here. After setting IDS_DB_DIR, run this line. ./tools/idsdbdumpall.rb (takes time a lot.) Then, you'll have a feature like ids.

Decompose

There are two methods String#decompose, String#decompose_all. String#decompose decmopose one level. String#decompose_all decompose recursively.

p "字".decompose
p "字".decompose_all
p "榊".decompose
p "榊".decompose_all
p "終了".decompose
p "終了".decompose_all
p "鬱".decompose
p "鬱".decompose_all

The result is a IDS. Many environment can not show IDC correctly. You can see it with IE.

Compose

Use String#compose.

p "⿰木木".compose	# 林

You can find characters by using String#find method.

p "日雲".find

Output is a string of characters that contains "日" and "雲".

usage

Please see test case to usage.

class String
	char	convert it as Characgter.

class Character
	[]	get a feature.
		return nil if there is no such features.
	[]=	set a feature.

tools

define-kanji-meaning.rb
add new features about the meaning of Kanji Character in IDS.
dump-database.rb
This tool dumps all data of the database to text files. You can see the inside of the database.
make-chisedb-tarball.rb
Usage: make-chisedb-tarball.rb <directory of XEmacs CHISE> <tmp dir>
example.
 % ./make-chisedb-tarball.rb /usr/local/lib/xemacs-21.4.14/i686-pc-linux
This makes a tarball (chise-db.tar.gz) of the Chise Character DataBase files. Set a directory that contains "chise-db" directory to the first argument. You can input the temp directory to the second arg. (default value: /var/tmp) This tools also rename the characters that contains Windows forbidden characters. Please use eo to extract the tarball on Windows.
make-ids-database.rb
This tool read all of the IDS Text database, and store them as features. This tool take time a lot.
move-obsolete-files.rb
There are some obsolete features. This tool moves the obsolete files to anothre directory.
rename-files.rb
Currently, the directory tree of XEmacs CHISE and the requirement of libchise is not same. This tool renames the old files trees to the new file trees.
trim_bom.rb
This tool remove a BOM (Byte Order Mark) in the head of file.

description of each library

  • ext/*
    Extension using libchise by C.
  • network.rb, makegraph.rb, graphviz.rb, defkanji.rb, kanjilist.rb
    Calculate the network of Kanji characters. Make a graph by Graphiviz.
    ruby makegraph.rb
    

    You need Graphviz. Output is "min.svg".

    ruby defkanji.rb
    

    define the meaning under ideographics of Kanji characters.

  • stroke.rb, kage.rb, kageserver.rb, csf.rb
    Libraries to use StrokeFont. You can use two system, KAGE by Koichi Kamichi and CSF by Saka Naozumi. You need fonr files.
    change the line in csf.rb, CSF_FONT_DIR = 'd:/work/chise/csf/'.

    You need sgl (my own graphic library) to use this. sgl is not published yet.

    ruby stroke.rb
    

    Show a code table and you can see the character.

discussion

history

  • 2003-0110 : Test
  • 2003-0112 : add XString
  • 2003-0115 : add reading IDS Text DB
  • 2003-0116 : IDS Text DB 1.0
  • 2003-0117 : remove XString, move the methods to String. read more IDS Text DB.
  • 2003-0120 : add IDS_Tree. check integrity of IDS Tree structure.
  • 2003-0130 : add reverse translation of IDS.
  • 2003-0213 : ruby-chise-20030213.tar.bz2
    change the name to Ruby/CHISE from Ruby/UTF-2000.
  • 2003-0312 : ruby-chise-20030312.tar.bz2
    add some libraries.
  • 2003-1004 : ruby-chise-20031004.tar.bz2
    change feature names. add Copyright notice.
  • 2003-10-31 : presentation at LC2003.
  • 2003-1110 : ruby-chise-20031110.tar.bz2
    change directory structure to "chise/*.rb". add installer.
  • 2004-07-08 : ruby-chise-0.2.targz
    add libchise extension. make many changes.