What's Ruby/CHISE?

Ruby/CHISE is a implementation of Chaon model.

What's Chaon model?

Chan model means a model that uses a character as a bundle of features not code point.

Ruby/CHISE extend the model and use a character as a object.



CHISE Character database

You should have CHISE Character database to use Ruby/CHISE. The CHISE Character database is distributed with XEmacs CHISE. For your convenience, I provide a CHISE Chacater database tarball.

  • 2003-11-04 : char-db.tar.gz for Windows
    Build XEmacs CHISE on GNU/Linux, use "make-chisedb-tarball.rb" to make tarball.
  • 2003-11-04 : char-db-linux.tar.bz2 for Linux
    make a tarball under "/usr/local/lib/xemacs-21.4.12/i686-pc-linux/char-db"

CVS access

You can access the CVS tree.


This software is released under GPL.See COPYING.


move the direcotory "chise" to somewhere.

# mv chise /usr/local/lib/ruby/site_ruby/1.8

Usually,it'll be on /usr/local/lib/ruby/site_ruby/.

Cd to "ext", input like this.

% ruby extconf.rb
% make
# make install

Ruby/CHISE fully functional without C extention. If you have it, it works faster.


In chise/config.rb and ext/config.h,

DB_DIR = '/usr/local/lib/xemacs-21.4.10/i686-pc-linux/char-db'
change these lines.

IDS_DB_DIR = '/home/eto/work/chise/ids/''
Store IDS Text database files here. (see below "ids" section.)


You need followings.

You can find Ruby package by RAA.


Currentle, you should use utf-8 as a charset when using Ruby/CHISE.

You can use a editor "Meadow + Mule-UCS" to use Unicode on Windows.

There are some other free editor.

How to


require "chise"

str = "字"		# extends String. use UTF-8 as charset.
p str.ucs		# show the code_point in UCS.
p str.total_strokes	# show the total strokes.
p str.gb2312		# etc.
str.each_feature {|f, v|	# show the all features.
  print f, ": ", v, "\n"

Ideographics Structure

I designed Ruby/CHISE for use ideographic structure of Kanji character.

Ruby/CHISE uses IDS (Ideographic Description Structure) to describe the ideographics structure. This is a specification in Unicode. Start sequence with IDC(Ideographic Description Characters) (U+2FF0 to U+2FFB) that specify the connection of the parts. Followin two or three characters are composed.

preparation to use IDS

get IDS Text database by following,

% cd ~/work/chise (change as your environment)
% cvs -d login
password: (just hit return again)

% cvs -d co -d ids ids

Then, change the line in chise/config.rb. IDS_DB_DIR = '/home/eto/work/chise/ids/''
Input the direcoty here. After setting IDS_DB_DIR, run this line. ./tools/idsdbdumpall.rb (takes time a lot.) Then, you'll have a feature like ids.


There are two methods String#decompose, String#decompose_all. String#decompose decmopose one level. String#decompose_all decompose recursively.

p "字".decompose
p "字".decompose_all
p "榊".decompose
p "榊".decompose_all
p "終了".decompose
p "終了".decompose_all
p "鬱".decompose
p "鬱".decompose_all

The result is a IDS. Many environment can not show IDC correctly. You can see it with IE.


Use String#compose.

p "⿰木木".compose	# 林

You can find characters by using String#find method.

p "日雲".find

Output is a string of characters that contains "日" and "雲".


Please see test case to usage.

class String
	char	convert it as Characgter.

class Character
	[]	get a feature.
		return nil if there is no such features.
	[]=	set a feature.


add new features about the meaning of Kanji Character in IDS.
This tool dumps all data of the database to text files. You can see the inside of the database.
Usage: make-chisedb-tarball.rb <directory of XEmacs CHISE> <tmp dir>
 % ./make-chisedb-tarball.rb /usr/local/lib/xemacs-21.4.14/i686-pc-linux
This makes a tarball (chise-db.tar.gz) of the Chise Character DataBase files. Set a directory that contains "chise-db" directory to the first argument. You can input the temp directory to the second arg. (default value: /var/tmp) This tools also rename the characters that contains Windows forbidden characters. Please use eo to extract the tarball on Windows.
This tool read all of the IDS Text database, and store them as features. This tool take time a lot.
There are some obsolete features. This tool moves the obsolete files to anothre directory.
Currently, the directory tree of XEmacs CHISE and the requirement of libchise is not same. This tool renames the old files trees to the new file trees.
This tool remove a BOM (Byte Order Mark) in the head of file.

description of each library

  • ext/*
    Extension using libchise by C.
  • network.rb, makegraph.rb, graphviz.rb, defkanji.rb, kanjilist.rb
    Calculate the network of Kanji characters. Make a graph by Graphiviz.
    ruby makegraph.rb

    You need Graphviz. Output is "min.svg".

    ruby defkanji.rb

    define the meaning under ideographics of Kanji characters.

  • stroke.rb, kage.rb, kageserver.rb, csf.rb
    Libraries to use StrokeFont. You can use two system, KAGE by Koichi Kamichi and CSF by Saka Naozumi. You need fonr files.
    change the line in csf.rb, CSF_FONT_DIR = 'd:/work/chise/csf/'.

    You need sgl (my own graphic library) to use this. sgl is not published yet.

    ruby stroke.rb

    Show a code table and you can see the character.



  • 2003-0110 : Test
  • 2003-0112 : add XString
  • 2003-0115 : add reading IDS Text DB
  • 2003-0116 : IDS Text DB 1.0
  • 2003-0117 : remove XString, move the methods to String. read more IDS Text DB.
  • 2003-0120 : add IDS_Tree. check integrity of IDS Tree structure.
  • 2003-0130 : add reverse translation of IDS.
  • 2003-0213 : ruby-chise-20030213.tar.bz2
    change the name to Ruby/CHISE from Ruby/UTF-2000.
  • 2003-0312 : ruby-chise-20030312.tar.bz2
    add some libraries.
  • 2003-1004 : ruby-chise-20031004.tar.bz2
    change feature names. add Copyright notice.
  • 2003-10-31 : presentation at LC2003.
  • 2003-1110 : ruby-chise-20031110.tar.bz2
    change directory structure to "chise/*.rb". add installer.
  • 2004-07-08 : ruby-chise-0.2.targz
    add libchise extension. make many changes.