Ruby の REXML を使って XML ドキュメントをリレーショナル・データベースのテーブルに格納する (Store a XML Document into a relational table using REXML of ruby)
XMLドキュメントをリレーショナルのテーブルにマッピングする。 (Store a XML document int a relational table.)
REXMLのクラスのうち主要なもの (main REXML classes)
XML Declaration (XML宣言) | REXML::XMLDecl |
Document Type Definition (文書型定義) | REXML::DocType |
Comment (コメント) | REXML::Comment |
Element (要素) | REXML::Element |
CDATA Section (CData セクション) | REXML::CData |
Processing Instruction (処理命令) | REXML::Instruction |
Text (テキスト) | REXML::Text |
REXML::ElementDecl | |
REXML::NotationDecl | |
REXML::AttlistDecl | |
REXML::Entity |
前もって決めておく事項
- 使用するデータベース管理システム: sqlite3
- データベースファイル名: mydb
- 格納する XML ドキュメントのファイル名: hoge.xml
- Ruby のプログラム名 conv.rb
テーブル定義 (table definition)
create table R (
start integer primary key not null,
end INTEGER UNIQUE not null,
parent INTEGER not null,
level INTEGER not null,
class TEXT not null,
tagname text,
path text,
text TEXT not null );
◆ sqlite3 での操作手順例 (Table definition using sqlite3)
sqlite3 mydb
create table R (
start integer primary key not null,
end INTEGER UNIQUE not null,
parent INTEGER not null,
level INTEGER not null,
class TEXT not null,
tagname text,
path text,
text TEXT not null );
.exit
◆ Windows での実行結果例 (in case of Windows)

◆ Linux での実行結果例 (in canse of Linux)

テーブルへの格納を行うプログラム (a program to store a XML document into the table 'R')
ファイル hoge.xml を読み込み,テーブル R(start, end, parent, level, class, tagname, path, text) に格納するプログラム (The program read a 'hoge.xml' file and store it into the table R(start, end, parent, level, class, tagname, path, text))
次のプログラムを conv.rb のようなファイル名で保存する. (The program file name is 'conv.rb')
require "rexml/document"
f = open("hoge.xml")
source = f.read
f.close
def pre(n, parent, line, level, pathbase, tagname)
line << "insert into R (start, parent, level, class, text, tagname, path, end) values("
line << $number.to_s
line << ", "
line << parent.to_s
line << ", "
line << level.to_s
line << ", '"
line << n.class.to_s
line << "', '"
str = n.to_s
# text
# SQL escape \' \" \\ \% \_ \n \t
line << str.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t")
if n.is_a?(REXML::Element) then
if ( tagname.size == 0 ) then
line << "', NULL"
else
line << "','"
# SQL escape \' \" \\ \% \_ \n \t
line << tagname.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t")
line << "'"
end
else
line << "', NULL"
end
# generate path
if ( pathbase.size == 0 ) then
path = tagname
else
if ( tagname.size == 0 ) then
path = pathbase
else
path = pathbase + "/" + tagname
end
end
# print path
if ( path.size == 0 ) then
line << ", NULL, "
else
# SQL escape \' \" \\ \% \_ \n \t
line << ", '"
line << path.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t")
line << "', "
end
end
def post(n, line)
line << $number.to_s
line << ");"
end
def trav(n, parent, level, pathbase, tagname)
line = ""
$number = $number + 1
# extract tag name
tagname = ""
if n.is_a?(REXML::Element) then
tagname = ((n.to_s.scan(/^<[^>]*/)[0]).scan(/[^\s]*/)[0]).gsub(/^</,"").gsub(/\/$/,"")
if ( tagname == "?xml" ) then
# "?xml" is not tag name!
tagname = ""
elsif ( tagname == "?xml-stylesheet" ) then
# "?xml-stylesheet" is not tag name!
tagname = ""
end
end
pre(n, parent, line, level, pathbase, tagname)
current = $number
if n.is_a?(REXML::Parent) then
n.children.each {|e|
if ( pathbase.size == 0 ) then
path = tagname
else
path = pathbase + "/" + tagname
end
trav(e, current, level + 1, path, tagname)
}
end
if n.is_a?(REXML::Element) then
if n.has_attributes? then
n.attributes.each_pair { |k,v|
str = ""
$number = $number + 1
str << "insert into R (start, parent, level, class, text, tagname, path, end) values("
str << $number.to_s
str << ", "
str << parent.to_s
str << ", "
str << level.to_s
str << ", '"
str << "Attribute"
str << "', '"
# text
# SQL escape \' \" \\ \% \_ \n \t
str << k.to_s.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t")
str << "="
# SQL escape \' \" \\ \% \_ \n \t
str << v.to_s.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t")
str << "', NULL, '"
# SQL escape \' \" \\ \% \_ \n \t
str << pathbase.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t")
str << "', "
$number = $number + 1
str << $number.to_s
str << ");"
puts str
}
end
end
$number = $number + 1
post(n, line)
puts line
end
$doc = REXML::Document.new source
$number = 0
puts "begin transaction;"
trav($doc, $number, 0, "", "")
puts "commit;"
使ってみる (try to use)
◆ テスト用に使う XML ドキュメント (an example XML document for a test).
次のような XML ドキュメントを,ファイル名 hoge.xml で保存 (Store the following file. the file name is 'hoge.xml').
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="www.hogehoge.com/css/orange.xsl" type="text/xsl" ?>
<order_records>
<order_record hoge="aaa">
<id>1</id>
<year>2011</year>
<month>12</month>
<day>1</day>
<customer_name>kaneko</customer_name>
<product_name>orange A</product_name>
<unit_price>1.2</unit_price>
<qty>10</qty>
<created_at>2011-12-01 10:42:48</created_at>
</order_record>
<order_record hoge="bbb">
<id>2</id>
<year>2011</year>
<month>12</month>
<day>1</day>
<customer_name>miyamoto</customer_name>
<product_name>Apple M</product_name>
<unit_price>2.5</unit_price>
<qty>2</qty>
<created_at>2011-12-01 10:42:59</created_at>
</order_record>
<order_record hoge="ccc">
<id>3</id>
<year>2011</year>
<month>12</month>
<day>3</day>
<customer_name>kaneko</customer_name>
<product_name>orange B</product_name>
<unit_price>1.2</unit_price>
<qty>8</qty>
<created_at>2011-12-03 11:08:56</created_at>
</order_record>
<order_record hoge="ddd">
<id>4</id>
<year>2011</year>
<month>12</month>
<day>4</day>
<customer_name>miyamoto</customer_name>
<product_name>Apple L</product_name>
<unit_price>3</unit_price>
<qty>1</qty>
<created_at>2011-12-04 15:35:12</created_at>
</order_record>
</order_records>
テスト結果の例 (test result example)
ruby conv.rb > hoge.sql
sqlite3 mydb
.read hoge.sql
.exit
◆ Windows での実行結果例 (in case of Windows)

◆ Linux での実行結果例 (in case of Linux)

テーブル R の確認 (examine the table R)
sqlite3 mydb
SELECT * FROM R;
.exit
◆ Windows での実行結果例 (in case of Windows)

◆ Linux での実行結果例 (in case of Linux)

sqlite3 mydb
select distinct tagname FROM R;
select distinct path FROM R;
.exit
◆ Linux での実行結果例 (in case of Linux)

タグ名のテーブル (generate a table for tag name)
テーブルRから,新しいテーブル tag(tagid, tagname) を作る. (generate a new table tag(tagid, tagname) from table R).
sqlite3 mydb
CREATE TEMPORARY TABLE T1 AS select distinct tagname FROM R WHERE tagname IS not null;
CREATE TEMPORARY TABLE T2 AS SELECT OID AS tagid, * FROM T1;
create table tag (
tagid integer primary key not null,
tagname TEXT UNIQUE not null
);
insert into tag(tagid, tagname) SELECT * FROM T2;
SELECT * FROM tag;
.exit
◆ Linux での実行結果例 (in case of Linux)

パスのテーブル (generate a table for path)
テーブルRから,新しいテーブル path(pathid, path) を作る. (generate a new table path(pathid, path) from table R).
sqlite3 mydb
CREATE TEMPORARY TABLE T1 AS select distinct path FROM R WHERE path IS not null;
CREATE TEMPORARY TABLE T2 AS SELECT OID AS pathid, * FROM T1;
create table path (
pathid integer primary key not null,
path TEXT UNIQUE not null
);
insert into path(pathid, path) SELECT * FROM T2;
SELECT * FROM path;
.exit
◆ Linux での実行結果例 (in case of Linux)

REXML::Element に関するテーブル (generate a table for REXML::Elementh)
テーブルRから,新しいテーブル elements(start, parent, level, pathid, tagid) を作る. (generate a new table elements(start, parent, level, pathid, tagid) from table R).
sqlite3 mydb
CREATE TEMPORARY TABLE R1 AS SELECT start, parent, level, path.pathid, tag.tagid
FROM R, tag, path
WHERE R.tagname = tag.tagname
AND R.path = path.path
AND R.class = "REXML::Element";
create table elements (
start integer primary key not null,
parent INTEGER not null,
level INTEGER not null,
pathid INTEGER not null,
tagid INTEGER not null
);
CREATE INDEX idxtagid ON elements(pathid);
insert into elements(start, parent, level, pathid, tagid) SELECT start, parent, level, pathid, tagid
FROM R1 ORDER BY pathid;
DROP TABLE R1;
SELECT start, parent, level, path, tagname FROM elements, tag, path WHERE elements.tagid = tag.tagid AND elements.pathid = path.pathid;
.exit
◆ Linux での実行結果例 (in case of Linux)
