XMLドキュメントをリレーショナルのテーブルにマッピングする。 (Store a XML document int a relational table.)
REXMLのクラスのうち主要なもの (main REXML classes)
XML Declaration (XML宣言) | REXML::XMLDecl |
Document Type Definition (文書型定義) | REXML::DocType |
Comment (コメント) | REXML::Comment |
Element (要素) | REXML::Element |
CDATA Section (CData セクション) | REXML::CData |
Processing Instruction (処理命令) | REXML::Instruction |
Text (テキスト) | REXML::Text |
REXML::ElementDecl | |
REXML::NotationDecl | |
REXML::AttlistDecl | |
REXML::Entity |
create table R ( start integer primary key not null, end INTEGER UNIQUE not null, parent INTEGER not null, level INTEGER not null, class TEXT not null, tagname text, path text, text TEXT not null );
◆ sqlite3 での操作手順例 (Table definition using sqlite3)
sqlite3 mydb create table R ( start integer primary key not null, end INTEGER UNIQUE not null, parent INTEGER not null, level INTEGER not null, class TEXT not null, tagname text, path text, text TEXT not null ); .exit
◆ Windows での実行結果例 (in case of Windows)
◆ Linux での実行結果例 (in canse of Linux)
ファイル hoge.xml を読み込み,テーブル R(start, end, parent, level, class, tagname, path, text) に格納するプログラム (The program read a 'hoge.xml' file and store it into the table R(start, end, parent, level, class, tagname, path, text))
次のプログラムを conv.rb のようなファイル名で保存する. (The program file name is 'conv.rb')
require "rexml/document" f = open("hoge.xml") source = f.read f.close def pre(n, parent, line, level, pathbase, tagname) line << "insert into R (start, parent, level, class, text, tagname, path, end) values(" line << $number.to_s line << ", " line << parent.to_s line << ", " line << level.to_s line << ", '" line << n.class.to_s line << "', '" str = n.to_s # text # SQL escape \' \" \\ \% \_ \n \t line << str.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t") if n.is_a?(REXML::Element) then if ( tagname.size == 0 ) then line << "', NULL" else line << "','" # SQL escape \' \" \\ \% \_ \n \t line << tagname.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t") line << "'" end else line << "', NULL" end # generate path if ( pathbase.size == 0 ) then path = tagname else if ( tagname.size == 0 ) then path = pathbase else path = pathbase + "/" + tagname end end # print path if ( path.size == 0 ) then line << ", NULL, " else # SQL escape \' \" \\ \% \_ \n \t line << ", '" line << path.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t") line << "', " end end def post(n, line) line << $number.to_s line << ");" end def trav(n, parent, level, pathbase, tagname) line = "" $number = $number + 1 # extract tag name tagname = "" if n.is_a?(REXML::Element) then tagname = ((n.to_s.scan(/^<[^>]*/)[0]).scan(/[^\s]*/)[0]).gsub(/^</,"").gsub(/\/$/,"") if ( tagname == "?xml" ) then # "?xml" is not tag name! tagname = "" elsif ( tagname == "?xml-stylesheet" ) then # "?xml-stylesheet" is not tag name! tagname = "" end end pre(n, parent, line, level, pathbase, tagname) current = $number if n.is_a?(REXML::Parent) then n.children.each {|e| if ( pathbase.size == 0 ) then path = tagname else path = pathbase + "/" + tagname end trav(e, current, level + 1, path, tagname) } end if n.is_a?(REXML::Element) then if n.has_attributes? then n.attributes.each_pair { |k,v| str = "" $number = $number + 1 str << "insert into R (start, parent, level, class, text, tagname, path, end) values(" str << $number.to_s str << ", " str << parent.to_s str << ", " str << level.to_s str << ", '" str << "Attribute" str << "', '" # text # SQL escape \' \" \\ \% \_ \n \t str << k.to_s.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t") str << "=" # SQL escape \' \" \\ \% \_ \n \t str << v.to_s.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t") str << "', NULL, '" # SQL escape \' \" \\ \% \_ \n \t str << pathbase.gsub(/\\/,"\\\\\\").gsub(/'/,"\\\\'").gsub(/'/,"\\\"").gsub(/%/,"\\%").gsub(/_/,"\\_").gsub(/\n/,"\\n").gsub(/\t/,"\\\t") str << "', " $number = $number + 1 str << $number.to_s str << ");" puts str } end end $number = $number + 1 post(n, line) puts line end $doc = REXML::Document.new source $number = 0 puts "begin transaction;" trav($doc, $number, 0, "", "") puts "commit;"
◆ テスト用に使う XML ドキュメント (an example XML document for a test).
次のような XML ドキュメントを,ファイル名 hoge.xml で保存 (Store the following file. the file name is 'hoge.xml').
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="www.hogehoge.com/css/orange.xsl" type="text/xsl" ?> <order_records> <order_record hoge="aaa"> <id>1</id> <year>2011</year> <month>12</month> <day>1</day> <customer_name>kaneko</customer_name> <product_name>orange A</product_name> <unit_price>1.2</unit_price> <qty>10</qty> <created_at>2011-12-01 10:42:48</created_at> </order_record> <order_record hoge="bbb"> <id>2</id> <year>2011</year> <month>12</month> <day>1</day> <customer_name>miyamoto</customer_name> <product_name>Apple M</product_name> <unit_price>2.5</unit_price> <qty>2</qty> <created_at>2011-12-01 10:42:59</created_at> </order_record> <order_record hoge="ccc"> <id>3</id> <year>2011</year> <month>12</month> <day>3</day> <customer_name>kaneko</customer_name> <product_name>orange B</product_name> <unit_price>1.2</unit_price> <qty>8</qty> <created_at>2011-12-03 11:08:56</created_at> </order_record> <order_record hoge="ddd"> <id>4</id> <year>2011</year> <month>12</month> <day>4</day> <customer_name>miyamoto</customer_name> <product_name>Apple L</product_name> <unit_price>3</unit_price> <qty>1</qty> <created_at>2011-12-04 15:35:12</created_at> </order_record> </order_records>
ruby conv.rb > hoge.sql sqlite3 mydb .read hoge.sql .exit
◆ Windows での実行結果例 (in case of Windows)
◆ Linux での実行結果例 (in case of Linux)
sqlite3 mydb SELECT * FROM R; .exit
◆ Windows での実行結果例 (in case of Windows)
◆ Linux での実行結果例 (in case of Linux)
sqlite3 mydb select distinct tagname FROM R; select distinct path FROM R; .exit
◆ Linux での実行結果例 (in case of Linux)
テーブルRから,新しいテーブル tag(tagid, tagname) を作る. (generate a new table tag(tagid, tagname) from table R).
sqlite3 mydb CREATE TEMPORARY TABLE T1 AS select distinct tagname FROM R WHERE tagname IS not null; CREATE TEMPORARY TABLE T2 AS SELECT OID AS tagid, * FROM T1; create table tag ( tagid integer primary key not null, tagname TEXT UNIQUE not null ); insert into tag(tagid, tagname) SELECT * FROM T2; SELECT * FROM tag; .exit
◆ Linux での実行結果例 (in case of Linux)
テーブルRから,新しいテーブル path(pathid, path) を作る. (generate a new table path(pathid, path) from table R).
sqlite3 mydb CREATE TEMPORARY TABLE T1 AS select distinct path FROM R WHERE path IS not null; CREATE TEMPORARY TABLE T2 AS SELECT OID AS pathid, * FROM T1; create table path ( pathid integer primary key not null, path TEXT UNIQUE not null ); insert into path(pathid, path) SELECT * FROM T2; SELECT * FROM path; .exit
◆ Linux での実行結果例 (in case of Linux)
テーブルRから,新しいテーブル elements(start, parent, level, pathid, tagid) を作る. (generate a new table elements(start, parent, level, pathid, tagid) from table R).
sqlite3 mydb CREATE TEMPORARY TABLE R1 AS SELECT start, parent, level, path.pathid, tag.tagid FROM R, tag, path WHERE R.tagname = tag.tagname AND R.path = path.path AND R.class = "REXML::Element"; create table elements ( start integer primary key not null, parent INTEGER not null, level INTEGER not null, pathid INTEGER not null, tagid INTEGER not null ); CREATE INDEX idxtagid ON elements(pathid); insert into elements(start, parent, level, pathid, tagid) SELECT start, parent, level, pathid, tagid FROM R1 ORDER BY pathid; DROP TABLE R1; SELECT start, parent, level, path, tagname FROM elements, tag, path WHERE elements.tagid = tag.tagid AND elements.pathid = path.pathid; .exit
◆ Linux での実行結果例 (in case of Linux)