To parse a snippets.dzone.com post Nokogiri was used to read the XML document, and then REXML was used to parse each individual post item.

require 'nokogiri'
require 'open-uri'
require 'rexml/document'
include REXML

html_url = 'http://snippets.dzone.com/user/jrobertson/tag/nokogiri'
doc = Nokogiri::HTML(open(html_url))
doc2 = Document.new(doc.xpath('html/body').to_xml)

# multi-post: class="post"

entries = XPath.match(doc2.root, 'div/div[3]/div[5]/div/div').map do |post|
title, raw_id = XPath.match(post, 'h3/a/text() | h3/a/attribute::href').map &:to_s
id = raw_id[/\d+$/]
body, metadata = XPath.match(post, 'div | div[2]')
tags = XPath.match(metadata, "a[@class='tag ']/text()").map &:to_s

# convert
 (containing code) to 


XPath.each(body, 'pre[span]') do |code|
raw_code = code.to_s

code_tag = Element.new('code').add_text raw_code.gsub(/<\/?[^>]*>/, "")
code.parent.insert_before code, code_tag
code.parent.delete code
end

{title: title, id: id, body: body.to_s[23..-7], tags: tags}

end
puts entries.map{|x| x[:title]}


Transform dynarex.xsl to Nokogiri XML::Builder code
Introducing the PrettyXML gem
Pretty print XML from Nokogiri's XML::Builder
Reading an HTML file with help from an RSF job
S-Rscript: Edit Dynarex records from a flat file view
Fetch the weather conditions in XML format
Sinatra-Rscript: Supplying a parameter to Nokogiri::XSLT
Using Nokogiri's XML::Builder
Reliably converting from a Nokogiri to a REXML document
Traversing XML using REXML or Nokogiri



puts entries[1][:title]
#=> Introducing the PrettyXML gem

puts entries[1][:id]
#=> 11025

puts entries[1][:body]
#=> The PrettyXML gem accepts a string of XML and ...

p entries[1][:tags]
#=> ["ruby", "gem", "nokogiri", "pretty-xml"]

Read more: http://feeds.dzone.com/~r/dzone/snippets/~3/b2xm-tqKCFs/11727