Browse the Ruby on Rails Community.

You are here: Forums Ask a Rails expert Parsing HTML and XML Documents...

Replytotopic

Parsing HTML and XML Documents

Posted in Forums : Ask a Rails expert

 
Profile

Authority 12
Posting Rating 65
Sign in to rate this post

Hi,

I want to parse an HTML file (http://xmlfeed.jobcentral.com/) for the HYPERLINKS it has on the page.

I want to store all the Hyper Links present on the page to MySQL database.

Once this is done, Then the links are associated to different XML files. I also want the data present on the XML to be stored in the database.

Can we directly parse the XML files present there or we should parse the hpyerlinks present and then should go ahead with XML parsing…?

Which method is more efficient..?

Can we directly parse the XML files present on the website..?

 
20064666954644d813e6326

Authority 0
Posting Rating 79
Sign in to rate this post

Hpricot rules and should do perfect for the parsing..
http://code.whytheluckystiff.net/hpricot/

As for the other part, you are going to grab all their listings an store them in your database? If its specific to just this site and that xml.. I would just make a Job model and code up a rake task that parses the xml and stores it in the job model. They even have a guid attribute so you can easily avoid dupe jobs..

 
Dsc00495-medium;brt:55

Authority 37
Posting Rating 70
Sign in to rate this post

Hi Piyush, there are many methods to parse the HTML and XML data,
I have worked a lot in this field and defenately can tell you following are best options:
1.Usr Rubyful Soup (its a gem) You can get more info at http://www.crummy.com/software/RubyfulSoup/ http://www.crummy.com/software/RubyfulSoup/documentation.html
2.Use Hpricot and Mechanize
3.For feeds use feed_tools to read the feeds and rubyful_soup to parse the data or Hpricot
in case any difficulty, mail me saurabh[at]railsworkways[dot]com

Replytotopic

Other Recent Topics

Ask a Rails expert : how to execute the url from my controller

Ask a Rails expert : conditional action caching multiple mongrel clusters

Ask a Rails expert : Using like command in RoR

Ask a Rails expert : Apache ActionController:RoutingError

Ask a Rails expert : Multiple csv file upload based on data in a form

Ask a Rails expert : Exception error code

Ask a Rails expert : Tracking down an issue

Ask a Rails expert : Thread Vs Transaction

Ask a Rails expert : implementing whitelist plugin

Ask a Rails expert : Validation helper

Formatting Help
  • *bold*       _italics_      
    bq. (quotes)
  • "DSC":http://www.dsc.net
  • * or # (lists)
or cancel