Did you know that you can navigate the posts by swiping left and right?

Scraping the World

25 Oct 2016 . category: tech . Comments
#ruby

Hi guys!

A big part of my professional experience is decated to scraping. In this new post I’ll explain how to do a simple ruby script for scraping a website.

This post will be explain with ruby, You can select other language better than ruby in the “scraping world” like python.

You’ll need previus experience with HTML and HTTP flow.

The basic tools when you want to do scraping:

require 'nokogiri'
require 'tweakphoeus'

Nokogiri is a HTML parser. We need it for manage the response. The response always will be a string and we need convert it to the correct format. Tweakphoeus is a gem that help us to do HTTP conections.

I always do a principal method where I can see the “flow to scraping”. I think that it’s a good practice to isolate errors.

In our case we go to implement a method called extract_urls which will recive a string url. In the following lines can see the how initilize the Tweakphoeus client and do a request to the url recived. Now the response is saved and parse to HTML object because the real response is a simple string.

def extract_urls(url)

  @http = Tweakphoeus::Client.new()
  response = @http.get(url)
  page = Nokogiri::HTML(response.body)

  [...]

end

Me

I am Alejandro a software developer who lives in Spain, where I work on web engineering projects. I spend my spare time studying computer enginering.