Tuesday, December 28, 2010

Perl script to GET a web page

Recently I found it necessary to do a little web page scraping and concocted a little Perl script to help me get the job done. The script I've provided below contains a single function that will get the contents of a web page whose URL is passed as the sole parameter to the function. I would likely embed this function within another script to make it easily accessible. I've taken the time to place a lot of comments into the script itself so this post will be short and sweet. You'll need to have the CURL library installed in order for this script to work, and please make sure you are using the most recent version (at the time of writing this post it is 3.12).

#!/usr/bin/perl

use strict;
use warnings;
use WWW::Curl::Easy;

# Script produced by Benjamin Crudo on December 28, 2010.
# Feel free to use it and learn from it without me suing you.

# Note this function expects a single parameter which should be in the form of a URL
# Here is an example of a valid call to this function:
# $web_page = &getWebPage("http://www.google.com");

sub getWebPage 
{
  # Setting up the Curl parameters
  my $curl = WWW::Curl::Easy->new; # create a variable to store the curl object

  # A parameter set to 1 tells the library to include the header in the body output.
  # This is only relevant for protocols that actually have headers preceding the data (like HTTP).
  $curl->setopt(CURLOPT_HEADER, 1);

  # Setting the target URL to retrieve with the passed parameter
  $curl->setopt(CURLOPT_URL, @_);

  # Declaring a variable to store the response from the Curl request
  my $response_body = '';

  # Creating a file handle for CURL to output to, then redirecting our output to the $response_body variable
  open(my $fileb, ">",\$response_body) or die $!;
  $curl->setopt(CURLOPT_WRITEDATA, $fileb);

  # getting the return code from the header to see if the GET was successful
  my $return_code = $curl->perform;

  # capturing the response code from the GET request in the HTTP header, i.e... 200, 404, 500, etc...
  # 200 is success
  my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE);  

  # if the return code is zero than the request was a success
  if ($return_code == 0)
  {    
    # A little debug output to keep you informed
    print ("Success ". $response_code.": ".@_."\n");

    # return whatever was contained on the web page that we just got using a GET
    return $response_body;
  }

  else
  {
    print ("Failure ". $response_code.": ".@_."\n");
  }

  close($fileb); # close the file-handle
}


Monday, December 27, 2010

MySQL Reference Book

Before you begin reading my code please take note of the following conventions.
Wherever I expect you to replace my generic code with your own specific code I have placed square brackets around the term, such as this [replaceMe]. Also please note that it is conventional to place SQL commands in upper-case letters to distinguish those terms from your input.


Ubuntu Commands:
Start MySQL server:
mysql -u [user_name] -p

SQL Commands:
Show all databases on server:
SHOW DATABASES;

Delete a database:
DROP DATABASE [database_name];
Note this function will return the number of tables that were removed. This corresponds to the number of .frm files removed.

Select database for use:
USE [database_name];

Show tables in a database:
SHOW TABLES;

Show a table's schema:
DESCRIBE [table_name];

Add a column to an existing table:
ALTER TABLE [table_name] ADD COLUMN [column_name] varchar(255); 
Note in this case the varchar(255) data type can be replaced by any valid SQL data type. Also further restrictions can be placed after the data type just like in a CREATE statement for foreign keys, NOT NULL, etc...

Java and MySQL
In order to connect your Java program to a MySQL database there are a couple of things you need to handle first. Obviously you need a MySQL database server to be running somewhere, locally or remotely. Next you will need the MySQL driver mysql-connector-java-5.1.14-bin.jar file to be included in the class path of your Java project. Please take care that you don't skip the last step and you will never be able to achieve a connection to the database without the driver properly installed, and most unfortunately the compiler will not return any meaningful errors other than a connection to the database is not possible. Next you'll need some Java code which I provide below and a table in the database which you can query. Here is a sample program that connects to a database called taskManager and performs a query to return all of the data from the singleTask table.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;


public class MysqlDriver {
 
 public static void main (String[] args) {
  MysqlDriver driver = new MysqlDriver();
 }
 
 private String db_path = "localhost:3306";
 private String db_name = "taskManager";
 private String db_user = "root";
 private String db_pass = "secret";
 private String db_url = "jdbc:mysql://"+db_path+"/"+db_name;
 
 public MysqlDriver() {
  
  try {
   Class.forName("com.mysql.jdbc.Driver");
   Connection db = DriverManager.getConnection(db_url, db_user, db_pass);
    System.out.println("Connected to the database");
   
   String statement = "SELECT * FROM singleTask";
   PreparedStatement db_statement = db.prepareStatement(statement);
   ResultSet result_set = db_statement.executeQuery();
   
   while(result_set.next()) {
    String oid = result_set.getString("oid");
    String desc = result_set.getString("description");
    System.out.println("ObjectID: "+oid+"\tDescription: "+desc);
   }
   
  }
  catch(Exception e) {
   System.out.println("We could not connect to the database located at: " + db_path);
  }
 }
 
}


References:
Online Marketing