Using cURL with PHP

he background

If you write a PHP program that is designed to go and fetch a webpage from the World Wide Web you soon find out that you are not allowed to because of a sensible restiction placed on the use of fopen(), simplexml_load_file and the like.

This restiction is absolutely vital to maintain the integrity of a shared server system and you would have to be ten bob short of a quid to allow the retrieval of all that untrusted material from the web. So an intermediate step is used. You can use cURL (http://uk.php.net/manual/en/ref.curl.php).

But cURL looks complicated, especially to me, and it just looks like something else to learn. But all you need is that webpage being fetched, so you only need a bit of cURL code to do that. The good news is that it has already been written. The bad news is that it is not obvious where to stick it!
Looking at the cURL and PHP code

So here is an example I have come up with to explain what I do.

This bit of code from the O'Reilly book Learning PHP 5 by David Sklar (Copyright 2004 O'Reilly Media, Inc., ISBN 0-596-00560-1) fetches a nice list of items from a Yahoo News RSS feed. 
The PHP portion

(save as rsayahoo.php)

<?php 
$xml = simplexml_load_file('http://rss.news.yahoo.com/rss/oddlyenough');
print "<ul>\n";
foreach ($xml->channel->item as $item){
  print "<li>$item->title</li>\n";
}
print "</ul>";
?> 

The cURL portion

But the simplexml_load_file in the function is not allowed out into the wild. So you need an intermediary step, cURL, to go and fetch the page.

So the cURL sample would be.


(save as geturl.php)

<?php
$ch = curl_init("http://rss.news.yahoo.com/rss/oddlyenough");
$fp = fopen("example_htmlpage.html", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?> 

(The page is fetched and re-written in the local file example_htmlpage.html, creating and/or overwriting it as necessary.)


Now when you upload this code to your web site folder and run it, you will get just a blank page! But it has done what it should and has fetched "http://rss.news.yahoo.com/rss/oddlyenough" and re-written it out as a local file in your web site folder.

Now, "example_htmlpage.html" is a local file and you can fopen() this to your hearts content.

Going back to the original problem, the rsayahoo.php program, you can upload it to your web site folder ( where "example_htmlpage.html" has just been written out) with this change for the simplexml_load_file option. :-
After the cURL portion has executed.

(upload as localrsayahoo.php)

<?php 
$xml = simplexml_load_file('example_htmlpage.html');
print "<ul>\n";
foreach ($xml->channel->item as $item){
  print "<li>$item->title</li>\n";
}
print "</ul>";
?>
This will now print out the news items as you wanted it to in the first place. Just like this:-

    "I'll have what Spot is having..." (Reuters)
    Loving your pets to death (Reuters)
    Ryanair boss offers "Fuel Monty" to Polish airline (Reuters)
    Gas station's shocking sign of times (Reuters)
    Activists get PM's ear from poster (Reuters)
    Lords rule in favor of women in key divorce cases (Reuters)
    Zoo apes have taste for red wine (Reuters)
    Well, just as long as they're not going overboard (Reuters)
    Qatari pays $2.75 mln for mobile phone number (Reuters)
    New taxpayers' association says be happy (Reuters)
    18,000 pounds of fireworks seized in N.Y. (AP)
    Fans think crowds sway referees - study (Reuters)
    Michigan bakery sells Hoffa cupcakes (AP)
    Joan Baez joins tree-sitting bid to save LA garden (Reuters)
    Dead horse washes ashore in woman's yard (AP)
    Murder defendant tries to strangle lawyer in court (Reuters)
    SoCal county carriers bitten by 94 dogs (AP)
    "I'll be back", Berlusconi tells world leaders (Reuters)
    Ind. male, in a dress, barred from prom (AP)
    Boo Boo the Chicken dies (AP)

Adding the cURL and PHP portions together
Although the two programs (the cURL part and the PHP) have been shown as two seperate actions you could of course combine them into one program.

<?php

$ch = curl_init("http://rss.news.yahoo.com/rss/oddlyenough");
$fp = fopen("example_homepage.html", "w");

curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

curl_exec($ch);
curl_close($ch);
fclose($fp);

$xml = simplexml_load_file('example_homepage.html');
print "<ul>\n";
foreach ($xml->channel->item as $item){
  print "<li>$item->title</li>\n";
}
print "</ul>";

?>

Replacement function for simplexml_load_file

This function uses the CURLOPT_RETURNTRANSFER option, so you don't have to write the results to a local file to retrieve them.

<?php

function My_simplexml_load_file($URL)
  {
  $ch = curl_init($URL);

  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_HEADER, 0);

  $xml = simplexml_load_string(curl_exec($ch));

  curl_close($ch);

  return $xml;
  }

?>