Build a customizable RSS feed aggregator in PHP
Realize the power of RSS in Ajax and Web 2.0 applications Summary: RSS (Rich Site Summary, RDF Site Summary, or Really Simple Syndication) has been around since the mid-1990s. Over the years, several variants of the RSS format have popped up and several claims have been made about its ownership. Despite these differences, RSS never ceased to serve its usefulness in distributing Web content from one Web site to many others. The popularity of RSS gave way to the growth of a new class of Web software called the feed reader, also known as the feed aggregator. Although there are several commercially available feed aggregators, it's easy to develop your own feed aggregator, which you can integrate with your Web applications. You'll appreciate this article's fully functional PHP code snippets, demonstrating the use of PHP-based server-side functions to develop a customizable RSS feed aggregator. In addition, you'll reap instant benefits from using the fully functional RSS feed aggregator code, which you can download from this article.
Date: 22 Jan 2008
Level: Advanced
Also available in: Chinese Russian Japanese
Activity: 31588 views
Comments: 5 (View | Add comment - Sign in)
Level: Advanced
Also available in: Chinese Russian Japanese
Activity: 31588 views
Comments: 5 (View | Add comment - Sign in)
RSS provides an elegant and widely accepted format for content providers to encode and syndicate the periodic changes made to their site's content. Content providers then make this content available to the entire Web, or to a subset of the Web, such as an internal syndication inside a company. On the other end of the spectrum are content consumers, who are caught up in the practice of jumping from one site to the other and filtering through page after page to locate just the information they need, even before beginning to digest that information. The magic of RSS feeds eliminates the need for this type of page hopping by enabling consumers to receive information on any selected category in one chunk. In addition, RSS feeds allow consumers to get these information highlights on a selected category from any number of content providers.
As already mentioned, RSS has two parts: syndication and feed aggregation. The main benefit of RSS feeds to consumers is their feed aggregation capability — the focus of this article. You will learn details about developing a customizable RSS feed aggregator that will suit the specific needs of content consumers.
Introduction
As Web-based information sources grow pervasively, content providers and content consumers are presented with several challenges. For content providers, a main challenge is their ability to stand out from the crowded competition and attract as many content consumers as possible to their content. Conversely, for content consumers, their main challenge is their ability to quickly get to the information in myriad categories offered by a multitude of content providers. Content consumers also have the burden of filtering out distractions, such as animated graphics and flashy advertisements, that can obscure the content they're trying to access.
Because of the pervasiveness of the Web in our lives, developers must focus on helping consumers access information quickly and avoid the clutter around desired content. Content providers have the growing need for targeting the delivery of dynamic content to several different consumer mediums, ranging from Web browsers to hand-held mobile devices. As outlined in the beginning of this article, RSS alleviates many concerns with syndicating and aggregating content. If you are new to RSS, you can get some background by reading the RSS-related articles referenced in the Resources section.
RSS brings so much potential to the Internet that it can be put to use in many different (and perhaps unanticipated) ways. Needless to say, executives of some major software companies have recently readjusted their core Internet missions to tap into the power of RSS. (For a discussion on this, take a look at the crisp and concise article by Dion Hinchcliffe (the founding Editor-in-Chief of the respected Web 2.0 Journal and AjaxWorld Magazine) in the Resources section.) Using Figure 1, Mr. Hinchcliffe asserts how RSS enables the Web 2.0 information ecosystem. All the major Web 2.0 building blocks such as wiki, blog, news service, aggregation service, mashup service, and search engines use RSS as the essential glue to connect them together to make the social computing vision of Web 2.0 a reality:
Figure 1. Hinchcliffe's view: Role of RSS in Web 2.0
There are several versions of the RSS format in use. RSS versions 0.91, 0.92, and 2.0 follow a particular XML format in which RSS feeds are described. RSS version 1.0 uses a subtly different XML format from the rest. This minor difference should be accounted for when developing RSS feed aggregators. You can read about the interesting history of RSS in one of the referenced articles in the Resources section.
Having gained an understanding of the importance and usefulness of the RSS format, let's delve into developing a customizable RSS feed aggregator. We will be using the PHP language for developing our RSS feed aggregator. PHP offers several built-in Web and XML functions, which will speed up development. In addition, the code written in PHP can be easily run in a PHP server such as Zend Core, which includes the Zend Framework. To develop the feed aggregator, you need to have PHP installed on your machine and configured with cURL (Client URL) and XML packages. I recommend that you install the Zend Core PHP server, which is available for free and is easy to install and configure.
Let's take a brief look at the basics of the RSS format.
RSS basics
Several companies and groups, including Apple, Microsoft®, Netscape, Userland, and the RSS-DEV working group, have contributed to where RSS is today. Because many different companies were involved, we ended up with different versions of the RSS format. Most notably, RSS Version 1.0 (developed by the RSS-DEV) uses a slightly different format from all the other versions (0.91, 0.92, and 2.0, which all share a backward-compatible format).
Listing 1 shows a high-level format of RSS Version 1.0. An important thing to note in this format is that the root element is named
<rdf> and has one <channel> element and one or more <item> elements as its children. Remember that in this format, the <channel> and <item> elements are siblings.Listing 1. A skeleton format of RSS version 1.0
<rdf>
<channel>
.....
.....
</channel>
<item>
<title>My Title</title>
<link>My URL</link>
<description>My Description</description>
.....
.....
</item>
</rdf>
|
Listing 2 shows a high-level format of RSS versions 0.91, 0.92, and 2.0. You can see that in this format, the root element is named
<rss> and not <rdf>. The root element <rss> has only one child element named <channel> and all the <item> elements are made children of the <channel> element. This is notably different from RSS 1.0. This subtle difference should be accounted for in the code that parses these different RSS formats. For the purposes of this article, we are interested in the three children under the <item> element. It is good that these three child elements are kept intact in all the RSS formats.Listing 2. A skeleton format of RSS versions 0.91 and 2.0
<rss>
<channel>
.....
.....
<item>
<title>My Title</title>
<link>My URL</link>
<description>My Description</description>
.....
.....
</item>
</channel>
</rss>
|
For detailed RSS specifications, refer to the relevant articles referenced in the Resources section. The following sections will explain the components in our RSS feed aggregator.
Functional components
The RSS function described in this article is made up of the following components, explained in the following order:
- Feed reader
- Feed sources input
- Feed aggregator
- Feed results output
Figure 2. Overview of RSS functional components
Observe the following in Figure 2:
- The feed reader component does the bulk of the job and focuses on obtaining feeds provided by a given feed source. A feed source is nothing but a URL at which a particular content provider periodically syndicates the content for a given information category. For instance, a feed source might point to a URL at which the New York Times publishes all its latest news blurbs about the business category/channel using the XML-based RSS format.
- The feed aggregator component takes several user-specified feed sources as input and then it invokes the feed reader component to get all feed items from each customized feed source.
- The feed sources input component defines and reads the details about the user-specified feed sources. The feed source details can be provided in the form of a string stored in system memory, via an input file, or as records in a database.
- The feed results output component stores the aggregated RSS feed item results received from a particular feed source. It can store the results as a string in system memory, into a file, or into database tables.
We are now ready to focus on the inner workings of these individual components.
Feed reader
The feed reader component is the core engine that does the necessary network communications with the feed sources and parses the XML-based RSS feeds returned by the feed sources. It primarily uses the following three simple, yet powerful, features of a PHP environment:
-
cURL -
SimpleXML -
PHP arrays
SimpleXML is another extension of PHP. As its name implies, this extension makes it simple to work with XML, whether it be reading from or writing to an XML document. In PHP Versions 5 and above, this extension is enabled by default.
Arrays are part of the base PHP language and they make the collection data structures a breeze to work with. We will be using PHP arrays to collect RSS feed item results.
Source code for the feed reader component is in a PHP file named
rss_feed_reader.php. It is composed of the following three custom functions:-
get_rss_feeds -
perform_curl_operation -
parse_rss_feed_xml
As shown in Listing 3,
get_rss_feeds is the main business logic function that gets called from the feed aggregator component. This function accepts the following three referenced values as input from the caller:- RSS provider name
- RSS provider URL
- Maximum number of RSS feed items that the caller wants to obtain from the given feed provider
title, link, description), which we will be collecting from every feed item in the received RSS feed content. You might recall from Listing 1 that they are the child elements inside the <item> elements in the received RSS feed content. Then, the function's logic passes the received RSS content string and the three arrays to another function, which parses the XML content and collects the RSS feed items. When the result of parsing the RSS feed content is false, that means there was an error in the received RSS feed content. In case of an error, this function returns an empty array to the caller. When the result of parsing is true, it means that the feed items were successfully parsed and the three arrays now contain title, URL, and description of every received feed item. Then this function creates a new result array to be returned to the caller. In each of the top five indices of the result array, this function stores the provider name, total number of feed items received, title array, URL array, and the description array, respectively. Finally, it returns this result array to the caller.Listing 3. get_rss_feeds function
function get_rss_feeds(& $rss_provider_name, & $rss_provider_url,
& $max_rss_items_required) {
// Check if the max_rss_items_required is 0
if ($max_rss_items_required <= 0) {
// Return an empty array.
$empty_array = array();
return($empty_array);
} // End of if ($max_rss_items_required <= 0)
// Let us go ahead and fetch the RSS contents from the given RSS provider.
$received_rss_feeds = perform_curl_operation($rss_provider_url);
// At times, if the XML data is not properly utf8 encoded,
// it possibly could fail in parsing. Let us encode it properly.
$received_rss_feeds = utf8_encode($received_rss_feeds);
// Is it empty?
if (empty($received_rss_feeds)) {
// Return an empty array.
$empty_array = array();
return($empty_array);
} // End of if (empty($received_rss_feeds))
// We have a non-empty result from the RSS feed provider.
// Create three empty arrays to hold the values from the received rss feed items.
$rss_feed_title_array = array();
$rss_feed_url_array = array();
$rss_feed_description_array = array();
// We can now parse the individual RSS feed items.
$parser_result = parse_rss_feed_xml($received_rss_feeds, $max_rss_items_required,
$rss_feed_title_array, $rss_feed_url_array, $rss_feed_description_array);
// Check if we were able to parse the RSS feed XML content.
if ($parser_result == true) {
// We have successfully parsed the RSS feed results.
// Create an array and fill it with the results as
// described in the function description comments above.
$result_array = array();
// Send the rss provider name.
$result_array[0] = $rss_provider_name;
// Tell how many rss feed items are being returned.
$result_array[1] = sizeof($rss_feed_title_array);
// Send the array containing different RSS feed titles.
$result_array[2] = $rss_feed_title_array;
// Send the array containing different RSS feed URLs.
$result_array[3] = $rss_feed_url_array;
// Send the array containing different RSS feed descriptions.
$result_array[4] = $rss_feed_description_array;
// Return the result array now.
return($result_array);
} else {
// We were not successful in parsing the RSS feed items.
// Return an empty array as the result.
$empty_array = array();
return($empty_array);
} // End of if ($parser_result == true)
} // End of function get_rss_feeds
|
As shown in Listing 4, the
perform_curl_operation function does an HTTP GET operation to fetch the contents of a given remote URL; thanks to the PHP cURL library for doing it so elegantly. This function accepts a referenced remote URL value as an input argument from the caller. In our case, the caller is get_rss_feeds function, which we discussed previously. The logic in the perform_curl_operation function initializes a new cURL session. Then it sets various cURL options such as the remote URL, an option not to include the HTTP headers in the response, an option to follow the location if a location HTTP header is present, and an option to instruct cURL to return the HTTP response as a string from the curl_exec function. Then, it calls the curl_exec function, which will connect to the remote URL and fetch the RSS feed contents available at that time. During this network activity, the curl_exec function will block until the HTTP operation is completed. Then, it closes the cURL session and returns the received RSS feed content to the caller.Listing 4. perform_curl_operation function
function perform_curl_operation(& $remote_url) {
$remote_contents = "";
$empty_contents = "";
// Initialize a cURL session and get a handle.
$curl_handle = curl_init();
// Do we have a cURL session?
if ($curl_handle) {
// Set the required CURL options that we need.
// Set the URL option.
curl_setopt($curl_handle, CURLOPT_URL, $remote_url);
// Set the HEADER option. We don't want the HTTP headers in the output.
curl_setopt($curl_handle, CURLOPT_HEADER, false);
// Set the FOLLOWLOCATION option. We will follow if location header is present.
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true);
// Instead of using WRITEFUNCTION callbacks, we are going to receive the
// remote contents as a return value for the curl_exec function.
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
// Try to fetch the remote URL contents.
// This function will block until the contents are received.
$remote_contents = curl_exec($curl_handle);
// Do the cleanup of CURL.
curl_close($curl_handle);
// Check the CURL result now.
if ($remote_contents != false) {
return($remote_contents);
} else {
return($empty_contents);
} // End of if ($remote_contents != false)
} else {
// Unable to initialize cURL.
// Without it, we can't do much here.
return($empty_contents);
} // End of if ($curl_handle)
} // End of function perform_curl_operation
|
As shown in Listing 5, the
parse_rss_feed_xml function is responsible for getting the individual feed items from the received RSS feed content. This function accepts all its input arguments as referenced values. Its input arguments include the received RSS feed content string, maximum number of feed items that the user wishes to receive, and the three arrays where the title, URL, and description of all the feed items will be returned to the caller. If you have not experienced the simplicity of the PHP SimpleXML extension, you will get to see it in this function. Unlike the complicated XML parsing techniques employed in other paradigms, SimpleXML in PHP lets you manipulate the XML structure as a PHP object structure.The very first step is simply to load the received RSS feed content string and get an equivalent object structure. Before we can start parsing, we have to determine if the received RSS feed content is encoded using RSS Version 1.0 or any of the other RSS versions. One way to do this is by checking if the
<item> element is a child of the root element or a child of the <channel> element. You may want to refer to the RSS basics section to recollect the subtle format differences among the different versions of RSS. After the RSS format version is determined and the received XML content validity is confirmed, it will iterate over all the <item> elements (actually PHP objects in this case) and parse the title, link, and description fields. Each of these three values will be added to its respective arrays that were passed as referenced input arguments to this function. When the iteration is completed for the maximum RSS feed items required, this function returns true. If it couldn't parse even a single RSS feed item, then this function returns false.Listing 5. parse_rss_feed_xml function
function parse_rss_feed_xml(& $received_rss_feeds,
& $max_rss_items_required, & $rss_feed_title_array,
& $rss_feed_url_array, & $rss_feed_description_array) {
/*
We will tap into the elegance of the PHP SimpleXML API to
parse these RSS feeds encoded in XML format.
There are multiple versions of RSS out there namely 0.91, 0.92, 1.0 and 2.0
The basic difference between these versions comes down to one of the
following two formats.
1) <rss><channel><item>...</item><item>...</item><item>...</item></channel></rss>
2) <rdf><channel>...</channel><item>...</item><item>...</item><item>...</item></rdf>
In format 1, <item> elements are the children of the <channel> element.
In format 2, <item> elements are direct children of the root element <rss> or <rdf>.
In other words, in format 2, <item> elements are siblings of the <channel> element.
RSS version 1.0 uses format 2, whereas all the other versions follow format 1.
In both these formats, we are interested only in the children between
<item>...</item>.
Our parsing logic here should handle both of these formats.
*/
// To begin with load the XML string to get a SimpleXML object representation.
$xml = simplexml_load_string($received_rss_feeds);
// Is it a valid XML document.
if ((is_object($xml) == false) || (sizeof($xml) <= 0)) {
// XML parsing error. Return now.
return(false);
} // End of if ((is_object($xml) == false) ...
// Now we have to determine, if we have the <item> elements as the
// children of the <channel> element i.e. Format 1 above or
// if we have the <item> elements as the direct children of the
// <rss> or <rdf> root element i.e. Format 2 above.
$obj1 = $xml->item;
if ((is_object($obj1) == false) || (sizeof($obj1) <= 0)) {
// <item> elements are not direct children of the document root element.
// In that case, it is not format 2. It should be as in format 1.
// Move to the <channel> element so that will be our new root.
$xml = $xml->channel;
} // End of if ((is_object($obj1) == false) ...
// Check for XML validity one more time from we can parse this.
if ((is_object($xml) == false) || (sizeof($xml) <= 0)) {
// XML parsing error. Return now.
return(false);
} // End of if ((is_object($xml) == false) ...
// Initialize a variable to count the <item> elements retrieved.
$count_of_rss_items_retrieved = 0;
// Stay in a loop and collect the details from the <item> elements.
foreach ($xml->item as $item) {
// At this stage, we have access to the <item> elements one at a time.
// We don't know how many <item> elements are there.
// Let us read the title, link and description elements.
$rss_feed_title = trim(strval($item->title));
$rss_feed_url = trim(strval($item->link));
$rss_feed_description = trim(strval($item->description));
// Let us now add these values to the array references we have.
array_push($rss_feed_title_array, $rss_feed_title);
array_push($rss_feed_url_array, $rss_feed_url);
array_push($rss_feed_description_array, $rss_feed_description);
// We have to filter out specific number of <item> elements
// as required by the user. Let us try to do that now.
$count_of_rss_items_retrieved++;
if ($count_of_rss_items_retrieved >= $max_rss_items_required) {
// Exit from this loop now.
break;
} // End of if ($count_of_rss_items_retrieved >= $max_rss_items_required)
} // End of foreach ($xml->item as $item)
if ($count_of_rss_items_retrieved > 0) {
// At last, it turned out to be fruitful.
return(true);
} else {
// All the hard work didn't yield anything.
// Better luck next time.
return(false);
} // End of if ($count_of_rss_items_retrieved > 0)
} // End of function parse_rss_feed_xml
|
We've finished looking at all the major tasks performed by the feed reader component; let's move on to the feed sources input component.
Feed sources input
Feed sources input is the simplest component in this system. Its job is to get the list of feed sources that the user has customized. This component has one function, and it is called by the feed aggregator at the very beginning of the program invocation. As discussed at the beginning of this article, feed sources information can be specified in the program data structures, in a database table, or in a file. In our case, we expect the feed sources to be supplied through an XML file. The logic in the feed sources input component simply reads from a file, whose name is passed as a function input argument. It reads the XML file contents and returns a string to the caller. As mentioned earlier, this component is combined in the same source file (rss_feed_aggregator.php) along with the feed aggregator component. Listing 6 shows the trivial logic involved in reading the contents of the feed sources input file:
Listing 6. get_list_of_rss_feed_sources function
function get_list_of_rss_feed_sources($input_xml_file) {
//Read the XML contents from the input file.
file_exists($input_xml_file) or die("Could not find file " . $input_xml_file);
$xml_string_contents = file_get_contents($input_xml_file);
// Return the XML contents now to the caller.
return($xml_string_contents);
} // End of function get_list_of_rss_feed_sources
|
The contents of the feed sources input file should contain one or more XML elements that provide information about the feed provider name, feed provider URL, and the maximum number of RSS feed items that the user wants to receive from the feed provider. Listing 7 shows the format of the feed sources input XML file:
Listing 7. Feed sources input XML file format
<?xml version="1.0" encoding="UTF-8"?>
<ListOfRssFeedSources>
<!-- This is the data set for RSS Feed Provider 1 -->
<RssFeedSourceInfo>
<rssFeedProviderName>Barron's: Markets</rssFeedProviderName>
<rssFeedProviderUrl>
http://online.barrons.com/xml/rss/3_7517.xml
</rssFeedProviderUrl>
<maximumRssItemsToBeReturned>5</maximumRssItemsToBeReturned>
</RssFeedSourceInfo>
<!-- There can be more RSS Feed Provider elements defined here. -->
</ListOfRssFeedSources>
|
Feed aggregator
Feed aggregator is a wrapper component that wraps the feed reader component to meet the main objective of this article: to create a customizable RSS feed aggregation function. It uses the SimpleXML PHP extension, which was discussed in the previous section. In addition, it uses a set of custom logic to do the aggregation of RSS feeds. The source code for this component is in the PHP file rss_feed_aggregator.php.
As shown in Listing 8, the feed aggregator component has one function named
aggregate_rss_feeds. It takes a function argument of an input filename, in which details about the RSS feed sources are specified. If it is called with no input argument, then it will use a default filename called rss_feed_sources.xml. At first, it calls the feed sources input component to get the string-formatted XML structure, in which details about the RSS feed sources are specified. Then it uses the SimpleXML extension to convert the string-formatted XML contents into a PHP object. Next, it iterates over each of the feed sources that we have and retrieves the feed provider name, feed provider URL, and the maximum number of RSS feed items that user wants to receive from that provider. It then calls one of the feed reader component functions named get_rss_feeds, which is explained in an earlier section. If it gets a successful result of RSS feed items from a provider, then it calls the feed results output component. Once all the feed sources are iterated, this function ends by printing a summary of the feed aggregation activity.Listing 8. aggregate_rss_feeds function
function aggregate_rss_feeds($input_xml_file = RSS_FEED_SOURCES_FILE_NAME) {
// Declare a variable to track the current
// RSS feed source being processed.
$feed_source_sequence_number = 0;
// Let us get the list of RSS feed sources.
// In our case, we will read them from an input file.
$xml_string_contents = get_list_of_rss_feed_sources($input_xml_file);
/*
We will tap into the elegance of the PHP SimpleXML API to
parse these RSS feeds encoded in XML format.
*/
// To begin with, load the XML string to get a SimpleXML object representation.
$xml = simplexml_load_string($xml_string_contents);
// Is it a valid XML document.
if ($xml == false) {
print ("Sorry. Your RSS feed sources input file contains invalid data.\n");
// XML parsing error. Return now.
return;
} // End of if ($xml == false)
print ("\n");
/*
Stay in a loop and get the RSS feeds from each source.
The document root element of the input xml file is <ListOfRssFeedSources>
Under the root element, we will have one or more blocks of data with the
following format.
<RssFeedSourceInfo>
<rssFeedProviderName>....</rssFeedProviderName>
<rssFeedProviderUrl>....</rssFeedProviderUrl>
<maximumRssItemsToBeReturned>....</maximumRssItemsToBeReturned>
</RssFeedSourceInfo>
We are going to iterate over all the <RssFeedSourceInfo> elements.
*/
foreach ($xml->RssFeedSourceInfo as $feed_source) {
// Read the details about the next feed source from the input file.
$feed_source_sequence_number++;
$rss_provider_name = trim(strval($feed_source->rssFeedProviderName));
$rss_provider_url = trim(strval($feed_source->rssFeedProviderUrl));
$max_rss_items_required =
trim(strval($feed_source->maximumRssItemsToBeReturned));
print ("Getting RSS feeds from $rss_provider_name ...\n");
// Go and get the RSS feeds now from this feed source.
$rss_feeds_result_array =
get_rss_feeds($rss_provider_name, $rss_provider_url, $max_rss_items_required);
if (empty($rss_feeds_result_array) == false) {
// We will store only if we receive one or more RSS feed results.
// The result array format is explained in the store function called below.
store_rss_feed_results($feed_source_sequence_number,
$rss_feeds_result_array);
} // End of if (empty($rss_feeds_result_array) == false)
} // End of foreach ($xml->RssFeedSourceInfo as $feed_source)
print ("\nFinished getting RSS feeds from $feed_source_sequence_number " .
"feed sources.\n\n");
print ("You can view the received feed items in the .\feed_results directory.\n\n");
print ("Feeds from each active feed source are stored in separate files.\n\n");
print ("These files are named NNN_rss_feed_items.txt, where NNN corresponds to\n" .
"the sequence number of the order in which the feed source is\n" .
"listed in your $input_xml_file file.\n");
} // End of function aggregate_rss_feeds
|
Feed results output
The feed results output component is used to store RSS feed results. This component has one function, which is called by the feed aggregator, when all the RSS feed items are parsed from the received RSS feed content. As discussed at the beginning of this article, RSS feed results can be sent to a browser or another stand-alone program, or they may be stored in the program data structures, in a database table, or in a file. In our case, we are going to store the feed results received from each RSS feed provider in their own file. All the result files are stored in a subdirectory named feed_results. This subdirectory will be automatically created under the directory from where the feed aggregator program is run.
As shown in Listing 9, the feed results output component's single function, named
store_rss_feed_results, takes two function input arguments. The first argument is a file sequence number, which will be used to form the result filename. The format of the result file is NNN_rss_feed_items.txt, where NNN will be substituted with the value of the first function argument. This function takes a second argument, which is a PHP nested array containing all the RSS feed items received from a particular RSS feed provider. This result array will have five elements and each of those array elements will contain one of these values:.a[0] = RSS feed provider name
a[1] = Number of feed items received from the feed provider
a[2] = (rss_feed_title_array) An array of RSS feed item titles
a[3] = (rss_feed_url_array) An array of RSS feed item URLs
a[4] = (rss_feed_description_array) An array of RSS feed item descriptions
Contents in each index of the three arrays a[2], a[3], and a[4] put together will provide all the information related to one particular RSS feed item present in the received RSS XML. For example, information in rss_feed_title_array[0], rss_feed_url_array[0], and rss_feed_description_array[0] combined together corresponds to the first RSS feed item in the received RSS XML content.
At first, the logic in this function creates the feed_results subdirectory, if it doesn't already exist. Next, it creates a file with a unique name for a given RSS feed provider. Then it iterates over the result array and writes to the file with the RSS feed item details for all the feed items in the result array. As you should be familiar with by now, an RSS feed item includes the feed title, URL for the full story about the feed, and a brief description about the feed. With these received feed items, content consumers can quickly go through the titles and descriptions of a selected feed channel. For a few selected feed items of interest, they can also follow to the URL from where the full story about that feed can be fetched. Such is the benefit derived from the use of the RSS feed aggregation.
Listing 9. store_rss_feed_results function
function store_rss_feed_results($file_sequence_number, $result_array) {
// Let us first check if a subdirectory named "feed_results" exists.
if (file_exists(RSS_FEED_RESULTS_DIRECTORY) == false) {
// Directory doesn't exist. Create it now.
mkdir(RSS_FEED_RESULTS_DIRECTORY);
} // End of if (file_exists(RSS_FEED_RESULTS_DIRECTORY) == false)
// Form the file name.
$result_file_name = sprintf("%s/%03d%s", RSS_FEED_RESULTS_DIRECTORY,
$file_sequence_number, RSS_FEED_RESULTS_FILE_NAME_SUFFIX);
// If this file already exists from previous runs, simply delete it.
// We will overwrite it with the latest feed data.
if (file_exists($result_file_name) == true) {
unlink($result_file_name);
}
// Create and open the file.
$handle = fopen($result_file_name, FILE_CREATE_WRITE_FLAG);
if ($handle == false) {
// File creation failed. Return now.
return;
}
// We can start writing the received RSS feeds into this file.
// Write the Feed provider sequence number.
$feed_provider_number = FEED_PROVIDER_SEQUENCE_NUMBER .
$file_sequence_number . NEW_LINE;
fwrite($handle, $feed_provider_number);
// Write the Feed provider name.
$feed_provider_name = RSS_FEED_PROVIDER_NAME . $result_array[0] . NEW_LINE;
fwrite($handle, $feed_provider_name);
// Write the number of feed items received.
$number_of_received_rss_feeds = RECEIVED_RSS_FEEDS_CNT .
$result_array[1] . NEW_LINE;
fwrite($handle, $number_of_received_rss_feeds);
$rss_feed_title_array = $result_array[2];
$rss_feed_url_array = $result_array[3];
$rss_feed_description_array = $result_array[4];
// Stay in a loop and write the title, URL and Description.
for($cnt=0; $cnt < sizeof($rss_feed_title_array); $cnt++) {
$feed_item_separator = FEED_ITEM_SEPARATOR_LINE . NEW_LINE;
fwrite($handle, $feed_item_separator);
$feed_item_sequence_number = FEED_ITEM_SEQUENCE_NUMBER .
($cnt+1) . NEW_LINE;
fwrite($handle, $feed_item_sequence_number);
$feed_item_title = FEED_ITEM_TITLE .
$rss_feed_title_array[$cnt] . NEW_LINE;
fwrite($handle, $feed_item_title);
$feed_item_url = FEED_ITEM_URL .
$rss_feed_url_array[$cnt] . NEW_LINE;
fwrite($handle, $feed_item_url);
$feed_item_description = FEED_ITEM_DESCRIPTION . NEW_LINE .
$rss_feed_description_array[$cnt] . NEW_LINE;
fwrite($handle, $feed_item_description);
} // End of for($cnt=0; $cnt < sizeof($rss_feed_title_array), $cnt++)
$feed_item_separator = FEED_ITEM_SEPARATOR_LINE . NEW_LINE;
fwrite($handle, $feed_item_separator);
fclose($handle);
} // End of function store_rss_feed_results
|
This completes the walkthrough of all of the functional components we have in this program. Hopefully, you have gained a better understanding of how these components work together. Now let's see the RSS feed aggregator at work.
Putting the RSS feed aggregator to work
From the Download section, you can download a compressed file consisting of the following files.
- rss_feed_aggregator\rss_feed_reader.php
- rss_feed_aggregator\rss_feed_aggregator.php
- rss_feed_aggregator\rss_feed_sources.xml
To put the RSS feed aggregator to work, you have to use the following command syntax (note that the last token at the end of the command syntax below is an optional command-line argument):
php -f rss_feed_aggregator.php <Feed sources input XML filename> |
As discussed previously, if you don't give a command-line argument, the application will use the default feed sources input XML file (rss_feed_sources.xml). Assuming that you used the default feed sources input XML file and everything worked correctly, then you should see the results shown in Figure 3:
Figure 3. Results from running the RSS feed aggregator
Inside the directory from where you ran the RSS feed aggregator, you should see a new subdirectory named feed_results. There should be several files in that subdirectory containing the RSS feed items received from each of the RSS feed providers, which the user has specified in the feed sources input XML file.
Conclusion
RSS has been around for several years. Its popularity is now surging due to the advent of Web 2.0 technologies such as wiki, blog, mashup, social networking portals, and other information aggregation services. This article sums up the simplicity of the RSS format, which makes it a great choice for information integration with other emerging Web technologies. Because the Web is all about information, RSS will continue to play a central role in determining how that information is syndicated and disseminated in powerful and useful ways.
This article has given you details on developing a feed aggregator with the help of fully functional PHP scripts. You can use the PHP source code provided in this article in multiple ways: as a stand-alone tool, as a shared library to be used in an existing PHP server-side program, or as a SOAP/REST Web service function to participate in an enterprise Service-Oriented Architecture (SOA).
Many great things in (Web) life come in simple forms. RSS is simply one of them. No pun intended!
Nessun commento:
Posta un commento