The ProgrammersTalk Community
Forum Register Search Today's Posts Mark Forums Read
Register

Go Back   The ProgrammersTalk Community > Web Programming > PHP


Welcome to the The ProgrammersTalk Community forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.
Reply
 
LinkBack Thread Tools    Display Modes   
  #1 (permalink)  
Old 05-15-2008, 12:47 PM
MrPickle's Avatar
MrPickle MrPickle is offline
Full Programmer
Join Date: Nov 2007
Location: England, Lincolnshire
Posts: 247
iTrader: (0)
MrPickle is on a distinguished roadMrPickle is on a distinguished roadMrPickle is on a distinguished road
Pulling Data from a web page

How would you pull data from a web page eg, there's something inbetween some tags with id foo, how would I pull the data from inbetween those tags?

__________________

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #2 (permalink)  
Old 05-15-2008, 08:29 PM
ccoonen ccoonen is offline
PT Staff
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 4
Join Date: Jun 2007
Location: Wisconsin
Posts: 306
iTrader: (0)
ccoonen is on a distinguished roadccoonen is on a distinguished roadccoonen is on a distinguished roadccoonen is on a distinguished road
Run Regular Expressions to parse the data you want. This might help you out

Basic PHP Web Scraping Script Tutorial - Oooff.com
Reply With Quote
  #3 (permalink)  
Old 05-15-2008, 08:50 PM
sssddd sssddd is offline
Novice
Join Date: Apr 2008
Posts: 9
iTrader: (0)
sssddd is on a distinguished road
is this similar to how search engines work?

__________________

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #4 (permalink)  
Old 05-15-2008, 09:29 PM
HelloWorld's Avatar
HelloWorld HelloWorld is offline
Programming Expert
Awards Showcase
Quality Tutorial 
Total Awards: 1
Join Date: Jun 2007
Location: In front of computer...
Posts: 1,107
iTrader: (0)
HelloWorld will become famous soon enoughHelloWorld will become famous soon enoughHelloWorld will become famous soon enough
Quote:
Originally Posted by MrPickle View Post
How would you pull data from a web page eg, there's something inbetween some tags with id foo, how would I pull the data from inbetween those tags?
really, you can do this in many different ways.. in PHP though, I believe that there's a function where you extract data from XML and put them into array, you could also do this in JavaScript to integrate AJAX getting data from nodes, in which also well known as DOM (Document Object Model).. but your question is kind of unclear..

__________________
PHP Code:
System.out.println("Hello World!"); 

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #5 (permalink)  
Old 05-16-2008, 09:08 AM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Staff*
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 395
iTrader: (0)
TeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enough
Well, I use strpos and strrpos to find the start and end tag. Then, I use substr to grab the correct part of the string.

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #6 (permalink)  
Old 05-16-2008, 01:56 PM
Lee's Avatar
Lee Lee is offline
PT Staff*
Awards Showcase
Quality Tutorial 
Total Awards: 1
Join Date: Jun 2007
Location: Blackpool, UK
Posts: 611
iTrader: (0)
Lee will become famous soon enoughLee will become famous soon enoughLee will become famous soon enoughLee will become famous soon enough
Quote:
Originally Posted by TeraTask View Post
Well, I use strpos and strrpos to find the start and end tag. Then, I use substr to grab the correct part of the string.
Could you maybe give an example? (This is something i have been looking at recently) e.g. If you have a string $str which contained many of what you wanted how would you loop through that?
Reply With Quote
  #7 (permalink)  
Old 05-16-2008, 02:23 PM
MrPickle's Avatar
MrPickle MrPickle is offline
Full Programmer
Join Date: Nov 2007
Location: England, Lincolnshire
Posts: 247
iTrader: (0)
MrPickle is on a distinguished roadMrPickle is on a distinguished roadMrPickle is on a distinguished road
Lee took the words out of my mouth.

__________________

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
  #8 (permalink)  
Old 05-17-2008, 09:38 AM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Staff*
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 395
iTrader: (0)
TeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enough
For the scenario where there are many occurrences, I usually use preg_split. When there is only one, then I use strpos. Something like this:

PHP Code:
<?php
$string 
'<html>
  <head>
    <title>My page</title>
  </head>
  <body>
    <h1>A page of content</h1>
    <p>The paragraph of text on the page of content.</p>
  </body>
</html>'
;

$start_tag_start strpos($string,'<body'); //NOTE: The closing '>' has been left off in case of other tags being around

if ($start_tag_start !== false) {
  
//Calculate end of start tag
  
$start_tag_end strpos($string,'>',$start_tag_start) + 1;

  
//Since we're searching for the end tag, start from the end and search backwards
  
$end_tag_start strrpos($string,'</body>');

  
/*
   * Alternatively, we could find the ending tag following the start tag with this
   * Use the method which is likely to have to search through fewer characters
   * to return the result.
   *
   * $end_tag_start = strpos($string,'</body>',$start_tag_end);
   */

  
if ($end_tag_start !== false && $end_tag_start $start_tag_end) {
    
//trim() usage is optional
    
$desired_contents trim(substr($string,$start_tag_end,$end_tag_start $start_tag_end));
  }
}
echo 
'<textarea rows="30" cols="80">'.$desired_contents.'</textarea><br /><br />';

//Now, to search using preg_split

$string '<html>
  <head>
    <title>My page</title>
  </head>
  <body>
    <h1>A page of content</h1>
    <p>The paragraph of text on the page of content.</p>
    <p>The second paragraph of text on the page of content.</p>
  </body>
</html>'
;

$tag_name 'p'//ONLY LETTERS.  Try $tag_name = 'body';
$tag_matches =  preg_split('/<[\/]?'.$tag_name.'[^>]*?>/i',$string);

$desired_contents '';

$match_count count($tag_matches);
if (
$match_count 1) {
  
$shown_count 0;
  for (
$index=1;$index $match_count;$index += 2) {
    
$desired_contents .= ++$shown_count.") ".trim($tag_matches[$index])."\n\n";
  }
}
echo 
'<textarea rows="30" cols="80">'.$desired_contents.'</textarea>';
?>

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
The Following 2 Users Say Thank You to TeraTask For This Useful Post:
Lee (05-20-2008), MrPickle (05-20-2008)
  #9 (permalink)  
Old 05-20-2008, 03:25 PM
Lee's Avatar
Lee Lee is offline
PT Staff*
Awards Showcase
Quality Tutorial 
Total Awards: 1
Join Date: Jun 2007
Location: Blackpool, UK
Posts: 611
iTrader: (0)
Lee will become famous soon enoughLee will become famous soon enoughLee will become famous soon enoughLee will become famous soon enough
Thanks Tera that has been of great use, i understand how to get things from between tags but what about those within a tag?

e.g. Theres two links in the html:
Code:
<html>
  <head>
    <title>My page</title>
  </head>
  <body>
    <h1>A page of content</h1>
    <a herf="www.google.com">Google</a>
    <a herf="www.yahoo.com">Yahoo</a>
  </body>
</html>
How would i actually get the contents of the link e.g.
Quote:
www.google.com
Reply With Quote
  #10 (permalink)  
Old 05-20-2008, 07:38 PM
TeraTask's Avatar
TeraTask TeraTask is offline
PT Staff*
Awards Showcase
Quality Tutorial Quality Tutorial Quality Tutorial 
Total Awards: 3
Join Date: Jun 2007
Location: Reno, NV
Posts: 395
iTrader: (0)
TeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enoughTeraTask will become famous soon enough
Lee: Check my code more carefully. Replace your question with "How do I find all the text in <p> tags?" and you'll see that the answer is before you.

Glad the code helped.

__________________
Jeremy Miller
Content Farmer - Optimized Automated Blog Posting

Digg this Post! Del.Icio.Us this Post! Technorati this Post! Furl this Post! Mister Wong this Post! Newsvine this Post! Spurl this Post! Reddit this Post! Netscape this Post!
Reply With Quote
The Following User Says Thank You to TeraTask For This Useful Post:
Lee (05-21-2008)
Reply


Thread Tools
Display Modes

   Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -7. The time now is 12:51 PM. Powered by vBulletin
Copyright © 2000 - 2007, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO © 2007 ProgrammersTalk Sedo - Buy and Sell Domain Names and Websites project info: programmerstalk.net Statistics for project programmerstalk.net etracker® web controlling instead of log file analysis


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50