The W
Views: 108573692
Main | FAQ | Search: Y! / G | Color chart | Log in for more!
27.11.08 2038
The 7 - Site Bashing - A1 and!
This thread has 2 referrals leading to it
Register and log in to post!
(247 newer) Next thread | Previous thread
Post (4 total)
Level: 28

Posts: 31/142
EXP: 120568
For next: 10772

Since: 2.1.02
From: MI

Since last post: 1008 days
Last activity: 204 days
#1 Posted on 11.6.03 2211.59
Reposted on: 11.6.10 2212.09
Can ANYONE tell me how they do that with the headlines? I know all about RSS feeds and stuff. But we all know that and 1wrestling don't offer feeds.

What is the secret? Anyone?
Promote this thread!
Level: 29

Posts: 136/152
EXP: 131589
For next: 16297

Since: 29.5.02

Since last post: 4273 days
Last activity: 4269 days
#2 Posted on 11.6.03 2338.24
Reposted on: 11.6.10 2340.16
They either "screen scrape" (parse the relevant pages in Perl or PHP or whatever) or do it manually.
Level: 30

Posts: 89/172
EXP: 153895
For next: 11976

Since: 1.3.03

Since last post: 4295 days
Last activity: 3069 days
#3 Posted on 12.6.03 0526.39
Reposted on: 12.6.10 0529.01
Just something to note, a1wrestling always only links to the headlines page whereas wrestlingdb links to the article.
Polska kielbasa
Level: 26

Posts: 1/125
EXP: 94272
For next: 8005

Since: 13.6.03
From: Washington, DC

Since last post: 2833 days
Last activity: 2833 days
#4 Posted on 13.6.03 1420.12
Reposted on: 13.6.10 1427.34
I can answer that... since wrestlingdb is my site.

It's really no secret, and vacheroi is correct. I use PHP with Perl regular expressions to "match" the headlines and rip them out of the page. It's not an exact science and it requires a little bit of trial and error, but it works pretty well.

For example, for 1wrestling, after I've grabbed the page into a buffer, I use this code

$buffer = preg_replace("'^.*?<td width\=\"1%\">\ \;\ \; \;\</td\>(.*?)<td width\=\"1\%\">\ \;\ \;</td>[^>]*?>.*'s",'\1',$buffer);
$buffer = str_replace(' ','',strip_tags($buffer,'<a>'));
preg_match_all("'<a href=\'/(.*?)\'>(.*?)</a>\s?by\s(.*?)\s-.*?:\s(.*?M)'s",$buffer,$items,PREG_SET_ORDER);

The first line takes the page, and strips out most of header/footer type stuff, leaving the body of headlines.
The second line strips out all of the tags except for the links.
The third line then matches certain parts of the links and puts them into an array which I can then use to insert into a database.

Each site is a bit different, and when sites redesign, I have to go through the whole process of determining what'll work again.

If you have any other questions, let me know.

(edited by FriedEgg on 13.6.03 1520)
Thread ahead: 411's News/Opinion Squad
Next thread: I wish Scotty would just stick to wrestling
Previous thread: More stupidity from "The Slammer"
(247 newer) Next thread | Previous thread
The 7 - Site Bashing - A1 and!Register and log in to post!

The W™ message board - 7 year recycle

©2001-2015 Brothers Zim
This old hunk of junk rendered your page in 0.191 seconds.