Link Harvester - Free & Deep Access to Link Information
Tool from Last Month:
None of the major text link analysis tools for sale allow you to check co-citation, or pages which link to multiple related resources.
Last month I had a friend create Hub Finder, which is a free on topic link analysis tool which looks for co-citation. I have not got much feedback on the tool yet, but a few people have said they found it to be useful.
New SEO tool for this month:
Another common problem with most link analysis tools is that they do not make it quick, easy, and convenient for you to be able to search past the 1,000 backlink barrier set by most search engines. What is the point of being slow to give you more details than you need, only to survey a small portion of the inbound links?
A friend of mine is a decent programmer, and I had him whip up a tool I call Link Harvester, which has a ton of cool features:
- uses the Yahoo! API, so it is in compliance with their TOS.
- free
- makes saving and exporting data in CSV as simple as a click of the mouse
- does not require any software downloading
- quickly grabs the number of .gov, .edu, & .ac.uk inbound links while also listing each individual link.
- quickly grabs the number of unique linking domains while listing them
- quickly grabs the number of unique linking C block IP addresses while listing the C block next to each domain
- allows you to check links pointing at a page or at a domain
- displays the total number of links showed by Yahoo!
- displays the total number of pages indexed by Yahoo!
- links next to each domain that point at its WhoIs source information and Wayback Machine information.
- if a site links at your site more than 5 times then it is bolded in the results and a checkbox is autochecked, which allows you to filter out that site and spider deeper through the link database. This harvesting action is how you can spider deeper than 1,000 backlinks and where the tool got its name from.
- Link Harvester is open source. If you like the tool & find it useful you can add it to your site. Also if you can think of ways to make it better you can modify it however you please.
Why Not Look at Anchor Text?
- I did not want this tool to spider websites.
- I wanted this tool to be faster than anything on the market.
- It is important to understand what anchor text variations people are using, but usually you can figure out how stiff the competition is just by quickly glancing through their backlink profile without necissarily looking too deeply into anchor text. The current off the shelf tools that monitor the anchor text only give you a small sample of backlink data.
- This tool was not designed to be the comprehensive show all link analysis tool, but just something that was useful and quick and easy to use.
After you see enough linkage data you become aware of how competitive a site is and how you should go about promoting it. It is kinda like the thin slicing concept Malcolm Gladwell talks about in Blink.
Feedback:
Please let me know what you think about Link Harvester in the comments below.
Want to Host Link Harvester? Want to make it better?
grab the source code here.
Comments
Well installed and re-installed it carefully ..a couple of times: http://www.afroarticles.com/SEO-Search-Engine-Optimization-Tools/Link-Ha...
------------------------------
Getting error: Fatal error: Call to undefined function: domxml_open_mem() in /home/afroarti/public_html/SEO-Search-Engine-Optimization-Tools/Link-Harvester/class_gamy.php on line 200
------------------------------
Checked all the includes...they look fine to me.
What's happening here?
Aaron, I have got to tell you that I use this tool every week. In fact, I probably use one of your reat tools every day. thanks!
Would you consider adding the "target page" of the link that is found?
I can't find a single tool out there that will find backlinks for an entire domain (not just a single URL at a time) *and* give the target page. This is critical when performing site redesigns.
This script is a great start - any chance of those features being added?
Hi Kevin
when my programer is free there is a pretty good chance of it being added I think...
you also realize that Yahoo! Site Explorer is pretty handy for doing what you want to do, right?
Although the tool seems to be working - also get displayed on repeated lines when you run a query:
Warning: parse_url(http://) [function.parse-url]: Unable to parse url in /home/latentse/public_html/backlinks.php on line 309.
See here:
http://www.latentsemanticindexing.co.uk/backlinks.php
Any ideas please?
Looks like a great tool. I am trying to install it but seems as though it was written for PHP4 and not PHP5 ie. it won't operate under PHP4.
Do you have any plans to update the code to PHP5?
I am sure your handy dandy coder could fix it in no time flat :)
Thanks so much - again, great tool.
Hi Curt
since it is currently free and functional an upgrade is low on the priority list, but it may eventually happen.
Aaron,
Thanks. I have actually tried to install on a PHP4 machine and there seems to be addtional errors. Undefine variables etc. So I am will continue to try and diagnose the issues.
I´d like to know more about this , can anyone help me. thanks
Link Harvester for MSN is showing these problems with the coding:
Warning: Invalid argument supplied for foreach() in /home/linkhou/public_html/link-harvester/class_gamy.php on line 62
Warning: Invalid argument supplied for foreach() in /home/linkhou/public_html/link-harvester/class_gamy.php on line 62
Warning: Invalid argument supplied for foreach() in /home/linkhou/public_html/link-harvester/class_gamy.php on line 62
Warning: Invalid argument supplied for foreach() in /home/linkhou/public_html/link-harvester/class_gamy.php on line 62
Warning: Invalid argument supplied for foreach() in /home/linkhou/public_html/link-harvester/class_gamy.php on line 62
I noticed it the other day. It is still showing the problem today so I thought I should let you know.
Thanks for the great tool.
Rhoda
Excellent!
But: Put up a little key/guide to what everything means
* The checkboxes - what are they for and why are these domains bolded?
* The letters - what do they mean? (I can work out most by mouseovering, but it's inconveient)
Is there any way to mark probable scraper sites linking in? For example, can it be figured out by all the Google or whatever search results clogging up their pages or by the percentage of adverts on the page? (Probably not, but scraper sites are SOO annoying)
Hi Rhonda
I just used Link Harvester and it worked for me.
On Dreamhost it doesn't work :(
Tried your link harvester trial...you have an code error on the msn query that might need some attention.....no use buying something that doesnt work :)
Good tool, however, I'm quite worried that people are able to also see private addresses published with this tool. Is this not a breach in privacy policy?
When you run the tool it lists the links--when you click on a link it goes to that page (which is fine). But, if you use the Back button you lose all the data you just got by running the tool.
Those links need to open up in a new window so you don't have to run the tool again.
Hi,
I just tried link harvester and it's a great tool. I'm confused though: when I export the .csv file instead of getting the actual URLs of pages linking to my site, I get the text "Array" in each field. I used link harvester a couple of months ago and was getting the actual URLs, so I'm not sure what has changed. Any help?
thanks for the clever tool.
Hi Aaron,
Thanks for having a great tool developed. I started using your tool when you first wrote about it and now it is part of my "SEO Tool Chest". I didn't want to use up too much of your API limit, so I mirrored the tool at http://webseodesign.com/seo-tool-chest/backlinks.php.
Thanks for making this Open Source.
martin
power comment here ;)
>What does "filtered sites" mean in the link harvester returns?
it means many links came from that site and it was filtered to view more sites linking in
>Question though. When I am searching on a particular page ie www.domain.com/pagename.htm what is the difference between "links to domain" and "links to homepage"?
Links to home page is links pointing at www.seobook.com or seobook.com
Links at site is the linkdomain function (all links pointing into the site)
>Love the SEO tools. Can you clarify something. It's a question that I've always had in regards to the API's. The limit is 1,000 to 5,000 daily requests depending on the search engine, correct? What exactly is a request?
I believe you can grab up to 50 search results per request.
>When you run the tool it lists the links--when you click on a link it goes to that page (which is fine). But, if you use the Back button you lose all the data you just got by running the tool.
>Those links need to open up in a new window so you don't have to run the tool again.
will try to get that fixed soon Bill.
>export the .csv file instead of getting the actual URLs of pages linking to my site, I get the text "Array" in each field. I used link harvester a couple of months ago and was getting the actual URLs, so I'm not sure what has changed. Any help?
Friend must have wacked the tool while adding features ;)
Will try to get that fixed soon Kate.
Link Harvester is offline for some days already. What happened? Too much traffic? Do you need mirror or something?
Hi Aaron,
Excellent tool, however i am experiencing the same problem as Howard with the parse_url function.
"Warning: parse_url(http://): Unable to parse url in /var/www/html/tools/link-harvest/backlinks.php on line 309"
Please advise.
Rgds.
http://tools.zettwalls.com/backlinks.php
this tool dosent work at all.
can you tell what happend?
Great tools Aaron - I like how link harvestor groups the links by domains.
Aaron,
Love the SEO tools. Can you clarify something. It's a question that I've always had in regards to the API's. The limit is 1,000 to 5,000 daily requests depending on the search engine, correct? What exactly is a request? When I use your tool to search for the backlinks to my site is that one request? Or is the 1,000 returned results each considered a request? This has always confused me. Thanks in advance for any info.
Hello!
Fantastic tool, thanks for making it open source. I <3 this thing. I <3 it.
Question though. When I am searching on a particular page ie www.domain.com/pagename.htm what is the difference between "links to domain" and "links to homepage"?
I'm guessing that "links to homepage" is how many total links(even if more than one comes from the same url) I have to the URL www.domain.com/pagename.htm.
I tried the Link Harveter, but regrettably discovered an error. Surprisingly it occured on domains that I know have extensive backlinks.
Warning: Division by zero in /**/**/**/**/**/backlinks.php on line 447
I hope this helps.
2 Mike
when you do links to domain it does linkdomain:
when you do links to page it does link:
2 Bill
each engine has its own api
this is the current info (which may change)
Google's API limit is 1,000 daily usages per user key
Yahoo!'s API limit is 5,000 daily uses per IP address. if the tool is web based then all queries using that tool are from the IP address of the website
MSN is like Yahoo!'s, but offers 10,000 daily uses
the limits can be modified if you get permission
displays the total number of pages indexed by Yahoo!
is this the total number of pages indexed of the domain you search backlinks for?
Getting similar problem when trying to check backlinks of a particular page. Like...
www.domain.com/pagename.shtml
Warning: Division by zero in /home/wmcommun/public_html/hounds/link-harvester/backlinks.php on line 447
>displays the total number of pages indexed by Yahoo!
>is this the total number of pages indexed of the domain you search backlinks for?
it should be. sometimes the number might be a little off since the API data and Yahoo! data may be a bit behind actual web conditions because it takes time for them to spider pages and find links and update their data.
>Getting similar problem when trying to check backlinks of a particular page. Like...
www.domain.com/pagename.shtml
the backlink checking function is different for individual pages and sites.
when you check a backlink into a specific page you need to add http:// ahead of all the other information. Not sure why Yahoo! made it that way, but they did.
Hopefully I can try to have my friend rewrite it with better error checking to auto correct that situation.
thanks for the feedback
cheers
aaron
the tool was changed to where the http:// part is no longer needed. also the results now show the request URL that was sent to Yahoo! for troubleshooting purposes.
I also added a mirror to my site:
http://www.webmasterinvestments.com/backlinks/
Thanks for the cool tool.
What does "filtered sites" mean in the link harvester returns?
R
Hi, I've added a mirror on my seo-scoop site for the tool, and I will be blogging about it tomorrow.
Yes i would like to know aswell i have the same problem, Blank Page, but also noticed that on the site there "Start over button" does not work but on the code that you download now it does but does not submit, and the csv gets java error and the other site that work dont.
http://pcaccessoriesparts.com/Tools/Link_Harvester/backlinks.php
ERROR FIXES
To some of the other posts above, the below errors
Warning: Invalid argument supplied for foreach() in /home/linkhou/public_html/link-harvester/class_gamy.php on line 62
_____________________________________
Getting error: Fatal error: Call to undefined function: domxml_open_mem() in /home/afroarti/public_html/SEO-Search-Engine-Optimization-Tools/Link-Harvester/class_gamy.php on line 200
______________________________________
I had both at one stage or anouther i dont think both are api related but the later is. You need to go get an api key for your site
With the yahoo engine the tool doesn't do anything at all on my server. The msn engine is just working fine. I tryed to change the yahoo api but it won't change anything.
Just tried to use your link harvesting program and received the following error:
Warning: file_get_contents(http://api.search.yahoo.com/WebSearchService/V1/webSearch?query=linkdoma...): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /home/wmcommun/public_html/hounds/link-harvester/backlinks.php on line 338
Fatal error: Call to a member function on a non-object in /home/wmcommun/public_html/hounds/link-harvester/backlinks.php on line 342
403 errors mean the query limit is used up for the day. try one of the mirrors.
Tried to install Link Harvester version 3.0.
The form appears but when the query is run it produces no results (i.e nothing happens) with Yahoo or MSN. The program seems to run but only for a nano second and then says done with a blank output. I tried one of the mirror sites (Donna) and it does the exact same thing.
any ideas ? thanks!!
Tried to install Link Harvester version 3.0.
The form appears but when the query is run it produces no results (i.e nothing happens) with Yahoo or MSN. The program seems to run but only for a nano second and then says done with a blank output. I tried one of the mirror sites (Donna) and it does the exact same thing.
any ideas ? thanks!!
Hi
Great software, just what I was looking for. I downloaded and installed and replaced Yahoo api key with my own, once registering application with Yahoo. Nothing seemed to work.
After some diagnosing I found that the form fields were not being passed in backlinks.php
As a work around I inserted GET statements @ line 225 as below and all works well.
$query = $_GET["query"];
$engine = $_GET["engine"];
$linktype = $_GET["linktype"];
$manual_filter = $_GET["manual_filter"];
Hope this helps those experiencing issues.
CR
Now have online version working @ http://www.sector3it.com/pages/backlinks.php
I entered the URL "rdesgr.com/WhatsAllThisThen", click [Query] and received the following error with 8 variations:
Warning: file_get_contents(http://api.search.yahoo.com/WebSearchService/V1/webSearch?query=linkdoma...): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden in /home/linkhou/public_html/link-harvester/class_gamy.php on line 199
Thank you for your time and attention. I hope that this helps.
That host may have exceeded it's limit for API daily usage. Look to some of the mirrors and use them if the main tool is down that day.
once i tried to enter a big ammount of sites to the filter, i got an error. the browser(opera 9.50) said that the url was too long, so i've got an advice to fix this trouble. please, change the method of request from "GET" to "POST" and also all the "$_GET" to "$_POST". i tried to do this on my website, but my webmaster said the server we use was not good enough. so, please try to optimise, cause i am sure, i am not alone having this trouble. also, i do have the whole script ready, so feel free to contact in order to get it from me.
thank you,
alex
How long was the URL you were using Alex?
i had to filter 382 websites
you may try it with "POST" and "$_POST" at http://www.dropshiparea.com/prov/Link_Harvester/backlinks.php
the simple analysing works well, but once i try to filter those 382 websites, it just refuses to work properly
I get an error after installing:
Fatal error: Cannot redeclare class soapclient in /home/user/mydomain/public_html/linkhound/nusoap.php on line 4104
I think it is designed for PHP 4 and would need re-coded to be PHP 5 friendly.
Dear Mr. Wall,
Still haven't got a response from you concerning the filter limit(due to URL length) enquiry. Please, let me know if there is anything possible to do about that. I can understand if you will not change anything, cause your tool is free, but I was just hoping you could make another improvement.
Thank you for your attention,
Alex Tapper
Can you try the tool now Alex?
Thank you, It is much better this way.
Hi,
maybe iam overasking but iam laos interesed in a PHP5 version.
because php4 is now officially retired, I think you should upgrade it because nobody can use it on fresh servers anymore. No servers will be serving php4 anymore...
Hi Aaron,
when the tool was on URL: www.linkhounds.com - I got pretty bigger number for linking domains. For some site I got information that they have got 670 unique domain linking to them.
Why did you set the maximum number of 250 now?
I saw this as you move the tool to seobook.com
Can you answer?
Now, this is not complete information from the tool.
Since this site is so much more popular I had to lower the limits to ensure the API key lasts longer.
Can we expect that you will bring it back, max number to 1000?
Because this is now incomplete information.
I'm disappointed. I loved this tool.
If I bring it back at that level it will become a member's only tool for paying subscribers.
Add new comment