Daniel Brandt seems to regard Google as a corporate crominal, and stealing back as the solution. Google, according to Brandt has built a commercial empire upon scraping free content from the internet and assigning relevant advertising to it. So, turning the tables, Brandt has used a loophole in the Google API to create an ad-free Google; in other words, he is scraping the scraper. He expcts to be sued, and even seems to hope for it. Brandt's point is not entirely capricious. Google does tread on shaky copyright ground. Even if its result links to not sufficiently copy content to infringe copyright, Google's cache represents a more arguable infringement. As TechDirt notes, it is probably in Google's interest to let this guy froth unchallenged. Few people are going to use his interface, so why raise potentially uncomfortable copyright issues?
Open-source, Ad-Free, and Possibly Illegal Google Clone
Reader Comments
(Page 1)3. Google indexes and archives pages from the internet. Even if only a part of that page appears in search engine results then it would be a breach of copyright.
Most pages are copyrighted directly or indirectly, and the most common infringement is unauthorised copying. Google pretty much infringes copyright on every website it archives so 8billion pages later, Google may go bust.
Will it ever happen? I think not.
4. If someone in "the land of the lawyers" (read USA) should be able to sue Google for copyright infigmenet for indexing their materials - and do so without the option of telling Googles spiders to stay away from the page. (Read about Robots.txt to find out more about search spider exclusions) then the world would be a poorer place.
Being able to index and make sense of the world wide web has become very useful for the majority of the people using the net. And it won't become any less important in the future.
6. Buncha freakin' wackos.
CONSPIRACIES DON'T HAPPEN IN REAL LIFE!
Posted at 6:06AM on Dec 19th 2005 by suntiger
8. I am not accessing Google automatically. Every click is from a live user who enters search terms because he wants to see Google's results. In fact, to make sure of this I throttle anyone who tries to do more than 20 searches per hour.
I can add CSS code to Firefox and block Google's ads. See www.scroogle.org/gscrape.html at the bottom of the page for detailed instructions. What I can do with Firefox is no different than what Google's toolbar does to pop-up ads.
My scraper is more private than Firefox can possibly be, even though Firefox has the best cookie control tools I've seen on any browser. That's because Google only sees my IP address if you use the scraper, not the searcher's IP address, and I don't keep logs.
Finally, my scraper uses less of Google's bandwidth than a browser uses, because I access a lean interface at Google that isn't used by the average searcher.
Posted at 6:06AM on Dec 19th 2005 by Daniel Brandt
10. Y would anyone wanna block google ad..........;)
Posted at 6:06AM on Dec 19th 2005 by Toronto Student
11. use Google API's to query Google index.
"Do you search with Google a hundred times a day? Do you reach for Google before the phonebook, the dictionary or the newspaper? Do you think, just maybe, you're a Google frequent searcher?
--------------------------------------------
A searchbot that would never stop (e.g. Googlebot not saying "I crawled 10,000 from this domain, that's my limit")
Automatically generated random pages (e.g. from a dictionary database backend)
Automatically generated random links to more random pages from every random page (all on the same server, of course)
Maybe some way to hide that it's always the same script, if a bot cares about such (like by the use of Apache's htaccess file)
...?
Posted at 6:06AM on Dec 19th 2005 by Jaspal Singh
12. Google treads on Shaky ground?
To reduce Google's accomplishment, the subtle and ingenious mathematical modeling that is the substance of "Google", by the way, and the even more brilliant engineering of the software and networking architecture - which allow Page and Brin to index millions of pages and deliver fast queries on basically scrap heap computers back in the beginning - to reduce that to "scraping" is so foolishly arrogant that it's simply comical.
The modeling and the engineering, combined with fierce dedication and perseverence and a willingness to take the time and the excrutiating effort to make the best technology in the world, then continue making it better, are precisely what constitutes "Google" - the search tool. Google, the business, is another matter.
And that brings me to Google, the business. Google's ad programs are viable because of the quality of their search results. Advertisers *want* to be there. The advertisers drive the marketplace, not vice versa pal.
They want to be on Google, so they'll be on Google. Surruptitious advertisers (search engine "optimizers") want to be there too, quite apparently, and so they are.
Now, if someone wants to come along and build an index off of Google's, I see no reason why that's not perfectly legal.
Except for one thing. If Google requests via the robots.txt or other robots instruction meta declarations, that it wishes bots to stay off its pages, then they are entitled to expect crawlers to stay off their pages. If a crawler chronically ignores the robots instructions, they have the right to contact the owner of the crawler software and ask that he or she please follow the robots exclusion protocols.
If the crawler continues to deep index Google's results pages, then Google will have a pretty good cause of action to file a lawsuit and try and set case law precedent for the legal enforcement of protocal adherence by automatic agents on the web or any other interconnected network.
"Scraping" doesn't make you a programming, or anything really, than a "SKRIPT KIDDEIZ". To equate scraping with Google, sir, is simply delusional. And to suggest that scraping Google is an "eye for an eye" justice because Google "scrapes the web anyway" is ignorant to the point of utter laughability. Perhaps you've heard Newton's famous quote about standing on the shoulders of giants?
Posted at 6:06AM on Dec 19th 2005 by youve_got_to_be_kidding
14. Google indexes and archives pages from the internet. Even if only a part of that page appears in search engine results then it would be a breach of copyright.
Most pages are copyrighted directly or indirectly, and the most common infringement is unauthorised copying. Google pretty much infringes copyright on every website it archives so 8billion pages later, Google may go bust.
Will it ever happen? I think not.
15. If someone in "the land of the lawyers" (read USA) should be able to sue Google for copyright infigmenet for indexing their materials - and do so without the option of telling Googles spiders to stay away from the page. (Read about Robots.txt to find out more about search spider exclusions) then the world would be a poorer place.
Being able to index and make sense of the world wide web has become very useful for the majority of the people using the net. And it won't become any less important in the future.
17. Buncha freakin' wackos.
CONSPIRACIES DON'T HAPPEN IN REAL LIFE!
Posted at 6:06AM on Dec 19th 2005 by suntiger
19. I am not accessing Google automatically. Every click is from a live user who enters search terms because he wants to see Google's results. In fact, to make sure of this I throttle anyone who tries to do more than 20 searches per hour.
I can add CSS code to Firefox and block Google's ads. See www.scroogle.org/gscrape.html at the bottom of the page for detailed instructions. What I can do with Firefox is no different than what Google's toolbar does to pop-up ads.
My scraper is more private than Firefox can possibly be, even though Firefox has the best cookie control tools I've seen on any browser. That's because Google only sees my IP address if you use the scraper, not the searcher's IP address, and I don't keep logs.
Finally, my scraper uses less of Google's bandwidth than a browser uses, because I access a lean interface at Google that isn't used by the average searcher.
Posted at 6:06AM on Dec 19th 2005 by Daniel Brandt








1. Google treads on Shaky ground?
To reduce Google's accomplishment, the subtle and ingenious mathematical modeling that is the substance of "Google", by the way, and the even more brilliant engineering of the software and networking architecture - which allow Page and Brin to index millions of pages and deliver fast queries on basically scrap heap computers back in the beginning - to reduce that to "scraping" is so foolishly arrogant that it's simply comical.
The modeling and the engineering, combined with fierce dedication and perseverence and a willingness to take the time and the excrutiating effort to make the best technology in the world, then continue making it better, are precisely what constitutes "Google" - the search tool. Google, the business, is another matter.
And that brings me to Google, the business. Google's ad programs are viable because of the quality of their search results. Advertisers *want* to be there. The advertisers drive the marketplace, not vice versa pal.
They want to be on Google, so they'll be on Google. Surruptitious advertisers (search engine "optimizers") want to be there too, quite apparently, and so they are.
Now, if someone wants to come along and build an index off of Google's, I see no reason why that's not perfectly legal.
Except for one thing. If Google requests via the robots.txt or other robots instruction meta declarations, that it wishes bots to stay off its pages, then they are entitled to expect crawlers to stay off their pages. If a crawler chronically ignores the robots instructions, they have the right to contact the owner of the crawler software and ask that he or she please follow the robots exclusion protocols.
If the crawler continues to deep index Google's results pages, then Google will have a pretty good cause of action to file a lawsuit and try and set case law precedent for the legal enforcement of protocal adherence by automatic agents on the web or any other interconnected network.
"Scraping" doesn't make you a programming, or anything really, than a "SKRIPT KIDDEIZ". To equate scraping with Google, sir, is simply delusional. And to suggest that scraping Google is an "eye for an eye" justice because Google "scrapes the web anyway" is ignorant to the point of utter laughability. Perhaps you've heard Newton's famous quote about standing on the shoulders of giants?
Posted at 6:06AM on Dec 19th 2005 by youve_got_to_be_kidding