> Fair use does not give you the right to wholesale scrape content
Yes, it potentially does. There are court cases establishing precedent that copying something in its entirety can still be fair use, as well as law and court cases establishing specific allowances for archives/libraries/etc.
What you've failed to mention is the criteria used to determine if a usage is indeed "fair". There are 4 basic criteria[0] but can be summarized as "If the usage doesn't affect the market for the original work, is substantially transformative, is proportionally insignificant or is used for critique/parody then it is fair". Or, at the risk of over simplifying it: "Does the usage grant a net public benefit without significantly hurting the copyright holders ability to make money?".
>Can I send an email to Netflix and tell them "Hey, if you don't want me to copy your shows, please add this in your page's HEAD element: <meta name='please-dont-download-my-shows-sir'>"?
Actually, under fair use you certainly can make a personal copy (see Betamax case). If you distribute the work you would likely run afoul of the criteria summarized above.
The robots.txt relevancy is being over stated in your argument. The main criteria used in this case is summarized above. The fact that Google provides an opt-out mechanism is a secondary, supporting argument.
>What if I started indexing and rehosting thumbnails? I can assure you that I would get C&D'd almost immediately
A determination of infringement would depend entirely on the context as related to the afore mentioned criteria. The fact that someone might try to sue is a product of the terrible system in general and you're absolutely right - as with any legal matter the entity with the deeper pockets can often bully the other guy into submission.
>In Craigslist v. 3Taps, while primarily a CFAA case, 3Taps was found to be infringing copyrights
My understanding is that the copyright part of the case was thrown out [1] and thus was settled solely around CFAA matters.
>In Ticketmaster v. RMG Technologies , RMG was found to infringe just by parsing a page.
I agree that the logic used for the judgement is absurd (for reasons that are plainly obvious to any HN user). But it's less clear whether the case would meet fair use criteria outlined above should it have come to that. My guess is that it wouldn't qualify since the usage affects the copyright holders ability to make money on the work and doesn't meet any of the other criteria for Fair Use.
>Facebook v. Power Ventures
This is not a case involving a defense of fair use (as far as I can tell). Facebook even acknowledged the users owned the data and had a right to it. The defendant was actually found to be violating CFAA and CAN-SPAM acts.
>It seems Google is the only entity capable of making unauthorized copies and then getting courts to agree that it's fair use. For the rest of us, it's infringement
Provably false [2]. It sounds like perhaps your personal experience has soured your opinion on the matter? That's understandable. But none of the evidence you've cited supports the argument that Google is infringing copyrights in its core activities nor that Google is the only entity where copyright laws and fair use legislation don't apply.
PS: To be clear, my argument revolves specifically around copyright infringement and fair use. I don't have enough understanding of other, separate legislation like CFAA to comment on that except to say that it seems overly broad and unrealistic. But that's another topic. I'm specifically arguing against calling Google a copyright infringer in a broad sense which is what you've done. That's not been proven.
Fair use certainly doesn't apply. You're using Google search technology (which btw, involves a bit more than 'scraping' the web) and stripping out the ads (their source of revenue). Expect a cease and desist letter soon.
Indeed - and i believe scraping data off a publically accessible webpage (e.g., one where you do not require registration and login/password) falls under fair use, provided you do not take up more bandwidth/resources than the average user of that site.
I recall there being some precedent for this sort of fair use - something like a phone directory - the information is not copyrighted, but the arrangement and layout is. So hence, you can't just iframe a site and present it, but obtaining the data, and deriving a new work from it should fall under fair use.
What you have come up with is a fuzzy case, not clear cut. Even with fair use, the way search engines use content can easily be considered infringement.
I still stand by my statement and the only reason search engines are allowed to copy petabytes of COPYRIGHTED material is because they are so darn useful. I don't know of any other service that sidesteps intellectual property rights (whether through fair use or not) and makes a good amount of money that has been allowed to exist and thrive.
It seems fairly similar, at least to me, to a search engine copying snippets of other people's web sites and displaying them on a page. Admittedly, there's still some discussion as to whether or not _that_ is fair use, but I think enough of the population think it is (with many news organizations disagreeing).
I didn't think either was fair use. I think fair use applies to the utilization not the acquisition of. For example if I bought a movie and I used it with the intention to discuss it with clips, I can recode it because I physically own the medium. The fair use chimes in when the clips are in use. However when I download something with a tool from Google (YouTube) I'm violating their ToS. I might also be violating something in the middle that opens up some legal issues. So when you go to use a video only for fair use, you should ask the poster or the source for an unencrypted one. Also they should do the responsible thing and provide it.
I also have the right to copy and extract parts of the content under Fair Use. If content providers are making this technically impossible - thus depriving us of the possibility of using it for teaching, research, news reporting, or criticism - how are they not violating the social contract?
Fair use might work but maybe not? If I were to argue against it, I'd probably compare something like a recording of music vs. a MIDI file. Same raw data scaling.
For example: Scraping <title> tags off Netflix would be legit, but copying Netflix video files wouldn't.
reply