Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Show HN: Spotter – Search for copies of videos on the internet (dashboard.spotter.tech) similar stories update story
90 points by joaodmj | karma 54 | avg karma 1.35 2017-05-10 06:22:44 | hide | past | favorite | 90 comments



view as:

Do you track what people search? What do you do with that data? Who do you share it with? What's the business model, if any?

Who says they're looking to monetize?

Just asking.

Suggested by the fact that it's not Free Software, or even open source.

Suggested, yes, but anecdotally I release a lot of web services in the wild that is not open source and I don't have any plans on making any money off them. It's not binary "OS or monetization".

Servers ain't free, mate.

Thanks for asking aaaawweeeee. Answers below:

Do you track what people search? - No, we just get the video and search the web based on the keywords.

What do you do with that data? - We don't keep the videos. We use our own fingerprints.

Who do you share it with? - With the uploader of the video

What's the business model - Trying to figure it out... Have any suggestion? :)


I can see content holders willing to pay for an automated service if it's better than what's already available. Particularly if it issued take down requests as well.

> No, we just get the video and search the web based on the keywords.

So you're not indexing every video, you can only find videos that share some keywords with the original. If I had a video with no keywords it wouldn't show up even if it was an exact copy?

How many videos will you analyze per search? If it's something with popular keywords and shows a million videos, will you download all of those and compare?


ikeboy: At this stage we are not indexing to control server costs. However, our goal is to have all videos indexed to reduce the searching time. We are searching the web based on keywords to reduce the searching space.

It'd be interesting to build a tool to detect all the copyrighted material on YouTube that bypasses the automated checks. Some of the tricks used:

* picture-in-picture: https://www.youtube.com/watch?v=Rp1aOWUSRZg

* mirrored image

* higher pitch

* other?


How will you deal with fair-use?

You don't have to, fair-use is a defence only applicable in court, the easiest way is to sue everyone and let their lawyers assert fair-use. AFAIK it's not unlawful, nor will you incur any fines, if you sue someone for using your copyrighted material if it ends up being declared fair-use. Additionally, I think youtube is even more lenient than that towards people asserting unlawful use of their copyrighted material. In short, fair-use does not exist outside of the courtroom and no amount of "I own no copyrights" or "No copyright intended" tags, or citing the copyright code on your videos can summon it.

YouTube already does an absolutely horrible job with handling take-downs and fair use. I have zero interest in seeing any systems or algorithms built that would aid the sloppy, lazy, and greedy organizations shot gunning take downs, even on their own material and channels (always funny).

This amuses me greatly, got any tangible examples?

jwz posted a thing about his long saga fighting a takedown of horror movie reviews.

>even on their own material and channels (always funny).

I was referring specifically to this? Am I right to infer that DMCAs are so scatter gun that publishers have managed to DMCA themselves?


>Am I right to infer that DMCAs are so scatter gun that publishers have managed to DMCA themselves?

Yes this has happened multiple times. Notable one that's somewhat recent that comes to mind: https://torrentfreak.com/warner-bros-flags-website-piracy-po...


I posted an unlisted video of my daughter at a noisy indoor theme park. It got automatically removed because the venue had a song playing in the background.

> sue everyone and let their lawyers assert fair-use

YouTube would turn into a tumbleweed landscape overnight!

I've often thought YT should just add "No copyright intended" et al as a ContentID trigger though.


"No copyright intended" always makes me laugh. What does that even mean?

It means the average person has no idea how copyright law works. It means lots of average people think there is nothing wrong with non-commerical usage of copyrighted works so long as you don't try to pass something off as your own. It means the legal definition of copyright infringement is out of step of what most people think of as right and wrong.

Your line of thinking is why we need anti-SLAPP style laws protecting Fair Use.

Strategic lawsuits against Fair Use are a form of stifling protected speech.


Since there is no algorithmic way of defining fair use, it can only really be dealt with by lawyers in court.

Hi abdias, thanks for your question! At this stage we only inform our users where the uploaded video is online. Do you think we should deal with fair-use?

Interesting in what way? Everyone who has spent more than five minutes on youtube knows their bread and butter is copyright infringing videos. It's the unspoken dirty secret of online video.

Just search for 'full album' or 'full movie'.


It's hardly unspoken, Viacom sued them for a billion dollars way back before Google bought them. They only lost the case because they couldn't find evidence that YouTube execs mentioned any specific video when discussing the money they made from piracy.

They fact that they included videos they had specifically authorised or uploaded themselves in the list of supposedly infringing videos couldn't have helped.

One correction: Google bought YouTube in 2006. Viacom sued YouTube in 2007.

I guess that makes more sense, since YouTube wouldn't have had $1B to sue for. But the actions brought up in the trial were pre-acquisition.

Beyond the outright movie/album piracy, there's another phenomenon that happens constantly at Youtube:

1. Someone uploads funny/interesting content.

2. Other people (bots maybe) notice that it's popular and download and re-upload it, titling and promoting it so that it ranks above the original in search results. Often the re-uploaders file takedown notices against the original video or other re-uploaders, so they're removed.

3. Repeat step 2 hundreds of times, until any search for "funny/interesting content" returns only a series of almost-identical 240p 15fps cropped, mirrored, or otherwise unwatchable versions of the original video.

I don't know if they condone this because it results in more overall views, or if they're just terrible at fighting it, but it's largely made Youtube useless for its original purpose as a many-to-many video sharing platform.


Hey CommieBobDole, that's exactly the problem we are solving. Our technology is robust enough to detect all those tweaks.

Other tricks I've seen:

* skip a 1-3 seconds every so often

* chop it into blocks and re-arrange them, so you watch out of order but still get the gist (1,2,3,4,5 becomes 1,3,2,5,4)


Hi robbiemitchell, we can also recognize video with that technique :) You should try the platform

Cell phone video of a television.

Hey emodendroket, that's one of the use cases we are better at: recognizing videos from a scan of the TV.

Thanks for starting a good discussion smnscu. That's exactly what we are here for: we are solving that problem!

I'd be more interesting to build a tool to detect all the videos in Youtube that fall under fair use but are still being DMCA'd.

Could be a cool open source project. Would almost definitely be a worse business.

We're looking into this for couple of years now. We're doing some steady progress, but far from a final result.

The problem is incredibly complex, because Fair Use doesn't have any quantifiable metrics that you can translate into a computer algorithm. The gist of Fair Use is that you have to prove the intent of the uploader trying to rip you off.

As you can imagine, this is a very hard thing to prove and the amount of videos being uploaded every day makes this is even harder to crack.

If you're interested in this problematic, reach out and I will be happy to share more.


Yeah, those are fun ones. We [0] are dealing with most of those.

Here are some fun ones based on Shia LeBouf's green screen performance https://www.youtube.com/watch?v=ZXsQAXx_ao0

- https://vk.com/video-67185996_171327939

- https://www.youtube.com/watch?v=w_TuTzzSUb0&t=0s

- https://www.youtube.com/watch?v=i4ktEzJvaGM&t=186s

- https://www.youtube.com/watch?v=5BLMW3vQV20&t=124s

etc. There is much more we can do, but there are some we can't. For instance phone recorded 3D movie recorded in a cinema from a weird angle.

[0] https://pex.com


just fyi - got flagged as malware in my company network.

Hi mholmes680! That's important for us to know. Can you share some details? What page were you in or what were you doing?

Just clicked the link directly from HN, redirect ends up "This domain is blocked due to a security threat."

no real info on that page that i can share though. My company is very aggressive with security and maybe out of date blacklist?


Thanks mholmes680... probably related to the previous owner... not sure how we can change that. If you have any insight please share

And thanks for the heads up :)


It's still beta (i.e., slightly buggy) I think.

I did their suggested Pepsi video search[1], then picked a lower ranked YouTube video titled "GiveMeNews"[2] where it claims that "61% of your video was found here".

But that video consists a single frozen frame from the Pepsi ad and someone talking for a couple minutes. One frame isn't 61% of the video.

[1] https://dashboard.spotter.tech/reports/1?platform=youtube

[2] https://www.youtube.com/watch?v=sKrDECFhosA


Hey mysterypie. Yes we just launch and want to collect as much feedback as possible.

Thanks for that note, we are checking that. The good thing is that we were able to found a single frame from the original video :)


Yeah, that actually sounds like it could be very useful as well! Good work on this so far.

FYI: You can only test the default videos. Trying any other (Youtube) video URLs leads to them asking for you to sign up in order to see the results.

Hi personlurking! Yes, because we just launched today and need to control our servers we are asking for you email to send you the reports and hear back from your experience. Thanks for trying :)

Jumped through the hoops and signed up to submit my own video, still "Queued to analyze..." after 1h.

Got the results now, posted one of my own videos I know has been reuploaded multiple times:

    0 total copies were found
    with 0 total views
Original: https://vimeo.com/132700334

Some known re-uploads:

https://www.youtube.com/watch?v=X0oSKFUnEXc

https://www.youtube.com/watch?v=onbi3Ws8fng

https://www.youtube.com/watch?v=SCE-QeDfXtA

---

Great concept, would love if it actually worked.


rwinn, thanks for checking our platform and for your patience waiting for the results. It's great to know your experience and the results you got (or lack of those :P). We'll check it and get back to you with some news.

(we just launch the tool and it's still on Beta).


I know you didn't ask for this, but I took the liberty to run the video you asked for through our data and here are the results [0]. As you can see, pretty decent amount of them found across many different sites. These results look much more impressive for massively viral videos, but even this paints an interesting picture.

[0] https://www.dropbox.com/s/2913aiiug2bvwb9/pex_Inside_an_Arti...


Impressive results! Even where the original is distorted and mixed with other videos it's detected. I take it you use a different technique than spotter?

Thank you. Can't comment on their technology as I have no idea what are they using. Our service is very complex environment that consists of many different parts and pieces. We run at a huge scale (many thousands of servers) and indexed more than 4B videos. That allows us to do what you see above.

Duh. Spotter requires sign-up with an email address to receive the report. No Thank You.

Hi rixrax!

We struggled with that, but it was the only way we found to provide a fair and good service. You can check an example report on our dashboard (https://dashboard.spotter.tech). Check Psy video for example.

Thanks for your feedback!


Sounds like an excellent tool for censorship and copyright enforcement. Can we not make things like this?

Because copyright enforcement is inherently evil? Please

Yes, if you use a shotgun approach. Fair use is a thing.

Fair use is a thing in the US. It's not a thing in most other places, at least not in anything like the same generic form. And of course a great deal of online copying isn't fair use even in the US.

A shotgun approach is one thing, but speaking as someone who's seen content produced by his businesses shared online, 100% of the copies we've found were blatantly infringing. Trying to keep track of them and get them taken down does waste time that we'd rather spend on activities like making more content. Given that YouTube are awful at dealing with legitimate takedown requests from small businesses (again, personal experience talking here) a tool that would rapidly identify infringing content and let us review it and submit a properly formed takedown notice quickly could be very useful.


>Trying to keep track of them and get them taken down does waste time that we'd rather spend on activities like making more content

Then don't. If people want to steal your content, it's going to be stolen. You cannot solve that. It's better to spend your time making more content and better content so your paying customers pay more and refer more paying friends.


> If people want to steal your content, it's going to be stolen. You cannot solve that. It's better to spend your time making more content and better content so your paying customers pay more and refer more paying friends.

You can solve a lot of it. I wrote a book that took me a lot of time and effort to produce. People started posting it for free. I forced them to take it down. That was much, much easier than writing another book (which would take me about a year)


How many of the people who were going to steal it do you think were converted into legitimate buyers when you took down the free source? Also, I expect that your book is still available in certain circles for free.

Legitimate buyers can convert the other way.

> How many of the people who were going to steal it do you think were converted into legitimate buyers when you took down the free source

More than zero. My book is a particular niche that people need, and they will pay what they have to in order to get it. If zero is an option some will take zero, if not they'll pay.


How many of the people who were going to steal it do you think were converted into legitimate buyers when you took down the free source?

I can't speak for Austen, but in our case we've seen people attempting to rip our stuff and use it in such a way that there was definitely revenue being generated. What's more, we know that the people providing that revenue liked what they saw and were looking for more of it, because some of the comments or ratings on some of the distribution channels are public. It's just that those people who were demonstrably willing to support the content were sending their support to the wrong place.

Also, I expect that your book is still available in certain circles for free.

Again, I can't speak for Austen, but in our case those "certain circles" would have to be pretty tight and obscure. As I said before, we operate in a relatively small world, and if someone were leaking any of our stuff on a large scale, we'd almost certainly know about it. And if anything out there is only on a small enough scale that we haven't heard about it, then almost all of our other potential customers probably haven't either. That's a big difference from seeing people putting something up on YouTube or Vimeo or whatever and watching them pick up thousands of views/listens and positive feedback in a matter of hours.


If people want to steal your content, it's going to be stolen. You cannot solve that.

Man who say something cannot be done should not interrupt man who is doing it.

We're talking about small producers operating in niche markets here, not the latest Hollywood summer blockbuster, Taylor Swift album or Game of Thrones episode. You can't just go onto YouTube and find our stuff, except on the rare occasions when someone's managed to post a little of it for a short time before we get it taken down.

The downside of this is that I'm writing this from a normal house, not a luxury yacht somewhere nice and sunny.

The upside is that most of our customers don't know how to save/share our stuff, and the few who do stick out like a sore thumb in their access patterns and can rapidly be cut off. The biggest real world threat to us is not acting quickly on those or the redistribution they attempt, meaning we wind up losing other potential customers to rips of our stuff with someone else's branding/advertising slapped all over them.

It's better to spend your time making more content and better content so your paying customers pay more and refer more paying friends.

No, it isn't. The evidence in our case is beyond any doubt. Of course we would much rather be making new content and doing more for our paying customers, but we can't afford to just ignore the small minority of abusers.


OP's software identifies videos that could potentially be copyright infringement. The conclusion that we shouldn't build software like that because there's such a thing as fair use doesn't follow.

The subsequent DMCA notice that's filed is the problem. DMCA has a huge problem with abuse from automated filings. Every takedown notice should be prepared by a human. Writing software that automates the process is a crime against culture.

Again, the conclusion doesn't follow. DMCA being abused doesn't mean that copyright is inherently wrong or that software that makes it possible for copyright holders to find offenders is evil or a "crime against culture."

And here I was thinking porn.

Hi tokenizerrr, I was counting down until someone mentioned porn, you were the first!!! YAY!!!! :)))

So what sites do you scrape? Just YouTube?

At this stage we analyze YouTube and Facebook. But we are also able to do it with LinkedIn, Vimeo, Twitter, Instagram, the Chinese platform Tencent Video and a couple of porn sites :D

A knife can be used for good or for evil. You don't stop tools from being made just because they can be used in bad ways.

Wrong. You should always consider the potential for abuse when you make software. OP seems to be pushing a commercial angle as well - who exactly do you think is going to pay?

similar service https://pex.com/

Thanks gondo! Please try our tool and give us feedback ;) Which one got the best results?

Congratulations to the team on the release. It's exciting to see more companies entering the market.

We built a similar service [0] although from the first look it seems we took a different path. Our approach is to crawl all videos and music on the web, fingerprint the multimedia content and search through the fingerprints.

We just recently passed 4B indexed videos (we run a bit large scale [1]). Thus our results are bit different. Here is an example for gangnam style for comparison [2].

Anyway good luck. Feel free to reach out if there is anything we can do to help.

[0] https://pex.com

[1] https://news.ycombinator.com/item?id=13259415

[2] http://i.imgur.com/3KDKHsI.png


Hey doh! Thanks for reaching out. We have no videos indexed and we are doing strictly visual search - we are Computer Vision evangelists :)

Thanks for your availability, likewise: feel free to drop us a note.


Please don't reply to people with their usernames; HN's threaded comments make that redundant, and it breaks the sense of normal conversation in much the way that repeating people's names out loud would.

Thanks for letting me know. I'm new to HN :)

You're welcome, and welcome!

Legal | privacy