Python API for Zero Day Phishing Detection Based on Computer Vision

kijeda | karma 619 | avg karma 4.42 · 2018-04-20 14:41:56

This entire project is less than 20 lines of code:

https://github.com/phishai/phish-ai-api/blob/master/phish_ai...

reply

krapp | karma 17552 | avg karma 1.17 · 2018-04-20 14:45:03

That's not impressive when those lines are just API calls to a hosted service that does all of the actual work.

wyldfire | karma 20828 | avg karma 4.03 · 2018-04-20 14:58:42

I suppose that comment is to indicate exactly what's been linked to and not to indicate an impressive feat.

So, yes, you are likely both in agreement.

Instead of what content often shows up on HN with lots of technical detail, this amounts to a service announcement.

This service sounds interesting but a much better HN submission would be one that talks about how it works, how it was improved, etc. Browsing the phish.ai website uncovers several (very brief) articles that are more interesting than this repo.

BTW footnote wrt the API's design -- {"verdict": "clean"} -- this is not ideal for a non-interactive service. It should either return a numeral (like a confidence interval) or a boolean value.

reply

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 17:19:09

Hey, Thanks for the feedback. It's a great idea and I got the same feedback from more people so we will publish a blog on how it works - I replied with a short explanation to encoderer. footnote on API design noted and will be added:) thx

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 17:16:38

Hey there, So the name of the project is: API it will contain more repos with API for different languages and integrations. You can read more about our open-source strategy here and why we can't open-source the backend https://www.phish.ai/2018/03/26/phish-ai-love-open-source/

fredley | karma 11683 | avg karma 6.88 · 2018-04-20 16:07:20+00:00

This is really great. I think if this gains traction, it could be fascinating for Computer Vision in general, as inevitably scammers would try to introduce adversarial elements to fool the CV, potentially by training a CV AI of their own to try and replicate (mis)classification. The end result of such an arms race is CV that better mimics human perception.

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 17:27:31+00:00

Thanks!

btown | karma 14946 | avg karma 5.75 · 2018-04-20 20:29:14

Even before it has an impact on CV, this is really important. If widely adopted, it mitigates against the "script kiddie-ization" of phishing sites, where it becomes easier and easier to closely mimic a legitimate site by reusing (perhaps in real time via proxy) its original assets. Such attackers won't have the ability to create novel adversarial elements, though, and you could train yourselves on any that made it into common exploit scripts.

That said, your browser plugin is extremely aggressive at uploading screenshots of potentially sensitive information to your servers (for instance, someone's stealth mode startup's unindexed Heroku staging URL), doing so by default if it hasn't been whitelisted [0]. And your privacy policy [1] permits you to resell "non-personally identifiable visitor information... to other parties for marketing, advertising, or other uses."

I don't blame you for needing to make money or for launching an MVP, but there must be better zero-knowledge ways of accomplishing this. It's an important goal and I wish you all the best... but don't be the source of the data leaks you intend to prevent.

[0] https://github.com/phishai/phish-protect/blob/master/js/back...

[1] https://www.phish.ai/phish-ai-privacy-policy/

reply

yevpats | karma 233 | avg karma 1.52 · 2018-04-21 09:34:05

Hey, Thanks for the extensive reply.

Per the browser extension, we don't upload every screenshot in the free extension only if you like to scan a particular webpage as we can't afford to scan every webpage for free. We don't sell any information as this is not our business model - I got your point with the privacy policy and we will make it more accurate.

The business model is mainly with API product where you will see more integration coming up in the next couple of months and as I said the use case for the API is more for incident response teams and hosting providers that need to go manually through a lot of URLs and tag them.

reply

_pdp_ | karma 813 | avg karma 1.69 · 2018-04-20 16:20:20

I am not sure this works. I just tested it on a dummy phishing site I made and it came as clean.

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 17:15:14+00:00

Hey, Thanks for the feedback! We are currently in Beta and we some false negative/positive can occur Also I'll publish an up-to-date list of brands that we detect

encoderer | karma 9841 | avg karma 3.34 · 2018-04-20 17:18:32

The best I get reading the website is that it ‘proactively crawls’ websites of “top brands”

And then with the extension it takes a screenshot of every page you visit and uploads to their server that does something with CV fast enough that it can block you from submitting data to a phishing attempt.

Anybody have any details of how this actually works?

Seems a little magic.

reply

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 17:24:27

Hey, So looks like quite a lot of confusion and our content marketing need to be improved. Anyway, I'll try to explain here:

1) we crawl websites of known brands and take screenshots of them. then we extract AI & computer vision features from them and create signatures for every website.

Now there are two products 1) API: you can use this for any type of use case: incident response, automated phishing classification for hosting providers. or any other use-case that you can think of. 2) chrome extension: The free version is not uploading screenshots in real-time as it's very expensive and resource consuming to process a lot of screenshots so it's available only for enterprise version. The free chrome extension has two features: unicode detection and link to scan the current website actively with Phish.AI - so really nothing to call home about.

Hope it made some things a bit clearer

reply

SmooL | karma 511 | avg karma 3.96 · 2018-04-20 17:55:54

This is very cool; as computer vision gets better and the database grows, I can easily see this being the default way to do phishing detection.

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 18:14:38+00:00

True, the database is the key here. we are still in Beta and our database grows every day!

ris | karma 4200 | avg karma 2.93 · 2018-04-20 18:28:17

Pipe... all of my email... through some startup's random opaque web service...?

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 18:38:45

Why email? It's not an email service at all....I guess we need to improve our content marketing but we don't have the word email anywhere on our site. Having said that the main service is an API that you can use for a lot use cases for example a good use case is an incident response team that goes manually through a lot of phishing reports to tag them and some hosting providers that goes through their websites to get down phishing or hacked website

ris | karma 4200 | avg karma 2.93 · 2018-04-20 19:31:03+00:00

So... the idea is (potentially) for a provider's mail filters to extract links from emails and submit them to your API, raising an alert if it's a close match for a known site but not on a whitelisted domain?

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 19:40:42

No it won't work for an email at all. You can't integrate it with email links because the service needs to surf to the url to take a screenshot and then analyze the screenshot. If we will click on every link in an email will make chaos and unsubscribe everyone from everything and decline/accept/maybe all invitations:)

The use case is more appropriate for Incident-response teams that go manually through tons of URLs to classify them or Hosting providers that go through tons of websites to check if phishing websites are hosted or some sites were hacked. Essentially any use case where you have a feed of urls that you can access from the web and tag/find phishing websites

reply

jlgaddis | karma 11467 | avg karma 2.4 · 2018-04-20 22:42:39+00:00

The existing URIBLs are great at this (without having to upload every web page you're looking at).

ghostly_s | karma 3338 | avg karma 2.84 · 2018-04-20 19:40:13

"Phishing is the attempt to obtain sensitive information such as usernames, passwords, and credit card details (and money), often for malicious reasons, by disguising as a trustworthy entity in an electronic communication." [1]

'electronic communication' is generally considered a synonym for 'e-mail'. If that's not what your service is targeting, you should probably explain it.

1. https://en.wikipedia.org/wiki/Phishing

reply

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 19:51:26

so to make it a little more clear: Phishing can come in a lot of forms from telephone to websites to sms-phishing, whatsapp etc...

We specialize in comparing visually website to our own database of legit websites and then comparing the domain to detect fake/phishing websites that hosted on different domains. For example, we detect that website looks like paypal and then we check if it's hosted on one of paypal domain (paypal.com,paypal.uk etc...)

reply

alexbeloi | karma 756 | avg karma 2.89 · 2018-04-20 20:40:46

Who is your target market?

endymi0n | karma 11254 | avg karma 9.9 · 2018-04-20 18:39:55

Old and busted: Phishing attempt leaks all my companies‘ emails.

New and hot: Breach at phish.ai‘s screenshot database leaks all my company‘s emails.

reply

yevpats | karma 233 | avg karma 1.52 · 2018-04-20 18:44:23+00:00

Hey there, I already answered here in the thread - we are not an email solution please see my other reply to ris. Hope people will at least read the thread or the website before posting negative feedback.

jeffnappi | karma 539 | avg karma 2.58 · 2018-04-20 21:06:50+00:00

This is a Python API client for a zero day phishing detection service. The title of this post is misleading and makes it sound like it is an open-source project.