Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Python API for Zero Day Phishing Detection Based on Computer Vision (github.com) similar stories update story
60.0 points by yevpats | karma 233 | avg karma 1.52 2018-04-20 14:18:16+00:00 | hide | past | favorite | 27 comments



view as:

This entire project is less than 20 lines of code:

https://github.com/phishai/phish-ai-api/blob/master/phish_ai...


That's not impressive when those lines are just API calls to a hosted service that does all of the actual work.

I suppose that comment is to indicate exactly what's been linked to and not to indicate an impressive feat.

So, yes, you are likely both in agreement.

Instead of what content often shows up on HN with lots of technical detail, this amounts to a service announcement.

This service sounds interesting but a much better HN submission would be one that talks about how it works, how it was improved, etc. Browsing the phish.ai website uncovers several (very brief) articles that are more interesting than this repo.

BTW footnote wrt the API's design -- {"verdict": "clean"} -- this is not ideal for a non-interactive service. It should either return a numeral (like a confidence interval) or a boolean value.


Hey, Thanks for the feedback. It's a great idea and I got the same feedback from more people so we will publish a blog on how it works - I replied with a short explanation to encoderer. footnote on API design noted and will be added:) thx

Hey there, So the name of the project is: API it will contain more repos with API for different languages and integrations. You can read more about our open-source strategy here and why we can't open-source the backend https://www.phish.ai/2018/03/26/phish-ai-love-open-source/

This is really great. I think if this gains traction, it could be fascinating for Computer Vision in general, as inevitably scammers would try to introduce adversarial elements to fool the CV, potentially by training a CV AI of their own to try and replicate (mis)classification. The end result of such an arms race is CV that better mimics human perception.

Thanks!

Even before it has an impact on CV, this is really important. If widely adopted, it mitigates against the "script kiddie-ization" of phishing sites, where it becomes easier and easier to closely mimic a legitimate site by reusing (perhaps in real time via proxy) its original assets. Such attackers won't have the ability to create novel adversarial elements, though, and you could train yourselves on any that made it into common exploit scripts.

That said, your browser plugin is extremely aggressive at uploading screenshots of potentially sensitive information to your servers (for instance, someone's stealth mode startup's unindexed Heroku staging URL), doing so by default if it hasn't been whitelisted [0]. And your privacy policy [1] permits you to resell "non-personally identifiable visitor information... to other parties for marketing, advertising, or other uses."

I don't blame you for needing to make money or for launching an MVP, but there must be better zero-knowledge ways of accomplishing this. It's an important goal and I wish you all the best... but don't be the source of the data leaks you intend to prevent.

[0] https://github.com/phishai/phish-protect/blob/master/js/back...

[1] https://www.phish.ai/phish-ai-privacy-policy/


Hey, Thanks for the extensive reply.

Per the browser extension, we don't upload every screenshot in the free extension only if you like to scan a particular webpage as we can't afford to scan every webpage for free. We don't sell any information as this is not our business model - I got your point with the privacy policy and we will make it more accurate.

The business model is mainly with API product where you will see more integration coming up in the next couple of months and as I said the use case for the API is more for incident response teams and hosting providers that need to go manually through a lot of URLs and tag them.


I am not sure this works. I just tested it on a dummy phishing site I made and it came as clean.

Hey, Thanks for the feedback! We are currently in Beta and we some false negative/positive can occur Also I'll publish an up-to-date list of brands that we detect

The best I get reading the website is that it ‘proactively crawls’ websites of “top brands”

And then with the extension it takes a screenshot of every page you visit and uploads to their server that does something with CV fast enough that it can block you from submitting data to a phishing attempt.

Anybody have any details of how this actually works?

Seems a little magic.


Hey, So looks like quite a lot of confusion and our content marketing need to be improved. Anyway, I'll try to explain here:

1) we crawl websites of known brands and take screenshots of them. then we extract AI & computer vision features from them and create signatures for every website.

Now there are two products 1) API: you can use this for any type of use case: incident response, automated phishing classification for hosting providers. or any other use-case that you can think of. 2) chrome extension: The free version is not uploading screenshots in real-time as it's very expensive and resource consuming to process a lot of screenshots so it's available only for enterprise version. The free chrome extension has two features: unicode detection and link to scan the current website actively with Phish.AI - so really nothing to call home about.

Hope it made some things a bit clearer


This is very cool; as computer vision gets better and the database grows, I can easily see this being the default way to do phishing detection.

True, the database is the key here. we are still in Beta and our database grows every day!

Pipe... all of my email... through some startup's random opaque web service...?

Why email? It's not an email service at all....I guess we need to improve our content marketing but we don't have the word email anywhere on our site. Having said that the main service is an API that you can use for a lot use cases for example a good use case is an incident response team that goes manually through a lot of phishing reports to tag them and some hosting providers that goes through their websites to get down phishing or hacked website

So... the idea is (potentially) for a provider's mail filters to extract links from emails and submit them to your API, raising an alert if it's a close match for a known site but not on a whitelisted domain?

No it won't work for an email at all. You can't integrate it with email links because the service needs to surf to the url to take a screenshot and then analyze the screenshot. If we will click on every link in an email will make chaos and unsubscribe everyone from everything and decline/accept/maybe all invitations:)

The use case is more appropriate for Incident-response teams that go manually through tons of URLs to classify them or Hosting providers that go through tons of websites to check if phishing websites are hosted or some sites were hacked. Essentially any use case where you have a feed of urls that you can access from the web and tag/find phishing websites


The existing URIBLs are great at this (without having to upload every web page you're looking at).

"Phishing is the attempt to obtain sensitive information such as usernames, passwords, and credit card details (and money), often for malicious reasons, by disguising as a trustworthy entity in an electronic communication." [1]

'electronic communication' is generally considered a synonym for 'e-mail'. If that's not what your service is targeting, you should probably explain it.

1. https://en.wikipedia.org/wiki/Phishing


so to make it a little more clear: Phishing can come in a lot of forms from telephone to websites to sms-phishing, whatsapp etc...

We specialize in comparing visually website to our own database of legit websites and then comparing the domain to detect fake/phishing websites that hosted on different domains. For example, we detect that website looks like paypal and then we check if it's hosted on one of paypal domain (paypal.com,paypal.uk etc...)


Who is your target market?

Old and busted: Phishing attempt leaks all my companies‘ emails.

New and hot: Breach at phish.ai‘s screenshot database leaks all my company‘s emails.


Hey there, I already answered here in the thread - we are not an email solution please see my other reply to ris. Hope people will at least read the thread or the website before posting negative feedback.

This is a Python API client for a zero day phishing detection service. The title of this post is misleading and makes it sound like it is an open-source project.

Legal | privacy