>> For example, what if the sender types in `You should go to http://ww.example.com, where "example" must be replaced with your company name`? Suddenly `www.example.com` has an unintended DDoS!
> Ah, so that doesn't happen if the sender types in the wrong thing in HTML...?
I can't tell if you're being purposely contrarian or simply don't understand users.
The problem: Things that shouldn't be links get turned into links.
Your response: $SOME_OTHER_PROBLEM
I mean, really? You can't tell the difference between someone making a typo when intending to write a link and someone making a typo that results in a link?
Okay, I have no evidence of this, but they'd better not be using another AI classifier to try and tell whether that link is safe.
Props for finally doing something I guess, but leave it to OpenAI to pick arguably the weirdest mitigation strategy that I've ever seen for a problem like this -- to literally send the URL into another opaque system that just spits out "safe" or "dangerous" with no indication to the user why that is.
Well, you've got usability problems. I actually tried to find the catalog, and could not.
Now, I was able to see it's a link on the top right corner, for some reason. Very unlike what I'd expect, being used to browsing 4chan.
>Every page is available in plain HTML and it even works without client-side JavaScript.
I scrolled the overboard, and stuff suddently appeared while scrolling. I'll have to assume that the page was still loading, slower than I'd expect due to HN hug of death.
> being a developer and security conscious, I inspect URLs before I click on them. The longer they are, the more frustrated I get examining them.
it doens't really matter if a url is long or short, right? as long as the domain looks good then there's no reason to check every character in the url? not sure why it's more secure to have a shorter url if the domain is safe
> "Slashdot" is an intentionally obnoxious URL. When Rob registered the domain http://slashdot.org, he wanted a URL that was confusing when read aloud. (Try it!)
It is not a proper URL without the protocol part up front. Browser creators improperly trying to hide the protocol not-withstanding.
> just happened to me: in a youtube description you cant just put
ycombinator.com,
ycombinator.com is not technically a URL, it is simply a domain name. The fact that browsers all invisibly 'auto convert' it to a URL when entered into their URL/magic bars does not magically make it into a url. If youtube's code which is looking for URL's to automatically make them anchor links simply looked for words with dots, then a large amount of false positives would occur (things that are not URL's getting wrapped in anchor links).
You can blame the browser makers, and the evil one specifically, for all the hiding of URL structure such that people now think "ycombinator.com" should be interpreted as if it were a URL everywhere.
> Do they really care to see* if they are at www.site.com or m.site.com or site.com or do they just want to know they are at site.com
www.example.com == example.com is an assumption that has never been true and I have seen it screw over regular users countless times. Quietly rewriting the URL and hiding that fact will just make that worse.
As for m.example.com, that is even further from standardized. Me and and at least one person I know have personal websites with short aliases like mike.lastname.tld == m.lastname.tld, not to mention the plethora of sites that don't have matching paths on mobile vs desktop. Even if copying the URL quietly switches it out for the real one (which is bad on its own), that break the surprisingly common workflow of writing down the URL you see on a piece of paper.
> Not to be snarky, but haven't people written tools to help with this? This seems like a common issue. I mean, there's `sed` and similar tools, obviously, but something that could go, validate that the link works over https://, and update it. I don't see why that would need to be some monumental amount of work.
Not as trivial as you'd think: if there's an HTTP URL on the page when it should be HTTPS, how did the URL end up there? Dynamically from PHP code? Dynamically from JavaScript code? Did the URL come from a database? Did the URL come from an environment variable? It can be a lot of work to track all these down and a lot of them you won't be able to find using grep/sed e.g. URLs might appear as relative URLs in code with the "http" part being added dynamically.
You'll get insecure content warnings as well if you try to load HTTP images, css, iframes or JavaScript on an HTTPS page. Likewise, the URL for these can come from lots of places.
> Also, it's a terrible default (for security reasons) to let the web pages you're parsing automagically initiate new requests to arbitrary urls.
Right. We'd have to only grab the article-id, validate that it is in fact an interger in the right range, and only then piece the url back together and request it.
On the other hand, maybe just checking that we stay within the domain is enough. If the website wants to screw with us, they can send us any reply they want to any url anyway.
> So like, if you're on my-spa.com, and click a link to a post, the JS loads the post and updates the URL, but doesn't perform a full page load. So while you might be on https://my-spa.com/user/39820002/post/39811155, your browser likely never actually sent a GET request to /user/39820002/post/39811155. But this creates the issue that if you manually navigate to that URL, your browser WILL send a request for it.
Exactly, you got it. This is why SPA configurations for webservers exist, like the SPA_MODE on the one we are commenting on.
Note that they strongly suggest to not use hash routing unless you have no other choice.
I agree that returning 200 to every route isn't great for some scenarios, but realistically, SPAs exist, they usually work like this and they are deployed like this, can't really change that now.
I would just not build an SPA with client side routing at this point in time since there are solutions that I personally prefer, other people may think differently though.
> Namely, avoiding using /id/date/title-of-post (or something similar) and just using the rootdomain/title-of-post to make it rank higher and seem more important than it is.
This causes the small wayward fragments of Library Science curriculum embedded in my brain to quiver with rage.
Bonus points if the tail end of the URL contains what may-or-may-not be a bunch of tracking shit and it's not obvious how much it can be shortened without breaking the link.
I've seen some random blogging software break when you embed certain kinds of links into articles, for example, ones that have query strings (?key=value), where the question mark gets interpreted as something else by the software and thus the link gets cut off and no longer works.
In cases like that, it makes sense (because you literally don't have a choice, unless you want to migrate to something else).
Also some might enjoy the consistency of short links, instead of having a URL that's really long. That is also pretty nice for putting the URL on a poster, flier, or sending it to someone.
Some also just don't care about the implications of the link going down in the future.
> Ah, so that doesn't happen if the sender types in the wrong thing in HTML...?
I can't tell if you're being purposely contrarian or simply don't understand users.
The problem: Things that shouldn't be links get turned into links.
Your response: $SOME_OTHER_PROBLEM
I mean, really? You can't tell the difference between someone making a typo when intending to write a link and someone making a typo that results in a link?
reply