Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

>> For example, what if the sender types in `You should go to http://ww.example.com, where "example" must be replaced with your company name`? Suddenly `www.example.com` has an unintended DDoS!

> Ah, so that doesn't happen if the sender types in the wrong thing in HTML...?

I can't tell if you're being purposely contrarian or simply don't understand users.

The problem: Things that shouldn't be links get turned into links.

Your response: $SOME_OTHER_PROBLEM

I mean, really? You can't tell the difference between someone making a typo when intending to write a link and someone making a typo that results in a link?



sort by: page size:

> like http://ai./

Slightly off topic, but it annoys me to no end that verisign doesn't operate a website at http://com./

I feel like people typing that in, then wondering what the heck just happened would be a great source of people understanding DNS more deeply.


> Remember GeoCities? The web doesn't anymore.

Bad example:

http://reocities.com/


> Its unclear why:

> https://wuzzi.net/img/a.png is a safe URL and is rendered but

> https://wuzzi.net/img/d.png is not a safe URL?

Okay, I have no evidence of this, but they'd better not be using another AI classifier to try and tell whether that link is safe.

Props for finally doing something I guess, but leave it to OpenAI to pick arguably the weirdest mitigation strategy that I've ever seen for a problem like this -- to literally send the URL into another opaque system that just spits out "safe" or "dangerous" with no indication to the user why that is.


> but if the user clicks, the link actually goes to evilscam.com

Often it's more like youtue.com, something the user is unlikely to notice as a slight deviation from the expected URL


> For example, we might say "we believe site-a.com has a problem with spam or inorganic links. An example link is site-b.com/spammy-link.html."

If you are able to identify problem links, you can just ignore them. Or am I missing something?


>Wrong: https://smelle.xyz/lynx/catalog.html

Well, you've got usability problems. I actually tried to find the catalog, and could not.

Now, I was able to see it's a link on the top right corner, for some reason. Very unlike what I'd expect, being used to browsing 4chan.

>Every page is available in plain HTML and it even works without client-side JavaScript.

I scrolled the overboard, and stuff suddently appeared while scrolling. I'll have to assume that the page was still loading, slower than I'd expect due to HN hug of death.


> being a developer and security conscious, I inspect URLs before I click on them. The longer they are, the more frustrated I get examining them.

it doens't really matter if a url is long or short, right? as long as the domain looks good then there's no reason to check every character in the url? not sure why it's more secure to have a shorter url if the domain is safe

*not a security expert


> Back in the day, all the tech folks hung out online at a place called slashdot (/. – CLI folks will get it)

It's not a CLI reference. Slashdot was named for how it sounds read aloud: https://slashdot.org/faq/slashmeta.shtml

> "Slashdot" is an intentionally obnoxious URL. When Rob registered the domain http://slashdot.org, he wanted a URL that was confusing when read aloud. (Try it!)


> you never know when you need to write https://

It is not a proper URL without the protocol part up front. Browser creators improperly trying to hide the protocol not-withstanding.

> just happened to me: in a youtube description you cant just put ycombinator.com,

ycombinator.com is not technically a URL, it is simply a domain name. The fact that browsers all invisibly 'auto convert' it to a URL when entered into their URL/magic bars does not magically make it into a url. If youtube's code which is looking for URL's to automatically make them anchor links simply looked for words with dots, then a large amount of false positives would occur (things that are not URL's getting wrapped in anchor links).

You can blame the browser makers, and the evil one specifically, for all the hiding of URL structure such that people now think "ycombinator.com" should be interpreted as if it were a URL everywhere.


> Seems like if it were simply renamed to .html with no content changes, then it would be okay.

Imagine you do that and I DDoS the URL. CF will then mitigate this DDoS by, in part, replacing your html with their Browser Integrity Check html.

If you're serving 'web pages and websites' everything continues to work. What would happen if this list suddenly became an actual webpage.

If your site is serving 'a disproportionate percentage' of non-html you decrease the ability of CF to tell good traffic from bad.


> Do they really care to see* if they are at www.site.com or m.site.com or site.com or do they just want to know they are at site.com www.example.com == example.com is an assumption that has never been true and I have seen it screw over regular users countless times. Quietly rewriting the URL and hiding that fact will just make that worse. As for m.example.com, that is even further from standardized. Me and and at least one person I know have personal websites with short aliases like mike.lastname.tld == m.lastname.tld, not to mention the plethora of sites that don't have matching paths on mobile vs desktop. Even if copying the URL quietly switches it out for the real one (which is bad on its own), that break the surprisingly common workflow of writing down the URL you see on a piece of paper.

> Presumably the implementation is smarter than being defeated by this easy trick, but I too wonder how it works.

I wouldn't make too many assumptions. Browser vendors have overlooked seemingly "simple" things in the past [1].

[1]: https://news.ycombinator.com/item?id=13329525


> Not to be snarky, but haven't people written tools to help with this? This seems like a common issue. I mean, there's `sed` and similar tools, obviously, but something that could go, validate that the link works over https://, and update it. I don't see why that would need to be some monumental amount of work.

Not as trivial as you'd think: if there's an HTTP URL on the page when it should be HTTPS, how did the URL end up there? Dynamically from PHP code? Dynamically from JavaScript code? Did the URL come from a database? Did the URL come from an environment variable? It can be a lot of work to track all these down and a lot of them you won't be able to find using grep/sed e.g. URLs might appear as relative URLs in code with the "http" part being added dynamically.

You'll get insecure content warnings as well if you try to load HTTP images, css, iframes or JavaScript on an HTTPS page. Likewise, the URL for these can come from lots of places.


> Also, it's a terrible default (for security reasons) to let the web pages you're parsing automagically initiate new requests to arbitrary urls.

Right. We'd have to only grab the article-id, validate that it is in fact an interger in the right range, and only then piece the url back together and request it.

On the other hand, maybe just checking that we stay within the domain is enough. If the website wants to screw with us, they can send us any reply they want to any url anyway.


> I love Stack Overflow's URLs. Here's an example: https://stackoverflow.com/users/6380/scott-hanselman

> The only thing that matters there is the 6380. Try it https://stackoverflow.com/users/6380 or https://stackoverflow.com/users/6380/fancy-pants also works. SO will even support this! http://stackoverflow.com/u/6380.

This works too: https://stackoverflow.com/users/6380/a3n

That doesn't seem right, and on a different site could even be dangerous.


> So like, if you're on my-spa.com, and click a link to a post, the JS loads the post and updates the URL, but doesn't perform a full page load. So while you might be on https://my-spa.com/user/39820002/post/39811155, your browser likely never actually sent a GET request to /user/39820002/post/39811155. But this creates the issue that if you manually navigate to that URL, your browser WILL send a request for it.

Exactly, you got it. This is why SPA configurations for webservers exist, like the SPA_MODE on the one we are commenting on.

> EDIT: I think a better way to do this hold thing is to make the resource an anchor tag in the URL. Instead of https://my-spa.com/user/39820002/post/39811155, make it https://my-spa.com/#user/39820002/post/39811155

As someone else said:

https://news.ycombinator.com/item?id=39819120

Hash routes have their own problems, I don't think they are a great solution, but indeed SPA routers often support them, example from react router:

https://reactrouter.com/en/main/router-components/hash-route...

Note that they strongly suggest to not use hash routing unless you have no other choice.

I agree that returning 200 to every route isn't great for some scenarios, but realistically, SPAs exist, they usually work like this and they are deployed like this, can't really change that now. I would just not build an SPA with client side routing at this point in time since there are solutions that I personally prefer, other people may think differently though.


> "How does the url http://redacted/ work?"

I apologize for the confusion. I used an actual server there (ie. http://somename.com) but chose to redact the actual URL from this post.


> Namely, avoiding using /id/date/title-of-post (or something similar) and just using the rootdomain/title-of-post to make it rank higher and seem more important than it is.

This causes the small wayward fragments of Library Science curriculum embedded in my brain to quiver with rage.

Bonus points if the tail end of the URL contains what may-or-may-not be a bunch of tracking shit and it's not obvious how much it can be shortened without breaking the link.


> What’s wrong with normal URLs?

I've seen some random blogging software break when you embed certain kinds of links into articles, for example, ones that have query strings (?key=value), where the question mark gets interpreted as something else by the software and thus the link gets cut off and no longer works.

In cases like that, it makes sense (because you literally don't have a choice, unless you want to migrate to something else).

Also some might enjoy the consistency of short links, instead of having a URL that's really long. That is also pretty nice for putting the URL on a poster, flier, or sending it to someone.

Some also just don't care about the implications of the link going down in the future.

next

Legal | privacy