Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> We had already had created this anonymous favicon service for our private search engine.

I don't think here's a need for adjectives here. Why stress that it's anonymous (when that's hard to verify) or that the search engine is private, when that too is starting to come into question? Repeating these things won't will them into the reader's perception.

> In addition, doing it this way avoids another request (and potentially multiple) to the end site.

This isn't true, unless I'm missing something here? When I access a website, the HTML response I get from that website includes all the information my browser needs to, on its own, get and display the favicon. Can you clarify why you think/say this avoids one or more requests? What mechanism is this service a substitute for?



sort by: page size:

>I clicked your links and I read your description but I don't get how they are tied together. For example, how does your opening sentence have anything to do with the Google Docs links?

I would have posted a link to the live search engine, but as the OP says it "folded" after a year, so I did the next best thing and gave links to 2 screen shots of example search results pages. Sorry, I just do not understand why the description of my search engine leaves you lost, especially to the point you feel the description and actual screen shots are not tied together - the only difference between the description and screen shots is that because they are screen shots you do not have the benefit of continuing to scroll down and have infinite results populate, also most of the results are not favicons but custom uploaded logos (but I mention that function in the OP).

>are you trying to offer a more visual display for the favicon next to the search results? Is that your "differentiator" I assume? I'm just so lost here.

As the links show, my results display the favicon only - no text based results.

>EDIT: No one wants a "web search engine that displays results as an infinite list of favicons by default." People might want to view the favicon next to the search results but no one wants an "infinite list of favicons by default."

I asked for insight, but ouch. Did I shut it down for lack of traction? Yes, but starting from day 1 I had over 10,000 queries/day so I would not say "no one wants".

Besides I can build a search engine right now that people do want, that does not mean anyone will jump ship from Google for it. For example, I know from posts on HN that people have been waiting for years for Google to permit infinite results (see: http://news.ycombinator.com/item?id=4875463) instead of x results per page, so in minutes I could create a Google API search engine that provides identical results to Google and offers infinite results on the first page - but do you know how many people will use it? None, and for anyone who thinks it is as easy as building something people want, then test your theory now and build the infinite Google search engine that people on HN have posted about and see if building something people want actually gets people using your engine instead of Google.

Disclosure: I know you commented on the post I linked above about infinite scrolling, specifically that you do not like infinite scrolling for web search but you like it for image searches - a reply comment to yours was another HN user saying they prefer the infinate scroll feature of DuckDuckGo but find the actual Google results to be better quality.


> What's the general lesson taught by such redactions by the browser vendor?

That it drives traffic to their search engine and increases their revenue. A lot of people always use a search engine to navigate to sites anyway, even if it's a site they've been to many times before and even if the URI is as simple as https://somecompany.com.

Personally, I always want to see the full URI.


> they’ve demonstrated gross incompetence in privacy

Not sure I buy the example that is given here.

1. It's an issue in their browser app, not their search service.

2. It's not completely indefensible: it allows fetching favicons (potentially) much faster, since they're cached, and they promise that the favicon service is 100% anonymous anyway.

3. They responded to user feedback and switched to fetching favicons locally, so this is no longer an issue. https://github.com/duckduckgo/Android/issues/527#issuecommen...

> The search results suck! The authoritative sources for anything I want to find are almost always buried beneath 2-5 results from content scrapers and blogspam. This is also true of other search engines like Google.

This part is kinda funny because "DuckDuckGo sucks, it's just as bad as Google" is ... not the sort of complaint you normally hear about an alternative search engine, nor does it really connect with any of the normal reasons people consider alternative search engines.

That said, I agree with this point. Both DDG and Google seem to be losing the spam war, from what I can tell. And the diagnosis is a good one too: the problem with modern search engines is that they're not opinionated / biased enough!

> Crucially, I would not have it crawling the entire web from the outset. Instead, it should crawl a whitelist of domains, or “tier 1” domains. These would be the limited mainly to authoritative or high-quality sources for their respective specializations, and would be weighed upwards in search results. Pages that these sites link to would be crawled as well, and given tier 2 status, recursively up to an arbitrary N tiers.

This is, obviously, very different from the modern search engine paradigm where domains are treated neutrally at the outset, and then they "learn" weights from how often they get linked and so on. (I'm not sure whether it's possible to make these opinionated decisions in an open source way, but it seems like obviously the right way to go for higher quality results.) Some kind of logic like "For Python programming queries, docs.python.org and then StackExchange are the tier 1 sources" seems to be the kind of hard-coded information that would vastly improve my experience trying to look things up on DuckDuckGo.


> I wonder why Google shows a Q

A placeholder used if the algorithm thinks the favicon is not appropriate for some reason.


> For example, your average non-tech-savvy user would never realize that an important privacy setting is in a search setting marked "suggestions".

Is this not clear enough?

http://i.imgur.com/i1Zq3E4.png


> A reasonable person would not assume that content you can only get to by editing a URL manually is supposed to be accessible to the public.

A reasonable person could totally use crawlers like DownThemAll, and fail to notice that some URLs they request are not, in fact, accessible by clicking through a web page. That's different from accessing something you know isn't accessible by mainstream means.

I did that several time to download some porn. The process is simple: search for whatever I'm interested in in a search engine, click on whatever image looks interesting, see if I the URL has numbers I can modify to access nearby images (they will have hopefully the same theme, or even depict the same scene).

The first URL was clearly publicly available. I got it legitimately through a search engine, or by clicking around. How am I supposed to guess that some of the others are off limits?


   > Original commenter is right about the feature obsolescence and didn't seem condescending to me
Maybe it wasn't, intention and tone are really hard to get through text, that's just how it felt to me when I read it.

   > That said, URL filtering isn't necessarily effective at keeping your behavior private either. There's an argument to be made about ClearURLs and URL filtering in general being counter intuitive, as you might stick out among a sea of other users with marketing params in their URLs.
I'm personally kind of torn on this kind of thing, because fingerprinting is the default in the www since you expose your IP to every server you connect to. I personally believe it's worth to try and reclaim the privacy even if it could expose to even more advanced tracking techniques. Also things like removing google analytics tags and removing the "google.com" of urls in google searches is probably really effective. (you'll notice that Google only adds this redirect mechanism if you have JavaScript disabled, probably because they don't need that if you're running JavaScript anyways).

   > Still wishing for a Tor-like solution to anonymizing all users on a browser configuration level.
One can wish. I'm very pessimistic about Tor and i2p though, the market incentives to block these networks are just too great to ignore for most business. Ultimately though I believe the problem is that privacy is not a computers problem but a human one.

> It would be nice if I could configure my browser to rotate through different searx instances rather than configuring one as the default search engine.

That can be achieved using the Privacy Redirect [1] extension, set it to redirect search engine calls and it will use a random engine. The list contains more than just instances of Searx and can by default not be edited by users so you might have to get the source [2] and build a version with only those search engines you want to use. It can redirect many other corporate entities like Youtube, Twitter, Instagram (which does not really seem to work but since I never go there anyway I don´t really know), Reddit, Maps (Google etc) and others. I have it redirect to private instances of Invidious (for Youtube), Nitter (for Twitter) and LibReddit. I do not use search engine redirect since I run a custom Searx instance which doubles as an intranet search engine and as such offers more than any public instance.

[1] https://addons.mozilla.org/en-US/firefox/addon/privacy-redir...

[2] https://github.com/SimonBrazell/privacy-redirect


> Can you explain, clearly, how this URL would end up catalogued?

I don't have to explain any mechanics about how, because that's how definitions work (whether you accept it or not).

If you really need an example, how about taking 10 seconds to demonstrate some awareness about the catalyst of this tedious exchange—that a company that was founded on the basis related to cataloguing documents and their public identifiers is doing things to/with those identifiers?


> 1. all links are click-able at this point; what's more plain-text would force to provide just a link without all the tracking garbage

If only. Everyone I know who isn't an IT professional -- and many who are, too! -- sends links like https://www.example.com/something/somepage/?tracker=ASfas142......


> Putting this logic in the client is not feasible.

> You want to send requests directly to every shady site that shows up in your search results, load their pages in the background, work through network delays and HTTP errors, and parse out the location/format of the favicon files?

Looking at DuckDuckGo search results and visiting a page that you navigate to are two different things.

1. DuckDuckGo search results:

DDG already returns the search results so there's no privacy violation to return the favicon or the URL for the favicon in the list.

2. Any page that isn't a DDG search results page:

Use client side logic to locate the favicon. This means worse performance but better privacy - which aligns with DDG's goals.

If you want to optimise this then DDG could send the client some precalculated Bloom filters with info about known sites. The client could use these to try certain methods of favicon retrieval first.


> Note that the only information sent is domains—not interests, not profiles, just general-purpose news domains like bbc.co.uk.

Alex requests The Washington Post and CNN.

Bob requests The Wall Street Journal and Fox News.

And Joe requests PornHub.

I know the last one is not in the list buy my point is: the domain is more than enough to profile you for profitable reasons.

Plus, the fact that this isn't addressed or even mentioned in the article is not good.

Will we ever get a browser that dose not send anything to the company/organization developing it for any reason/excuse?


> You could do that with regular HTML.

You ignored the safely qualifier. You would deanonymize a user to the publisher and the third party trackers on their page even if the user never clicks through to visit the page.


> Some companies and their lobbies want to sell their products for filtering or whatever

It really shouldn't be needed: web pages are all about tags with data and metadata. It shouldn't be hard to add metadata about content type.

I never quite understood why labeling initiatives never gained traction:

* https://en.wikipedia.org/wiki/Platform_for_Internet_Content_...

* https://www.w3.org/PICS/

* https://www.w3.org/2007/powder/

Throw some <meta> tags in and browsers can parse: then have a password-protected "filter controls" area in settings (and perhaps a GPO for corporate environments).


> and (2) there are privacy implications, since this would leak information about what users are searching for, even if they don't click the results.

This is wrong. Google can proxy the traffic like they do with Gmail, negating that privacy issue.

I was typing a comment in which I said that I'd prefer your privacy risk (where I leak info to random sites that I might want to visit anyway) over the current privacy issue (all sites using AMP leaking data about me to Google), but it's worse than that: the risk you identify is easily mitigated by proxying the content.

The other risk you mention is not as easily mitigated. Proxying might help in part, because then the servers can be selective and/or stop at a certain point. Another idea is to only preload the HTML, and perhaps a limited number of resources smaller than a certain size (mainly css and small-size images, since JS is usually not used in styling, so preloading html and css should give a first, static impression before the rest can be loaded).

All of this would have been better than any part of what they chose. I can see only two reasons for Google not to choose this: (1) more control over the web (this I think is true) and (2) more tracking (this I think a few people within Google might see as a nice side-effect, even if the vast majority of employees has only good intentions, so I don't think it was a driving reason but it probably contributed).


> I made this search engine: https://www.gigablast.com/

Ye gods. It's actually a useful search engine. So useful I'm going to try it as a default for a while.

And it is a lone horse effort? My hat off to you sir.


> This submission is obviously not meant for me, it seems fair that I would ask for the possibility of not seeing it, don't you think?

You're clearly thinking of another internet service. Everyone sees the same Hacker News front page. You want something customizable.


> Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.

Yup, and exposing just a key pieces of information (title, and some of the meta/og tags) without the body would limit the potential for abuse, while still being fairly useful for legitimate uses.


> This seems like a cut-and-dry case of getting caught in monopolistic behavior. The code is right there.

???

Is "Darn, their browser only gets to track me on their own websites; if Google were playing fairly, they'd send the tracking header to all websites so I can be tracked more and have less privacy" the argument you're making here?

And it's debatable that this header is actually serving a tracking purpose at all. Being limited to their own web properties cements it as a diagnostic to me. What use is a tracking header that only gets sent to domains they already know you're visiting?

next

Legal | privacy