Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Parsing a huge /etc/hosts for every single hostname lookup certainly is expensive.


sort by: page size:

What software actually parses /etc/hosts, at least on Linux?

I'm wondering whether having a huge hosts file could create any performance issues since it needs to get parsed regularly I assume.

Does anybody have experience in this regard? What about a basic version with ~100 entries?


It's a lot less resource intensive not to use a hosts file. This might not be a concern for people with modern machines, but having a hosts file with 12,000 lines in it does take a certain amount of processing.

If I might mention my own site for a minute, I maintain a list of ad server (and tracking server) hostnames: http://pgl.yoyo.org/adservers/

You can view the list as a dnsmasq config file, a BIND config file, and a bunch of other formats.


On Windows, a large hosts file may lead to noticeably slower name resolution performance. Maybe it's less of a problem on Linux/macOS...?

Not if you're just putting these entries in your hosts file

The issue with that, in Windows at least, is that host lookups become a lot slower with a larger hosts file; a local caching DNS server with a block list is possibly a better solution, and one I think is already adopted by some.

My /etc/hosts has 3000 lines...

If it wasn't hard more people would be doing it; I think for users, this is hard.


but they're all related hostnames

Huh? I used to have ~100 hosting clients per IP address, none of whom were in any way related to each other (other than in having chosen me as a hosting provider).


How big would the list need to get before it starts affecting performance? There is obviously some kind of lookup for every HTTP request against the hosts file. I assume the hosts file is converted into some sort of hash list?

If you want to put in the effort you can sniff the hostname lookups and if it's done halfway dedicated name, as an entry for 0.0.0.0 to the hosts file.

This is downright silly.

HOSTS files are static. They were never designed for blocking ads or tracking. And for all we know, every connection does a linear search through the HOSTS file so the larger it gets, the more wasted time, because it was never designed to have millions of entries.


The zero[1] version of it works a little faster.

I am using the Unified hosts file[2] (mentioned in the article), it is a great way to combine many other hosts including Dan Pollock's list.

[1]https://someonewhocares.org/hosts/zero/

[2] https://github.com/StevenBlack/hosts


Looked into this - it looks like it's not much more than what you could do with a sufficiently advanced hosts file?

I'd tried this in the past but my machine slowed to a crawl. I guess it was to do with the algorithm used for handling the list of hosts (this sounds like a job for a bloom filter).

I've just tried the list you provided and it seems to be ok. Will try it for a while to see how I get on.


I've always wondered if Windows or other operative systems read the entire hosts file everytime they want to resolve an address. Maybe a big hosts file is bad for network performance?

> scan the hosts in your web history

That's simple SQL query against the Firefox profile sqlite database. No problem.

> and follow the cert chains to the various roots.

Doesn't scale. If it can't be scripted, then it can't be done for tens of sites that I regularly visit, and hundreds more that I come across.


Then the hostname is selected poorly.

Ah, the hosts file. Let’s research which ways of name resolution use it and which don’t. [A few hours later] …

Apps use hostsnames 99% of the time.
next

Legal | privacy