Hacker Read

ramraj07 · 2023-05-20 20:59:07

I mean Google search does it obviously using page rank which is that if someone links to the page with that word it uses it. Me I was searching for arcane words that I remembered hearing in the middle of a podcast. I doubt anyone actually linked or searched for the same with relevance to that podcast. Also the solution you're saying needs far more intuitive and intricate development than just indexing captions lol.

llambda | karma 58401 | avg karma 18.99 · | 2023-02-09 22:11:14

They could and probably should offer semantic search, which would be far more powerful than searching exact match keywords.

If you could identify podcasts that often talk about a domain more broadly, you'll have a higher hit rate and overall a better audience fit.

reply

patrickg-zill | karma 957 | avg karma 2.23 · | 2008-04-11 11:22:35

I would have thought that a lack of search capability would matter. How many neat pages or sites have you come across due to searching via Google? There is currently no easy way to search the podcast content for stuff I am interested in.

dumbfoundded | karma 2200 | avg karma 3.24 · | 2020-05-12 17:49:05+00:00

There are tools like automatic captions + manual tagging that can make search much better. The tools are out there, just not well adopted at this point.

not2b | karma 7590 | avg karma 4.12 · | 2021-05-07 21:33:44+00:00

But are you solving the right problem? This sounds like someone has produced a very good and efficient version of AltaVista. Back in the 1990s, if you wanted to do classic keyword searches of the web, and find all pages that had terms A and B but not C, it would give them to you, in a big unsorted pile. The web was still small enough that this was sometimes useful, but until Google came along with tricks to rank pages that are obvious in retrospect, it just wasn't useful for common search terms.

kanzure | karma 3634 | avg karma 4.32 · | 2023-05-08 13:30:02

It would be a lot better if you could just search an index of all the words on the web, and then we can refine our queries against the results to narrow things down even more. As it is right now, search just doesn't work anymore.

323 | karma 5003 | avg karma 4.18 · | 2021-12-14 13:03:15

People say google search is terrible these days, but I find the opposite.

I can vaguely describe in a sentence the gist of an article I've read, or an image, and the proper result will usually be in the first page.

Of course, it doesn't always work, sometimes there are "hash collisions" so to speak, but I don't think the old algorithm would have been more successfully either, since if I knew the exact keywords to use, I wouldn't need to start with a vague description in the first place.

reply

jshen | karma 5007 | avg karma 2.17 · | 2022-08-25 13:50:59

That’s how you make a worse search engine than Google. If you are serious about competing in that space I think you need to do something fundamentally different than Google. Treating pages as a bag of words leads to a shitty search engine. Like I said, I’ve built a few search engines, and I have tried this.

Edit: https://en.wikipedia.org/wiki/Bag-of-words_model

reply

gagege | karma 988 | avg karma 2.04 · | 2020-08-27 17:01:13+00:00

What it might need to do is index commonly searched terms and provide a reverse lookup to the location of the row... oh wait, that's called a search engine. :)

haldean | karma 1167 | avg karma 3.49 · | 2012-10-31 03:59:23

My experience with this:

Think of a random word that comes to mind: let's try "animus". "No Matches Found. Please rephrase your query and try again.". Hm, okay. How about "theory"? "No Matches Found." What about "set"? Nope. "rank"? No. "war of 1812"? No. "barack obama": nope. The first term I got to return content was the biggest softball I could think of - "procog".

I'm all for a new search engine (especially one that really lets content providers know what they need to do to rank highly for queries) but I'd say this isn't ready yet; most of those queries are the sorts of queries I search for every few minutes.

reply

volaski | karma 890 | avg karma 2.17 · | 2013-10-01 13:55:24

good idea in theory, but the search is "ridiculously" slow. Not to mention the fact that it doesn't work after the first search. Also, you're using the term "semantic" wrong.

user24 | karma 4152 | avg karma 3.28 · | 2012-03-17 23:46:46+00:00

Yeah semantic search, if solved, would address this problem.

That's really what I was getting at. Stripped right down, Google is still just viewing documents as a bag of words[1]. I mean, they have pagerank and they will apply more weight to words in headings, and they have 6-gram indexes and synonyms and all that clever stuff, but at it's core it's still lexically centered not semantically centered.

[1] Further reading: http://en.wikipedia.org/wiki/Bag_of_words_model

reply

melonbar | karma 150 | avg karma 2.21 · | 2018-08-31 16:21:46

Wow, that is so true! I never thought of it like that but it can be a bother when you realize the search is going to need google behind it.

jeffbee | karma 21041 | avg karma 2.25 · | 2020-07-09 00:23:09+00:00

You touched on what I was driving at in my original comment. There is no doubt whatsoever that if you were able to make a full pass over the corpus with a regular expression you could find docs you can't find on Google's search. But that's obviously not how their search works. They have to make it work at their scale, which dictates the format of the index, which in turn limits the possibilities for query operators. They have to make these design choices so that their product can exist at all.

"Grep the world" is a fine strategy for corpora up to a certain size, and I do wish there was a product that just stored everything I've ever seen and let me run expensive searches on that.

reply

lovecg | karma 1824 | avg karma 2.8 · | 2021-11-12 16:30:14

I mean “wiki <search>“ works on Google too. But I don’t want to type anything extra. My problem here is if we assume the search engine should be good at predicting what I actually want to see for a given term, Google is failing at that.

klibertp | karma 4682 | avg karma 1.75 · | 2023-01-06 13:27:50

> This doesn't always work

The problem is with when it doesn't work: when it's the least convenient and the most irritating. For popular content it doesn't matter if you forget a precise word used there, you'll get to it soon enough. But for specific, niche content - which is the most valuable to me most of the time - even if you get all the keywords right you might not find what you're looking for on Google. The reasons range from the keywords being too generic to the site being no longer online and it's really frustrating when it happens.

OneTab and bookmarks are not an answer because they don't save the content. I tried Joplin + Web Clipper which does, but it works on a single-tab basis, and when I have 200 tabs open sending them all to Joplin manually takes ages... and then Joplin slows down to a crawl when you're done.

reply

anfilt | karma 3200 | avg karma 3.05 · | 2018-01-15 19:24:33

True, but google is not searching there entire index for those. A simple linear search takes N time. So for a word that occurs billions of times. Google is not going to go through that entire list. They might use some clever hashing to jump around, and sorting. However, when trying to intersect two keywords they either have to pre-generate the intersection or make the data set they are intersecting small enough to get those 10 results quickly.

tonfa | karma 2151 | avg karma 2.4 · | 2011-02-20 23:22:40+00:00

Aren't keyword search sufficient? Like "w something" to search wikipedia for "something".

brianpan | karma 1759 | avg karma 3.03 · | 2012-12-13 16:32:20+00:00

Well, that's the point I'm trying to make. It's not a hit the drawing board problem, it's a refine the algorithm problem. How many "if you google this string, you can't find the right site" problems have there been in the life of Google search? They continue to refine pageranking don't they?

imp | karma 1274 | avg karma 2.1 · | 2009-02-04 02:08:23+00:00

The problem with that is that your search ranking for any given key word is probably different in each search engine.