It's not the same as Ctrl+R - it actually doesn't look for a match in the middle of the string, only from the beginning of it, so while it's better with going back and forth between the matches, it matches less data to begin with.
I wonder whether it would help to match from both sides (start and end) simultaneously, since you know you're not looking in the middle of the string. You also don't care about capture groups.
Shorter yes, but reversing the whole string and comparing it with the input value is actually slower/less efficent than the algorithm shown in the article. Sorry for my nitpicking ;)
The difference may be regex matching. This can often be optimized to an impressive degree, depending on the regex, but unless it's a simple substring search without any metacharacters, I'm not sure those approaches are comparable.
Pretty much, yes. Longer strings should also match fewer times than short strings, which should also speed things up (because reporting a match has its own overhead associated with it, like printing text to the terminal).
As a for-your-consideration, C-r leaves the cursor at the matched string when doing reverse search; so "echo helol world", enter, mutter "rats", C-r, o-l-sp, right (just to break the search), and voila you are now positioned on the offending substring
so you build his graph thing at the end. and if you go part way down, then dead end (no match), then don't you have to go back to the 2nd char of the non-match and try to match a word from there? and thus do a lot more comparisons than the number of bytes.
the reason is when you put 100+ words in the tree, they'll share some substrings.
Eh? PCMPISTRI has a few different modes of operation, including full substring search and character classes. e.g., You can use PCMPISTRI on a needle that contains adjacent classes. For example, `azAZ09` would check if any byte in the search string is in any of the ranges a-z, A-Z and 0-9.
Regardless, in the OP, they're specifically looking for one of a small number of bytes, which is exactly what PCMPISTRI is supposed to be good for.
With that said, my experience mirrors glangdale's. Every time I've tried to use PCMPISTRI, it's either been slower than other methods or not enough of an improvement to justify it.
Problem is, plenty of software doesn't actually look at the match but rather just validates that there was a match (and then continues to use the input to that match).
Wait, I always tried to make my pattern as short as possible and I thought it would speed up the searches. So I guess this means I'm actually better off searching for the longest possible match then?
Matches should monotonically disappear and not get reordered as you enter more letters.
This is trivial to do on exact contexts (like start menus - why none gets this?). It's ok to wait until the user enters some letters to start showing options, it's also ok to limit the number of results as long as you say there are more somewhere. What is not ok is to show a single match, and then after the user press another key show two matches, with the first one gone.
Distance based matching can't strictly follow this rule, but if you are optimizing one, it is a good goal to get approximately right.
reply