Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Ask HN: Why is Confluence Wiki Search so bad? (b'') similar stories update story
172 points by nicktorba | karma 64 | avg karma 3.56 2021-09-20 14:39:23 | hide | past | favorite | 127 comments

The title says it all. To me, the most important component of a wiki is search. With that said, why is confluence wiki search basically unusable?

(by unusable, I mean I can never find the page I am looking for when I search. Basically, I have to maintain my own wiki of important links I may need to reference in the future)



view as:

Yes it only finds you crap results. Not sure why they have the most naive search algorithm out there. Maybe good search needs more AI and CPU power than we think.

Maybe this is something google should take on. A search plugin for Confluence where google crawlers logs in from time to time for internal crawling to enable non-public teach request on that data. That boost knowledge workers efficiency a lot. I hope somebody from Google reads this and takes on the challenge. I'm sure companies would pay a lot for this.



Wohoo nice. So now management needs to hear the cries and buy it.

Try gmail. More than a decade on and still no partial word match.

And from a search company, no less.

The other one that gets me is that Google Docs search... doesn't search the contents of documents.

Gmail produces by far the best search results for me (comparing to apple mail and thunderbird) and makes me reach for it regularly for search alone, which I find pretty annoying. If there is anything better out there I am all ears.

> If there is anything better out there I am all ears.

Mutt, Pine, grep, awk, etc. I don't understand why throwing a GUI interface on top automatically seems to make email search absolutely awful, this includes Gmail. I so often need to find a specific old email using a hazy match criteria that I am half tempted to pipe my email into Splunk (I run a small Splunk cluster at home for other needs) and use it (as then I don't need a local copy of every email on all devices or to need to SSH into a central box to do a TUI based search)


I too need to find specific old email. For that, I type what comes to mind into the gmail search box.

To each their own.


So gmail search box allows you to find an email based on email header data? ;)

Sure, gmail search works "ok" when you are searching based on a word in the email or the sender/recipient email. Things more advanced than that it really falls short.


I usually have better luck with the "autocomplete" results in gmail search than with the actual search results. I don't even know how you manage to screw up your core competency that badly.

Same experience for me. However, I started to be more diligent on tagging each Confluence page whenever I see them lacking and that definitely helps with the searches.

So I am interested in this space. There are some alternatives out there but I suspect companies will be concerned with letting a 3rd party have access to the data needed. If you are interested in this space and would be willing to chat with me about what you're looking for OR what you are currently using I'd love to chat! My email is my username at gmail.com

Some existing tooling:

Google cloud search has a confluence connector https://developers.google.com/cloud-search/docs/connector-di...

Elastic workplace search has a connector. https://www.elastic.co/guide/en/workplace-search/current/wor...

Lessonly had / had a thing called Obie https://www.lessonly.com/blog/how-to-search-better-in-conflu...

Raytion https://www.raytion.com/connectors/raytion-confluence-connec...


Hi there. One of the first customers we had (zir-ai.com) asked for help building a better JIRA search.

I think neural-network powered search will be the long-term solution for Wiki search specifically, and SaaS search more generally.

Keyword has too many failure cases, and works poorly when there's not a lot of data, or when searching through content authored by others.

I'll contact you offline. Would love to hear more about your experience in this area.


Most search engines are pretty bad because the developers of most search engines don't do any work to improve relevance.

This methodology works

https://ccc.inaoep.mx/~villasen/bib/AN%20OVERVIEW%20OF%20EVA...

and I used it to tune up the relevance of a search engine for patents to the point where users could immediately perceive that it worked better than other products.

After I worked on that I wound up talking to the developers and/or marketing people for many enterprise search engines and few of them, if any, did any kind of formal benchmarking of relevance.

People at one firm told me that they used to go to TREC conferences because they thought it got them visibility but that they decided it didn't so they quit going.

A message I got repeatedly was that these firms thought that the people who bought the search engines didn't care much about relevance, but they did care about there being 200 or more plug-ins to import data from various sources.

In principle the tuning is unique to the text corpus. One reason for that is that there is a balancing act of having a search engine that prefers small documents (they have spiky vectors that look more like query vectors) or large documents (they have so many words they match everything.) Different corpuses have different distributions of document sizes, not to mention different distributions of words that appear.

Few organizations are willing to do the work to tune up a search engine (you have to decide about the relevance of 10,000+ document hits), but I've had the experience that you can beat the pants off the defaults even using a generic tuning. For instance that patent search engine was tuned up against the GOV2 corpus instead of a patent corpus. A small patent corpus showed us we were on the right track, however.


The good news here is that the Confluence API is actually really good, and very easy to integrate with.

I wrote a custom search engine that worked by running on cron, pulling in all of the content from Confluence and writing it into a SQLite table with SQLite full-text search enabled (using https://sqlite-utils.datasette.io/en/stable/python-api.html#...), then sticking a https://datasette.io/ interface in front of it.


On one level that's great and I'm certainly glad you made it work.

On another level, and bearing in mind that Confluence is a paid product, this absolutely should not be necessary and competent search is something that Atlassian should provide out of the box.

(Yes, I have beef with Confluence, but in my case it's primarily due to the historically awful editing experience.)


The API for writing docs/content to confluence is the worst i've ever seen. You are expected to use their custom syntax which then gets converted again before rendering.

The docs for the POST content literally says to write what you want in confluences WYSIWYG, then do a GET API call to see what it should look like.


It seems to me that the big problem with Confluence search (once you have a lot of pages) is that the results have poor relevance ranking. Wouldn't tossing the content into SQLite have the same problem?

Have you shared the code or any more details for that custom search engine anywhere? I'd love to see how you did that. I'm considering doing the same thing soon

My company uses Coveo [www.coveo.com] for their intranet. They have a native connector for Confluence, it works MUCH better: https://docs.coveo.com/en/1716/index-content/install-the-cov...

It's been a long time since I worked at Google but when I did (10 yrs ago), the search system for the intranet was notoriously awful. Part of the reason was that PageRank tends not to work so well in places where things aren't heavily cross-linked, which is a hard place to get to if you search system already sucks.

I always found those complaints funny. Google's internal search was and is light years ahead of every other company's. Those complaints were probably coming from people who never worked at any other large company and were expecting internal search to be as good as web search despite the relatively tiny corpus.

On a confluence that covers the whole of the Fortune 500 company I work for, I do NOT want to search over the corpus of all the documents hosted on it. I want a persistent search filter where I can easily restrict my results within certain parameters without having to constantly re-filter my results.

I think most search engine designers want to make the index as broad as possible, but the problem seems to be that people rarely want such broad searches. What they really want are very detailed indices and metadata implications over well trodden folders.


It didn't really seem to have any prioritisation - e.g. around titles, headings or any metadata (view count, edits, last updates). Agree completely it was awful.

OTOH I'm also a believer that you should be able to navigate to the right information.

People seem to think that writing pages is sufficient. A library works because pages are gathered in books, organised by sections and has an army of librarians to keep it running smoothly.

I treat documentation like code - DRY, refactor apply just the same. e.g. I might split a page up so that some common part can be re-used. I'll cull obsolete information or mark it obsolete. I'll _also_ updated headings to help them show up in searches.


I used to work at Atlassian but NOT on Confluence and I have no special information about this. But I can tell you that internally it is well known how awful the search is - they run one of the biggest known instances of Confluence - and there have been many spikes and projects to improve it. I have spoken to lots of people and asked why it continues to be so bad but all I get is handy-waving about how it's such a hard problem.

Honestly I wish I knew more but it was like pulling teeth trying to get people there to speak openly about why it's so hard when it is solved in so many other products.


This thread is well timed, I was just about to pick a wiki solution and was leaning towards confluence. But search is really important to me.

What’s the prevailing wisdom these days on the best solution for an internal knowledge base/wiki platform?


Markdown + Grep

As for an ok way to manage internal knowledge, I've yet to see it. I've wanted to try out the Johnny Decimal System because if you can create a solid hierarchy of a filing system, everyone should be able to drill down to the right doc. Confluence search doesn't work. Neither does google docs. I think I now want the ability to just pull a local copy of a section of docs, say "all engineering," and just use grep locally.

What are your issues with Google Docs search? Just curious as I'm building a product to generate a better wiki out of Docs/Drive.

I tried to replicate an exact issue I had last week, but I can't which is interesting. I knew I had read a doc days earlier. I could not find it by partial title matches, content searches, nothing. I had to search email for "so and so shared a doc" to get back to the document. Of course, now I can find it the expected way. I routinely need to search things that are buried in docs, like our intern program. This was another one where I couldn't quite remember the title of the doc nor who wrote it. I searched and searched variations of things containing "intern" and even though I had been in the document within the last couple of weeks, I just plum couldn't find it. I had to go back to a calendar invite to find the doc. Part of the problem could be that sometimes people write things different. I might look for "precap," a part of a doc title I'd expect for interviews, but someone else stored it as "interview pre-read" and so I can't find it. That's why I like document hierarchies / trees for finding things. I can go HR -> interviewing -> $something_more_if_needed and browse a couple of files and find what I want. Labels are cool for when something fits into multiple locations in the tree.

Great insight, thanks!

Got one more that showed up yesterday. A doc was shared via link, but I didn't see where the link went so I searched for it by title (looking at the shared screen during a meeting). No results in google docs. Found the link, and the doc opened right up. It was shared and I wouldn't be able to find it.

I'm working on one, V1 is going to be released in a few days (you can find the link in my profile). It is meant to be a big improvement to Confluence if your goal is to organize the knowledge at company or department level. If you are a smaller team, Notion is what I would recommend as long as you are smaller than 100 people

Because "enterprise" tools are bought by people who don't have to use them, so improvements that actually matter to users are not a priority.

Lucene

I'm my experience almost everything that Atlassian makes is total garbage. Bitbucket, Jira, Confluence, etc. are all horribly slow to the point of being unusable and most of it has very poor UI/UX. I pretty much don't recommend anything they make. It's not surprising at all that a fundamental feature of a wiki, search, doesn't work very well.

That's what everyone has said about every piece of enterprise software ever.

It's the incentives that are in place. Most enterprises buy products based on feature sets. Therefore, enterprise software companies prioritize delivering features.

wait until you see medical enterprise software or defense industry enterprise software

It happens any time the buyer isn’t the user. Atlassian products are terrible because some manager buys them and tells everyone they have to use it, and if the engineers complain they’ll probably just blow it off as “they’re too demanding” or “they don’t want to do Agile right”.

I remember what a friend said about software.

The desktop people want the latest and greatest software ASAP if not sooner.

The server people want nothing to change, ever.

I'm sure enterprise software has similar rules and incentives.


IMO bitbucket is okay. Its UX for PRs is amazing, 1000x better than Githubs. Especially its side by side diff.

This concludes, and fully encompasses, everything good that I have to say about Atlassian products.


Try to use Intellij's Github plugin. It does wonders.

Atlassian bought Bitbucket after it was already mature. That's why!

Not quite. Bitbucket was acquired in 2011, only supported Mercurial and was missing a lot of features, including the pull request available today.

Perhaps they didn't fire the whole bitbucket team after buying them, so the people were still able to produce good software for a short time.

Bitbucket Server, which some people are referring to here, was build from the ground up, tailored to a self hosting environment.

We use bitbucket cloud, and the PR UX is awful. Which version are you using? Are you using a browser extension or something? Compared to UpSource or GitHub, Bitbucket PRs are very rough.

Bitbucket Cloud and Bitbucket On-prem are two entirely separate products. It makes about as much sense as you can expect from Atlassian. The former was a Mercurial thing that they purchased then later removed Mercurial support. The latter used to be called Stash.

We moved from Bitbucket On-prem to Gitlab and I must admit I do miss parts of Bitbucket's UI. It was much easier to find reviews you needed to do and it was much clearer when reviewers had finished reviewing and if work needed to be done. Gitlab should just copy this stuff.


I was the head of product for the developer tools at Atlassian in 2012. We thought long and hard about taking Bitbucket cloud and packaging it in a VM (which is what GitHub did at the time) or leveraging the platforms we’ve already built for Confluence and Jira that would give us access control and a plug-in system from day 1. It was a tough call.

Ultimately we’ve decided to build on top of our server platforms and target companies with 1000+ employees from day one. That decision had a huge impact on how we approached performance and what features we prioritised. The hierarchy of projects and permissions associated with them as well as the way we designed Pull Requests are good examples of that.

It was the right decision at the time, even if the product happened to be different in cloud and server, which did lead to some confusion. But Stash customers were really happy with the product.


I have a few plugins to improve the PR's in bitbucket too, showing you the relative size of the PR compared to others you normally work with.

Includes the language breakdown and such. Makes it much easier to know if you should be blocking out 5 mins to 5 hours to review something and if you even should be depending on the language.

Alas, Atlassian does not offer PVA for Bitbucket (it was meant to be there in July) so I cannot release it since it costs me money to host. I really wish they would invest more time into Bitbucket.


bitbucket PR is horrible compared to reviewboard

So true. The way Atlassian hijacks browser keyboard hotkeys in Jira/Confluence/Bitbucket is purely infuriating.

Well, they bought Trello and ruined it too :(

What’s wrong with Trello? It still seems to run fast? And has some new stuff added that seems to be useful? Dunno, still seems to be fine to me.

Atlassian products feel like raw database frontends. I feel like each screen in each Atlassian product is always exactly a database table, being presented to me as an auto-generated form. Might as well use SQL directly.

The truly impressive feat (of Jira in particular, but also all of Atlassian's products in general) is how incredibly slow they are. I assume each page somehow touches every single row of every single table in the database because I don't know what else it could be doing to make page loads take so long.

It’s artificially slow to get you to upgrade. Wish I was joking. Thankfully my company uses Clubhouse/Shortcut which is orders of magnitude better.

Why use a single query languange when one for each view is possible?

- Atlassian probably


That really couldn't be further from the truth, especially in Jira. Jira keeps virtually every piece of interesting information in a custom field, including built-in fields like issue titles and points (known as system fields but effectively the same thing). Every view you see is the product of a zillion complicated joins across field definitions, field schemes, field values, field permissions and other bits and pieces.

+1, Bitbucket search often returns results from older versions of a repo. Wouldn't be an issue if syncing to the current master didn't take a few days...

I have seen self-hosted Jira installs that took 20+ sec to load a page.

Today I use one that they host and there is nothing wrong with it.


We used the hosted version. It would lag on the order of seconds while trying to type in the issue description box. We switched to another issue management software.

I tried their hosted version for a bit on their 30-day trial or whatever.

Virtually every page load took upwards of 5 seconds.


Also, having 5 jira issues opened in separate tabs simply kills the browser. It’s awful.

They are garbage for developers but managers love it. Guess who decides in the end?

Hey even Trello got ruined, lol. Fucking Atlassian, feature overload.

Confluence search is great! I could always find what I needed. In fact it's my favorite feature about Confluence. I'd say it's my favorite search outside of Google.

I crossed a huge milestone last week. I actually found something I was looking for in confluence.

It's unbelievably bad. This is literally the only thing you need a wiki for. I can't believe this is the market leader. Notion is going to crush them.

Our search is also quite poor though, something we’re hoping to improve soon!

Thank you for underpromising, and hopefully soon overdelivering!

I don't think it's unusually bad. Rather, if an app offers open ended search, it will generally generate fairly poor results.

No, it really is exceptionally bad even among half-assed search implementations.

For a start, it interprets multiple words in a query as an OR. You search for a "hello world", you get "hello nobody" and "goodbye world" and the search results.

It also always applies stemming, which mangles technical terms. At Cloudflare we have a daemon called "cloudflared" and it's impossible to find it in the damn wiki.

If it even tries to do any prioritization, it's indistinguishable from random. I search for a project's name, I get fragment of meeting notes from 7 years ago, not the project's homepage.

And the UI is unusably awful too. The fancy-ajaxy JS overlay breaks the Back button, so if you click on an irrelevant result (and all of them are irrelevant), pressing back doesn't go back to search results, but instead makes you lose document you were on.


If possible please try this https://marketplace.atlassian.com/apps/1225034/better-instan... and let me know how it goes for you. No stemming applied, no term expansion etc... The back button issue exists (not sure if possible to fix that as a plugin), but id suggest opening results in a new tab to solve that issue.

I don't understand why people use confluence.

I can gain far more functionality with a properly implemented self-hosted mediawiki server (the same code that runs wikipedia itself) with a number of useful plugins installed and enabled.

It doesn't require a rocket science level of apache2+php7+mariadb knowledge to set up. The instructions are really quite straightforward.


In corporate environment paying for Confluence Cloud subscription can be cheaper than having even a part time admin to install and maintain self-hosted solution (proper security, backups, handling compatibility issues on updates etc etc). It may not be the best solution, but it is good enough.

In corporate environment how do you not already have an admin who can handle this just like they handle any of your other self-hosted needs? I've never worked for a single company that didn't have something hosted internally.

Regardless of whether a company has an admin with necessary qualifications or not, their time is not free and can be used elsewhere.

In the company where I work we do not have anything but network equipment in the server room (500 employees distributed across Europe). All „self-hosted“ solutions are on AWS instances behind VPN, and there are only four 3rd party systems of this kind, where investment in self-hosted setup did make sense at some point (two of them will be replaced with commercial SaaS soon). We of course have a devops team capable of maintaining those systems, but this is not given that it worth it.

When talking about „self-hosted“ solutions it is important to consider all factors that contribute to TCO. It is not as simple as getting some hardware and running an installer.


Of course their time is not free. Neither is the time of the SaaS provider's employees. What's your point?

Of course it's important to consider all the factors. That's important no matter what. The point is that people aren't considering all the factors because the reality is, the costs of SaaS are almost always higher.

You don't have any local backup, by the way? Oof.


If this is a serious question, this is why:

Confluence users are enterprise companies, and getting a self-hosted server up and running is too much pain to be bothered to deal with.

This is a process problem. The steps to get one would be something like:

- try and find the “provision a server” option in the corporate service portal (there probably isn’t one)

- ask someone if they know how to provision one. Get a link to a separate system where you can make the request

- you need to associate the instance with a cost centre, or maybe you literally need a credit card number, don’t forget to attach written manager approval

- update the project’s budget to include the unexpected cost of this internal service. Hopefully there’s actually some margin to afford it.

- wait a day or two for the request to go through

- get the instance details, RDP in and try and set everything up. Realise you need to make a separate request for admin rights to install non-base software if you don’t want to use IIS and MSSQL server

- wait a day for admin rights. Don’t forget to add written manager approval to the request or else it will be denied

- realise you need to make a separate DNS request to get a friendly url for the team to access it. Also, how are you going to secure access to just your team members? Need to integrate with the corporate AD

- …about a dozen more steps

Compare all of that with:

- Go to the corporate confluence instance

- click “Create”, add your team members with edit rights.

- done

Confluence itself may not be a great experience to use, but it’s solving the problem of getting to the point of having a wiki setup in the first place.


And firewall rules!

Oh, and updating the CMDB too!

> getting a self-hosted server up and running is too much pain

And yet many of them self-host Confluence. And many other things. And provision servers all the time. And you have to provide a CC (or maybe PO) for Confluence in any case. And you can't just associate Confluence with a cost centre. And you have to budget it. And... literally every single one of your arguments applies just as much to Confuence.


Self-hosted Confluence Server edition is a legacy of the times when cloud SaaS was not an option. Now you cannot even buy it, because it is being replaced with DataCenter edition.

It’s not that confluence doesn’t require a server, or a technological feature at all, it’s about the business processes.

The business is guided to build a setup where setting up confluence is no-friction. Whereas a one-off generic server is much higher in comparison.


"It's not any of the things claimed in the comment you were replying to, but I'm going to argue with you anyway"

We started with a self-hosted mediawiki server and this did not go well. Expecting someone not very computer savvy (and there are lots of those in my company) to dive into the markup on a page and not make a mess of it was a bad idea. At that time at least the WSIWYG editor was not very usable. Don't know if that is still the case.

So off we went to Atlassian. It has many flaws, but nobody is pining for the old days of Mediawiki. And the hooks Confluence has in to Jira is something you don't get with plain Mediawiki, and that has real use for us.


You can literally go see for yourself how the WYSIWYG editor works these days. I suspect it's come a long way since the last time you checked it.

My bigger question though is why the average user is important. Most large companies have employees whose entire job is ... knowledge management. If they can't figure out how to write wikitext then maybe they're not a good fit for the role?


> My bigger question though is why the average user is important. Most large companies have employees whose entire job is ... knowledge management. If they can't figure out how to write wikitext then maybe they're not a good fit for the role?

If your wiki limits its contributors to experts, you're doing it wrong.


Regardless of limitations, the vast majority of edits tend to be by those whose job it is.

I'm struggling to remember any job I've had where documentation was primarily the realm of tech writers. Certainly none of the large companies. One startup had a dedicated tech writer but engineers and evangelists still wrote much of the documentation.

> If they can't figure out how to write wikitext then maybe they're not a good fit for the role?

This is absolutely wrong.

It is correct for Wikipedia. Because you don't want idiots aka Twitter users in there.

For almost any other wiki it's hard to get anyone to give a fuck to actually enter data and harder still to update data.

Even in imaginary company's who pay people to 'Wiki', other than why you want to add to things for them to 'work out', any shit they do will be corporate crap everyone knows.

It's the long tail on those big companies you want on your wiki.


Searching corporate wiki is pretty difficult, because contrary to something like Google, you can't use context of a search query to recommend content.

* First you have a few occurrence of the same search query in your search history (because only a few people searched similar words in the past)

* You can't either use synonyms of remove stop words to recommend better content (IT, can means "information technology, or the pronoun. THE can be an acronym, ...).

So basically the only thing you can do is search words. Confluence is worse than that because it tries to remove stop words and do things that break exact match search. But this is a difficult job. Ways to improve search: allow multi titles, index with tags, attributes, only do exact words match, allow users to suggest content for a specific search query, search autocompletion, searching in live during typing ... (many things that Confluence doesn't care about). You also have to respect rights when returning documents, each documents, can have rights from folder or document itself, inherited from team access or user access, so this is really computation intensive too, or pre-compute rights

(Working on a competitor [0] of Confluence and I have put plenty of hours of work on that specific issue, and I can tell you this is really hard)

[0] https://dokkument.com


It seems like there ought to be some recognition that these are business tools, and ought to be designed with power users in mind. Instead, "search" in B2B products is built with the same uber-minimalist UX as B2C search.

Even early Google had more power user features than a typical B2B product search bar.

Boolean expressions (NOT, OR, AND), exact match strings, links-to, linked-from, in-folder/category, etc. should be mandatory for these workflows. Better if you can include search queries as live page content, as in Notion & Height.


Knowledge management is still a neglected area in most of companies. No money => a few players. Confluence has been there for years with almost no competition. Notion has emerged recently but is not really a good fit for medium to large companies. As a result Confluence is not worried and doesn't have to improve its product.

Power users are a small share of users of knowledge management software, so it is difficult to build a system only for them. Most people just type a few words and give up if they don't find the result in the 5 first results.


We're also trying to build something in the space with www.archbee.io, a YC company.

> Power users are a small share of users of knowledge management software, so it is difficult to build a system only for them

In practice, knowledge management at companies is a specialization. There are <5% of employees that go around and document/organize things for everyone else. Most employees are passively consuming information and information hierarchies built by someone else.

If you're not building tools for those power users, you're not building for creating and organizing content in your system at all.

As an example of how nuts this is, managers at my company regularly try out various search terms, create index documents, and do "internal SEO" to optimize how other employees will discover documents. This isn't a byzantine environment like public web search is, why do I have to hack around the wiki's default notion of page relevance?


Well it depends of what you are talking about. Usually people who produce contents are power users. But people who search content as you said are the 95% of others users, these are the ones who also needs a search relevant to them.

My belief is that knowledge management can't exist without power users, which we call "admins", these are the ones responsible to make sure content is well organized for others and create content if necessary. Those people need specific tools to do their job well, which to me is more something that you can have in an admin interface while all the users use the basic interface.

Those tools have two sets of users, admins (curators, creators, organizers) and regular users. We need a different interface for both. And that's exactly what we are working on.

> This isn't a byzantine environment like public web search is, why do I have to hack around the wiki's default notion of page relevance?

That's exactly why I suggested to have multi titles, when you get that and you facilitate the suggestion of new titles for a document, anyone when finding a document can suggest the query terms he used, and that can benefit others users


Confluence does search while typing, it's just so abysmally slow that you typically won't get a result until you've stopped typing.

Confluence/Jira is abysmally slow with everything it does, anyone know how/why that is?

Have a look at how many http queries it is sending. It's like they are sending a unique query for every word used in each page. This is horrendous.

Do you really think you will be able to get close to Google results?

FYI, the "blog" page on your website returns an error. Plus it appears to be a django setup in debug mode, and is serving full error messages w/ settings to the web.

We are using Confluence for public and internal wiki, it has a bad search and really slow, but no matter how much everyone hates it, the market does not provide worthy alternatives.

When choosing 3 years ago, we used the following criteria:

* WYSIWYG editor. Any user must have a minimum effort to write documentation

* Flexible access permissions to various parts of the documentation. Public documentation is open to anonymous users, the internal one is divided into many sections with access for certain groups

* Multilingual support. Not out of the box, but possible with plugins

* Multilingual pdf export. In some markets, some customers prefer to have exported manuals

* The ability to inherit articles. We need to be able to make edits once, instead of duplicating the same articles

* Have a relatively modern appearance. Wiki engines are familiar to many because the whole world uses Wikipedia, but this does not make them more pleasing to the eyes, if I can say so

3 years have passed, I periodically look at alternatives, so far only wiki.js seems like a good solution but it’s not even close yet.


> the market does not provide worthy alternatives.

MediaWiki?


I agree. This is why I've tried to make use of Confluence's other tools to make content findable and also improve search…

1. give pages labels. This lets you insert a label-based index, and also makes it possible to narrow search by label

2. use spaces. Separate the content into spaces based on who is likeliest to need that information. You can narrow search by space, and put a search box on the page in the space.

3. use the hierarchy. You have to put the pages somewhere in the hierarchy anyway, so try to make it reasonable.

4. Make useful index pages. Obviously, this doesn't scale, but if you can provide people with useful starting points, it will help them. For example, at Khan Academy we have a space for the whole org with a front page to get you to every team's front page. The engineering team has a front page with a small collection of useful & commonly-used links

5. if you have a page in your hierarchy with a lot of content underneath it, add a search box on that page that constrains the search to that set of pages.

The biggest problem Confluence search has is that it's terrible with relevance, and using its tools to narrow down the search can improve the relevance of the results considerably.


In my understanding, you have to prefix all your keywords with "+" for all of them to be necessary for a page to be included in your results. This makes the behavior slightly closer to Google.

Compared to jira search Confluence search is quite good.

I use BitBucket, because it's free and I've been using it for a long time. Maybe GitHub is faster, but I don't access BitBucket enough to justify migrating ~50 repos I have. Can't be bothered. Its UI/UX? meh. I got used to it.

I use Confluence and Jira because, again, we use them at work. So I guess I'm using them because I have to. I also understand it's a pain to move our company from one to another (oh we've had discussions to move to Coda and others) but again, I'm not taking on that project. Again, UI/UX, search - all meh - they are working and I got used to it.

The inconvenience of using them does not justify the amount of time I need to spend to overcome my inconvenience. Some things, you just have to let them slide.


Let's stop asking "why closed feature in closed product works so bad?" type of questions. The only appropriate answer is: because costumers continue to use it.

> costumers continue to use it

The people who make the decision to buy Confluence aren't the ones who have to use it.


Yes both Jira and Confluence search are frustrating at times. This is one of the big wins of using Glean (https://glean.com) for me as a developer :-)

I'll take a stab at actually guessing why aside from the issue that people making purchasing decisions don't see how bad it is until work has already gone into bringing in docs and pushing people to use it.

Aside from the organizational issues, I think there's a problem where basically no search system can be good for every org with any kind of internal info and different queries from perhaps several distinct types of users with different goals. To get good, a system needs to improve through at least rudimentary ML. At its simplest, if Alice searches for X today and clicks doc3, if Bob searches for X tomorrow, doc3 should rank higher. This requires collecting and aggregating click stream data, and using this count info (with cardinality #docs x #queries) at search time. But sometimes it requires a richer model relating search terms to terms in relevant (clicked) docs and optimizing for some measure of search quality (NDCG) etc. All of this requires detailed access to docs, search/click histories, and a fair amount of computation and storage. But customers have legit reasons for wanting these docs to only be accessible by their own employees. And they don't want to dedicate their own staff to improving such a system. No one wants to hear that their model retaining ran out of memory, etc. So shipping a simple system which doesn't improve but doesn't have moving parts becomes a local optima.


Great take. I worked on Confluence for a few years and have a bit of insight.

Search has been an area of focus on and off for the most part of the last 15 years. It actually has gotten a lot better and Atlassian has an entire team focused on improving the search experience across their suite of products (they started with Confluence). And from what I hear, they are focusing on all the right things.

To your point, no search system can be a good fit for every possible use-case. Confluence has a number of different use-cases, but let's just pick "documentation" and "intranet" as an example here.

Intranets are, to a large degree, about keeping up with what's new in a company. Therefore recent content is likely more relevant than older content.

When used for documentation, recency doesn't matter at all. If a document was written 2 years ago, but the content is still accurate, it's just as relevant as it was on day one.

That means no single relevance configuration will work well for all use-cases. Leveraging ML is essential. But even a single ML model across an entire Confluence instance is not going to work as different spaces are used for different use-cases. What's really required here is to build different models for different spaces to create a tailored relevancy for each space. It's not an easy problem to solve, but I'm confident they will get there with time.

Seeing the challenges with Search at Atlassian, despite having a large, dedicated team of engineers working on the problem, is what motivated me to join http://sajari.com. We've been doing a lot of work on reinforcement learning and Neural Search. Our focus right now is on public content websites and e-commerce, but eventually we will get around to enable products like Confluence to create a great search experience without the need for an entire team. Search is a hard problem, but there is so much opportunity to improve the experiences that are available today. Exciting times.


It may be depressing to be on the Confluence “search” team. But wouldn’t it be MORE depressing to be on their “editing” team? Or how about their “performance” team? :-(

The comment you responded to said nothing about being on the search team being depressing.

I found the search pretty iffy at times. There was an exisiting marketplace app for it that was not much better so I wrote my own. Then turned it into a full marketplace app so others could benefit.

It does partial matches anywhere in a word, supports every language even in the same document, and even has regex support for those who need it. Update instantly with instant filters.

It can find things like 168.0 in 192.168.0.1 which the existing confluence search cannot for example. Or search for AKIA credentials /AKIA[A-Z0-9]{16}/ I have heard people describe it as Agolia for confluence which makes me happy.

https://marketplace.atlassian.com/apps/1225034/better-instan...

As for why their search is so bad? It's probably due to how they apply permissions. Every permission for their search needs to apply per search per user. It makes it complex and hard to apply changes, making it hard to improve things. I imagine it's one of those parts of confluence that is a major pain to work with.

I think a lot of this is also due to their cloud migration. When using the server version they were allowing you to host yourself you could store the index on disk. With cloud they suddenly need to keep the index state somewhere persistent, but they also want to dynamically scale up and down.

Lastly, they also apply stop words, stemming and such, using out of the box lucene. Lucene is a great tool, but it can also be a pain to work with. You can see problems when you mix languages on the page too, such as having Thai, Chinese and English on a single page which confuses the Lucene tokeniser.


ysk: You can save sites for reference later if you don't want to create a page in Confluence to do it: https://support.atlassian.com/confluence-cloud/docs/save-a-p...

If you want best of both words, you can use the "Favorite Pages Macro" on any page to reference all of the pages that you have saved for later, which makes keeping that page up to date with your latest changes to saved pages trivial.


I'm amazed to see this here.

My colleagues and I have been grumbling for ages that our instance of Confluence must be really badly configured. If you put in a single word search term, there will be lots of results, but no guarantee that any pages containing that word in the title (or body), will appear above ones where it doesn't.

The search problem was solved long ago by Apache Solr/Lucene. Although this may not be true for multiple languages.


As others have said Atlassian don't care about you, the user. Their products are piles of features that perform well in feature comparisons, with the minimum amount of effort to UX.

"Atlassian Tools" is on my list of automatic rejections for companies I'm thinking of working at for this reason.


I really wish one day I can search Bitbucket as I can search Github.

In my experience, it's usually that the person who created the page did not title it with something that a person would search for.

The organization of most teams' documentation is horrendous at my company. There are at least 3 different pages I have to go to for how-to articles and that's just within my current team's space. Not to mention there's limited information on those pages.

Documentation is an after thought. We've also seen a lot of attrition this year. I'm the senior person on my team as a midlevel. I have one contractor who's term is up in a couple months and one junior. They can't fill the 4 positions that have been open for 2-3 months.


Ive not really found any of the searches in Wiki's like this to be good - Notion is a beautiful Wiki type tool let down by its absolutely atrocious search capability.

Legal | privacy