Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
"LibreOffice is better at reading old Word files than Word" (eldritch.cafe) similar stories update story
448 points by sohkamyung | karma 76115 | avg karma 9.95 2024-01-07 17:28:38 | hide | past | favorite | 196 comments



view as:

Unsurprising this would be the case since LibreOffice strives for compatibility while MS would want clients to upgrade versions to newer, more lucrative editions

Or you know, standard open formats if you prefer them. But then there is always that MS trolls in comments.

I converted, 100gb of xls/doc files to upload to SharePoint and netted something like 5x space savings just by resaving in an open format.

I am not surprised at all that Libre is better at opening old files, there are files that Microsoft wont let you open without going back into security settings and allowing them, we were converting a library that goes back nearly 30 years, lots of stuff dated 20+ years ago, why would you carry compat for this trash today?


Standard formats? I don't think so. The "standard format" they use for OOXML is not compliant with the strict version - so loose LibreOffice couldn't go to the standard to fix the compatibility issues.

It was never a standard to begin with.

Countries were refusing to use MS office because it didn't support open document standards. MS basically showed up to the spec committee with a bunch of MS Word documentation and said it was a new spec. It was flatly rejected, but then at the last second, a whole bunch of the committee changed their minds for no logical reason (I suspect they were simply paid off) and the "standard" was adopted.

Of course, it's impossible to write something compliant with their garbage and Office didn't comply with the "spec" in the slightest.

At the same time, MS refused to keep up with ODF standards in a massive case of malicious compliance.

In the end, it worked out for MS. They faked being open just enough for European governments to continue giving them huge piles of money for systems that are as locked down and proprietary as ever and consumers (like always) were the big losers.


I remember that I was following the saga on Slashdot. It went on for months.

There is a Wikipedia page about that standardization process https://en.wikipedia.org/wiki/Standardization_of_Office_Open...


There is the "standard" [ISO-29500], there are patch notes [MS-OI29500] for the standard and there are Excel only extensions [MS-XLSX].

As someone who develops a library for xlsx, you have to check all three, as well as spend some time experimenting with various changes to xlsx. Patch notes are generally regulary updated with valuable information (when MS is asked a question it often is incorporated there). Standard is full of inaccuracies and is mostly primer. Some essential info misses from all and it's described as "implementation spexific", e.g. text to number conversation in formulas.

Excel has gotten better over the years, so it is mostly in line with standard.


Is it not a lot better on that front than .doc was, though? I realize a lot of stuff can read .doc today but it's due to a lot of reverse engineering that happened over like 20 years because it was entirely proprietary.

Back in the days, Microsoft took backwards compatibility very seriously, maybe more than anyone else besides mainframes. Going as far as making special cases so that applications that relied on some bugs or undocumented features continued working.

IIRC not for Word file formats. Usually only the previous version would definitely convert, any older and things started breaking. And it would always save in the newest version by default so everyone else has to update to edit your file (there were free readers, but they were separate programs and I don't know how popular they were).

Yup, exactly this. And there would usually be an official plugin you could optionally install to read files from ~2 versions ago, presumably because businesses demanded it, but still wouldn't go back farther.

I can't think of any other program that removed the ability to open older versions of its own files. Kinda crazy.


Aren't there only two word file formats, going back to the 1990s? I've never seen a conversion problem.

DOC and DOCX right? I feel like there might have been some DOC 97 and DOC 2003 etc stuff going on but my memory fails me

There are three .doc pre office 97, .doc after 97 and .docx.

There are two file extensions, but with every new word version there was new features, and thus new stuff that had to be represented in the saved file. If you only went forwards and didn't skip to many versions, and remembered to save the file again in the newer version you'd be fine. but as the mastodon thread says, anything from before a certain date won't open at all, and more complex documents from the 90s and early 00s might have surprising issues in modern word.

Yeah I remember as a kid saving a file with Word 97, then opening it on a different computer with Word 95, and there was a box between every pair of characters. Looking back, it's clear that the internal representation had changed from an 8-bit encoding to UTF-16 but I learned to be careful to "save as Word 95" every time from then on.

That is forward compatibility and is much more difficult to maintain.

Somebody more knowledgeable than me can correct me, but iirc some of the earlier versions of Rollercoaster Tycoon 2 were incompatible with the latest version of Windows at the time that was unreleased at the time of the game's launch (because RCT2 used a memory management bug? I don't remember) so the Windows team coded up a special exception for RCT2 because they knew that users would blame Windows rather than the RCT2 team if the game failed to launch). This patched version of Windows ended up shipping.

I have no source to back me up. This is something I read about sometime back and it's probably lost in some deeply nested bookmark folder I'll never open.


I've heard a similar story but about SimCity.

Ah, that must have been what I was confused about. Thanks for correcting.

How I imagine the conversation:

  "It took a lot of work, but LibreOffice 7.6.4 can now open a wide range of word processor documents from pre-1994 on both Windows and Mac machines."
  "Can it edit .docx files without screwing up formatting yet?"
  "No, but who uses THAT?"
(shamelessly adapted from https://xkcd.com/619/)

Many many years ago I was distributing my resume as MSWord documents that I had so nicely crafted in LibreOffice. Years later I happened to open one of those resumes in actual Word and realized all my careful formatting was completely trashed.

That's ok. HR departments were going to trash the formatting even if it looked good in MS Word.

They manage to trash the formattimg for plain text resumes too.


It's not just between LibreOffice and Word. Up until at least Office 2016, .docx formatting would regularly break for me between Word versions or between Office for Windows/Mac. Admittedly, Microsoft has got better with this as of late, but the .odt format is much more solid.

Formatting for docx files isn't the same between modern desktop Word, and the online version either, I've had trouble with tables for example.

I stick to PDF for that sort of thing. No need for them to edit your resume, right?

I've had recruiters ask for editable resumes because they don't want to reveal your name until you've gone far along enough in the hiring pipeline.

I'd just send them a blank named PDF document copy if that was their only justification.

In all likliehood, they're going to copy and paste your resume into their tracker, and it's going to look like garbage regardless.

that xkcd has aged beautifully.

Back during the Windows 7 upgrade cycle, my company discovered that the old timecard software they were using no longer worked. Fortunately, the IT department came up with a solution: issue everyone in the commpany an Ubuntu VM, and have them run the software under Wine. That went on for years until the eventually switched over to a web based solution.

Water under the bridge but Microsoft takes compatibility very seriously and provides tools for you to write your own compatibility patches:

https://techcommunity.microsoft.com/t5/ask-the-performance-t...


I bet that the timecard application was a 16 bit executable. 32 bit Windows was able to run 16 bit executables using the NT Virtual DOS Machine (NTVDM), but 64 bit Windows couldn't.

Wine is however able to run 16 bit executables on 64 bit hosts using their version of NTVDM.


This is one of the rare use cases where Wine on Windows actually makes sense - you can build it under SFU/SUA, or at least you used to be able to. (I never managed to get FreeType working, so messages weren't sized properly for dialogue boxes, but it was enough to do what I needed).

Not rare at all! If you work with older games, you often will struggle to get the correct version of direct3D to work on Windows 11. Most of those games start first try in Wine on Windows.

And I honestly never understood this apart from Microsoft wanting to finally kill Win16: after all, it’s not like V86 mode (whose absence in x86-64 is the official justification) is actually required—non-x86 versions of NT had perfectly serviceable Win16 support through the extremely expected approach of having a full PC emulator inside[1]. (Some even extremely briefly had x86 Win32 support[2], but that wasn’t resurrected for Windows for ARM either, perhaps because the sheer amount of stuff in the system has grown so much since that time.)

[1] http://bytepointer.com/resources/old_new_thing/20060525_178_...

[2] http://retro.ircx.net.pl/nt/mips/wx86/


Do you want to troubleshoot 16-bit apps written in 1993 in 2024? Because this is what you would do if you would leave x16 support in your OS in 2024.

And it would cost a lot too.


Mind you, the decision was made with the release of (I think) Windows XP for x86-64, in 2005. I actually briefly used a computer with Windows 3.11 on it around that time (oh the wonders of school IT). I’m not saying Microsoft’s decision was wrong, though, only that the official rationale is kind of bullshit.

Couldn't you still run 16-bit apps in Windows 10 32-bit or did they drop it entirely by then?

In 2005 it was not only Win16 apps from 1993, but DOS3 apps from 1985 too.

Not to mention that v86 is still available on x86. You just have to temporarily drop out of 64 bit mode. This is not the 286 anymore...

... I’m trying to imagine how one’d handle interrupts in this setup, and it sounds kind of terrifying. Reprogram the LAPIC each time? Carefully code the interrupt entry to work as both 32- and 64-bit code to determine if it needs to switch back? Ouch.

You don't need to use the same IDT as in long mode. Just have a 32-bit one that decides whether to handle an interrupt (e.g. vm86 traps/emulation) or re-enable long mode and forward it to the 64-bit handler.

But you don't have to imagine -- early amd64 Linux could do it, and there was a patch floating around that kept that feature all the way to 2.6.2x series.

These days vm86 is borked even in 32-bit kernels.


You are right that Windows 7 x64 is unable to run 16-bit applications but I suspect it would have been far easier to do a Windows 7 x86 installation than Ubuntu VM + Wine per PC.

We once bought Parallels licenses for 3 marketing users that demanded Macs.

The main reason was so they could our timecard software. We could have just gave them a $2 RFID card like the production employees, but then they would have to walk down to the time clock and "punch in".

I think they also ended up using it for the Windows version of Outlook because there was some feature they wanted to use that the Mac version did not have.


My first mac at work was around 2009. To facilitate using all the internal tools all mac users were issues a windows virtual machine (vmware) that booted the company maintained windows image. Some of that stuff included some things that were dependent on IE 6 era APIs. Which at that point were already deprecated. But MS still supported that stuff.

The amazing stuff was that that vm ran circles around my previous windows laptop, which at that point was three years old. Just way faster than that ever was. Virtual machines and emulators are a good solution for legacy applications. Much easier than pretending ancient APIs are still supported. MS could have saved themselves a lot of grief by just embracing that decades ago and breaking compatibility with each new windows version. Just run the old crap in some vm or emulator. Apple has done that a few times when they switched CPU architectures and when they introduced OS X. They of course removed legacy support as well. But the point is valid: emulation works great.

And it works well enough with Linux support in windows these days; so why not apply that for legacy stuff as well. There's no reason why you wouldn't be able to run everything from the DOS era forty years ago up until now on a modern laptop. And there's no reason to burden modern APIs with all that crap.


It’s worth remembering the x86 emulation on Apple Silicon is as good as it is (reportedly better than most other x86-on-ARM efforts) in part due to CPU features Apple added to the host platform for that explicit purpose, from memory ordering[1,2] to obscure flag bits[3].

[1] https://news.ycombinator.com/item?id=28731534

[2] https://github.com/saagarjha/TSOEnabler

[3] https://news.ycombinator.com/item?id=33635720


Around 2009 (the timeframe GP describes), Apple was still using Intel CPUs.

My girlfriend is a fan of the old point-and-click Nancy Drew games. Nearly all of them are sold on Steam, and while some work fine, others won't even launch or have immediate breaking issues. We discovered that the games run better (granted, not really well enough to actually play them..) on Linux via Steam Proton than they do on modern Windows.

Some of the Nancy Drew games are supported in ScummVM, which means you'd be able to play them in higher resolution on many different platforms.

I have had much better experiences with gog.com instead of Steam for old games. They tend to actually make sure something runs before putting it up for sale.

Might be worth checking out for you. Though I see only one Nancy Drew game there.


I worked for a company once whose software was written for bigger monitors (and that size ONLY) that was common in the day.

So, as a customer you got a free properly sized monitor with your software purchase. True story.


wine is better at running everything prior to Windows 7 than current Windows

This hasn't been my experience at all.

WINE does fine for the top 10k or so most common apps. But whenever I've tried something in the long tail, I've been disappointed by cryptic error messages that don't turn up any useful answers in a search.

Meanwhile, Windows's compatibility mode settings for Win XP have worked fine every time I've tried.

The problem with Windows isn't a lack of backwards compatibility, the problem is that Windows 11 is Windows XP with lots of cruft added on top over the decades.


The reason WINE error messages often don't have google hits is because they're generally notes saying "new code needs to be written here".

> The problem with Windows isn't backwards compatibility, the problem is that Windows 11 is Windows XP with lots of cruft added on top over the decades.

It is not only cruft but also:

1 - Security. Lots of stuff that doesn't work it is because of features that couldn't be made secure and had to be removed from windows.

2 - Developers that used undocumented behavior that ended up being removed in modern windows versions, a lot of times because of the first item. Being undocumented behavior, it is a lot more probable that WINE doesn't implement it. And sometimes this was not a fault of the developer, but from a particular development environment and framework he used.


Windows XP was arguably Windows 2000 with a Fisher-Price UI bolted on, which in turn was "just" Windows NT with the Win98SE shell bolted on, so there's that, too.

(My MS kernel developer friends might disagree with this assessment, though)


Windows XP incorporated some QOL features we take (probably?) for granted today from Windows ME into the Windows NT line, like System Restore and Automatic Windows Update.

First and probably last time I've seen automatic windows updates described as QOL.

I guess it's true they do affect quality of life ;)


It wasn't just the shell. Win2000 had PnP. NT didn't.

XP introduced VEH, a more "Unix-like" method of handling exceptions. I recall this was one of the reasons the Golang devs dropped support for 2000 at 1.3x.

You should be able to run these under WSl2 in Windows 11 then, I assume?

Abiword does OK with old .doc files, if anyone was wondering

Unfortunately, AbiWord is effectively abandonware on Mac and Windows.

I tried compiling it for Mac recently and there is code that was deprecated in 2016 with the release of macOS Sierra.


it's better at reading UTF8 CSV files than Excel, too.

There is probably nothing that can read CSVs that does it worse than Excel.

An internal tool I'm responsible for maintaining (that coworkers at my company use) consumes CSVs, and a lot of the time they're made by less-technical employees who use Excel to create/edit them. Very often Excel causes unexpected bugs that baffle me when the tool can't parse it. One time it surprised me when all the cells looked right in Excel itself, but when I popped it into Emacs, there were ~15 extra commas at the end of every row. Don't know what world that would be expected behaviour in, but alright.

Extra commas mean extra cells. There are probably cells to the right of the data that are empty, but have formatting or something so that Excel thinks they are part of what needs to be saved.

Probably, but why would I want to keep formatted blank cells in my CSV data, when it doesn't even represent the style/formatting in any way? If there was any data I'd care about from those cells, it'd only maybe be the styling. But if it's not going to export the styling information, and the content is blank, then don't export those cells.

=cmd|' /C calc'!A0

I don't understand why is so bad. If you use the insert data from file it ask you to choose separator and all that, great works fine.

If I open the CSV directly it doesn't and unless the files follows your system settings for separator and date a d similar it opens it wrong.


[flagged]

Stallman knew exactly what was coming

Iirc Stallman knew because it was already happening with printer drivers and operating systems first.

Yeah, well... Just don't assume the doc/docx file you are working on with multiple revisions and comments and you are dutifully saving every 5 minutes will open next time you try loading it... Libreoffice will eat your document sooner or later if you edit a word file [0].

Switch to Libreoffice file format then convert to doc/docx then send back that doc/docx file.

[0] it's sneaky, basically Libreoffice will show you your edits and let you manipulate and modify the document as wish but what it saves into the file is corrupted and you will only notice it next time you open that file. Sometimes it's a mismatched tag in the internal xml representation (can be fixed), sometimes huge chunk of the doc will be missing (can't be fixed).


This. LibreOffice once erased all the footnotes in an article I was editing as a .docx. Never had such problems with .odt.

> Libreoffice will eat your document sooner or later if you edit a word file

To be fair, Word will won't eat it, but it will eventually shred it quite well to the point where there is no practical difference.

The Word's format is just something to avoid.


Please do not work on docx.

Import docx if necessary, but work on the OASIS OpenDocument format.

>Switch to Libreoffice file format then convert to doc/docx then send back that doc/docx file.

If you need to send work to a msoffice user, just send OpenDocument. Office can open them. Should there issues, it's more likely to be their bug, not libreoffice's.

MS Office users need to get used to working with the standard document format, which is what OpenDocument is.


Ubuntu Bug #1 has been closed for over 10 years. Time to let it go.

>Ubuntu Bug #1 has been closed for over 10 years. Time to let it go.

Could you please elaborate?


I interpreted this as "LibreOffice is primarily used on Linux, while Office is almost exclusively used on Windows. Windows has much larger desktop market share than Linux, so it is not surprising that Office prefers the Office-specific format."

For what it's worth, [docx][1] is technically standard. Not sure how that pans out in practice.

[1]: https://en.wikipedia.org/wiki/Office_Open_XML


> For what it's worth, docx is technically standard. Not sure how that pans out in practice.

It doesn’t, basically,—the spec isn’t enough to render it, and Microsoft has stopped engaging with ISO in favour of publishing what is now nineteen major versions[1] of their own spec in step with Office updates. There also was a huge shitstorm[2] around the ratification of the original, unsurprisingly given how useless and patent-infested it was.

[1] https://learn.microsoft.com/en-us/openspecs/office_standards...

[2] http://www.groklaw.net/article.php?story=2007011720521698


It's an "international standard" via both ECMA and the ISO/IEC JTC1. Although ISO in particular seems very pleased that JTC1 exists, this is a terrible way to do technical work, basically the idea is that countries get to agree the world's standards using a democratic process.

But why would countries be the right entities to do this work? They aren't, but there are a conveniently small number of them internationally and there were already bodies to represent them. Specifically they send representatives from their own national standards bodies to the relevant JTC1 sub-sub-committees. Yes that means Taiwan isn't represented.

For situations where there's just a matter of agreeing a few narrow specifics, such as the A-series paper standards, it doesn't really matter how it's done. For a huge problem like "Standardize Word processor application data" it's completely impractical and the results are all you'd expect. Microsoft basically leaned on national representatives from smaller countries to push their pointless vanity standard through both ECMA and subsequently JTC1.

After all that, it's basically futile because of course Microsoft can't magically make their "Office" suite and particularly Word behave in a documented internationally standard way, they don't even know how to describe much of the behaviour except "You know, that's how Word does it". And so, the "Office Open XML" standard has long sections where there's a magic escape hatch for "legacy" documents, which Word uses extensively, and it will always do that.

"Standard" except that yeah, all this non-standard stuff is critical and will be used forever. Futile.


I am not sure about the first part. Most Libre Office users I have come across seem to be running it on Windows.

Of course MS Office is easily dominant on Windows.


> Most Libre Office users I have come across seem to be running it on Windows.

Right you are! I stand corrected: https://stats.documentfoundation.org/downloads#week,os


Those stats are of direct downloads?

Linux is probably under-represented on this one because most Linux users will install from distro repos (or flatpack, snap or similar).


> worth, [docx][1] is technically standard.

there was the funny thing thy MS published the standard and at the same time implemented a different standard then due to some stuff MS had to implement their standard properly and now you had 2 docx standards and had to "choose" which to use when setting up MS Word (like back in ~2007 or so) .... then they always switched to the standardized docx but they kinda continued to mess with it to a point where you shouldn't expect a MS Word document to be readable by anything even if it supposedly is saved in the standardized format (and when nitpicking it theoretically also is, just practically not in a very useful way).



Context

Bug: Microsoft has a majority market share

https://bugs.launchpad.net/ubuntu/+bug/1


> If you need to send work to a msoffice user, just send OpenDocument. Office can open them. Should there issues, it's more likely to be their bug, not libreoffice's.

In the real world, we can't all afford to be this idealistic. When I need to send someone a document for work purposes and they can't open my .odt file in MS Word (something that frequently happens to me), I'm not going to say "You need to get used to working with the standard document format. It isn't my problem that you can't open it - it's more likely to be your bug, not Libreoffce's". I'm going to send them a .docx so that we can both get on with our work.


MS Office has officially supported ODT for years. Are you sending files to people using decade old versions of MS word, or are you using too recent a version of ODT, or are is MS's ODT support incredibly buggy or what?

MS Office should be able to open and display ODT correctly but MS Office will produce "rainbow" ODT, ODT with extensions that diverge. I can't find the reference/source at the moment, I will update ASAP.

>MS Office has officially supported ODT for years. [...], or are is MS's ODT support incredibly buggy or what?

The bugs of programs trying to read others' file formats goes both ways. LibreOffice has problems reading some *.docx files -- and likewise -- Microsoft Word has problems reading some *.odt files.

Example thread of trying to keep "tracked changes" preserved in .odt files when co-workers open it in MS Word: https://forum.openoffice.org/en/forum/viewtopic.php?t=99962

That type of interoperability issue also happen with *.xlsx and *.ods spreadsheets that have non-trivial formatting or advanced functionality.

The advice of "just use the OpenDocument format" is not that simple because round-trip fidelity of the file may not be 100% preserved depending on what features of the software the collaborators use.


> Are you sending files to people using decade old

it's sadly not that rear

> or are is MS's ODT support incredibly buggy or what?

Sometimes, _especially_ if it's the "web" Word 365, which in my experience is really good in messing up files including ironically docx files (through I haven't used it in the last ~2 years so maybe it got better).


and yet they will happily say to you: "just get software compatible with a shitty office suite from a crappy company determined to kill interoperability" and expect you to eat it.

I think that's reasonable though. I haven't used Windows or any Microsoft product for over 10 years. However, I accept that that means I use software which is relatively obscure and unpopular. I'm happy with my choice, but I don't expect to enforce it on others, and I think it's entirely reasonable (even if it doesn't personally please me) that when communicating with others for work purposes I should use a medium that is used by virtually everyone else rather than expecting them to adopt mine.

question, how far does this logic go? would you expect people to consider the environmental ramifications if they use a v12 20liter car to go to the grocery store every day? or whether its proper to prepare A LOT of food and just throw out what you dont eat?

Why is responsibility for the software YOU decide to use somehow not something you get, but responsibility for the car you choose to drive is most certainly something you get


The world would be a lot better if people didn't use obnoxious tools for jobs. No, you shouldn't be commuting in your truck that is a tool meant to haul things. Doubly so if it's within walking distance and the weather conditions are favorable.

> software compatible with a shitty office suite

I bet most people would consider Libre/OpenOffice the shitty office suit and Microsoft Office the standard.


yeah and at one point most people considered the earth flat

send a PDF

Works except when you want to collaborate. Joke's on you, I built my career never to collaborate on files but projects.

When I want to collaborate I send a fillable pdf. If I need more collaboration than that obviously mailing back and forth a document in any format is not going to cut it so we're using google doc et al.

”MS Office users need to get used to working with the standard document format, which is what OpenDocument is.”

This is not how the world, in which most of Fortune 500 operate in, works. The software and format people use, is the one usually mandated by their working environment. If your org operates in Office, you operate in Office. The question 99.9% of all users are concerned with is ”I want to review/edit this document I got from Sarah/need to send to Jane”. Not ”does this conform to standard xyz”.

Standars are just paper that software may support. The real question is phenomenological - is there a support, at what level, and is the software vendor incentivized to implement support.

There is no ”this format must be followed” convention in userspace software (sadly) unlike in say, hardware drivers for a desktop os. Even if the standard would have an ISO label.


> Office can open them.

Maybe, but it's probably more reliable for them to just grab LibreOffice to open it.

It's much easier to get LibreOffice than MS Office.


Not on a work computer

Depends on your work

I can't imagine working on large document in WYSWIG opaque formats anymore. Plain text file formats with version control is the only scalable toolset that won't lose your work.

And there are no file formats that support it, unless you go to something like Markdown.

LaTeX seems like it fits the bill.

I have used LaTeX for a lot of things and it is a pain to use. It is like everything is stuck in 70s, from the syntax to the toolchain to package management.

I don't use LaTeX for anything these days but Typst popped up recently and seems like a decent alternative: https://github.com/typst/typst

I dont like the idea of embedding a scripting language in my documents at all.

I mean LaTeX is Turing complete as well. It is just that scripting is very clunky to use.

Well, sure, I think thats part of the problem with it.

I dont think either of them have to be turing complete though.


Have a look at Typst then. It's a fresh take on the concept.

Otoh I feel like it forces me to keep it simple. I have two or three templates I've been using for decades, occasionally needing an update because some package was deprecated. Documents are not like Web Design where every two years someone comes up with completely new design guidelines based on the latest super duper awesome user testing and UX research and everything we did before was inferior.

To be fair, printed documents used to be like that for the first one or two centuries of their existence.

Another way of looking at it is that LaTeX is so good no one has managed to come up with something better.

I recently started using it because Lyx does not work on my ARM based Linux tablet and its not too bad when used with a GUI editor.

It is a lot more powerful than other things I use (Markdown, Sphinx) and a lot better for version tracking and multiple output formats than word processors.


You could also use docbook. My gut feeling is, that if you have large documents that you collaboratively work on, there's probably a bit of a compilation process to be done and other stuff you want to do with the document.

So the overhead of semantic markup of the whole content (instead of just marking words as bold) might be worth it.


Typically the big task of a large document is focused on content and organization, and the formatting is a separate concern.

XML is a good start to describing structured text.


Well, it depends.

If you're compiling a report for a college group project (or the workplace equivalent of that) you might well need things like equations and tables and suchlike.

For example if you've got a table of performance results with the best performer in each row highlighted in bold, a fact the text references - then separating content and formatting doesn't really make that much sense.


Just an example. Obviously the more formatting control you need the fancier your language and tool is. Tex is the upper bound.

> For example if you've got a table of performance results with the best performer in each row highlighted in bold, a fact the text references - then separating content and formatting doesn't really make that much sense.

You can tag the text as "important": see <strong> vs. <b> in HTML.


Markdown-in-git used to be used for tech specs at my company. Now we switched to Google Docs, which means we need to maintain a "change log" section at the bottom because Docs history is... not a commit log.

Markdown in git, reviewed as a pull request on GitHub, is the best way to do an RFC. I will die on this hill.

Alternatively, I wonder if an old school message board would work. Each RFC gets its own (sub)board, and within that board each thread is a discussion about some topic -- an individual review, debate about a section, etc. I wonder if such a thing already exists, _specifically_ for technical specification review.


> Markdown-in-git used to be used for tech specs at my company. Now we switched to Google Docs

I'm so sorry. Any rueful ideas on how another company might avoid that pitfall?


Not sure how to avoid. I think it was two things:

1. Some devs just didn't like having review discussions as PR comments. Maybe it's the way that the threads break up the markdown source. Or maybe WYSIWYG feels better for doc review. I also recall people saying "it's hard to know what's changed, could you add a changelog section?" When I replied "it's a git repo, and I leave detailed commit messages," there was no response. I think this means that they don't like that.

2. The managers who decide things don't use git. We use Google Docs for everything else, why not for this?

In the end, Google Docs works well enough. I do miss the commit log, though. It's tricky to link to a previous revision in the changelog table -- maybe I'll get into the habit of it.

Both GitHub PRs and Google Docs have this problem, though: There's no ready record of the review process -- the comments. They eventually disappear and are difficult or impossible to retrieve, and they lose context.

Not like anybody is bothering to dig.


> There's no ready record of the review process -- the comments. They eventually disappear

I wonder if someone could build something clever with git-notes [0] to capture such discussions in the repo itself.

I'm sure others have thought about it already, so there must be some kind of pitfall... A little more searching finds projects like git-appraise. [1]

[0] https://git-scm.com/docs/git-notes

[1] https://github.com/google/git-appraise


What about a wiki, like xwiki or others ? They have diff for changes (changelo), comments and in inline @user prompts.

> I wonder if an old school message board would work.

It's called Redmine.

Eg: https://redmine.pfsense.org/issues/14139


Trac is still around, too. 1.6 came out a few months ago.

Confluence does a great work of having a WYSWIG editor and keeping a log of changes on a document.

It's true for any non-native format (of some complexity) in any application.

Vendor A's development team spends all their time developing, testing, bug-squashing, updating, etc. Format A, and developing, etc. their application, and developing the two to work together. They also have all the bug data, data from users, etc.

Vendor B can't possibly keep up. They have their own application and format to develop. They lack the institutional knowledge, the data, etc. There's no way their application will correctly handle complex data in Format A nearly as reliably as Vendor A (which will have bugs itself).

And in this case, Microsoft's Office team probably has far greater resources overall than LibreOffice.


Got bug numbers on any of those? It's not clear what you mean. How recent was the LO?

I won't bother with providing more than this: https://ask.libreoffice.org/search?q=SAXParseException%3A%20...

Feel free to craft problematic docx that trigger conversion errors and report them as bug for the converter.


Similar effect can happen the other way around and in fact I have seen that more often (albeit mostly 10 years ago):

- create DOCX with change tracking in Word

- edit that in LibreOffice and save multiple times

- open it in Word modify something and save

- Word cannot load that file. Loading and saving in LibreOffice fixes that.

All cases I have seen involved change tracking, which is probably not that surprising as the internal representation of change tracking in DOCX is totally brain damaged (even more than the rest of that "dump RTF and random crap as XML" format).


> it's sneaky

I don't know if I would call LibreOffice sneaky here, they're the good guys IMO enabling you to break the clutches of MS Office


I am speechless.

These days it seems at least faster at opening even new Word documents, or maybe that's the network delay of the OneDriveification at work?

Don't even get me started on Adobe Creative Cloud.


Yep! I got some old word files off a floppy disk from 1996 and LibreOffice was the only thing I could find that could read them.

It’s great for all the people still living in 1992. The rest want good collaboration tools with a grammar checker

LibreOffice supports grammar checking with LanguageTool, which blows Word's grammar checker out of the water.

Fragmented sentence. Please rewrite your comment.

[Edit] I guess someone doesn't get the joke about how bad Words grammar checker is. It used to flag so much stuff as "fragmented" and didn't suggest improvements.


To the Microsoft apologizers: Can't you be happy that open source software provides solutions to problems users face?

I WORK at Microsoft 365 (not on Word), and it's actually good LibreOffice can read old Word files even if Word can't. I wasn't even born until 1997, tho when I was young I played a lot with MS-DOS and Win9x, since my toys were old PCs.

The reality is, both MS Office and LibreOffice are legacy messes. In the 90s MS aggressively broke compatibility to get users to upgrade, which is why LO can read it. Now MS is excellent at compatibility, because the focus is subscription software your employer pays for.


[flagged]

I use LibreOffice to open all my CSV files because somehow, in the year 2024, Excel can’t do that without messing them up. And yes, I know about the import wizard that still manages to mess them up at least some of the time.

> I use LibreOffice to open all my CSV files

I really love the preview, where you can choose what delimiters and character encoding you want to work with, which row to start from and so on. It's so very user friendly and nice to use.


It’s the best! I have no clue how Excel continues to be so awful at this, by contrast.

The Excel team treats the CSV format like Voldemort. They put no effort into it because it's an open file format they can't seize control of.

You should use Power Query for this usage, something like Get Data, From File, yada yada

I should double click on the file and it open correctly, like LibreOffice does when I set it as the default for this file type.

The problem is that Excel let you open a file with a random format. Same for LO, CSV data is usually so dirty that you shouldn't manage them without an proper loading process.

…but Excel already does it without a proper loading process and just breaks the files. LibreOffice figured this out. Just have a small modal that pops up when you open with sane defaults. This works the vast majority of the time.

A Linux user friend of mine once told me that he saves his files as .doc. His argument was that if he saved as .odt, only libreoffice would be able to read it but with .doc, both platforms (LO and word) would "work fine".

One of my "culture shock" from game dev industry was the number of professional artists who paint with Clip Studio Paint but use .psd as exchangeable format[1].

[1]: Of course .psd isn't an exchangeable format. It's proprietary. But just like your friend's case, CSP can kinda read/write .psd, while Photoshop can't read .clip at all.


Could you share some other "culture shocks" from game dev?

In regards of proprietary format, the prevalence of FBX is another one. Especially when there is an open format specifically designed for this purpose (glTF) by an important organization (Khronos).

Generally speaking I think game dev industry relies on closed-source stuff much more than web dev. It's not that surprising tho; after all the most targeted platforms (Windows, iOS, consoles), except Android, are all closed OS.

Another one is salary (again compared to web dev). But I heard web devs had a hard time lately too.


The salary issue (along with the working conditions) are because game dev is an "art" field which meant, like other creative jobs, companies can life in young, naive people and exploit their passion to get them to accept terrible conditions because "wow I get to make a game". I suspect this is why triple A games are so terrible nowadays, most of the talented people realised that corporate game devs suck and going indie is the best path. Web design is also creative obviously but mostly going to consist of making corporate pages, not something most people can be passionate about.

To be fair I did that too when I was still in school and regularly shared files, just for the simplicity. However I rarely used more than default formatting so it didn't really matter

It's rather ironic that the one old file format I've tried to open in LibreOffice, namely, old StarOffice 5 files, do not open. (LibreOffice forked from OpenOffice, which was the open-sourcing of StarOffice.) The converters were removed to simplify the codebase.

If they are converters, I wonder how difficult it would be to make them standalone utilities that did the format conversion and nothing more (assuming there doesn’t need to be manual intervention). I’m sure there is complexity but it seems like StarOffice is a fixed format at this point, and OpenDocument has some previous fixed version that could be targeted.

Not that anyone should expect different, but Apple has not preserved backwards compatibility with their iWork formats and I have several legacy documents that I need to find old versions to open.


LO now has libstaroffice, which should read them. What happens with recent LO?

Try opening with Libreoffice 3.x. A lot of support for Staroffice was dropped in LO 4.

https://wiki.documentfoundation.org/ReleaseNotes/4.0#Feature...


And Wine is better at running some older Windows software than Windows.

We have a full circle. "The job ain't done until Word won't run".


Never open old Japanese Word files in LibreOffice. It will destroy the file.

This is because old Japanese Word files are saved in Shift JIS, but LibreOffice opens them in UTF-8.


Do you know if there is an issue for that particular problem in their Bugzilla[1]? If not, have you tried opening a new one? Could you share a sample to reproduce? Because I tried to search for "Shift JIS" on Bugzilla and found nothing.

[1] https://bugs.documentfoundation.org/


Similar story: I managed to run a Windows XP-era game to run on Ubuntu but not Windows 11. The issue is that the game uses some sort of Direct3D which doesn't work on Windows 10 or later. I don't want to install Windows XP/7/8 on the a machine, and virtual machines have bad support for graphics acceleration. But somehow it can run in Wine on a "real" Ubuntu installation after a bit of poking around and getting the correct files.

And what's even funnier? I installed Wine in Ubuntu running in WSL on a Windows 11 machine, and the game runs in that environment! Never thought I would run an old game in such a convoluted way.


I've been wanting to run old Photoshop in this way.

Was the process straightforward?


How old is the Photoshop? I was recently able to run Macromedia Flash 2004 in Windows XP compatibility mode without any hiccups.

This does not surprise me at all. WINE under Linux through Yabridge runs perfectly some very old (20+ years) but still excellent audio plugins from the Win98/XP era that have problems or don't work at all on newer Windows versions. Gaming also benefits from this incredibly faithful level of emulation (WINE Is Not an Emulator), and as of today many Windows games, including newer ones, can be installed in a transparent way under Linux using WINE.

https://www.youtube.com/watch?v=Bg1NiXtrJ6g

https://www.youtube.com/watch?v=3b50Stm8gu4


The Steam Deck runs Linux, and I can get many games that shit the bed on Windows 7, 8, or 10 but run perfectly fine under 98 and XP to run under Proton.

> I installed Wine in Ubuntu running in WSL on a Windows 11 machine, and the game runs in that environment! Never thought I would run an old game in such a convoluted way.

You don't need WSL.

https://fdossena.com/?p=wined3d/index.frag

Compiled wined3d dlls that work on windows.

https://github.com/doitsujin/dxvk

Works on windows.

Just drop them in the game folder and the game should load them instead of the real directx.

There's also other implementations of old APIs to keep old video games running, some of them are even used by linux users who use wine, like dgVoodoo :

http://dege.freeweb.hu/dgVoodoo2/ (supports Glide, a proprietary API from the 3dfx era cards, along with dx 1-7, 8.1 and 9)

https://github.com/FunkyFr3sh/cnc-ddraw (fixes all issues you can have with DirectDraw, an old 2d API, can have its use for both windows users and people who use wine on linux)

https://github.com/otya128/winevdm run 16 bit apps on 64 windows

This, along with Windows's own compatibility mode tweaks, should run almost any game that has ever been released on Windows, without having the heavy overheard of a VM (as far as I know, WSL doesn't even know how to free memory it has claimed).


Thanks! This didn't turn in my previous search. Will give it a try.

I've recounted a similar story to this in the past: Basically some custom DOS app being used at a small chain of quick oil change places wouldn't run on Windows 7+ anymore, regardless of compatibility settings but worked fine on Linux under wine. This was done as a stopgap until they could redevelop the application. Because of the experience they had with their console app, they kept it a console app and just had its replacement modernized and written to run in a linux shell (saved all their licensing costs too). They seemed pretty happy in the end.

Microsoft moved away from the core tenet of backwards compatibility at all costs more than a decade ago seemingly across the entire company all at once.


Sincere question: Did that core tenet become incompatible with security against malware? Were the attack surfaces too big when supporting the old stuff?

It is infeasible and unmaintainable for them to be compatible with everything they have ever released since the 1980s. 16-bit DOS and Windows applications don't run on 64-bit versions of Windows.

Can’t it just be emulated?

A while back I read a piece by, I think, Joel Spolsky on the "design" of the XLS format. It was mostly a dump of Excel's internal memory structures to disk, with some optimizations to speed up saving and loading on the slow disks of the time. He seemed quite satisfied with how well the software performed but all I was thinking is "any other program trying to deal with those files is screwed." And "any other program" includes later versions of Excel.

It'd be nice to have some documentation of the internals of old DOC and XLS formats, just for the sake of recovering old files for archival purposes, but it's likely that Microsoft never bothered documenting them, even internally.


This is also true of DOC. It was just a dump of the internal representation of Word.

Hence why they had to completely overhaul it with DOCX/XLSX when we had the open standards requirements come in. There was no possible way for Microsoft to document it. Although it doesn't help that DOCX/XLSX have tags for "Internal binary dump of Word/Excel" so they also fall into the same trap.


They would, presumably, just need to release the header files?

All file formats are, in the end, just memory structures written in binary.


Then their dark history would have been obvious.


Microsoft has in fact documented these old formats, because they were forced to in antitrust actions!

Per Spolsky, they're an absolute shower. But they are, technically, documented!


This help me at least once. office365 don't support a old doc format, use Libre office save it as new version...

Back in the mid-late 1990s I worked for a software company producing anti-virus software, and scanning Word files for macro viruses was one of those things they had to do. However, they weren't able open Word 1.0 files, so they brought over some actual Microsoft employees from the US to help. For a relatively small (at the time) UK company that was actually quite a big thing and the only time I'm aware they did something like that. It was all a bit hush-hush, but the story I heard was that Microsoft had lost the original source code to read and write Word 1.0 format files (although they still had compiled binaries), so new code had to be reverse-engineered.

I have word documents from 1997-1999 that are useless. The recent Word version does not open them. I open the documents in Nisus Writer on my Mac, but the formatting is basically useless. I‘ll give it a try in LibreOffice.

You can find Word 2003-2007 and use that to convert it.

This LibreX for open source products really needs to stop. Also the loose-attribution of panasian or ancient greek/latin…

LibreOffice - in my experience - is also much better about dealing with huge CVS files.

WINE is better at running old Windows software than Windows, so there seems to be some consistency at least.

Pity that the same can't be said of modern Word files. My father asked me for help in converting a book he had written in Word to an EPUB. I opened it up in LibreOffice so that I could try to fix a few things like insert a proper ToC (he generated his by hand, and is constantly fiddling with the page numbers whenever he changes something). LO renders it well enough that I can see the intent, but there are formatting artifacts galore, and as soon as I start fiddling with anything, it seems like I've consigned it to a limbo state where it looks like crap in both LO and Word.

Regrettably, my personal devices are all Linux or Android only at this point, so, I lack a proper copy of Word to fix it for him.


If this is still an issue for you I believe you can convert/import docx files into ebooks via calibre, from there you can just use calibre's ebook editor to fix things up

Is the difference between the macos version of word and the windows version of word significant here? I have to assume microsoft didn't ship a whole COM compatibility layer office for mac, like apple did with objective-c when they put safari on windows.

Google Docs is now also a roach motel for DOCX files. It is apparently no longer possible to export as DOCX. I am rather disappointed, but not surprised. I have become quite skilled and experienced with Google Docs and Sheets, but frankly I'd prefer, for reasons of compatibility and interoperability, to author stuff in Microsoft Word instead.

Legal | privacy