lbriner
Don't get me started on the automatic conversion of strings that it thinks are in scientific notation into numbers - which you can't switch off!

We have large data exports from systems that include things like unique location code. You accidentally happen to notice that a block of these look weird and it isn't just the display of them that has changed, the contents of the cell were changed by Excel automatically, without asking, and you cannot disable it.

Absolute BS after all these years. I hate that they won't fix these niggling issues that keep tripping people up over the years and just make excuses. Microsoft's usual response is: "We only work on things that affect a large number of customers". Yeah Microsoft, if you keep closing these bug reports, then each time someone reports it, you can just say that it only affects one person and close it again.

Or...you could show how amazing your company is by doing what most of us have to do: Fix it, add more debugging for the next time it happens if you can't recreate it, or have a properly tracked reason to say, "only a very few people have asked for this but changing it might break these other areas/bacwards compatability" or something.

Prickle
Previous popular thread, Aug 2020: https://news.ycombinator.com/item?id=24070385

> Scientists rename human genes to stop MS Excel from misreading them as dates (theverge.com)

Related details:

2023:

https://www.pcmag.com/news/microsoft-finally-fixes-excel-gli...

> Years after introducing Excel's automatic conversion features, Microsoft rolls out an update to prevent it from changing gene symbols to dates.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9325790/

> Gene Updater: a web tool that autocorrects and updates for Excel misidentified gene names

zelphirkalt
Know your tools. When I see such headlines, I feel sad about the state of this world and obstacles to progress, angry about the stupidity of designing all the tools for the most lazy of users, and at the same time I also laugh about the failures of proprietary software and its users, who could easily have invested a little bit of time to learn alternatives, but apparently chose not to.

Yes the tooling they use might be terrible. It is your responsibility to either deal with the terrible tooling yourself, learn better tooling, or get a capable computer person in the room, who can navigate the tooling landscape and get you the results.

And of course, that is not even addressing checking your result yet. This is a sad state of the research landscape, often financed by public money, and then throwing money at MS for using a proprietary tool and messing up.

bikenaga
An annoyance with spreadsheets that deterred me from ever using them in teaching is that they've perpetuated a arithmetic order-of-precedence bug. ("Bug" in the sense that it contradicts long-standing mathematical convention.) If you type

    -3^2
in a cell and press ENTER, the spreadsheet tell you it's "9". It should be "-9"; in math, exponentiation has precedence over unary minus, so you square 3, then negate the result. For instance, if you tell students to graph "y = -x^2", they should draw a parabola opening downward.

I don't have a recent copy of Excel to check this in, but this was the case in the '97 version. I just tried it in the current LibreOffice calc, and it returns "9". My guess is that one of the early spreadsheets messed up the order of precedence, and everybody after copied it for compatibility.

On the other hand, I just tried maxima and python and they both give "-9".

I wonder if this particular problem afflicts people who copy formulas from (say) math books into spreadsheets.

smartmic
The problem with Excel is that it tries to do almost everything in one software tool: front-end (editing, presentation), analysis, semantic modeling, data storage/database.

Almost all reasonable engineers see that there is something wrong with such an approach. But almost all everyday computer users think that this is the way computing has to be.

Sometimes I wonder why even I voluntarily open it for certain tasks - anyway, despite all the criticism, Excel has reached the Lindy[1] threshold for me and is here to stay.

[1] https://en.wikipedia.org/wiki/Lindy_effect

v1ne
We have a saying in German: A bad worker always blames their tools.

Excel has many quirks, but I'm still very grateful that it exists, for quickly putting together some numbers and still being able to change the inputs to my formulas.

Eddy_Viscosity2
But why oh why can't you disable automatic date reading? Put something in settings with a checkbox that I, the user, can tell excel to F-off with that nonsense.

Is there anybody who can argue the 'for' case for having this on all the time without recourse?

jasinjames
I had a similar issue just two weeks ago at $DAYJOB. I was scraping a log file and I had written an awk script to convert the HH:MM:SS.uuuuuu timestamps to nanoseconds since epoch for ingestion by another tool. Little did I know that awk uses floating point for numbers internally, and so my conversion function was incorrectly rounding the result, which messed up my later analysis in arcane ways. To add insult to injury, I had the same problem again when paging through the data in Visidata, but there was an easy workaround I don't exactly remember.
freehorse
At work we have had problems with excel changing values of numerical entries due to different locale formatting multiple times in the past, and many times causing hours of lost time till the culprit (excel) was found. Other types of values that are completely inconvenient for excel include social security numbers and phone numbers; which one would assume are more general purpose and thus closer to the standard use cases of excel than genes. Still excel is messing these things up. Opening a csv file is also a non-trivial overly complicated endeavour for excel in year 2024, which should be done with a double click (somehow other spreadsheet programs manage to handle csv files much more easily). The worst is that excel is usually used by non-technical people, who misunderstand these idiosyncrasies and time is wasted over and over.
redbell
Here's another interesting article on the same topic (2016) entitled "Gene name errors are widespread in the scientific literature" : https://genomebiology.biomedcentral.com/articles/10.1186/s13...

From the article:

"The problem of Excel software inadvertently converting gene symbols to dates and floating-point numbers was originally described in 2004. For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively."

golergka
I studied bioinformatics in university back in 2005, and the first subject we were trained on was not Python or Java, but using Excel. Our teachers explicitly told us about excel's conversions to dates and other pitfalls. I find it mind-boggling that some researchers still made these mistakes 10 years later.

Excel is a wonderful tool. But you need to learn your tools and find out about possible footguns.

__mharrison__
One wonders if these scientists took a week to learn sufficient Python and pandas, how much better the world would be?

(Certainly possible, I teach "I don't want to be a programmer"-types all the time. Taught a class this week in fact.)

mbreese
Even when this paper was published, errors due to Excel mangled gene names were well known for at least a decade.

Somewhat recently, some of the more error prone genes were renamed to accommodate Excel. (Ex: SEPT7 -> SEPTIN7)

bluedino
Missing checkbox. "Assume Everything Is A String"

We used to hit all kinds of Excel weirdness with inventory etc.

It was our fault that our part numbers could look like this:

00010190-95.020

benrutter
I know that the "anything" goes nature of excel is part of what makes it such a universally reached for tool, but every time I see sonething like this, I think that there would be a clear use case for a more restrictive spreadsheet.

9/10 spreadsheets are tables, but because they're spreadsheets they inherit the behaviour of "no conistent behaviour in columns, everything is independent and different".

rectang
Only one in five? Evidence of egregious overengineering by the Excel team.
ThinkBeat
This seems to be a problem quite smilar to dynamic or static typing.

The users expect that Excel will figure out what to do with the input correctly.

But you can manually tell Excel what to do with data in a column.

magicalhippo
I use Excel a fair bit at work, mostly for looking at data, but also to fairly easily generate SQL statements based on such data, often after some formulas have been applied.

Though I think what I'd really want is some tool which has the same grid-like visualization, filtering and direct entering, but was code-based under the hood and without magic conversions.

So, take Excel, and when you enter a formula in a cell, it actually writes a line of code for you, which you can inspect and edit. Including adding your own functions and such.

When importing delimited text data, you'd have to specify what the data is in each column. It should still save the original text data so you can change your mind, but yeah, no automagic stuff.

theodpHN
So, the question is: Would there be fewer or more errors if something other than Excel was used? Has much rigorous research been done on this?
databoy24
Excel will convert many CUSIPs (strings) into numbers (scientific notation). For a relatively comprehensive list, see below.

https://www.empirasign.com/cusip-excel-rosetta/

Barring some types of corporate actions, CUSIPs numbers cannot change, and I doubt the ABA is aware of this issue.

jimnotgym
Fixing the headline...

One in five genetics papers have errors caused by mistakes in the use of Excel

As if longhand calculations never have errors?

fifilura
I always felt that there is a hierarchy for working with numbers that goes something like this

Ease of use:

1. Excel

2. SQL

3. Functional programming (e.g. Scala, Python to some measure e.g. Pandas)

4. Imperative programming (C/C++/Java)

But then there another hierarchy that (roughly) goes in the other direction, which is about quality, repeatability, tooling.

If you are at 1 or 2, you responsibility will not be about writing tests and verifying your code using traditional engineering methods.

However! You are responsible for cross checking your results based on the input. This may be a manual process. But actually looking at the numbers from several different angles can give higher quality than writing contrived testcases (in 3 or 4).

HenryBemis
Let me reword this. "One in five genetics papers contains errors because the authors were careless when using Excel".

I dislike Excel for what it does (EUCs, nearly impossible to track changes, etc.) But on the other hand it is an amazing tool.

codr7
At least we got rid of Access?

I've seen things created in MS Access that can't be unseen.

me_vinayakakv
Reminds me of YAML's handling of booleans (`no` -> false) and git commit SHAs (treated as numbers in scientific notation)
dugmartin
US ZIP codes are what I always have issue with during imports. They start with 0 in New England and the zero always gets removed.
om8
Younger me would've said that WYSIWYG editors were a mistake and that researchers should've used LaTeX.

Now I think these errors are a small price to pay for convenience. One could waste a lifetime fighting small things like this and still lose. It's just the world we live in.

stellalo
> The researchers […] say the issue can be fixed by formatting Excel columns as text and remaining vigilant—or switching to Google Sheets, where gene names are stored exactly as they're entered.

I think the safest fix is to avoid spreadsheets altogether, as long as scientific research is concerned.

nuc1e0n
A HN thread similar to this one about how using Microsoft Excel corrupts data occurs about once a month. Nothing ever comes of it. Those emotionally invested in using Excel actively ignore criticism of it, or themselves for that matter.
hilbert42
Don't we ever learn. We've known since VisiCalc in the '80s not to trust spreadsheets for important data without crosschecking the results.

The question is why do we still use substandard tools for processing important data like this.

mixedbit
Does Excel have some reasonable way of unit testing of formulas?
a-dub
kinda like automatic fallback or other automatic help/fix features.

they sound like something that would be helpful but in practice they just end up being a massive violation of the principle of least surprise.

wodenokoto
What do you guys recommend for viewing and scrolling around tabular data? Most of my excel usage is literally just double clicking to have a look at a file.
huijzer
No, it is because incentives are misaligned and people care about getting cited. It doesn’t matter whether it’s correct or not.
amai
Thank got we don't have a monopoly in office software and can easily switch to a competitor.
bbarnett
It's not really Excel, is it? But users not reviewing what's happening, not paying attention.
jtrueb
Quit using Excel in research!
Gimpei
Didn’t the same thing happen in economics because of a bug in Stata in the 90s?
BurningFrog
This is 100% on the scientists who continue to use a tool that doesn't work for their use case!
darkhorn
This is why Statisticians do not use Excel for scientific work. They use SPSS, R, Minitab, MATLAB, SAS, etc.
bvan
Sounds like user-error, not a software error. Learn to use the software and its idiosyncrasies. RTFM.
oglop
lol. What a solution. Maybe learn how to use a proper DB. Research science is like 14 years behind the times. I was amazed at the SciPy24 conference how almost no data scientists use DBs. Wild. Pandas is your friend here and can at least help you with json stuff. Using excel though, yeesh. Yeah the problem isn’t excel, it’s bad training on data handling from schools.
robblbobbl
lol
aaron695
[dead]
temptemptemp111
[dead]
moi2388
The problem is scientists insisting on using tools like excel and Python instead of decent type safe language to do their calculations in
gverrilla
The problem with most software for complex tasks, including Excel and Python, is the hidden and/or non-intuitive mechanics. That's why a AI companion should accompany them for non-expert users (99%), which would be able to check what the user is doing for intention-result errors and warn.