> Scientists rename human genes to stop MS Excel from misreading them as dates (theverge.com)
Related details:
2023:
https://www.pcmag.com/news/microsoft-finally-fixes-excel-gli...
> Years after introducing Excel's automatic conversion features, Microsoft rolls out an update to prevent it from changing gene symbols to dates.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9325790/
> Gene Updater: a web tool that autocorrects and updates for Excel misidentified gene names
Yes the tooling they use might be terrible. It is your responsibility to either deal with the terrible tooling yourself, learn better tooling, or get a capable computer person in the room, who can navigate the tooling landscape and get you the results.
And of course, that is not even addressing checking your result yet. This is a sad state of the research landscape, often financed by public money, and then throwing money at MS for using a proprietary tool and messing up.
-3^2
in a cell and press ENTER, the spreadsheet tell you it's "9". It should be "-9"; in math, exponentiation has precedence over unary minus, so you square 3, then negate the result. For instance, if you tell students to graph "y = -x^2", they should draw a parabola opening downward.I don't have a recent copy of Excel to check this in, but this was the case in the '97 version. I just tried it in the current LibreOffice calc, and it returns "9". My guess is that one of the early spreadsheets messed up the order of precedence, and everybody after copied it for compatibility.
On the other hand, I just tried maxima and python and they both give "-9".
I wonder if this particular problem afflicts people who copy formulas from (say) math books into spreadsheets.
Almost all reasonable engineers see that there is something wrong with such an approach. But almost all everyday computer users think that this is the way computing has to be.
Sometimes I wonder why even I voluntarily open it for certain tasks - anyway, despite all the criticism, Excel has reached the Lindy[1] threshold for me and is here to stay.
Excel has many quirks, but I'm still very grateful that it exists, for quickly putting together some numbers and still being able to change the inputs to my formulas.
Is there anybody who can argue the 'for' case for having this on all the time without recourse?
From the article:
"The problem of Excel software inadvertently converting gene symbols to dates and floating-point numbers was originally described in 2004. For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively."
Excel is a wonderful tool. But you need to learn your tools and find out about possible footguns.
(Certainly possible, I teach "I don't want to be a programmer"-types all the time. Taught a class this week in fact.)
Somewhat recently, some of the more error prone genes were renamed to accommodate Excel. (Ex: SEPT7 -> SEPTIN7)
We used to hit all kinds of Excel weirdness with inventory etc.
It was our fault that our part numbers could look like this:
00010190-95.020
9/10 spreadsheets are tables, but because they're spreadsheets they inherit the behaviour of "no conistent behaviour in columns, everything is independent and different".
The users expect that Excel will figure out what to do with the input correctly.
But you can manually tell Excel what to do with data in a column.
Though I think what I'd really want is some tool which has the same grid-like visualization, filtering and direct entering, but was code-based under the hood and without magic conversions.
So, take Excel, and when you enter a formula in a cell, it actually writes a line of code for you, which you can inspect and edit. Including adding your own functions and such.
When importing delimited text data, you'd have to specify what the data is in each column. It should still save the original text data so you can change your mind, but yeah, no automagic stuff.
https://www.empirasign.com/cusip-excel-rosetta/
Barring some types of corporate actions, CUSIPs numbers cannot change, and I doubt the ABA is aware of this issue.
One in five genetics papers have errors caused by mistakes in the use of Excel
As if longhand calculations never have errors?
Ease of use:
1. Excel
2. SQL
3. Functional programming (e.g. Scala, Python to some measure e.g. Pandas)
4. Imperative programming (C/C++/Java)
But then there another hierarchy that (roughly) goes in the other direction, which is about quality, repeatability, tooling.
If you are at 1 or 2, you responsibility will not be about writing tests and verifying your code using traditional engineering methods.
However! You are responsible for cross checking your results based on the input. This may be a manual process. But actually looking at the numbers from several different angles can give higher quality than writing contrived testcases (in 3 or 4).
I dislike Excel for what it does (EUCs, nearly impossible to track changes, etc.) But on the other hand it is an amazing tool.
I've seen things created in MS Access that can't be unseen.
Now I think these errors are a small price to pay for convenience. One could waste a lifetime fighting small things like this and still lose. It's just the world we live in.
I think the safest fix is to avoid spreadsheets altogether, as long as scientific research is concerned.
The question is why do we still use substandard tools for processing important data like this.
they sound like something that would be helpful but in practice they just end up being a massive violation of the principle of least surprise.
We have large data exports from systems that include things like unique location code. You accidentally happen to notice that a block of these look weird and it isn't just the display of them that has changed, the contents of the cell were changed by Excel automatically, without asking, and you cannot disable it.
Absolute BS after all these years. I hate that they won't fix these niggling issues that keep tripping people up over the years and just make excuses. Microsoft's usual response is: "We only work on things that affect a large number of customers". Yeah Microsoft, if you keep closing these bug reports, then each time someone reports it, you can just say that it only affects one person and close it again.
Or...you could show how amazing your company is by doing what most of us have to do: Fix it, add more debugging for the next time it happens if you can't recreate it, or have a properly tracked reason to say, "only a very few people have asked for this but changing it might break these other areas/bacwards compatability" or something.