For the past few days I have been trying to compile the list of gene names that is the most complete possible. To start with, I was given an initial list of genes in an excel file that was taken from the HUGO Gene Nomenclature Committee (HGNC). Unfortunately, the gene names were pasted from the original source (HGNC) to an Excel spreadsheet without modifying the expected format of the column cells. This led to Excel trying to “help” with the formatting of the value inserted, changing those gene names that are similar to dates to an actual date. In the bioinformatics field, misnaming a gene can lead to disastrous consequences such as misdiagnosis of a causal gene in a clinical setting. Thus:
Beware of pasting gene names in an Excel spreadsheet with a default format, as these may be changed into dates.
From my current list of 19,026 genes that I have compiled as of now, here are the names of the genes that have been automatically changed by Excel into dates. In the table below, the first column denotes the date the gene name is changed to, the middle column the Ensembl ID of the gene and the right column the actual name that was changed by Excel into a date.
Sep-01 ENSG00000180096 SEPT1 Sep-02 ENSG00000168385 SEPT2 Sep-03 ENSG00000100167 SEPT3 Sep-04 ENSG00000108387 SEPT4 Sep-05 ENSG00000184702 SEPT5 Sep-06 ENSG00000125354 SEPT6 Sep-07 ENSG00000122545 SEPT7 Sep-08 ENSG00000164402 SEPT8 Sep-09 ENSG00000184640 SEPT9 Sep-10 ENSG00000186522 SEPT10 Sep-11 ENSG00000138758 SEPT11 Sep-12 ENSG00000140623 SEPT12 Sep-14 ENSG00000154997 SEPT14 Mar-01 ENSG00000145416 MARCH1 Mar-02 ENSG00000099785 MARCH2 Mar-03 ENSG00000173926 MARCH3 Mar-04 ENSG00000144583 MARCH4 Mar-05 ENSG00000198060 MARCH5 Mar-06 ENSG00000145495 MARCH6 Mar-07 ENSG00000136536 MARCH7 Mar-08 ENSG00000165406 MARCH8 Mar-09 ENSG00000139266 MARCH9 Mar-10 ENSG00000173838 MARCH10 Mar-11 ENSG00000183654 MARCH11 Dec-01 ENSG00000173077 DEC1