I would like to compare two Excel spreadsheets
Both have the same columns, in the same order, but not the same rows.
I have a spreadsheet with 1,000 rows and 50 columns including a unique ID for each row and personal data (address, phone number, etc.).
I want to compare it with the same base extracted one year ago.
My objective is to know all changes (for example, someone who changed his address).
I tried with spreadsheet (as explained here: Compare two excel files) but it doesn't work because in my "new" file I have new rows, and I have some rows in the old file which are not in the new.
Spreadsheet compare with Excel line number. Is it possible to compare with the first column (Unique ID)?
Here is a very simplified example: https://filebin.net/g4w98251y9mfwug6
There are two sheets representing my two spreadsheets.
As you can see both have the same columns but:
- There are rows that are on both sheets with no changes (all the row is the same)
- There are rows that are on both sheets but there are changes (in orange on my example)
- There are rows that are only in the December sheet (I don't need to identify these lines)
- There are rows that are only in the January sheet (in orange on my example)
The orange color is exactly what I want, but in this example I can do it manually: comparing December and January. In the real spreadsheets I cannot do it manually as there are too many changes, data, lines, columns, etc. and I will do this monthly.
There is only one column which can never change: the column A.
Answer
It’s convenient that your spreadsheet uses 50 columns, because that means that columns #51, #52, …, are available. Your problem is fairly easily solved with the use of a “helper column”, which we can put into Column AZ
(which is column #52). I’ll assume that row 1 on each of your sheets contains headers (the words ID
, Name
, Address
, etc.) so you don’t need to compare those (since your columns are in the same order in both sheets). I’ll also assume that the ID
(unique identifier) is in Column A
. (If it isn’t, the answer becomes a little bit more complicated, but still fairly easy.) In cell AZ2
(the available column, in the first row used for data), enter
=B2&C2&D2&…&X2&Y2&Z2&AA2&AB2&AC3&…&AX2
listing all the cells from B2
through AX2
. &
is the text concatenation operator, so if B2
contains Andy
and C2
contains New York
, then B2&C2
will evaluate to AndyNew York
. Similarly, the above formula will concatenate all the data for a row (excluding the ID
), giving a result that might look something like this:
AndyNew York1342 Wall StreetInvestment BankerElizabeth2catcollege degreeUCLA…
The formula is long and cumbersome to type, but you only need to do it once (but see the note below before you actually do so). I showed it going through AX2
because Column AX
is column #50. Naturally, the formula should cover every data column other than ID
. More specifically, it should include every data column that you want to compare. If you have a column for the person’s age, then that will (automatically?) be different for everybody, every year, and you won’t want that to be reported. And of course the helper column, which contains the concatenation formula, should be somewhere to the right of the last data column.
Now select cell AZ2
, and drag/fill it down through all 1000 rows. And do this on both worksheets.
Finally, on the sheet where you want the changes to be highlighted (I guess, from what you say, that this is the more recent sheet), select all the cells that you want to be highlighted. I don’t know whether this is just Column A
, or just Column B
, or the entire row (i.e., A
through AX
). Select these cells on rows 2 through 1000 (or wherever your data might eventually reach), and go into “Conditional Formatting” → “New Rule…”, select “Use a formula to determine which cells to format”, and enter
=IFERROR(VLOOKUP($A2,'December 2017'!$A$2:$AZ$1000,52,FALSE), "") <> $AZ2
into the “Format values where this formula is true box”. This takes the ID
value from the current row of the current (“January 2018”) sheet (in cell $A2
), searches for it in Column A
of the previous (“December 2017”) sheet, gets the concatenated data value from that row and compares it to the concatenated data value on this row. (Of course AZ
is the helper column, 52
is the column number of the helper column, and 1000
is the last row on the “December 2017” sheet that contains data — or somewhat higher; e.g., you could enter 1200
rather than worrying about being exact.) Then click on “Format” and specify the conditional formatting that you want (e.g., orange fill).
I did an example with only a few rows and only a few data columns, with the helper column in Column H
:
Observe that Andy’s row is colored orange, because he moved from New York to Los Angeles, and Debra’s row is colored orange, because she is a new entry.
Note: If a row might have values like the
and react
in two consecutive columns, and this could change in the following year to there
and act
, this would not be reported as a difference, because we’re just comparing the concatenated value, and that (thereact
) is the same on both sheets. If you’re concerned about this, pick a character that is unlikely to ever be in your data (e.g., |
), and insert it between the fields. So your helper column would contain
=B2&"|"&C2&"|"&D2&"|"&…&"|"&X2&"|"&Y2&"|"&Z2&"|"&AA2&"|"&AB2&"|"&AC3&"|"&…&"|"&AX2
resulting in data that might look like this:
Andy|New York|1342 Wall Street|Investment Banker|Elizabeth|2|cat|college degree|UCLA|…
and the change will be reported, because the|react
≠ there|act
. You probably should be concerned about this, but, based on what your columns actually are, you might have reason to be confident that this will never be an issue.
Once you get this working, you can hide the helper columns.
No comments:
Post a Comment