Hi
I'm in a data migration exercise and have a s/sheet of over 100k rows and so I need a solution beyond a manual update.
For the sake of this post I'll simplify the problem: I have a s/sheet with three columns (A, B & C) (see attached), with headings [data type] as follows: Name [text], ID [number], CreateDate [date]. The data has been sorted on Col A to deliberately bring the duplicate values in focus in that column. The IDs are unique and the dates are not unique.
In the example there are five rows which can be 'grouped' into two groups by virtue of duplicated values in column A. So for rows 2 and 3 the value in Col A is abc, for rows 4, 5 & 6 the value in Col A is xyz.
The first task is to determine the latest CreateDate in each of the 'row-groups'. So in this example that would be 5/11/1999 for rows 2 and 3, and 3/12/2001 for rows 4 to 6. I do this by using the array formula {=MAX(IF(A2=$A:$A,$C:$C))} in column D. See the attached s/sheet.
The next task is proving difficult. Now that I have determined the latest date for each row-group, I want to put the ID that corresponds to that LatestDate in column E, so that the result would look like Column E in the attached s/sheet (colored red). But I need to do it with a formula/function, not manually. This is for a migration project so it doesn't have to be pretty.
Putting the matching ID in column E is easy (an IF statement) - see rows 3 & 6 - but I'm not pretending that's the solution. But I can't work out how to populate the other rows in each group with that same ID - in my attached example that would be rows 2, 4 & 5. After futile attempts with various formulae I am at a loss. My guess is that it needs to rely on column C (the matching values), and functions such as index and match ... or maybe VBA. Can anyone help me with this?
Hi Kevin
You can try this formula. The assumption is that the latest ID for each group is the last transaction.
It doesn't look at the dates.
In cell F2 enter =LOOKUP(2,1/($A$2:$A$6=$A2),$B$2:$B$6)
Hope this helps.
Sunny
Thanks for that.
We can't assume anything about the value of the ID and its relationship to Createdate, but I think your suggestion will work so long as the data is ordered by Name, then CreateDate, both descending.
Would you agree?
Hi Kevin
Agree!!
Try this if the dates are not in sequence:
=LOOKUP(2,1/(($A$2:$A$6=$A2)*($C$2:$C$6=$D2)),$B$2:$B$6)
Thank you Sunny, I think this may have cracked it.
My only concern is the non-uniqueness of the dates may produce erroneous results but that will take some testing.
Hi Kevin
Thanks for your feedback. I am only able to test based on your sample data.
If there are other scenarios, then do post them for us to take a look.
Cheers