Hi,
I download payment information from our bank system and now I am trying to retrieve the invoice numbers from this information.
I can get a list of invoice numbers from our ERP system and I am trying to use Philip Treacy's method from his blog post to get the information.
The received information includes a lot of reference numbers which I don't need, but these numbers may get flagged as a invoice number if I perform a partial match, therefore I would like to get an exact match.
The problem with this is, that the information is not clean. The text column includes a lot of spaces which the TRIM function does not get rid of. Also, the customer may have put : or / before the invoice number.
I have attached a small data set, where the solution only gets some of the invoice numbers, due to these issues.
I am now lost as to how to get to the desired result. Maybe there is another way to retrieve numbers? If so, I know that my invoice number is always 6 digits long.
Thanks in advance for any help.
Kind regards,
Peter Gulstad
Hi Peter,
See file attached for solution. The general approach with this is to split the text into sections using Text.Split. This gives you a List of text strings. You can then check for strings that are 6 characters long. If you find one, try to convert it to a number. If that fails then it was just text but if it succeeds then it is a number and that's your invoice number.
Of course this will only work if there are no other 6 digit numbers in the text string. But with the string formats not being consistent, we have to start somewhere.
In order to split the strings I replaced certain characters with spaces, like : and /, which you mentioned customers sometimes insert. There were also ASCII 160 characters (non-breaking space) that had to be replaced by a space.
let
Source = Excel.CurrentWorkbook(){[Name="RawData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Tekst", type text}, {"Beløb i DKK", Int64.Type}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",":"," ",Replacer.ReplaceText,{"Tekst"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","/"," ",Replacer.ReplaceText,{"Tekst"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",Character.FromNumber(160)," ",Replacer.ReplaceText,{"Tekst"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value2", "Custom", each List.Sum(List.Transform(Text.Split([Tekst], " "), each if Text.Length(Text.Trim(_)) = 6 then try Number.From(_) otherwise 0 else 0)))
in
#"Added Custom"
Regards
Phil
Hi Phil,
Thanks so much for your solution. It really takes care of most of the problems. However, now I have to lines that yields some unexpected results. I could not find out why, but I've realized the issue now. In lines 2 and 6, the invoice number is twice in the input, and the final output gives me the double value of the invoice number 🙂 Can we also account for that?
Invoice numbers are currently in the series 21xxxx.
Thanks again.
Best regards,
Peter
Hi Peter,
Yes this situation can be dealt with. The code creates a list of values that can be converted to a number. If the value is text and can't be converted then we get a 0. So you end up with a list like this
0
0
212981
0
212981
0
This list can be sorted
212981
212981
0
0
0
0
Then you just take the first element in the list as the result - see attached file.
Regards
Phil
Thank you so much, Phil! I really appreciate it!
Kind regards,
Peter
no worries 🙂