Forum

Reading & proce...
 
Notifications
Clear all

Reading & processing a PDF file that could have a variable number of pages (same format each) with Power Query

7 Posts
3 Users
0 Reactions
665 Views
(@miko)
Posts: 4
Active Member
Topic starter
 

Good day at All

First time here "officially" greetings to the forum owners and all the users.

I have been working for a while trying to solve the issue that I'm facing without luck, wondering if it has a solution or not.

I will try to explain the best I can:

- I receive on a daily basis a PDF file that contains a simple table that needs to be cleaned, I implemented it with PQ without any problems.

- Now the issue is that the table could be on 1 page or multiple pages (the number of pages changes every day).

- I know how to implement the pdf reading/processing for 1 page, 2 pages, 3 pages, but evidently, all are different implementations.

So the question is:

  How to implement the solution with PQ for a variable number of pages?  ... it this even possible?

I will appreciate any tips that you can provide.

 

Thank you

Miko

 
Posted : 31/01/2021 1:01 am
(@mynda)
Posts: 4761
Member Admin
 

Hi Miko,

Welcome to our forum!

The Pdf.Tables function has a parameter for multiple pages. Have you tried setting this to TRUE?

Mynda

 
Posted : 01/02/2021 10:59 pm
(@maxdatabook)
Posts: 2
New Member
 

I may have a solution but it's ugly.  Interesting this question is here!  I was going to start a thread looking for comments to this tech article I'm writing on Medium.  A problem you'll face, at least I have, is the integration, or lack of integration between Power Query and Excel (VBA).  I explain in the article.

Does anyone have a real, or better, solution to the wait problem between Excel VBA and Power Query?  I learned the dummy wait trick on this forum, sorry I forgot who.  However, I can only get this process working by using both that trick and the "sleep" kludge.

https://docs.google.com/document/d/1A1SGyg92Av3Q000Ofx_F97mvpg333vDI3AlauAN9qwU/edit?usp=sharing

Here is the latest Excel workbook and sample files

https://drive.google.com/file/d/1Opx1fFwADUB2rT1vvWIKD1cJ93-nY6LK/view?usp=sharing

Thanks!

Max

 
Posted : 02/02/2021 11:03 am
(@miko)
Posts: 4
Active Member
Topic starter
 

Hello Mynda

 

I was reading the syntax of Pdf.Tables function and the description say the "TRUE" is the default value if omitted... now, to tell you the true I'm very new to PQ therefore even when I tried to explicitly declare it I have failed completely.

 

This is part of the code that I'm using

FilePath = Excel.CurrentWorkbook(){[Name="FilePath"]}[Content]{0}[Column1],
FileName = Excel.CurrentWorkbook(){[Name="FileName"]}[Content]{0}[Column1],
Source = Pdf.Tables(File.Contents(FilePath & FileName), [Implementation="1.1"]),
Table002 = Source{[Id="Table002"]}[Data],

 

The lines in bold are causing me a lot of trouble... no idea how to define the "options" mentioned on the documentation

 

If it is possible I will appreciate a little help

 

Thank you

Miko

 
Posted : 02/02/2021 2:22 pm
(@miko)
Posts: 4
Active Member
Topic starter
 

Good day

 

I found a solution that works in my case

 

Not sure if I can post a link to another website... just in case I'm requesting authorization from this forum's owner

 

Thank you

Miko

 
Posted : 02/02/2021 4:17 pm
(@mynda)
Posts: 4761
Member Admin
 

Yes, please share, Miko. Thanks for asking 😉

 
Posted : 02/02/2021 10:30 pm
(@miko)
Posts: 4
Active Member
Topic starter
 

Good day Mynda

 

Here is the link with the solution that solved my case... implementing it was a breeze, and I learned a few valuable things in the process

https://community.powerbi.com/t5/Power-Query/PDF-Connector-table-on-multiple-pages/m-p/687409

 

Note that I didn't use the Start Page & EnPage parameters at all, because in my case those were not needed 

Hope this helps others.

 

Miko

 
Posted : 03/02/2021 9:07 am
Share: