From: Thomas S. <ts...@la...> - 2009-09-02 13:02:35
|
Hello I'm a new subscriber on this list; greetings to everyone. I have a bash script which at some point should translate a PDF file to plain text. Let's say we have foobar.pdf and want to convert it to foobar.txt. I can do this from the GUI but I'm unable to figure out what the command should be to do the same from the command line. Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor help would be very much appreciated. Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. Best ragards, Tom |
From: Alister H. <ali...@sy...> - 2009-09-04 00:46:18
|
Sorry if someone else replied and I missed it. I don't know how to do this with pdfedit, but you could alternatively try the pdftotext tool from xpdf, or pdftohtml if that is more suitable for your purpose. Alister -----Original Message----- From: Thomas Spahni [mailto:ts...@la...] Sent: Thursday, 3 September 2009 12:21 a.m. To: pdf...@li... Subject: [Pdfedit-support] Save file as text from the command line Hello I'm a new subscriber on this list; greetings to everyone. I have a bash script which at some point should translate a PDF file to plain text. Let's say we have foobar.pdf and want to convert it to foobar.txt. I can do this from the GUI but I'm unable to figure out what the command should be to do the same from the command line. Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor help would be very much appreciated. Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. Best ragards, Tom ------------------------------------------------------------------------ ------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Pdfedit-support mailing list Pdf...@li... https://lists.sourceforge.net/lists/listinfo/pdfedit-support |
From: Thomas S. <ts...@la...> - 2009-09-04 09:53:10
|
On Fri, 4 Sep 2009, Alister Hood wrote: > Sorry if someone else replied and I missed it. > I don't know how to do this with pdfedit, but you could alternatively > try the pdftotext tool from xpdf, or pdftohtml if that is more suitable > for your purpose. > > Alister I am currently using pdftotext in my script. However, it doesn't work well. That means, it drops a lot of spaces between words which makes the output almost unuseable. This may be a problem with the PDF-input, but I have no influence on this. For this reason I tried to use pdfedit and found, that it's much better: the output is perfect. >From the man page I can see that there is a command line mode. I found the script savealltext.qs on the wiki. But I can't figure out how to use this from the command line. I still guess it must be easy, but I have no success so far. Unfortunately I could not find any examples of how to use pdfedit in command line mode. Thomas > -----Original Message----- > From: Thomas Spahni [mailto:ts...@la...] > Sent: Thursday, 3 September 2009 12:21 a.m. > To: pdf...@li... > Subject: [Pdfedit-support] Save file as text from the command line > > Hello > > I'm a new subscriber on this list; greetings to everyone. > > I have a bash script which at some point should translate a PDF file to > plain text. Let's say we have foobar.pdf and want to convert it to > foobar.txt. I can do this from the GUI but I'm unable to figure out what > > the command should be to do the same from the command line. > > Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor > help > would be very much appreciated. > > Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. > > Best ragards, > Tom > > ------------------------------------------------------------------------ > ------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pdfedit-support mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-support > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pdfedit-support mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-support > |
From: Jozef M. <mis...@ho...> - 2009-09-04 19:52:53
|
hi, i changed the algorithm of pdftotext a bit but it is far from what i would like it to be. nevertheless, i can provide you with source code of you tool using our pdfedit library extract text function but you would have to compile it by your own. will it help? /jozo ---------------------------------------- > Date: Fri, 4 Sep 2009 11:52:46 +0200 > From: ts...@la... > To: pdf...@li... > Subject: Re: [Pdfedit-support] Save file as text from the command line > > On Fri, 4 Sep 2009, Alister Hood wrote: > >> Sorry if someone else replied and I missed it. >> I don't know how to do this with pdfedit, but you could alternatively >> try the pdftotext tool from xpdf, or pdftohtml if that is more suitable >> for your purpose. >> >> Alister > > I am currently using pdftotext in my script. However, it doesn't work > well. That means, it drops a lot of spaces between words which makes the > output almost unuseable. This may be a problem with the PDF-input, but > I have no influence on this. For this reason I tried to use pdfedit and > found, that it's much better: the output is perfect. > >>From the man page I can see that there is a command line mode. I found the > script savealltext.qs on the wiki. But I can't figure out how to use this > from the command line. I still guess it must be easy, but I have no > success so far. Unfortunately I could not find any examples of how to use > pdfedit in command line mode. > > Thomas > > >> -----Original Message----- >> From: Thomas Spahni [mailto:ts...@la...] >> Sent: Thursday, 3 September 2009 12:21 a.m. >> To: pdf...@li... >> Subject: [Pdfedit-support] Save file as text from the command line >> >> Hello >> >> I'm a new subscriber on this list; greetings to everyone. >> >> I have a bash script which at some point should translate a PDF file to >> plain text. Let's say we have foobar.pdf and want to convert it to >> foobar.txt. I can do this from the GUI but I'm unable to figure out what >> >> the command should be to do the same from the command line. >> >> Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor >> help >> would be very much appreciated. >> >> Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. >> >> Best ragards, >> Tom >> >> ------------------------------------------------------------------------ >> ------ >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 >> 30-Day >> trial. Simplify your report design, integration and deployment - and >> focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Pdfedit-support mailing list >> Pdf...@li... >> https://lists.sourceforge.net/lists/listinfo/pdfedit-support >> >> ------------------------------------------------------------------------------ >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day >> trial. Simplify your report design, integration and deployment - and focus on >> what you do best, core application coding. Discover what's new with >> Crystal Reports now. http://p.sf.net/sfu/bobj-july >> _______________________________________________ >> Pdfedit-support mailing list >> Pdf...@li... >> https://lists.sourceforge.net/lists/listinfo/pdfedit-support >> > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pdfedit-support mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-support _________________________________________________________________ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery |
From: Thomas S. <ts...@la...> - 2009-09-05 11:05:11
|
Hi Jozef That would certainly help a lot. And I would be happy to compile it myself and give it a try. Thanks in advance. Thomas On Fri, 4 Sep 2009, Jozef Misutka wrote: > hi, > > i changed the algorithm of pdftotext a bit but it is far from what i > would like it to be. nevertheless, i can provide you with source code of > you tool using our pdfedit library extract text function but you would > have to compile it by your own. will it help? > > /jozo > > ---------------------------------------- >> Date: Fri, 4 Sep 2009 11:52:46 +0200 >> From: ts...@la... >> To: pdf...@li... >> Subject: Re: [Pdfedit-support] Save file as text from the command line >> >> On Fri, 4 Sep 2009, Alister Hood wrote: >> >>> Sorry if someone else replied and I missed it. >>> I don't know how to do this with pdfedit, but you could alternatively >>> try the pdftotext tool from xpdf, or pdftohtml if that is more suitable >>> for your purpose. >>> >>> Alister >> >> I am currently using pdftotext in my script. However, it doesn't work >> well. That means, it drops a lot of spaces between words which makes the >> output almost unuseable. This may be a problem with the PDF-input, but >> I have no influence on this. For this reason I tried to use pdfedit and >> found, that it's much better: the output is perfect. >> >>> From the man page I can see that there is a command line mode. I found the >> script savealltext.qs on the wiki. But I can't figure out how to use this >> from the command line. I still guess it must be easy, but I have no >> success so far. Unfortunately I could not find any examples of how to use >> pdfedit in command line mode. >> >> Thomas >> >> >>> -----Original Message----- >>> From: Thomas Spahni [mailto:ts...@la...] >>> Sent: Thursday, 3 September 2009 12:21 a.m. >>> To: pdf...@li... >>> Subject: [Pdfedit-support] Save file as text from the command line >>> >>> Hello >>> >>> I'm a new subscriber on this list; greetings to everyone. >>> >>> I have a bash script which at some point should translate a PDF file to >>> plain text. Let's say we have foobar.pdf and want to convert it to >>> foobar.txt. I can do this from the GUI but I'm unable to figure out what >>> >>> the command should be to do the same from the command line. >>> >>> Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor >>> help >>> would be very much appreciated. >>> >>> Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. >>> >>> Best ragards, >>> Tom |
From: Michal H. <ms...@gm...> - 2009-09-05 11:10:32
|
On Fri, Sep 04, 2009 at 07:52:43PM +0000, Jozef Misutka wrote: > > hi, > > i changed the algorithm of pdftotext a bit but it is far from what i > would like it to be. nevertheless, i can provide you with source code > of you tool using our pdfedit library extract text function but you > would have to compile it by your own. will it help? Isn't this just overkill for something that is already scripted and it is only matter how to call the script? Martin, could you help with this? > > /jozo > > ---------------------------------------- > > Date: Fri, 4 Sep 2009 11:52:46 +0200 > > From: ts...@la... > > To: pdf...@li... > > Subject: Re: [Pdfedit-support] Save file as text from the command line > > > > On Fri, 4 Sep 2009, Alister Hood wrote: > > > >> Sorry if someone else replied and I missed it. > >> I don't know how to do this with pdfedit, but you could alternatively > >> try the pdftotext tool from xpdf, or pdftohtml if that is more suitable > >> for your purpose. > >> > >> Alister > > > > I am currently using pdftotext in my script. However, it doesn't work > > well. That means, it drops a lot of spaces between words which makes the > > output almost unuseable. This may be a problem with the PDF-input, but > > I have no influence on this. For this reason I tried to use pdfedit and > > found, that it's much better: the output is perfect. > > > >>From the man page I can see that there is a command line mode. I found the > > script savealltext.qs on the wiki. But I can't figure out how to use this > > from the command line. I still guess it must be easy, but I have no > > success so far. Unfortunately I could not find any examples of how to use > > pdfedit in command line mode. > > > > Thomas > > > > > >> -----Original Message----- > >> From: Thomas Spahni [mailto:ts...@la...] > >> Sent: Thursday, 3 September 2009 12:21 a.m. > >> To: pdf...@li... > >> Subject: [Pdfedit-support] Save file as text from the command line > >> > >> Hello > >> > >> I'm a new subscriber on this list; greetings to everyone. > >> > >> I have a bash script which at some point should translate a PDF file to > >> plain text. Let's say we have foobar.pdf and want to convert it to > >> foobar.txt. I can do this from the GUI but I'm unable to figure out what > >> > >> the command should be to do the same from the command line. > >> > >> Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor > >> help > >> would be very much appreciated. > >> > >> Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. > >> > >> Best ragards, > >> Tom > >> > >> ------------------------------------------------------------------------ > >> ------ > >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > >> 30-Day > >> trial. Simplify your report design, integration and deployment - and > >> focus on > >> what you do best, core application coding. Discover what's new with > >> Crystal Reports now. http://p.sf.net/sfu/bobj-july > >> _______________________________________________ > >> Pdfedit-support mailing list > >> Pdf...@li... > >> https://lists.sourceforge.net/lists/listinfo/pdfedit-support > >> > >> ------------------------------------------------------------------------------ > >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > >> trial. Simplify your report design, integration and deployment - and focus on > >> what you do best, core application coding. Discover what's new with > >> Crystal Reports now. http://p.sf.net/sfu/bobj-july > >> _______________________________________________ > >> Pdfedit-support mailing list > >> Pdf...@li... > >> https://lists.sourceforge.net/lists/listinfo/pdfedit-support > >> > > > > ------------------------------------------------------------------------------ > > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > > trial. Simplify your report design, integration and deployment - and focus on > > what you do best, core application coding. Discover what's new with > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > _______________________________________________ > > Pdfedit-support mailing list > > Pdf...@li... > > https://lists.sourceforge.net/lists/listinfo/pdfedit-support > > _________________________________________________________________ > With Windows Live, you can organize, edit, and share your photos. > http://www.windowslive.com/Desktop/PhotoGallery > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pdfedit-support mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-support -- Michal Hocko |
From: Jozef M. <mis...@ho...> - 2009-09-05 19:30:20
|
> Date: Sat, 5 Sep 2009 13:10:19 +0200 > From: ms...@gm... > To: pdf...@li... > Subject: Re: [Pdfedit-support] Save file as text from the command line > > On Fri, Sep 04, 2009 at 07:52:43PM +0000, Jozef Misutka wrote: > > > > hi, > > > > i changed the algorithm of pdftotext a bit but it is far from what i > > would like it to be. nevertheless, i can provide you with source code > > of you tool using our pdfedit library extract text function but you > > would have to compile it by your own. will it help? > > Isn't this just overkill for something that is already scripted and it no, because it is definitely a useful tool. will do it on monday as i am away from pc. /jozo > is only matter how to call the script? Martin, could you help with this? > > > > > /jozo > > > > ---------------------------------------- > > > Date: Fri, 4 Sep 2009 11:52:46 +0200 > > > From: ts...@la... > > > To: pdf...@li... > > > Subject: Re: [Pdfedit-support] Save file as text from the command line > > > > > > On Fri, 4 Sep 2009, Alister Hood wrote: > > > > > >> Sorry if someone else replied and I missed it. > > >> I don't know how to do this with pdfedit, but you could alternatively > > >> try the pdftotext tool from xpdf, or pdftohtml if that is more suitable > > >> for your purpose. > > >> > > >> Alister > > > > > > I am currently using pdftotext in my script. However, it doesn't work > > > well. That means, it drops a lot of spaces between words which makes the > > > output almost unuseable. This may be a problem with the PDF-input, but > > > I have no influence on this. For this reason I tried to use pdfedit and > > > found, that it's much better: the output is perfect. > > > > > >>From the man page I can see that there is a command line mode. I found the > > > script savealltext.qs on the wiki. But I can't figure out how to use this > > > from the command line. I still guess it must be easy, but I have no > > > success so far. Unfortunately I could not find any examples of how to use > > > pdfedit in command line mode. > > > > > > Thomas > > > > > > > > >> -----Original Message----- > > >> From: Thomas Spahni [mailto:ts...@la...] > > >> Sent: Thursday, 3 September 2009 12:21 a.m. > > >> To: pdf...@li... > > >> Subject: [Pdfedit-support] Save file as text from the command line > > >> > > >> Hello > > >> > > >> I'm a new subscriber on this list; greetings to everyone. > > >> > > >> I have a bash script which at some point should translate a PDF file to > > >> plain text. Let's say we have foobar.pdf and want to convert it to > > >> foobar.txt. I can do this from the GUI but I'm unable to figure out what > > >> > > >> the command should be to do the same from the command line. > > >> > > >> Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor > > >> help > > >> would be very much appreciated. > > >> > > >> Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. > > >> > > >> Best ragards, > > >> Tom > > >> > > >> ------------------------------------------------------------------------ > > >> ------ > > >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > > >> 30-Day > > >> trial. Simplify your report design, integration and deployment - and > > >> focus on > > >> what you do best, core application coding. Discover what's new with > > >> Crystal Reports now. http://p.sf.net/sfu/bobj-july > > >> _______________________________________________ > > >> Pdfedit-support mailing list > > >> Pdf...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pdfedit-support > > >> > > >> ------------------------------------------------------------------------------ > > >> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > > >> trial. Simplify your report design, integration and deployment - and focus on > > >> what you do best, core application coding. Discover what's new with > > >> Crystal Reports now. http://p.sf.net/sfu/bobj-july > > >> _______________________________________________ > > >> Pdfedit-support mailing list > > >> Pdf...@li... > > >> https://lists.sourceforge.net/lists/listinfo/pdfedit-support > > >> > > > > > > ------------------------------------------------------------------------------ > > > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > > > trial. Simplify your report design, integration and deployment - and focus on > > > what you do best, core application coding. Discover what's new with > > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > > _______________________________________________ > > > Pdfedit-support mailing list > > > Pdf...@li... > > > https://lists.sourceforge.net/lists/listinfo/pdfedit-support > > > > _________________________________________________________________ > > With Windows Live, you can organize, edit, and share your photos. > > http://www.windowslive.com/Desktop/PhotoGallery > > ------------------------------------------------------------------------------ > > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > > trial. Simplify your report design, integration and deployment - and focus on > > what you do best, core application coding. Discover what's new with > > Crystal Reports now. http://p.sf.net/sfu/bobj-july > > _______________________________________________ > > Pdfedit-support mailing list > > Pdf...@li... > > https://lists.sourceforge.net/lists/listinfo/pdfedit-support > > -- > Michal Hocko > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pdfedit-support mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-support _________________________________________________________________ Windows Live: Make it easier for your friends to see what you’re up to on Facebook. http://windowslive.com/Campaign/SocialNetworking?ocid=PID23285::T:WLMTAGL:ON:WL:en-US:SI_SB_facebook:082009 |
From: Thomas S. <ts...@la...> - 2009-09-06 14:41:14
|
On Sat, 5 Sep 2009, Jozef Misutka wrote: > > Date: Sat, 5 Sep 2009 13:10:19 +0200 > > From: ms...@gm... > > To: pdf...@li... > > Subject: Re: [Pdfedit-support] Save file as text from the command line > > > > On Fri, Sep 04, 2009 at 07:52:43PM +0000, Jozef Misutka wrote: > > > > > > hi, > > > > > > i changed the algorithm of pdftotext a bit but it is far from what i > > > would like it to be. nevertheless, i can provide you with source code > > > of you tool using our pdfedit library extract text function but you > > > would have to compile it by your own. will it help? > > > > Isn't this just overkill for something that is already scripted and it > > no, because it is definitely a useful tool. > > will do it on monday as i am away from pc. > > /jozo I agree. A better pdftotext would certainly be a benefit, as it could improve many existing scripts using it. Nevertheless I would like to learn how to use pdfedit from the command line as well. Thomas > > is only matter how to call the script? Martin, could you help with this? > > > > > > > > /jozo > > > > > > ---------------------------------------- > > > > Date: Fri, 4 Sep 2009 11:52:46 +0200 > > > > From: ts...@la... > > > > To: pdf...@li... > > > > Subject: Re: [Pdfedit-support] Save file as text from the command line > > > > > > > > On Fri, 4 Sep 2009, Alister Hood wrote: > > > > > > > >> Sorry if someone else replied and I missed it. > > > >> I don't know how to do this with pdfedit, but you could alternatively > > > >> try the pdftotext tool from xpdf, or pdftohtml if that is more > suitable > > > >> for your purpose. > > > >> > > > >> Alister > > > > > > > > I am currently using pdftotext in my script. However, it doesn't work > > > > well. That means, it drops a lot of spaces between words which makes > the > > > > output almost unuseable. This may be a problem with the PDF-input, but > > > > I have no influence on this. For this reason I tried to use pdfedit > and > > > > found, that it's much better: the output is perfect. > > > > > > > >>From the man page I can see that there is a command line mode. I found > the > > > > script savealltext.qs on the wiki. But I can't figure out how to use > this > > > > from the command line. I still guess it must be easy, but I have no > > > > success so far. Unfortunately I could not find any examples of how to > use > > > > pdfedit in command line mode. > > > > > > > > Thomas > > > > > > > > > > > >> -----Original Message----- > > > >> From: Thomas Spahni [mailto:ts...@la...] > > > >> Sent: Thursday, 3 September 2009 12:21 a.m. > > > >> To: pdf...@li... > > > >> Subject: [Pdfedit-support] Save file as text from the command line > > > >> > > > >> Hello > > > >> > > > >> I'm a new subscriber on this list; greetings to everyone. > > > >> > > > >> I have a bash script which at some point should translate a PDF file > to > > > >> plain text. Let's say we have foobar.pdf and want to convert it to > > > >> foobar.txt. I can do this from the GUI but I'm unable to figure out > what > > > >> > > > >> the command should be to do the same from the command line. > > > >> > > > >> Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor > > > >> help > > > >> would be very much appreciated. > > > >> > > > >> Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. > > > >> > > > >> Best ragards, > > > >> Tom |
From: Jozef M. <mis...@ho...> - 2009-09-07 16:07:42
|
done, check pdf_to_text.cpp in newest tools package in sourceforge downloads. (tools-Win32-20090907_1746.zip) or http://pdfedit.cvs.sourceforge.net/viewvc/pdfedit/pdfedit/src/tests/tools/pdf_to_text.cc?revision=1.1&view=markup /jozo ---------------------------------------- > Date: Sun, 6 Sep 2009 16:40:45 +0200 > From: ts...@la... > To: pdf...@li... > Subject: Re: [Pdfedit-support] Save file as text from the command line > > On Sat, 5 Sep 2009, Jozef Misutka wrote: > >>> Date: Sat, 5 Sep 2009 13:10:19 +0200 >>> From: ms...@gm... >>> To: pdf...@li... >>> Subject: Re: [Pdfedit-support] Save file as text from the command line >>> >>> On Fri, Sep 04, 2009 at 07:52:43PM +0000, Jozef Misutka wrote: >>>> >>>> hi, >>>> >>>> i changed the algorithm of pdftotext a bit but it is far from what i >>>> would like it to be. nevertheless, i can provide you with source code >>>> of you tool using our pdfedit library extract text function but you >>>> would have to compile it by your own. will it help? >>> >>> Isn't this just overkill for something that is already scripted and it >> >> no, because it is definitely a useful tool. >> >> will do it on monday as i am away from pc. >> >> /jozo > > I agree. A better pdftotext would certainly be a benefit, as it could > improve many existing scripts using it. > > Nevertheless I would like to learn how to use pdfedit from the command > line as well. > > Thomas > >>> is only matter how to call the script? Martin, could you help with this? >>> >>>> >>>> /jozo >>>> >>>> ---------------------------------------- >>>>> Date: Fri, 4 Sep 2009 11:52:46 +0200 >>>>> From: ts...@la... >>>>> To: pdf...@li... >>>>> Subject: Re: [Pdfedit-support] Save file as text from the command line >>>>> >>>>> On Fri, 4 Sep 2009, Alister Hood wrote: >>>>> >>>>>> Sorry if someone else replied and I missed it. >>>>>> I don't know how to do this with pdfedit, but you could alternatively >>>>>> try the pdftotext tool from xpdf, or pdftohtml if that is more >> suitable >>>>>> for your purpose. >>>>>> >>>>>> Alister >>>>> >>>>> I am currently using pdftotext in my script. However, it doesn't work >>>>> well. That means, it drops a lot of spaces between words which makes >> the >>>>> output almost unuseable. This may be a problem with the PDF-input, but >>>>> I have no influence on this. For this reason I tried to use pdfedit >> and >>>>> found, that it's much better: the output is perfect. >>>>> >>>>>>From the man page I can see that there is a command line mode. I found >> the >>>>> script savealltext.qs on the wiki. But I can't figure out how to use >> this >>>>> from the command line. I still guess it must be easy, but I have no >>>>> success so far. Unfortunately I could not find any examples of how to >> use >>>>> pdfedit in command line mode. >>>>> >>>>> Thomas >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Thomas Spahni [mailto:ts...@la...] >>>>>> Sent: Thursday, 3 September 2009 12:21 a.m. >>>>>> To: pdf...@li... >>>>>> Subject: [Pdfedit-support] Save file as text from the command line >>>>>> >>>>>> Hello >>>>>> >>>>>> I'm a new subscriber on this list; greetings to everyone. >>>>>> >>>>>> I have a bash script which at some point should translate a PDF file >> to >>>>>> plain text. Let's say we have foobar.pdf and want to convert it to >>>>>> foobar.txt. I can do this from the GUI but I'm unable to figure out >> what >>>>>> >>>>>> the command should be to do the same from the command line. >>>>>> >>>>>> Yes, I read the docs, manpage, wiki, archives, but still no luck. Yor >>>>>> help >>>>>> would be very much appreciated. >>>>>> >>>>>> Details: PDFedit 0.4.2 from the SuSE-11.1 packman repo. >>>>>> >>>>>> Best ragards, >>>>>> Tom > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pdfedit-support mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-support _________________________________________________________________ Get back to school stuff for them and cashback for you. http://www.bing.com/cashback?form=MSHYCB&publ=WLHMTAG&crea=TEXT_MSHYCB_BackToSchool_Cashback_BTSCashback_1x1 |
From: Thomas S. <ts...@la...> - 2009-09-08 15:10:28
|
On Mon, 7 Sep 2009, Jozef Misutka wrote: > > done, > > check pdf_to_text.cpp in newest tools package in sourceforge downloads. > (tools-Win32-20090907_1746.zip) or > http://pdfedit.cvs.sourceforge.net/viewvc/pdfedit/pdfedit/src/tests/tools/pdf_to_text.cc?revision=1.1&view=markup > > /jozo Hi Jozef Thanks a lot!!! My progress is as follows: - checked out the cvs repository - did autoconf and compiled it (success) - went to the directory src/tests/tools/ - noticed that your pdf_to_text.cc is not (yet) in the Makefile - added pdf_to_text.cc to TARGET_SRCS and pdf_to_text to TARGET and inserted a rule like this: pdf_to_text: pdf_to_text.o $(LINK) $(LDFLAGS) -o pdf_to_text pdf_to_text.o $(UTILS_OBJS) \ $(MANDATORY_LIBS) - tried: make --> error regarding missing common.o which was not compiled automatically. - compiled common.o manually (success) - again: make --> g++ -c -O2 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -fexceptions -pipe -posix -I. -I/home/tsp/programs/pdfedit/CVS/pdfedit/src -I/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/ -I/usr/include -I/usr/include/freetype2 -o pdf_to_text.o pdf_to_text.cc g++ -o pdf_to_text pdf_to_text.o common.o -L/usr/lib -lkernel -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/kernel -lutils -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/utils -lxpdf -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/xpdf -lfofi -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/fofi -lGoo -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/goo -lsplash -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/splash -lfreetype -lz pdf_to_text.o: In function `main': pdf_to_text.cc:(.text+0x1a3): undefined reference to `boost::program_options::options_description::m_default_line_length' pdf_to_text.cc:(.text+0x1b9): undefined reference to `boost::program_options::options_description::options_description(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int)' (many more of those, all about boost) So the compile is fine but linking fails. What other directory must be included for linking to make this work? Best regards Thomas |
From: Michal H. <ms...@gm...> - 2009-09-08 17:20:43
|
On Tue, Sep 08, 2009 at 05:10:05PM +0200, Thomas Spahni wrote: > On Mon, 7 Sep 2009, Jozef Misutka wrote: > > > > > done, > > > > check pdf_to_text.cpp in newest tools package in sourceforge downloads. > > (tools-Win32-20090907_1746.zip) or > > http://pdfedit.cvs.sourceforge.net/viewvc/pdfedit/pdfedit/src/tests/tools/pdf_to_text.cc?revision=1.1&view=markup > > > > /jozo > > Hi Jozef > > Thanks a lot!!! My progress is as follows: > > - checked out the cvs repository > > - did autoconf and compiled it (success) > > - went to the directory src/tests/tools/ > > - noticed that your pdf_to_text.cc is not (yet) in the Makefile Tools are not incorporated into our build system yet. This is just discussed in our devel mailing list but I assume that it will take some time until this will work. > > - added pdf_to_text.cc to TARGET_SRCS and pdf_to_text to TARGET and > inserted a rule like this: > > pdf_to_text: pdf_to_text.o > $(LINK) $(LDFLAGS) -o pdf_to_text pdf_to_text.o $(UTILS_OBJS) \ > $(MANDATORY_LIBS) > > - tried: make --> error regarding missing common.o which was not > compiled automatically. This is because pagemetrics and displaycs want to link with UTILS_OBJS even though they don't need it. > > - compiled common.o manually (success) > > - again: make --> > > g++ -c -O2 -fmessage-length=0 -D_FORTIFY_SOURCE=2 -fno-strict-aliasing > -fexceptions -pipe -posix -I. -I/home/tsp/programs/pdfedit/CVS/pdfedit/src > -I/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/ -I/usr/include > -I/usr/include/freetype2 -o pdf_to_text.o pdf_to_text.cc > g++ -o pdf_to_text pdf_to_text.o common.o -L/usr/lib -lkernel > -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/kernel -lutils > -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/utils -lxpdf > -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/xpdf -lfofi > -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/fofi -lGoo > -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/goo -lsplash > -L/home/tsp/programs/pdfedit/CVS/pdfedit/src/xpdf/splash -lfreetype -lz > pdf_to_text.o: In function `main': > pdf_to_text.cc:(.text+0x1a3): undefined reference to > `boost::program_options::options_description::m_default_line_length' > pdf_to_text.cc:(.text+0x1b9): undefined reference to > `boost::program_options::options_description::options_description(std::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, unsigned int)' > > (many more of those, all about boost) > > So the compile is fine but linking fails. What other directory must be > included for linking to make this work? You are missing boost::program_options library. We are not detecting and configuring it during configure phase. We will add this test (I hope) but it will take some time. Until then the following workaround should work: pdf_to_text: pdf_to_text.o $(LINK) $(LDFLAGS) -o pdf_to_text pdf_to_text.o $(MANDATORY_LIBS) -lboost_program_options-mt Maybe you will need use -lboost_program_options instead and maybe also some /usr/lib64 tweaking. > > Best regards > Thomas > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > Pdfedit-support mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-support -- Michal Hocko |
From: Michal H. <ms...@gm...> - 2009-09-08 21:59:14
Attachments:
tools-revork.patch
|
On Tue, Sep 08, 2009 at 07:20:30PM +0200, Michal Hocko wrote: > On Tue, Sep 08, 2009 at 05:10:05PM +0200, Thomas Spahni wrote: > > On Mon, 7 Sep 2009, Jozef Misutka wrote: > > > > > > > > done, > > > > > > check pdf_to_text.cpp in newest tools package in sourceforge downloads. > > > (tools-Win32-20090907_1746.zip) or > > > http://pdfedit.cvs.sourceforge.net/viewvc/pdfedit/pdfedit/src/tests/tools/pdf_to_text.cc?revision=1.1&view=markup > > > > > > /jozo > > > > Hi Jozef > > > > Thanks a lot!!! My progress is as follows: > > > > - checked out the cvs repository > > > > - did autoconf and compiled it (success) > > > > - went to the directory src/tests/tools/ > > > > - noticed that your pdf_to_text.cc is not (yet) in the Makefile > > Tools are not incorporated into our build system yet. This is just > discussed in our devel mailing list but I assume that it will take some > time until this will work. OK, it went better than I expected ;) Could you try the attached patch series (please run cvs -q up -P -d before applying - there has been some changes in the area). You will need to run autoconf to re-generate configure script and then run ./configure --enable-tools [--disable-gui] [--disable-gui will (as name suggesting) prevevent from GUI compilation.] then you can go to the src directory and run make. Make sure that you have done make clean before that. Btw. what kind of system do you use? (OS, Architecture, version of the boost-program-options library). Thanks -- Michal Hocko |
From: Jozef M. <mis...@ho...> - 2009-09-09 06:26:25
|
the patch is in binary format ?! _________________________________________________________________ Windows Live: Keep your friends up to date with what you do online. http://windowslive.com/Campaign/SocialNetworking?ocid=PID23285::T:WLMTAGL:ON:WL:en-US:SI_SB_online:082009 |
From: Michal H. <ms...@gm...> - 2009-09-09 07:22:57
|
On Wed, Sep 09, 2009 at 06:23:30AM +0000, Jozef Misutka wrote: > > the patch is in binary format ?! Sorry, I forgot to add tag.gz to the name... It is an archive with 3 patches inside. -- Michal Hocko |
From: Thomas S. <ts...@la...> - 2009-09-10 10:44:31
|
On Tue, 8 Sep 2009, Michal Hocko wrote: > On Tue, Sep 08, 2009 at 07:20:30PM +0200, Michal Hocko wrote: >> On Tue, Sep 08, 2009 at 05:10:05PM +0200, Thomas Spahni wrote: >>> On Mon, 7 Sep 2009, Jozef Misutka wrote: >>> >>>> >>>> done, >>>> >>>> check pdf_to_text.cpp in newest tools package in sourceforge downloads. >>>> (tools-Win32-20090907_1746.zip) or >>>> http://pdfedit.cvs.sourceforge.net/viewvc/pdfedit/pdfedit/src/tests/tools/pdf_to_text.cc?revision=1.1&view=markup >>>> >>>> /jozo >>> >>> Hi Jozef >>> >>> Thanks a lot!!! My progress is as follows: >>> - checked out the cvs repository >>> - did autoconf and compiled it (success) >>> - went to the directory src/tests/tools/ >>> - noticed that your pdf_to_text.cc is not (yet) in the Makefile >> >> Tools are not incorporated into our build system yet. This is just >> discussed in our devel mailing list but I assume that it will take some >> time until this will work. > > OK, it went better than I expected ;) Could you try the attached patch > series (please run cvs -q up -P -d before applying - there has been some > changes in the area). You will need to run autoconf to re-generate > configure script and then run ./configure --enable-tools [--disable-gui] > > [--disable-gui will (as name suggesting) prevevent from GUI compilation.] > > then you can go to the src directory and run make. Make sure that you > have done make clean before that. > > Btw. what kind of system do you use? (OS, Architecture, version of the > boost-program-options library). > > Thanks > -- > Michal Hocko Hello Michal & Jozef Thank you very much. This works like a charm! Absolutely perfect. I took a fresh copy of the CVS, applied your patches, autoconf, configure, make --> build went ok. pdf_to_text works! (gui as well). There is one minor glitch: output of pdf_to_text is utf-8 (as documented) and when I recode that to latin1 with 'recode' it complains about non-valid input. Forcing it with 'recode -f' works. There must be some non-utf8 code in the *.txt file. It seems to stop at the following sequence in the text (hex): e2 96 a0. This could be a problem of the PDF-source, I don't know. I'm on a Linux install of SuSE-11.1 i686 32-bit with kernel 2.6.27.29-0.1-pae. Installed packages are from the standard repo (not yet updated). libboost_program_options is Ver. 1.36.0; gcc 4.3.2. Thomas |
From: Michal H. <ms...@gm...> - 2009-09-10 10:54:56
|
On Thu, Sep 10, 2009 at 12:44:05PM +0200, Thomas Spahni wrote: > On Tue, 8 Sep 2009, Michal Hocko wrote: > > > On Tue, Sep 08, 2009 at 07:20:30PM +0200, Michal Hocko wrote: > >> On Tue, Sep 08, 2009 at 05:10:05PM +0200, Thomas Spahni wrote: > >>> On Mon, 7 Sep 2009, Jozef Misutka wrote: > >>> > >>>> > >>>> done, > >>>> > >>>> check pdf_to_text.cpp in newest tools package in sourceforge downloads. > >>>> (tools-Win32-20090907_1746.zip) or > >>>> http://pdfedit.cvs.sourceforge.net/viewvc/pdfedit/pdfedit/src/tests/tools/pdf_to_text.cc?revision=1.1&view=markup > >>>> > >>>> /jozo > >>> > >>> Hi Jozef > >>> > >>> Thanks a lot!!! My progress is as follows: > >>> - checked out the cvs repository > >>> - did autoconf and compiled it (success) > >>> - went to the directory src/tests/tools/ > >>> - noticed that your pdf_to_text.cc is not (yet) in the Makefile > >> > >> Tools are not incorporated into our build system yet. This is just > >> discussed in our devel mailing list but I assume that it will take some > >> time until this will work. > > > > OK, it went better than I expected ;) Could you try the attached patch > > series (please run cvs -q up -P -d before applying - there has been some > > changes in the area). You will need to run autoconf to re-generate > > configure script and then run ./configure --enable-tools [--disable-gui] > > > > [--disable-gui will (as name suggesting) prevevent from GUI compilation.] > > > > then you can go to the src directory and run make. Make sure that you > > have done make clean before that. > > > > Btw. what kind of system do you use? (OS, Architecture, version of the > > boost-program-options library). > > > > Thanks > > -- > > Michal Hocko > > > Hello Michal & Jozef > > Thank you very much. This works like a charm! Absolutely perfect. > > I took a fresh copy of the CVS, applied your patches, autoconf, configure, > make --> build went ok. pdf_to_text works! (gui as well). Thanks for testing! I am currently doing some cleanup on patches and I assume that they go into the CVS in some short time. > > There is one minor glitch: output of pdf_to_text is utf-8 (as documented) > and when I recode that to latin1 with 'recode' it complains about > non-valid input. Forcing it with 'recode -f' works. There must be some > non-utf8 code in the *.txt file. It seems to stop at the following > sequence in the text (hex): e2 96 a0. This could be a problem of the > PDF-source, I don't know. > > I'm on a Linux install of SuSE-11.1 i686 32-bit with kernel > 2.6.27.29-0.1-pae. Installed packages are from the standard repo (not yet > updated). libboost_program_options is Ver. 1.36.0; gcc 4.3.2. Thanks! > > Thomas -- Michal Hocko |