Re: [Air-l] ensuring full-text pdfs [was Re: Citation Managers - Alternatives to Endnote/CiteULike/... ?]

19 Mar 2006


      ...
I too strongly prefer full-text pdfs, I'd love to know how you ensure
this.
I've written to our e-journal subscriptions person at the library
explaining their importance and requesting that the library prefer
full-text pdf providers when purchasing eJournal subscriptions.
Unfortunately this crucial bit of metadata doesn't appear to make its
way into library catalogues (or services like openurl lookups) so I
find it hard to remember which providers have full-text searchable
pdfs while grabbing research papers online.  This means I'm often
annoyed later finding image PDFs (yes, I'm looking at you JSTOR) in
my personal library.
i use acrobat pro 7 to batch convert jstor high quality pdfs to txt  
pdfs.   i just put all of my pdfs that i need converted into one  
directory and batch convert them to txt pdfs.   it takes about an  
hour per 30 jstor pdfs.  none of the jstor pdfs are protected, so  
this works.
...
To be clear, I'm referring to online journal providers. If one is
printing html or other full-text documents---ie creating one's own
PDF records---then it is fairly easy to ensure: It's the default mode
in 'printing PDF to disk' on OS X, and I assume that Acrobat has that
same capability on windows.
some journal providers protect their pdfs.   if they do that, you are  
not suppose to circumvent their protection.   so don't use them.    
alternatively, you could, and i am not advising this, just ignore the  
protection in one of the ways suggested on any number of websites.   
alternatively, you can just refuse to cite protected manuscripts.   
this has the effect of lowering the citation impact score of those  
journals and encourages people to not publish there and also  
encourages editors to change their policy.
...
Any hints?  Are you using OCR to convert the image PDFs?  Is that
pretty effective?  Does it integrate with DEVONthink?
once they are ocr'd in acrobat pro, i then import them into  
devonthink.   acrobat pro is excellent for ocr.
...
Perhaps we should create a list of full-text PDF providers on a wiki
somewhere?  Does anyone in the e-journal purchasing world already
prefer full-text PDF providers?  Or maybe end-user OCR is sufficient?
feel free to create a section on wiki.aoir.org for this.
...
Thanks,
James
yep
...
jeremy hunsinger
jhuns@vt.edu
www.cddc.vt.edu
jeremy.tmttlt.com
www.tmttlt.com

()  ascii ribbon campaign - against html mail
/\                        - against microsoft attachments
http://www.stswiki.org/  sts wiki
http://cfp.learning-inquiry.info/  LI-the journal

Re: [Air-l] ensuring full-text pdfs [was Re: Citation Managers - Alternatives to Endnote/CiteULike/... ?]

Jeremy Hunsinger