Wednesday, December 08, 2010

Tools for combining BibTeX, PDFs, and e-Readers

Quick links:
UPDATE: If you are looking for a simpler solution that updates PDF's in your BibTeX database with the corresponding BibTeX data, check out Be careful though; the old PDF::API2 module only supports Adobe PDF 1.4 (i.e., compatibility-mode PDF's).
UPDATE: Apparently, you can also manage your Kindle collections efficiently using emacs, using Calibre, or manually so long as you're okay with possibly having to reboot your Kindle after every change. So it would be possible for me to add Kindle collections.json support to my fix-pdf-tags script; I just don't anticipate having the time to do that in the near future (plus, I don't own a Kindle anymore, and so I wouldn't be able to test it).
In trying to migrate to e-Readers, I've been experimenting with both the Amazon Kindle 3 and the Sony Daily Reader PRS-950. They both have nice features... I don't have time to go into a review, but I'll give a teaser...
  • Kindle PRO's: It has a nice web browser (that works on 3G too!), and makes it super easy to get new content onto the device. In fact, you can even download PDF's from Dropbox via the nifty browser (although e-mailing to your Amazon Kindle e-mail address is convenient too). Plus, Amazon makes for a nice e-Book store -- lots of the books I would want.
  • Kindle CON's: However, PDF's really have to be in compatibility mode (Acrobat 1.4). Otherwise, the Kindle will miss all of the metadata. More importantly, it is basically impossibledifficult to manage collections through the USB. So if you have hundreds of PDF's, you'll spend days tagging them via the clunky Kindle keyboard.
    • UPDATE: Apparently, the Kindle uses a simple SHA-1 hash of the file's full path as a key in the collections.json file that is accessible via USB. Consequently, you can manage your collections data more efficiently. You can do so with an emacs script or with a calibre plugin or manually. However, you may have to reboot your Kindle every time you make a change. At least with the older Kindles, the collections.json file was only read on boot. It's possible that the newer Kindles are smart enough to refresh collections data every time the USB is unplugged (like the Sony does), but I honestly don't know. I have a feeling that Hannes, the author of bibtex-kindle, knows though.
  • Sony Daily PRO's: It has the optical touch screen. It doesn't require compatibility-mode PDF's. It has a large screen. It has terrific page viewing options. In theory, the PDF note options are very nice, but e-Reader notes just seem tedious to me in general regardless of interface. More importantly, it is essentially an "open" platform so long as you are OK with a little bit of reverse engineering. It is easy to write a few scripts to manage your XML files, and so keeping your PDF's organized is easy for your average script kiddy.
  • Sony Daily CON's: The optical touch screen means the screen is sunk down so far that it the chassis casts a small shadow around the edge of the screen. The Sony case-with-light isn't as nice as the Kindle's case-with-light. The Sony Bookstore doesn't have as many books (or at least the books I care about). The zoom modes leave much to be desired. In PDF's that work fine on the Kindle, trying to click a word for dictionary lookup often leads to selecting a phrase (and there's nothing you can do about it).
And there's plenty more to talk about, but those are the quick things off the top of my head. So it looks like I'm probably going to keep both... so I can have a diversity of e-Books available to me. Plus, the e-Book experience is a little nicer on the Kindle, but the PDF experience might be a little nicer on the Sony. It's hard to tell.

But what this post is really about is a utility I've put together that automatically manages my research PDF collection on either the Kindle or the Sony Reader. In particular,
  • It updates PDF's with metatags to match author/editor/title information from a central BibTeX database.
  • If you invoke it with a "kindle" argument, it converts PDF's to 1.4 so the Kindle can read the metatags.
  • If you invoke it with a "reader" argument, it also automatically generates categories based on file hierarchy (i.e., the folders in which your PDF files live). In fact, symbolic links indicate that multiple tags should be applied to the same file (i.e., the target of the symbolic link).
So maybe that will be helpful to someone (at least as an example to generate some ideas). The project started out as something customized for me, but I've tried to make the documentation clear (see the chunk at the top to start). Plus, most of the important custom information (paths, preferences, etc.) are at the top of the script.

Check out the most recent version of my fix_pdf_tags script; it resides in my bibtex_to_pdf Mercurial repository where you can view its change history.

P.S. I know that Calibre is an existing software package that has very similar aims and a nice graphical environment. However, it really is a poor choice for managing PDF research. Plus, the Calibre folks have basically written off Kindle users as poor schmucks with hobbled readers. More importantly for me, I'm much happier with scripted solutions that can be fired off quickly.

Wednesday, September 08, 2010

Red Queen Hypothesis 2.0: Social Networking (and the Internet) as a response to disease

After hearing someone in the next cubicle start hacking away, I got thinking about how the probability of me staying healthy is so greatly decreased by working in this common environment. That made me start dreading the arrival of new undergraduate and graduate students who might further pollute the air with their... um... filth.

The recent bedbug scare spreading across the nation (where Ohio is one hotspot) fits into this line of thinking too. One unlucky or unsanitary person turns into a possibly unaware transmission vector. Even those who are aware of the problem may be unwilling or unable to stop the problem. Of course, this goes for other pests like fleas as well. Moreover, as more people come back to the workplace, there is more chance of these pests moving from person to person, perhaps with a chair or a floor or a cubicle wall in between. It is the dual to herd immunity; it's herd vulnerability – a large group of healthy people are only as strong as their sickest link.

Driving to work today, I heard someone on NPR talk about how about the Internet is making offices unnecessary or deprecated. People are able to do work at home while still staying in contact with their customers and the rest of the work force. That is, they are still able to leverage the power of humans to form productive groups without having to actually be in the same space of those humans. With this still ringing in my ears coupled with the new sound of the guy in the next cubicle hacking away, it made me think that maybe the Internet and social networking are just another product of the Red Queen running to stay in one place. That is, although the ostensible purpose of physical isolation alongside virtual collaboration has nothing to do with disease, a collateral effect is that many communicable diseases have a hard time commuting across wires and fibers. So that's a happy thought, right?

On the other hand, there's that Bruce Willis movie that seems to be show the dystopia of my fantasy disesae-free future...

As if on cue, that thought is interrupted by a sneeze from the next cubicle over.

How to put "1 of ..." page numbers on a LaTeX letter

Someone e-mailed me recently to ask how to add "1 of LAST_PAGE" page numbers to a LaTeX letter document. I generated the sample LaTeX document fancy_letter_numbering.tex to show how it's done using the fancyhdr and lastpage packages.

Here is my response, which gives more details:
I have attached a sample letter with "1 of ..." page numbering throughout. fancyhdr works with "letter" just as well as it works with "article." You just have to treat "\opening" just like you do "\maketitle."

In particular, put this up in your preamble:
Then, after each "\opening" (in the case of a single letter, there will only be one), put the line:
Otherwise you will get page numbers on every page except for the first one.
Of course, you will have to run LaTeX (or PDFLaTeX) at least twice to place the "LastPage" label properly and generate the correct page refeference.

[ Additionally, you may further customize the headings and footers making use of all of the nice features that come with the fancyhdr package. ]

Monday, May 10, 2010

When the deli says "surprise" on the menu, they mean it

Now that's one helpful nutrition label...

Wednesday, April 21, 2010

How to ungrey "Download to Picasa" on PicasaWeb

I notice that when I run Picasa and "Import from Web Albums," many of the albums never download all of their pictures. For example, in one album that has 373 pictures, Picasa downloads a seemingly random 272 of them. Others who had this problem on-line said that using the "Download to Picasa" link on PicasaWeb (under the "Download" menu) would fix this problem. That is, that link would download all pictures. Unfortunately, all of the "Download" links are greyed out in my browser (Firefox on Linux).

To fix this problem, I installed the Greasemonkey firefox addon and then created this simple Greasemonkey script to trick Google into "ungreying" those download links.

Now I can click "Download to Picasa" and have it download all of those photos. My first trial download hasn't finished yet because it doesn't seem to care that 272 photos have already been downloaded, and so I'm not quite sure what the ultimate outcome will be... However, I think this is progress.

NOTE: For Linux users, adding a single line to the file ~/.mailcap that has in it:
application/x-picasa-detect; false; description=Picasa Installation detection
(in all one line; no wrapping) will also ungrey "Download to Picasa". Unfortunately, that only seems to convince PicasaWeb that Picasa2 is installed. To get all four options ungreyed, the force script is needed.

Friday, April 16, 2010

Is an SPSS monster like a SAS bunny rabbit?

A friend of mine had a Google Talk status of "Now I'm the SPSS monster" today. Lately, I have picked up the contagious habit of making fun of people who use gooey (GUI) SPSS, and so I responded by e-mail, "Is an SPSS monster like a SAS bunny rabbit?" She responded, "Could be. Or an R-invader." I couldn't resist letting this snowball turn into the avalanche it really could be, and so...
Kick S. Way to JMP on that one and even Z-score. Such a rejoinder makes me want to click away to one of the Minitabs of my browser. Phew, all of this stat talk makes me want to regress back into MATLAB; even if I am still centrally limited there, at least I can feel normal again.

Anyway, I wasn't trying to be mean. If I was, I hope you won't log this transformation and hold it against me later. I'm certain I can transcend and function better in the future; a higher power law need not intervene. Hopefully this hypothesis is correct and you will see some significant change. That should help you restore your confidence.

On a different note, I saw some Monte Carlo tulips at the zoo last weekend; it seems risky to have planted those at this time of the season, but hopefully they will Excel. If they do die, I'm afraid this story will have a heavy tail indeed.

By the way, yesterday for graduate appreciation day, Jessie got a coupon for $1 coffee at the expensive campus Starbucks. With the discount, prices are about normal. I guess there is no such thing as a scale free lunch. Shoot, I'm afraid my coffee has gone cold and is starting to taste a little bit like Poisson.

Well, enough of this. I'm sure if you remove the outlier that is e-mail thread, you'll find that the remaining e-mails are far less skewed and better fit the distribution you have come to expect.

I hope all of your days are better than average! --
There are parts of that that I'm not that excited about, but overall I'm pretty proud of myself.