Readwise + Reader API notes
A few weeks back I switched from Raindrop to Readwise for my reading, highlights, and related notes. The main reason for switching was the Kindle integration, but part of my criteria is an API integration where I can automatically collect + archive the data into my personal cloud. It ended up being a bit of a learning curve, so below are notes for anyone who might be interested in doing the same:
- The Readwise umbrella has two apps: Readwise and Reader. Readwise focuses on highlights + reviewing highlights, whereas Reader is a web bookmarking + highlighting tool that's more akin to Raindrop. The APIs are separate, with their own docs + internal concepts: [Readwise API] [Reader API]
- Consistent with the apps, the Readwise API focuses on "Highlights" + "Books" and the Reader API centers around "Documents" which can be links, highlights, and notes.
- Readwise has many integrations, including with Reader. So a saved link + highlights in Reader will also show up in Readwise (despite the "book"-centric terminology). Saved links without highlights will not show up in Readwise.
- Somewhat annoyingly, the two use completely different id spaces. e.g. if I have "17 ChatGPT Use Cases", it might have
user_book_id
123
in Readwise butdocument_id
asdf
in Reader. There is an indirect one-way lookup between the two: the Readwise record contains aunique_url
field that has the formathttps://read.readwise.io/read/asdf
. Other than that, the only way to tell if a Reader item is "the same" as a Readwise item is via other attributes like article title or url. - Another idiosyncrasy: highlight entries from the Readwise API contain location information, i.e. "where in an article" or "what page in the book" the highlight came from. The Reader API does not have this information – even for items generated by the Reader app.
- Finally: the Reader API recently added a
withHtmlContent
which returns a document's scraped HTML from Reader. Readwise has no equivalent of this field.
So in my case, where I want the maximum information around my reading + highlights + notes, I have to sort of stitch together the two APIs. Roughly:
- Pull non-highlight items that get saved in Reader so I have the most complete list of links / articles / videos / etc.
- Readwise is the source of truth for highlights + metadata. It contains the most info including location information.
- When an item gets saved in reader and then highlighted later (i.e. in between archival script runs), use the
unique_url
above to link the original Reader item to the Readwise item for highlight information.
Despite some workarounds, this API integration has been working alright so far. The products themselves are also pretty top notch. Now I just have to go and backfill all the Reader html_content
for my items from before they introduced it...