Read Me
---
output: rmarkdown::github_document
---
[](https://travis-ci.org/hrbrmstr/pubcrawl)
[](https://ci.appveyor.com/project/hrbrmstr/pubcrawl)
[](https://codecov.io/github/hrbrmstr/pubcrawl?branch=master)
# pubcrawl
Convert 'epub' Files to Text
## Description
Convert 'epub' Files to Text
The 'epub' file format is really just a structured 'ZIP' archive with
metadata, graphics and (usually) 'HTML' text. Tools are provided to turn an 'epub'
file into a tidy data frame.
## What's Inside The Tin
The following functions are implemented:
- `epub_to_text`: Convert an epub file into a data frame of plaintext chapters
## NOTE
There are edge cases I've totally not covered yet. Feel free to jump in and make this a real, useful package!
## TODO
- [X] Refactor so there aren't so many heavy dependencies
- <strike> [ ] Try to get `hgr` on CRAN so it's not a GH dep</strike> Moved the cleaner code into here
- [ ] Better docs
- [X] Embed some epubs for examples and tests
- [X] Setup Travis, Appveyor, code coverage
## Installation
```{r eval=FALSE}
devtools::install_github("hrbrmstr/pubcrawl")
```
```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```
## Usage
```{r message=FALSE, warning=FALSE, error=FALSE}
library(pubcrawl)
library(tidyverse)
# current verison
packageVersion("pubcrawl")
```
### An O'Reilly epub
```{r}
epub_to_text("~/Data/R Packages.epub")
```
### A Project Gutenberg epub that comes with the package
```{r}
epub_to_text(system.file("extdat", "augustine.epub", package="pubcrawl")) %>%
mutate(path = abbreviate(path))
```
## Code of Conduct
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.