pubcrawl Code

🍺📖 Convert 'epub' Files to Text

Brought to you by: brudis

Tree [a977f3] master / History

HTTPS access

File	Date	Author	Commit
R	2018-06-08	boB Rudis	[a977f3] better HTML file regex
inst	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
man	2018-04-12	boB Rudis	[da73ca] Experimental encoding support to help #4 along
tests	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
.Rbuildignore	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
.codecov.yml	2018-04-12	boB Rudis	[6ef504] initial commit
.gitattributes	2018-04-12	boB Rudis	[2cfb61] Update travis & appveyor configs
.gitignore	2018-04-12	boB Rudis	[6ef504] initial commit
.travis.yml	2018-04-12	boB Rudis	[2cfb61] Update travis & appveyor configs
CONDUCT.md	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
DESCRIPTION	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
NAMESPACE	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
NEWS.md	2018-04-12	boB Rudis	[6ef504] initial commit
README.Rmd	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
README.md	2018-04-12	boB Rudis	[b915b6] rookie mistake leaving hack-test code in a func...
appveyor.yml	2018-04-12	boB Rudis	[2cfb61] Update travis & appveyor configs
codecov.yml	2018-04-12	boB Rudis	[ddb0a6] spiffied it up a bit
pubcrawl.Rproj	2018-04-12	boB Rudis	[6ef504] initial commit

Read Me

---
output: rmarkdown::github_document
---

[![Travis-CI Build Status](https://travis-ci.org/hrbrmstr/pubcrawl.svg?branch=master)](https://travis-ci.org/hrbrmstr/pubcrawl) 
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/hrbrmstr/pubcrawl?branch=master&svg=true)](https://ci.appveyor.com/project/hrbrmstr/pubcrawl) 
[![Coverage Status](https://img.shields.io/codecov/c/github/hrbrmstr/pubcrawl/master.svg)](https://codecov.io/github/hrbrmstr/pubcrawl?branch=master)

# pubcrawl

Convert 'epub' Files to Text

## Description

Convert 'epub' Files to Text

The 'epub' file format is really just a structured 'ZIP' archive with 
metadata, graphics and (usually) 'HTML' text. Tools are provided to turn an 'epub'
file into a tidy data frame.

## What's Inside The Tin

The following functions are implemented:

- `epub_to_text`:	Convert an epub file into a data frame of plaintext chapters

## NOTE

There are edge cases I've totally not covered yet. Feel free to jump in and make this a real, useful package!

## TODO

- [X] Refactor so there aren't so many heavy dependencies
- <strike> [ ] Try to get `hgr` on CRAN so it's not a GH dep</strike> Moved the cleaner code into here
- [ ] Better docs
- [X] Embed some epubs for examples and tests
- [X] Setup Travis, Appveyor, code coverage

## Installation

```{r eval=FALSE}
devtools::install_github("hrbrmstr/pubcrawl")
```

```{r message=FALSE, warning=FALSE, error=FALSE, include=FALSE}
options(width=120)
```

## Usage

```{r message=FALSE, warning=FALSE, error=FALSE}
library(pubcrawl)
library(tidyverse)

# current verison
packageVersion("pubcrawl")

```

### An O'Reilly epub

```{r}
epub_to_text("~/Data/R Packages.epub")
```

### A Project Gutenberg epub that comes with the package

```{r}
epub_to_text(system.file("extdat", "augustine.epub", package="pubcrawl")) %>% 
  mutate(path = abbreviate(path))
```

## Code of Conduct

Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.

pubcrawl Code

🍺📖 Convert 'epub' Files to Text

Branches

Tree [a977f3] master / Download Snapshot History

Read Me

Tree [a977f3] master /

History