Textricatorweb

This is a web service for Textricator, a library for parsing data from computer-generated PDF files.

Notes:

Web UI

The web UI for each PDF has a link to download the PDF, a button to show the raw text content of the PDF, and allows you to type in the configuration for Textricator, and click a button to parse the PDF and show the results.

/files/ has links to the web UI pages for the available PDFs.

Endpoints:

GET /files/

Get a list of available files. Produces text/html or application/json

GET /files/{id}.pdf

Download the PDF for the specified ID. Produces application/pdf.

GET /files/{id}.pdf/text

Extract the raw text content of the PDF, as JSON. Produces application/json.

POST /files/{id}.pdf/fsm/{format}

Parse the PDF data, returning the result in the specified format.

Request body: YAML configuration for the FSM.

{format} options:

GET /files/{id}.pdf/ui

Load the web UI for the specified PDF.

Parser configuration

See the Textricator Github page for documentation on writing the parser configuration.

Legal

TextricatorWeb is licensed under the AGPLv3 (as is Textricator).

Source code for TextricatorWeb 9.0.20 is available at https://github.com/measuresforjustice/textricator-web

Source code for Textricator 9.0.46 is available at https://github.com/measuresforjustice/textricator

This service uses the following libraries:

Textricator mascot