Pantomime 2.5.0 is Released
TL;DR
Pantomime is a Clojure interface to Apache Tika.
Changes between Pantomime 2.4.0 and 2.5.0
Content Extraction API
Pantomime now provdes access to Tika’s content extraction
functionality via pantomime.extract/parse
:
(require [clojure.java.io :as io]
[pantomime.extract :as extract])
(pprint (extract/parse "test/resources/pdf/qrl.pdf"))
;= {:producer ("GNU Ghostscript 7.05"),
;= :pdf:pdfversion ("1.2"),
;= :dc:title ("main.dvi"),
;= :dc:format ("application/pdf; version=1.2"),
;= :xmp:creatortool ("dvips(k) 5.86 Copyright 1999 Radical Eye Software"),
;= :pdf:encrypted ("false"),
;= ...
;= :text "\nQuickly Reacquirable Locks∗\n\nDave Dice Mark Moir ... "
;= }
If extraction fails, extract.parse
will return the following:
{:text "",
:content-type ("application/octet-stream"),
:x-parsed-by ("org.apache.tika.parser.EmptyParser")}
extract/parse
is a simple interface to Tika’s own
Parser.parse method.
Contributed by Joshua Thayer.
Change Log
Pantomime change log is available on GitHub.
Pantomime is a ClojureWerkz Project
Pantomime is part of the group of libraries known as ClojureWerkz, together with
- Langohr, a Clojure client for RabbitMQ that embraces the AMQP 0.9.1 model
- Elastisch, a minimalistic Clojure client for ElasticSearch
- Cassaforte, a Clojure Cassandra client built around CQL
- Monger, a Clojure MongoDB client for a more civilized age
- Welle, a Riak client with batteries included
- Neocons, a client for the Neo4J REST API
- Quartzite, a powerful scheduling library
and several others. If you like Pantomime, you may also like our other projects.
Let us know what you think on Twitter or on the Clojure mailing list.
Michael on behalf of the ClojureWerkz Team