p

pdf-extractor

This is an optimized version of Apache PDFBox. It allows to extract the rough structure of a document (pages, blocks of text and paragraphs as well as formatting information) and was made with the intent to optimize text extraction results for scientific papers. The output can easily be transformed to plaintext (toString) or to an XML format (toXML).
http://openresearch.cit-ec.de/projects/scie/
The GNU Affero General Public License, Version 3
Raphael Dickfelder Jan Göpfert Benjamin Paaßen Andreas Stöckel
大版本 版本 最近更新
2.0 2.0.1 2014-12-10 22:47:18
2.0 2014-11-18 06:13:50
共2条记录