Package org.jpedal.tika
Class PDFParser
java.lang.Object
org.jpedal.tika.PDFParser
- All Implemented Interfaces:
Serializable,org.apache.tika.parser.Parser
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumAn enum containing the abilities of the parser. -
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new instance of a PDFParser that outputs unstructured text.PDFParser(PDFParser.Ability ability) Creates a new instance of a PDFParser -
Method Summary
Modifier and TypeMethodDescriptionSet<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser.voidparse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) Attempt to parse the given PDF file.
-
Field Details
-
PASSWORD
- See Also:
-
-
Constructor Details
-
PDFParser
Creates a new instance of a PDFParser- Parameters:
ability- the ability of the parser.
-
PDFParser
public PDFParser()Creates a new instance of a PDFParser that outputs unstructured text.
-
-
Method Details
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser. This parser only supports the type "application/pdf".- Specified by:
getSupportedTypesin interfaceorg.apache.tika.parser.Parser- Parameters:
parseContext- Parse context. Unused.- Returns:
- A singleton containing the "application/pdf" type.
-
parse
public void parse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) throws IOException, SAXException, org.apache.tika.exception.TikaException Attempt to parse the given PDF file.- Specified by:
parsein interfaceorg.apache.tika.parser.Parser- Parameters:
inputStream- Input stream. Please use an instance of TikaInputStream to pass the file path.contentHandler- Content handler.metadata- Metadata, used to retrieve the PDF file password, if it has one.parseContext- Parse context. Unused.- Throws:
IOException- Thrown if the PDF file cannot be opened.SAXException- Thrown if there is a problem generating the XHTML document.org.apache.tika.exception.TikaException- Thrown if there is a problem parsing the PDF file.
-