Package org.jpedal.tika
Class PDFParser
java.lang.Object
org.jpedal.tika.PDFParser
- All Implemented Interfaces:
Serializable
,org.apache.tika.parser.Parser
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
An enum containing the abilities of the parser. -
Field Summary
-
Constructor Summary
ConstructorDescriptionCreates a new instance of a PDFParser that outputs unstructured text.PDFParser
(PDFParser.Ability ability) Creates a new instance of a PDFParser -
Method Summary
Modifier and TypeMethodDescriptionSet
<org.apache.tika.mime.MediaType> getSupportedTypes
(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser.void
parse
(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) Attempt to parse the given PDF file.
-
Field Details
-
PASSWORD
- See Also:
-
-
Constructor Details
-
PDFParser
Creates a new instance of a PDFParser- Parameters:
ability
- the ability of the parser.
-
PDFParser
public PDFParser()Creates a new instance of a PDFParser that outputs unstructured text.
-
-
Method Details
-
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser. This parser only supports the type "application/pdf".- Specified by:
getSupportedTypes
in interfaceorg.apache.tika.parser.Parser
- Parameters:
parseContext
- Parse context. Unused.- Returns:
- A singleton containing the "application/pdf" type.
-
parse
public void parse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) throws IOException, SAXException, org.apache.tika.exception.TikaException Attempt to parse the given PDF file.- Specified by:
parse
in interfaceorg.apache.tika.parser.Parser
- Parameters:
inputStream
- Input stream. Please use an instance of TikaInputStream to pass the file path.contentHandler
- Content handler.metadata
- Metadata, used to retrieve the PDF file password, if it has one.parseContext
- Parse context. Unused.- Throws:
IOException
- Thrown if the PDF file cannot be opened.SAXException
- Thrown if there is a problem generating the XHTML document.org.apache.tika.exception.TikaException
- Thrown if there is a problem parsing the PDF file.
-