Package org.jpedal.tika
Class PDFParser
java.lang.Object
org.jpedal.tika.PDFParser
- All Implemented Interfaces:
- Serializable,- org.apache.tika.parser.Parser
- See Also:
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic enumAn enum containing the abilities of the parser.
- 
Field SummaryFields
- 
Constructor SummaryConstructorsConstructorDescriptionCreates a new instance of a PDFParser that outputs unstructured text.PDFParser(PDFParser.Ability ability) Creates a new instance of a PDFParser
- 
Method SummaryModifier and TypeMethodDescriptionSet<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser.voidparse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) Attempt to parse the given PDF file.
- 
Field Details- 
PASSWORD- See Also:
 
 
- 
- 
Constructor Details- 
PDFParserCreates a new instance of a PDFParser- Parameters:
- ability- the ability of the parser.
 
- 
PDFParserpublic PDFParser()Creates a new instance of a PDFParser that outputs unstructured text.
 
- 
- 
Method Details- 
getSupportedTypespublic Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser. This parser only supports the type "application/pdf".- Specified by:
- getSupportedTypesin interface- org.apache.tika.parser.Parser
- Parameters:
- parseContext- Parse context. Unused.
- Returns:
- A singleton containing the "application/pdf" type.
 
- 
parsepublic void parse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) throws IOException, SAXException, org.apache.tika.exception.TikaException Attempt to parse the given PDF file.- Specified by:
- parsein interface- org.apache.tika.parser.Parser
- Parameters:
- inputStream- Input stream. Please use an instance of TikaInputStream to pass the file path.
- contentHandler- Content handler.
- metadata- Metadata, used to retrieve the PDF file password, if it has one.
- parseContext- Parse context. Unused.
- Throws:
- IOException- Thrown if the PDF file cannot be opened.
- SAXException- Thrown if there is a problem generating the XHTML document.
- org.apache.tika.exception.TikaException- Thrown if there is a problem parsing the PDF file.
 
 
-