Package org.jpedal.tika
Class PDFParser
java.lang.Object
org.jpedal.tika.PDFParser
- All Implemented Interfaces:
 Serializable,org.apache.tika.parser.Parser
- See Also:
 
- 
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumAn enum containing the abilities of the parser. - 
Field Summary
Fields - 
Constructor Summary
ConstructorsConstructorDescriptionCreates a new instance of a PDFParser that outputs unstructured text.PDFParser(PDFParser.Ability ability) Creates a new instance of a PDFParser - 
Method Summary
Modifier and TypeMethodDescriptionSet<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser.voidparse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) Attempt to parse the given PDF file. 
- 
Field Details
- 
PASSWORD
- See Also:
 
 
 - 
 - 
Constructor Details
- 
PDFParser
Creates a new instance of a PDFParser- Parameters:
 ability- the ability of the parser.
 - 
PDFParser
public PDFParser()Creates a new instance of a PDFParser that outputs unstructured text. 
 - 
 - 
Method Details
- 
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext) Get the supported types of this parser. This parser only supports the type "application/pdf".- Specified by:
 getSupportedTypesin interfaceorg.apache.tika.parser.Parser- Parameters:
 parseContext- Parse context. Unused.- Returns:
 - A singleton containing the "application/pdf" type.
 
 - 
parse
public void parse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) throws IOException, SAXException, org.apache.tika.exception.TikaException Attempt to parse the given PDF file.- Specified by:
 parsein interfaceorg.apache.tika.parser.Parser- Parameters:
 inputStream- Input stream. Please use an instance of TikaInputStream to pass the file path.contentHandler- Content handler.metadata- Metadata, used to retrieve the PDF file password, if it has one.parseContext- Parse context. Unused.- Throws:
 IOException- Thrown if the PDF file cannot be opened.SAXException- Thrown if there is a problem generating the XHTML document.org.apache.tika.exception.TikaException- Thrown if there is a problem parsing the PDF file.
 
 -