Class PDFParser

java.lang.Object
org.jpedal.tika.PDFParser
All Implemented Interfaces:
Serializable, org.apache.tika.parser.Parser

public class PDFParser extends Object implements org.apache.tika.parser.Parser
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static enum 
    An enum containing the abilities of the parser.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    Creates a new instance of a PDFParser that outputs unstructured text.
    Creates a new instance of a PDFParser
  • Method Summary

    Modifier and Type
    Method
    Description
    Set<org.apache.tika.mime.MediaType>
    getSupportedTypes(org.apache.tika.parser.ParseContext parseContext)
    Get the supported types of this parser.
    void
    parse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext)
    Attempt to parse the given PDF file.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • PDFParser

      public PDFParser(PDFParser.Ability ability)
      Creates a new instance of a PDFParser
      Parameters:
      ability - the ability of the parser.
    • PDFParser

      public PDFParser()
      Creates a new instance of a PDFParser that outputs unstructured text.
  • Method Details

    • getSupportedTypes

      public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext parseContext)
      Get the supported types of this parser. This parser only supports the type "application/pdf".
      Specified by:
      getSupportedTypes in interface org.apache.tika.parser.Parser
      Parameters:
      parseContext - Parse context. Unused.
      Returns:
      A singleton containing the "application/pdf" type.
    • parse

      public void parse(InputStream inputStream, ContentHandler contentHandler, org.apache.tika.metadata.Metadata metadata, org.apache.tika.parser.ParseContext parseContext) throws IOException, SAXException, org.apache.tika.exception.TikaException
      Attempt to parse the given PDF file.
      Specified by:
      parse in interface org.apache.tika.parser.Parser
      Parameters:
      inputStream - Input stream. Please use an instance of TikaInputStream to pass the file path.
      contentHandler - Content handler.
      metadata - Metadata, used to retrieve the PDF file password, if it has one.
      parseContext - Parse context. Unused.
      Throws:
      IOException - Thrown if the PDF file cannot be opened.
      SAXException - Thrown if there is a problem generating the XHTML document.
      org.apache.tika.exception.TikaException - Thrown if there is a problem parsing the PDF file.