public class PDFText extends Object
PDFText
is the class to extract the list of words contained in a PDF document as a Vector
.
It also returns the DocumentInfo
for the document as well as the page count.Modifier and Type | Class and Description |
---|---|
static class |
PDFText.KeyInfoText
This is the Main-Class for the jPDFText.jar that can generate server key
requests, validate a key, and display server information.
|
Constructor and Description |
---|
PDFText(InputStream inStream,
IPassword password)
Creates a PDFText object from a PDF InputStream.
|
PDFText(String fileName,
IPassword password)
Loads a PDFText object from a file.
|
PDFText(URL url,
IPassword password)
Loads a PDFText object from a URL.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Close / release all resources held by this document.
|
Vector<TextPosition> |
findText(int pageIndex,
String searchText,
boolean caseSensitive,
boolean wholeWords)
Searches a page for text and returns a list of TextPosition objects for each of
the occurrences of the string in the page.
|
DocumentInfo |
getDocumentInfo()
Returns a DocumentInfo object containing the information
section of a PDF document (author, title, etc.)
|
static DocumentInfo |
getDocumentInfo(InputStream inStream,
IPassword password)
Returns a DocumentInfo object containing the information
section of a PDF document (author, title, etc.)
|
String |
getFileName()
Returns the name of the pdf document.
|
Vector<TextPosition> |
getLinesWithPositions(int pageIndex)
Returns position information for all the lines of text in the specified page of the
PDF document.
|
int |
getPageCount()
Returns the number of pages of the pdf document.
|
String |
getText()
Returns the text in the pdf document as a
String . |
String |
getText(int pageIndex)
Returns text contained in the specified page of the pdf document
as a String.
|
String |
getTextInArea(int pageIndex,
Rectangle2D textArea)
Returns text contained in an area of a page
|
String |
getTextWithCursors(int pageIndex,
Point2D startCursor,
Point2D endCursor)
Returns the text contained between the start and end cursors, in "reading mode".
|
static String |
getVersion()
Returns version string for jPDFText.
|
Vector<String> |
getWords()
Returns all words in the pdf document as a
Vector of Strings . |
Vector<String> |
getWords(int pageIndex)
Returns all words contained in the specified page of the pdf document
as a Vector of Strings.
|
Vector<TextPosition> |
getWordsWithPositions(int pageIndex)
Returns position information for all the words in the specified page of the
PDF document.
|
Vector<TextPosition> |
getWordsWithPositions(int pageIndex,
String wordSeparators)
Returns position information for all the words in the specified page of the
PDF document, given a set of word separators.
|
static void |
loadLicense(InputStream licenseStream)
Method to load a license file from an inputstream.
|
static void |
loadLicense(String licenseFile)
Method to load a license file.
|
static boolean |
setAppletKey(String key,
Applet applet)
Method to unlock the production version of the library.
|
static boolean |
setKey(String key)
Method to unlock the production version of the library.
|
boolean |
usePermissionsPassword(String password)
Tells the PDFText object that the permissions password is known, so all
permissions are granted.
|
public PDFText(InputStream inStream, IPassword password) throws PDFException
inStream
- InputStream to read the pdf document from.password
- An object that provides passwords to open the document, leave null if not needed.
When working with documents that have no passwords, the host application should pass null for the
value of this parameter. When documents are known to have passwords, the host application should
pass an instance of the PDFPassword
class that can hold a single
password or a list of passwords.PDFException
public PDFText(String fileName, IPassword password) throws PDFException
fileName
- Name of the PDF file.password
- An object that provides passwords to open the document, leave null if not needed.
When working with documents that have no passwords, the host application should pass null for the
value of this parameter. When documents are known to have passwords, the host application should
pass an instance of the PDFPassword
class that can hold a single
password or a list of passwords.PDFException
public PDFText(URL url, IPassword password) throws PDFException
url
- URL pointint to the location of the PDF file.password
- An object that provides passwords to open the document, leave null if not needed.
When working with documents that have no passwords, the host application should pass null for the
value of this parameter. When documents are known to have passwords, the host application should
pass an instance of the PDFPassword
class that can hold a single
password or a list of passwords.PDFException
public DocumentInfo getDocumentInfo()
public static DocumentInfo getDocumentInfo(InputStream inStream, IPassword password) throws PDFException
inStream
- InputStream to read the pdf document from.password
- An object that provides passwords to open the document, leave null if not needed.
When working with documents that have no passwords, the host application should pass null for the
value of this parameter. When documents are known to have passwords, the host application should
pass an instance of the PDFPassword
class that can hold a single
password or a list of passwords.PDFException
public String getFileName()
public int getPageCount()
public String getText() throws PDFException
String
. Pages are separated with a return char.PDFException
public String getText(int pageIndex) throws PDFException
pageIndex
- is the 0 based page number. pageIndex = 0 is the first page of the document.PDFException
public String getTextInArea(int pageIndex, Rectangle2D textArea) throws PDFException
pageIndex
- is the 0 based page number. pageIndex = 0 is the first page of the document.textArea
- the area of the page to get text from.PDFException
public String getTextWithCursors(int pageIndex, Point2D startCursor, Point2D endCursor) throws PDFException
startCursor
- the location where the cursor should start selectionendCursor
- The location where the cursor ends selectionPDFException
public static boolean setAppletKey(String key, Applet applet)
key
- Production key.public static boolean setKey(String key)
key
- Production key.public static void loadLicense(InputStream licenseStream) throws LicenseException
licenseStream
- The input stream for the license file contents.LicenseException
- If there are any problems with the license filepublic static void loadLicense(String licenseFile) throws LicenseException, IOException
licenseFile
- The full path to the license file.LicenseException
- If there are any problems with the license fileIOException
public static String getVersion()
public Vector<String> getWords() throws PDFException
Vector
of Strings
.
The default separators used to separate words are the following: ,/;\n><():?&.@*\t
To customize separators, see getWordsWithPositions(int, String)
PDFException
public Vector<String> getWords(int pageIndex) throws PDFException
getWordsWithPositions(int, String)
pageIndex
- is the 0 based page number. pageIndex = 0 is the first page of the document.PDFException
public Vector<TextPosition> getLinesWithPositions(int pageIndex) throws PDFException
pageIndex
- is the 0 based page number. pageIndex = 0 is the first page of the document.PDFException
public Vector<TextPosition> getWordsWithPositions(int pageIndex) throws PDFException
getWordsWithPositions(int, String)
pageIndex
- is the 0 based page number. pageIndex = 0 is the first page of the document.PDFException
public Vector<TextPosition> getWordsWithPositions(int pageIndex, String wordSeparators) throws PDFException
pageIndex
- is the 0 based page number. pageIndex = 0 is the first page of the document.wordSeparators
- A list of single character word separators.PDFException
public Vector<TextPosition> findText(int pageIndex, String searchText, boolean caseSensitive, boolean wholeWords) throws PDFException
pageIndex
- The index of the page to search insearchText
- The text to search for.caseSensitive
- Flag indicating whether the search should be case sensitive.wholeWords
- Flag indicating whether the search should only look at whole words.PDFException
public boolean usePermissionsPassword(String password) throws PDFException
password
- The permissions passwordPDFException
public void close()