java.lang.Object
- org.faceless.pdf2.PageExtractor.Text

All Implemented Interfaces:

Comparable<PageExtractor.Text>

Enclosing class:

PageExtractor
```
public abstract class PageExtractor.Text
extends Object
implements Comparable<PageExtractor.Text>
```
A class representing a piece of text which is extracted from the PageExtractor. Each text object has a location on the page, font-size, font-name, color and text.

Since:

2.6.2

Constructor Summary

Constructors
Constructor Description

Text()

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method	Description
`AnnotationMarkup`	`createAnnotationMarkup(String type)`	Create a new `AnnotationMarkup` of the specified type to cover this text.
`float`	`getAngle()`	Return the angle of rotation of this text on the page, in degrees clockwise from 12 o'clock.
`abstract float`	`getBaseline()`	Return the baseline of the text item, as a fraction between 0 and 1. 0 would indicate the baseline is at the top of the text, 1 at the absolute bottom.
`abstract int`	`getByteLength()`	Get the length of the original text in bytes.
`abstract int`	`getByteToCharOffset(int byteoffset)`	Given a byte offset into the original String, return the Character offset it refers to.
`abstract Paint`	`getColor()`	Return the color of this text, or `null` if none was set
`float[]`	`getCorners()`	Return the four corners (x1,y1) (x2,y2) (x3,y3) (x4,y4) of the quadrilateral that encompasses the text.
`abstract float`	`getEndOffset(int pos)`	As for `getOffset()` but return the end position of that letter
`abstract Reader`	`getFontMetaData()`	Return any XMP MetaData that has been set on the Font, or `null` if none exists.
`abstract String`	`getFontName()`	Return the font name of this text
`abstract float`	`getFontSize()`	Return the font size of this text in points
`abstract float`	`getHorizontalScale()`	Return an indication of the horizontal scale of the text.
`float`	`getLength()`	Return the length of this Text in points.
`abstract Paint`	`getLineColor()`	Return the outline color of this text, or `null` if none was set
`abstract String`	`getNormalizedText()`	Return a normalized form of the text, for text comparison purposes while searching.
`abstract float`	`getOffset(int pos)`	Given an offset into the text, return the start position of that letter.
`PDFPage`	`getPage()`	Return the `PDFPage` this text was found on - simply the page the parent `PageExtractor` was created from.
`PageExtractor`	`getPageExtractor()`	Return the `PageExtractor` this text was created from
`abstract PageExtractor.Text`	`getPrimaryText()`	If this text is a subtext or collection of Text object, return the primary text it starts with.
`abstract int`	`getPrimaryTextOffset()`	If this text is a subtext or collection of Text object, return the offset into the `primary text` where it starts.
`abstract PageExtractor.Text`	`getRowNext()`	Return the next Text item in this row, or `null` if there are none
`abstract PageExtractor.Text`	`getRowPrevious()`	Return the next Text item in this row, or `null` if there are none
`abstract PageExtractor.Text`	`getSubText(int off, int len)`	Return a substring of this Text object as another Text object
`abstract String`	`getText()`	Return the text content of this text
`abstract int`	`getTextLength()`	Return the length of the String returned by `getText()`
`abstract Shape`	`getVisualBounds()`	Return the visual bounds of the specified character in the string.
`abstract boolean`	`isHorizontal()`	Indicates whether this text is horizontal or vertical.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface java.lang.Comparable
compareTo

- Constructor Detail
  - Text
```
public Text()
```
- Method Detail
  - getLength
```
public float getLength()
```
    Return the length of this Text in points. This method measures the baseline of the text, so for rotated text the value will always be positive regardless of the angle.
    
    Returns:
    
    the length of the text in points at its baseline
  - getCorners
```
public final float[] getCorners()
```
    Return the four corners (x1,y1) (x2,y2) (x3,y3) (x4,y4) of the quadrilateral that encompasses the text. The order of these corners is as follows. For horizontal text: bottom-left, top-left, top-right, bottom-right. For vertical text: top-left, top-right, bottom-right, bottom-left. For horizontal text, the text baseline runs from (x1,y1) to (x4,y4).
  - createAnnotationMarkup
```
public AnnotationMarkup createAnnotationMarkup(String type)
```
    Create a new AnnotationMarkup of the specified type to cover this text. The annotation is not added to the page
    
    Parameters:
    
    type - the type of markup - "Highlight", "Underline" etc.
    
    Since:
    
    2.8
  - getAngle
```
public final float getAngle()
```
    Return the angle of rotation of this text on the page, in degrees clockwise from 12 o'clock. Most text is not rotated and so will return 0.
    
    Returns:
    
    the angle of the text
  - getFontSize
```
public abstract float getFontSize()
```
    Return the font size of this text in points
  - isHorizontal
```
public abstract boolean isHorizontal()
```
    Indicates whether this text is horizontal or vertical. Note that vertical text will never be successfully positioned in the methods on this class that attempt to convert PDF text content into plain text.
    
    Since:
    
    2.18.3
  - getHorizontalScale
```
public abstract float getHorizontalScale()
```
    Return an indication of the horizontal scale of the text. Typically this will be a value of 1; a value of 2 would mean the text had been stretched to double its natural width
    
    Since:
    
    2.18.1
  - getBaseline
```
public abstract float getBaseline()
```
    Return the baseline of the text item, as a fraction between 0 and 1. 0 would indicate the baseline is at the top of the text, 1 at the absolute bottom. The value will normally be 0.8
    
    Since:
    
    2.11.7
  - getOffset
```
public abstract float getOffset(int pos)
```
    Given an offset into the text, return the start position of that letter. Because text may not be on a horizontal line, this value is returned as a float in the range 0 to 1 (0 being at the start of the text, 1 being the end). For the common case where text is horizontal, you can calculate it's start position like so:
    float left = text.getCorners()[0] + (text.getOffset(pos) * text.getLength());
    Parameters:
    
    pos - the position of the letter in the Text to retrive the position for. In the range 0 to getText().length() - 1
    
    Since:
    
    2.6.12
  - getEndOffset
```
public abstract float getEndOffset(int pos)
```
    As for getOffset() but return the end position of that letter
    
    Since:
    
    2.16.1
  - getPage
```
public PDFPage getPage()
```
    Return the PDFPage this text was found on - simply the page the parent PageExtractor was created from.
    
    Since:
    
    2.6.12
  - getPageExtractor
```
public PageExtractor getPageExtractor()
```
    Return the PageExtractor this text was created from
    
    Since:
    
    2.10.3
  - getColor
```
public abstract Paint getColor()
```
    Return the color of this text, or null if none was set
    
    Returns:
    
    the color
  - getLineColor
```
public abstract Paint getLineColor()
```
    Return the outline color of this text, or null if none was set
    
    Returns:
    
    the outline color
    
    Since:
    
    2.17.1
  - getFontName
```
public abstract String getFontName()
```
    Return the font name of this text
    
    Returns:
    
    the name of the font
  - getText
```
public abstract String getText()
```
    Return the text content of this text
    
    Returns:
    
    the text
  - getNormalizedText
```
public abstract String getNormalizedText()
```
    Return a normalized form of the text, for text comparison purposes while searching. Normalization is done by converting to NFKD form and removing all diacritics.
    
    Returns:
    
    the normalized text
  - getTextLength
```
public abstract int getTextLength()
```
    Return the length of the String returned by getText()
    
    Since:
    
    2.11.7
  - getRowNext
```
public abstract PageExtractor.Text getRowNext()
```
    Return the next Text item in this row, or null if there are none
    
    Since:
    
    2.10.3
  - getRowPrevious
```
public abstract PageExtractor.Text getRowPrevious()
```
    Return the next Text item in this row, or null if there are none
    
    Since:
    
    2.10.3
  - getFontMetaData
```
public abstract Reader getFontMetaData()
                                throws IOException
```
    Return any XMP MetaData that has been set on the Font, or null if none exists.
    Since 2.24.3, the returned type is guaranteed to hava a toString() method that will return the Metadata as a String.
    
    Throws:
    
    IOException
    
    Since:
    
    2.11.6
    
    See Also:
    
    PDF.getMetaData()
  - getSubText
```
public abstract PageExtractor.Text getSubText(int off,
                                              int len)
```
    Return a substring of this Text object as another Text object
    
    Parameters:
    
    off - the offset into the text
    
    len - the number of characters to return
    
    Since:
    
    2.11.7
  - getPrimaryText
```
public abstract PageExtractor.Text getPrimaryText()
```
    If this text is a subtext or collection of Text object, return the primary text it starts with. If not, returns null
    
    Since:
    
    2.11.7
  - getPrimaryTextOffset
```
public abstract int getPrimaryTextOffset()
```
    If this text is a subtext or collection of Text object, return the offset into the primary text where it starts. If not, returns 0
    
    Since:
    
    2.11.7
  - getByteLength
```
public abstract int getByteLength()
```
    Get the length of the original text in bytes. This method is required because the Highlight File Format contains references to the byte offset into the string, not the character offset (as it states).
    
    Since:
    
    2.11.12
  - getByteToCharOffset
```
public abstract int getByteToCharOffset(int byteoffset)
```
    Given a byte offset into the original String, return the Character offset it refers to.
    
    Since:
    
    2.11.12
    
    See Also:
    
    getByteLength()
  - getVisualBounds
```
public abstract Shape getVisualBounds()
```
    Return the visual bounds of the specified character in the string. This should be a rectangular shape which just clips the visual edges of the glyph. If the text is rotated, it will be a generic shape, but if the text is horizontal the shape will be a Rectangle2D object.
    
    Since:
    
    2.16.1

Class PageExtractor.Text

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.lang.Comparable

Constructor Detail

Text

Method Detail

getLength

getCorners

createAnnotationMarkup

getAngle

getFontSize

isHorizontal

getHorizontalScale

getBaseline

getOffset

getEndOffset

getPage

getPageExtractor

getColor

getLineColor

getFontName

getText

getNormalizedText

getTextLength

getRowNext

getRowPrevious

getFontMetaData

getSubText

getPrimaryText

getPrimaryTextOffset

getByteLength

getByteToCharOffset

getVisualBounds