Class HWPFDocumentCore

java.lang.Object
org.apache.poi.POIDocument
org.apache.poi.hwpf.HWPFDocumentCore
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
HWPFDocument, HWPFOldDocument

public abstract class HWPFDocumentCore extends POIDocument
This class holds much of the core of a Word document, but without some of the table structure information. You generally want to work with one of HWPFDocument or HWPFOldDocument
  • Field Details

    • STREAM_OBJECT_POOL

      protected static final String STREAM_OBJECT_POOL
      See Also:
    • STREAM_WORD_DOCUMENT

      protected static final String STREAM_WORD_DOCUMENT
      See Also:
    • STREAM_TABLE_0

      protected static final String STREAM_TABLE_0
      See Also:
    • STREAM_TABLE_1

      protected static final String STREAM_TABLE_1
      See Also:
    • FIB_BASE_LEN

      protected static final int FIB_BASE_LEN
      Size of the not encrypted part of the FIB
      See Also:
    • RC4_REKEYING_INTERVAL

      protected static final int RC4_REKEYING_INTERVAL
      [MS-DOC] 2.2.6.2/3 Office Binary Document ... Encryption: "... The block number MUST be set to zero at the beginning of the stream and MUST be incremented at each 512 byte boundary. ..."
      See Also:
    • _objectPool

      protected ObjectPoolImpl _objectPool
      Holds OLE2 objects
    • _fib

      protected FileInformationBlock _fib
      The FIB
    • _ss

      protected StyleSheet _ss
      Holds styles for this document.
    • _cbt

      protected CHPBinTable _cbt
      Contains formatting properties for text
    • _pbt

      protected PAPBinTable _pbt
      Contains formatting properties for paragraphs
    • _st

      protected SectionTable _st
      Contains formatting properties for sections.
    • _ft

      protected FontTable _ft
      Holds fonts for this document.
    • _lt

      protected ListTables _lt
      Hold list tables
    • _mainStream

      protected byte[] _mainStream
      main document stream buffer
  • Constructor Details

    • HWPFDocumentCore

      protected HWPFDocumentCore()
    • HWPFDocumentCore

      public HWPFDocumentCore(InputStream istream) throws IOException
      This constructor loads a Word document from an InputStream.
      Parameters:
      istream - The InputStream that contains the Word document.
      Throws:
      IOException - If there is an unexpected IOException from the passed in InputStream.
    • HWPFDocumentCore

      public HWPFDocumentCore(POIFSFileSystem pfilesystem) throws IOException
      This constructor loads a Word document from a POIFSFileSystem
      Parameters:
      pfilesystem - The POIFSFileSystem that contains the Word document.
      Throws:
      IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.
    • HWPFDocumentCore

      public HWPFDocumentCore(DirectoryNode directory) throws IOException
      This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default. Used typically to open embeded documents.
      Parameters:
      directory - The DirectoryNode that contains the Word document.
      Throws:
      IOException - If there is an unexpected IOException from the passed in POIFSFileSystem.
  • Method Details

    • verifyAndBuildPOIFS

      public static POIFSFileSystem verifyAndBuildPOIFS(InputStream istream) throws IOException
      Takes an InputStream, verifies that it's not RTF or PDF, builds a POIFSFileSystem from it, and returns that.
      Throws:
      IOException
    • getRange

      public abstract Range getRange()
      Returns the range which covers the whole of the document, but excludes any headers and footers.
    • getOverallRange

      public abstract Range getOverallRange()
      Returns the range that covers all text in the file, including main text, footnotes, headers and comments
    • getDocumentText

      public String getDocumentText()
      Returns document text, i.e. text information from all text pieces, including OLE descriptions and field codes
    • getText

      @Internal public abstract StringBuilder getText()
      Internal method to access document text
    • getCharacterTable

      public CHPBinTable getCharacterTable()
    • getParagraphTable

      public PAPBinTable getParagraphTable()
    • getSectionTable

      public SectionTable getSectionTable()
    • getStyleSheet

      public StyleSheet getStyleSheet()
    • getListTables

      public ListTables getListTables()
    • getFontTable

      public FontTable getFontTable()
    • getFileInformationBlock

      public FileInformationBlock getFileInformationBlock()
    • getObjectsPool

      public ObjectsPool getObjectsPool()
    • getTextTable

      public abstract TextPieceTable getTextTable()
    • getMainStream

      @Internal public byte[] getMainStream()
    • getEncryptionInfo

      public EncryptionInfo getEncryptionInfo() throws IOException
      Overrides:
      getEncryptionInfo in class POIDocument
      Returns:
      the encryption info if the document is encrypted, otherwise null
      Throws:
      IOException - If retrieving the encryption information fails
    • updateEncryptionInfo

      protected void updateEncryptionInfo()
    • getDocumentEntryBytes

      protected byte[] getDocumentEntryBytes(String name, int encryptionOffset, int len) throws IOException
      Reads OLE Stream into byte array - if an EncryptionInfo is available, decrypt the bytes starting at encryptionOffset. If encryptionOffset = -1, then do not try to decrypt the bytes
      Parameters:
      name - the name of the stream
      encryptionOffset - the offset from which to start decrypting, use -1 for no decryption
      len - length of the bytes to be read, use Integer.MAX_VALUE for all bytes
      Returns:
      the read bytes
      Throws:
      IOException - if the stream can't be found