Apache POI: How to Read Word Document? - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

Apache POI: How to Read Word Document?

Share This
In this tutorial, we will discuss how to read a Microsoft Word document.

Dependencies


First, download the dependency files as mentioned in the Introduction to Apache POI For Manipulation of MS Office Documents with Java tutorial. After downloading the libraries, you have to add them to JAVA_PATH. Alternatively, you can create an Eclipse project and add them to the project as a library. You are being suggested to add the libraries as an internal library instead of an external library.

If you are not familiar with library linking, follow these steps.

Create a directory, lib, in your project. Put the jars in the lib directory. Right-click on the project, select Properties, go to Java Build Path, click on Add Jars and browse the lib directory you have created. It's done!

Explanation of Source Code

    Create an instance of FileInputStream for Demo.xlsx file.
    Create an instance of XWPFDocument class for .docx file.
    Create an instance of XWPFWordExtractor class.
    Read the content of the file using xwpfWordExtractor.getText().

Try the following code.



package com.bunks.demo.poi; import java.io.FileInputStream; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class WordDocumentExtractDemo { public static void main(String[] args) { try { FileInputStream fileInputStream = new FileInputStream("Demo.docx"); XWPFDocument xwpfDocument = new XWPFDocument(OPCPackage.open(fileInputStream)); XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(xwpfDocument); System.out.println(xwpfWordExtractor.getText()); xwpfWordExtractor.close(); } catch (Exception e) { e.printStackTrace(); } } }

The following code sample shows how to read .doc and .docx file.
package com.t4b.demo.poi; import java.io.File; import java.io.FileInputStream; import java.util.List; import org.apache.poi.hwpf.HWPFDocument; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.usermodel.XWPFParagraph; public class WordDocumentReaderDemo { public static void readDocFile(String fileName) { try { File file = new File(fileName); FileInputStream fileInputStream = new FileInputStream(file.getAbsolutePath()); HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream); WordExtractor wordExtractor = new WordExtractor(hwpfDocument); String[] paragraphs = wordExtractor.getParagraphText(); System.out.println("Number of paragraph " + paragraphs.length); for (String para : paragraphs) { System.out.println(para.toString()); } wordExtractor.close(); fileInputStream.close(); } catch (Exception e) { e.printStackTrace(); } } public static void readDocxFile(String fileName) { try { File file = new File(fileName); FileInputStream fileInputStream = new FileInputStream(file.getAbsolutePath()); XWPFDocument xwpfDocument = new XWPFDocument(fileInputStream); List paragraphs = xwpfDocument.getParagraphs(); System.out.println("Number of paragraph " + paragraphs.size()); for (XWPFParagraph paragraph : paragraphs) { System.out.println(paragraph.getText()); } xwpfDocument.close(); fileInputStream.close(); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { readDocxFile("Demo.docx"); readDocFile("Demo.doc"); } }


Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.