In this tutorial, we will discuss how to read a Microsoft Word document.
First, download the dependency files as mentioned in the Introduction to Apache POI For Manipulation of MS Office Documents with Java tutorial. After downloading the libraries, you have to add them to JAVA_PATH. Alternatively, you can create an Eclipse project and add them to the project as a library. You are being suggested to add the libraries as an internal library instead of an external library.
If you are not familiar with library linking, follow these steps.
Create a directory, lib, in your project. Put the jars in the lib directory. Right-click on the project, select Properties, go to Java Build Path, click on Add Jars and browse the lib directory you have created. It's done!
The following code sample shows how to read .doc and .docx file.
Dependencies
First, download the dependency files as mentioned in the Introduction to Apache POI For Manipulation of MS Office Documents with Java tutorial. After downloading the libraries, you have to add them to JAVA_PATH. Alternatively, you can create an Eclipse project and add them to the project as a library. You are being suggested to add the libraries as an internal library instead of an external library.
If you are not familiar with library linking, follow these steps.
Create a directory, lib, in your project. Put the jars in the lib directory. Right-click on the project, select Properties, go to Java Build Path, click on Add Jars and browse the lib directory you have created. It's done!
Explanation of Source Code
Create an instance of FileInputStream for Demo.xlsx file.
Create an instance of XWPFDocument class for .docx file.
Create an instance of XWPFWordExtractor class.
Read the content of the file using xwpfWordExtractor.getText().
Create an instance of XWPFDocument class for .docx file.
Create an instance of XWPFWordExtractor class.
Read the content of the file using xwpfWordExtractor.getText().
Try the following code.
package com.bunks.demo.poi;
import java.io.FileInputStream;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class WordDocumentExtractDemo {
public static void main(String[] args) {
try {
FileInputStream fileInputStream = new FileInputStream("Demo.docx");
XWPFDocument xwpfDocument = new XWPFDocument(OPCPackage.open(fileInputStream));
XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(xwpfDocument);
System.out.println(xwpfWordExtractor.getText());
xwpfWordExtractor.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The following code sample shows how to read .doc and .docx file.
package com.t4b.demo.poi;
import java.io.File;
import java.io.FileInputStream;
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class WordDocumentReaderDemo {
public static void readDocFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fileInputStream = new FileInputStream(file.getAbsolutePath());
HWPFDocument hwpfDocument = new HWPFDocument(fileInputStream);
WordExtractor wordExtractor = new WordExtractor(hwpfDocument);
String[] paragraphs = wordExtractor.getParagraphText();
System.out.println("Number of paragraph " + paragraphs.length);
for (String para : paragraphs) {
System.out.println(para.toString());
}
wordExtractor.close();
fileInputStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fileInputStream = new FileInputStream(file.getAbsolutePath());
XWPFDocument xwpfDocument = new XWPFDocument(fileInputStream);
List paragraphs = xwpfDocument.getParagraphs();
System.out.println("Number of paragraph " + paragraphs.size());
for (XWPFParagraph paragraph : paragraphs) {
System.out.println(paragraph.getText());
}
xwpfDocument.close();
fileInputStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readDocxFile("Demo.docx");
readDocFile("Demo.doc");
}
}
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.