Java – Read and Write Microsoft Word with Apache POI
In this article, we’ll learn how to read and write Microsoft Word with the help of Apache POI library.
Topics
- Apache POI – HWPF and XWPF
- Apache POI dependencies
- Create a word document
- Working with Paragraphs, Fonts, and Page breaks
- Document Header and Footer
- Add Image
- Create a table
- Read and parse document, paragraph, and table
- Update document
P.S Tested with Apache POI 4.1.2 and Java 8
1. Apache POI – HWPF and XWPF
Apache POI provides Java APIs for working with Microsoft Documents. We will cover use cases involving paragraphs, fonts, headers, footers, tables, images, write, read, parse and updating the document. Before exploring use cases, let’s cover basic concepts involving the Apache POI library.
1.1 HWPF (Horrible Word Processor Format)
These APIs are for older Word 6 and Word 95 file formats, .doc
documents.
1.2 XWPF (XML Word Processor Format)
These APIs are for new Word 2007, .docx
documents.
Component | Description |
---|---|
XWPFDocument |
It provides high level APIs to create/edit .docx word files |
XWPFParagraph |
It is used for working with Paragraphs |
XWPFRun |
It defines a region of text with a common set of properties. It is also used for editing font properties |
XWPFTable |
It is used to describe contents of table component present in word file |
XWPFTableRow |
It defines rows present within table component |
XWPFWordExtractor |
It is used to extract text from word file |
P.S This article will focus on the latest XWPF
APIs, working with the .docx
documents.
2. Apache POI dependencies
We only need the poi-ooxml
to work with Microsoft word.
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.1.2</version>
</dependency>
3. Create a word document
This example shows how to create a Microsoft Word .docx
file. We use XWPFParagraph
to create paragraphs and XWPFRun
to create text.
package com.mkyong.poi.word;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import java.io.FileOutputStream;
import java.io.IOException;
public class CreateDocumentSimple {
public static void main(String[] args) throws IOException {
String fileName = "c:\\test\\hello.docx";
try (XWPFDocument doc = new XWPFDocument()) {
// create a paragraph
XWPFParagraph p1 = doc.createParagraph();
p1.setAlignment(ParagraphAlignment.CENTER);
// set font
XWPFRun r1 = p1.createRun();
r1.setBold(true);
r1.setItalic(true);
r1.setFontSize(22);
r1.setFontFamily("New Roman");
r1.setText("I am first paragraph.");
// save it to .docx file
try (FileOutputStream out = new FileOutputStream(fileName)) {
doc.write(out);
}
}
}
}
Output
4. Working with paragraphs, fonts, and page breaks
We can use XWPFRun
for setting the styling properties of the font. It contains methods for changing font properties such as color, font size, italic, bold, capitalized, embossed, and many more.
We are creating four paragraphs and setting different font properties, line breaks, and page breaks for each of them.
package com.mkyong.poi.word;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import java.io.FileOutputStream;
import java.io.IOException;
public class WorkingParagraphFonts {
public static void main(String[] args) throws IOException {
try (XWPFDocument doc = new XWPFDocument()) {
XWPFParagraph p1 = doc.createParagraph();
p1.setAlignment(ParagraphAlignment.CENTER);
// Set Text to Bold and font size to 22 for first paragraph
XWPFRun r1 = p1.createRun();
r1.setBold(true);
r1.setItalic(true);
r1.setFontSize(22);
r1.setText("I am first paragraph. My Text is bold, italic, Courier and capitalized");
r1.setFontFamily("Courier");
XWPFParagraph p2 = doc.createParagraph();
//Set color for second paragraph
XWPFRun r2 = p2.createRun();
r2.setText("I am second paragraph. My Text is Red in color and is embossed");
r2.setColor("ff0000");
r2.setEmbossed(true);
XWPFParagraph p3 = doc.createParagraph();
//Set strike for third paragraph and capitalization
XWPFRun r3 = p3.createRun();
r3.setStrikeThrough(true);
r3.setCapitalized(true);
r3.setText("I am third paragraph. My Text is strike through and is capitalized");
XWPFParagraph p4 = doc.createParagraph();
p4.setWordWrapped(true);
p4.setPageBreak(true); // new page break
p4.setIndentationFirstLine(600);
XWPFRun r4 = p4.createRun();
r4.setFontSize(40);
r4.setItalic(true);
//r4.setTextPosition(100);
r4.setText("Line 1");
r4.addBreak();
r4.setText("Line 2");
r4.addBreak();
r4.setText("Line 3");
// save the docs
try (FileOutputStream out = new FileOutputStream("c:\\test\\simple.docx")) {
doc.write(out);
}
}
}
}
Figure: Page 1
Figure: Page 2
5. Document header and footer
This example creates the document’s header and footer for all pages.
package com.mkyong.poi.word;
import org.apache.poi.wp.usermodel.HeaderFooterType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFFooter;
import org.apache.poi.xwpf.usermodel.XWPFHeader;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
public class CreateDocumentHeader {
public static void main(String[] args) throws IOException {
try (XWPFDocument doc = new XWPFDocument()) {
XWPFParagraph p = doc.createParagraph();
XWPFRun r = p.createRun();
r.setBold(true);
r.setFontSize(30);
r.setText("Create document header and footer!");
// next page
XWPFParagraph p2 = doc.createParagraph();
p2.setWordWrapped(true);
p2.setPageBreak(true); // new page break
XWPFRun r2 = p2.createRun();
r2.setFontSize(40);
r2.setItalic(true);
r2.setText("New Page");
// document header and footer
XWPFHeader head = doc.createHeader(HeaderFooterType.DEFAULT);
head.createParagraph()
.createRun()
.setText("This is document header");
XWPFFooter foot = doc.createFooter(HeaderFooterType.DEFAULT);
foot.createParagraph()
.createRun()
.setText("This is document footer");
try (OutputStream os = new FileOutputStream(new File("c:\\test\\header.docx"))) {
doc.write(os);
}
}
}
}
Figure: Page 1
Figure: Page 2
6. Add an image
This example adds an image to the document.
package com.mkyong.poi.word;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.Document;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class AddImage {
public static void main(String[] args) throws IOException, InvalidFormatException {
String imgFile = "c:\\test\\google.png";
try (XWPFDocument doc = new XWPFDocument()) {
XWPFParagraph p = doc.createParagraph();
XWPFRun r = p.createRun();
r.setText(imgFile);
r.addBreak();
// add png image
try (FileInputStream is = new FileInputStream(imgFile)) {
r.addPicture(is,
Document.PICTURE_TYPE_PNG, // png file
imgFile,
Units.toEMU(400),
Units.toEMU(200)); // 400x200 pixels
}
try (FileOutputStream out = new FileOutputStream("c:\\test\\images.docx")) {
doc.write(out);
}
}
}
}
7. Create a table
In this example, we will cover how we can insert a table in the word document file. We will use the XWPFTable
component to create and initialize a table element.
This example inserts a table containing 3 rows and 3 columns.
package com.mkyong.poi.word;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import java.io.FileOutputStream;
public class CreateTable {
public static void main(String[] args) throws Exception {
try (XWPFDocument doc = new XWPFDocument()) {
XWPFTable table = doc.createTable();
//Creating first Row
XWPFTableRow row1 = table.getRow(0);
row1.getCell(0).setText("First Row, First Column");
row1.addNewTableCell().setText("First Row, Second Column");
row1.addNewTableCell().setText("First Row, Third Column");
//Creating second Row
XWPFTableRow row2 = table.createRow();
row2.getCell(0).setText("Second Row, First Column");
row2.getCell(1).setText("Second Row, Second Column");
row2.getCell(2).setText("Second Row, Third Column");
//create third row
XWPFTableRow row3 = table.createRow();
row3.getCell(0).setText("Third Row, First Column");
row3.getCell(1).setText("Third Row, Second Column");
row3.getCell(2).setText("Third Row, Third Column");
// save to .docx file
try (FileOutputStream out = new FileOutputStream("c:\\test\\table.docx")) {
doc.write(out);
}
}
}
}
P.S Warning for OpenOffice 4.1.7, the table APIs have not formatted the table properly.
8. Read and parse document, paragraph, and table.
This example will cover how we can use Apache POI for extracting and reading out text from documents.
8.1 We can extract the entire text from document by using XWPFWordExtractor
. The file simple.docx
is generated from the example 4. Working with paragraphs, fonts, and page breaks.
package com.mkyong.poi.word;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public class ReadParseDocument1 {
public static void main(String[] args) throws IOException {
String fileName = "c:\\test\\simple.docx";
try (XWPFDocument doc = new XWPFDocument(
Files.newInputStream(Paths.get(fileName)))) {
XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(doc);
String docText = xwpfWordExtractor.getText();
System.out.println(docText);
// find number of words in the document
long count = Arrays.stream(docText.split("\\s+")).count();
System.out.println("Total words: " + count);
}
}
}
Output
I am first paragraph. My Text is bold, italic, Courier and capitalized
I am second paragraph. My Text is Red in color and is embossed
I am third paragraph. My Text is strike through and is capitalized
Line 1
Line 2
Line 3
Total words: 43
8.2 In this example, we use the doc.getParagraphs()
to a list of the document’s paragraphs. Once we get the list, we can iterate over individual paragraph elements and get the text.
package com.mkyong.poi.word;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
public class ReadParseDocument2 {
public static void main(String[] args) throws IOException {
String fileName = "c:\\test\\simple.docx";
try (XWPFDocument doc = new XWPFDocument(
Files.newInputStream(Paths.get(fileName)))) {
// output the same as 8.1
List<XWPFParagraph> list = doc.getParagraphs();
for (XWPFParagraph paragraph : list) {
System.out.println(paragraph.getText());
}
}
}
}
8.3 The file table.docx
is generated from the example 7. Create table. We need to fetch the list of all elements in the file and then iterate over it to check for the TABLE
component. Once we find the TABLE
component, we can iterate over it to fetch table rows and finally iterate over the rows to extract the table’s values.
package com.mkyong.poi.word;
import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Iterator;
import java.util.List;
public class ReadParseDocumentTable {
public static void main(String[] args) throws IOException {
String fileName = "c:\\test\\table.docx";
try (XWPFDocument doc = new XWPFDocument(
Files.newInputStream(Paths.get(fileName)))) {
/*XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(doc);
String docText = xwpfWordExtractor.getText();
System.out.println(docText);*/
Iterator<IBodyElement> docElementsIterator = doc.getBodyElementsIterator();
//Iterate through the list and check for table element type
while (docElementsIterator.hasNext()) {
IBodyElement docElement = docElementsIterator.next();
if ("TABLE".equalsIgnoreCase(docElement.getElementType().name())) {
//Get List of table and iterate it
List<XWPFTable> xwpfTableList = docElement.getBody().getTables();
for (XWPFTable xwpfTable : xwpfTableList) {
System.out.println("Total Rows : " + xwpfTable.getNumberOfRows());
for (int i = 0; i < xwpfTable.getRows().size(); i++) {
for (int j = 0; j < xwpfTable.getRow(i).getTableCells().size(); j++) {
System.out.println(xwpfTable.getRow(i).getCell(j).getText());
}
}
}
}
}
}
}
}
Output
Total Rows : 3
First Row, First Column
First Row, Second Column
First Row, Third Column
Second Row, First Column
Second Row, Second Column
Second Row, Third Column
Third Row, First Column
Third Row, Second Column
Third Row, Third Column
9. Update document
In this example, we will use a pre-populated template file template.docx
. The template file contains the following text Hello ${name}, Congratulations, you are learning something new!
. And we will replace the string ${name}
with a predefined input and create a new file output.docx
with the updated contents.
Figure: c:\test\template.docx
package com.mkyong.poi.word;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
public class UpdateDocument {
public static void main(String[] args) throws IOException {
UpdateDocument obj = new UpdateDocument();
obj.updateDocument(
"c:\\test\\template.docx",
"c:\\test\\output.docx",
"mkyong");
}
private void updateDocument(String input, String output, String name)
throws IOException {
try (XWPFDocument doc = new XWPFDocument(
Files.newInputStream(Paths.get(input)))
) {
List<XWPFParagraph> xwpfParagraphList = doc.getParagraphs();
//Iterate over paragraph list and check for the replaceable text in each paragraph
for (XWPFParagraph xwpfParagraph : xwpfParagraphList) {
for (XWPFRun xwpfRun : xwpfParagraph.getRuns()) {
String docText = xwpfRun.getText(0);
//replacement and setting position
docText = docText.replace("${name}", name);
xwpfRun.setText(docText, 0);
}
}
// save the docs
try (FileOutputStream out = new FileOutputStream(output)) {
doc.write(out);
}
}
}
}
Output
Figure: c:\test\output.docx
Further Reading
For more examples, please refer to this official Apache POI XWPF examples
Download Source Code
$ git clone https://github.com/mkyong/apache-poi-examples/
$ cd word
i need to get the page layout of first page weather it is landscape or portrait . can i do that using XWPF . i tried a lot but did’t find any solution
I want to the outptut is if I have a file upload field and i upload a word file so after click the upload button it is showing in below the form openly not download . i use jsp for frontend so kindly reply me.
How did you created and posted tutorial here?