Main Tutorials

Java – Read and Write Microsoft Word with Apache POI

In this article, we’ll learn how to read and write Microsoft Word with the help of Apache POI library.

Topics

  1. Apache POI – HWPF and XWPF
  2. Apache POI dependencies
  3. Create a word document
  4. Working with Paragraphs, Fonts, and Page breaks
  5. Document Header and Footer
  6. Add Image
  7. Create a table
  8. Read and parse document, paragraph, and table
  9. Update document

P.S Tested with Apache POI 4.1.2 and Java 8

1. Apache POI – HWPF and XWPF

Apache POI provides Java APIs for working with Microsoft Documents. We will cover use cases involving paragraphs, fonts, headers, footers, tables, images, write, read, parse and updating the document. Before exploring use cases, let’s cover basic concepts involving the Apache POI library.

1.1 HWPF (Horrible Word Processor Format)
These APIs are for older Word 6 and Word 95 file formats, .doc documents.

1.2 XWPF (XML Word Processor Format)
These APIs are for new Word 2007, .docx documents.

Component Description
XWPFDocument It provides high level APIs to create/edit .docx word files
XWPFParagraph It is used for working with Paragraphs
XWPFRun It defines a region of text with a common set of properties. It is also used for editing font properties
XWPFTable It is used to describe contents of table component present in word file
XWPFTableRow It defines rows present within table component
XWPFWordExtractor It is used to extract text from word file

P.S This article will focus on the latest XWPF APIs, working with the .docx documents.

2. Apache POI dependencies

We only need the poi-ooxml to work with Microsoft word.

pom.xml

  <dependency>
      <groupId>org.apache.poi</groupId>
      <artifactId>poi-ooxml</artifactId>
      <version>4.1.2</version>
  </dependency>

3. Create a word document

This example shows how to create a Microsoft Word .docx file. We use XWPFParagraph to create paragraphs and XWPFRun to create text.

CreateDocumentSimple.java

package com.mkyong.poi.word;

import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import java.io.FileOutputStream;
import java.io.IOException;

public class CreateDocumentSimple {

    public static void main(String[] args) throws IOException {

        String fileName = "c:\\test\\hello.docx";

        try (XWPFDocument doc = new XWPFDocument()) {

            // create a paragraph
            XWPFParagraph p1 = doc.createParagraph();
            p1.setAlignment(ParagraphAlignment.CENTER);

            // set font
            XWPFRun r1 = p1.createRun();
            r1.setBold(true);
            r1.setItalic(true);
            r1.setFontSize(22);
            r1.setFontFamily("New Roman");
            r1.setText("I am first paragraph.");

            // save it to .docx file
            try (FileOutputStream out = new FileOutputStream(fileName)) {
                doc.write(out);
            }

        }

    }
}

Output

create a document

4. Working with paragraphs, fonts, and page breaks

We can use XWPFRun for setting the styling properties of the font. It contains methods for changing font properties such as color, font size, italic, bold, capitalized, embossed, and many more.

We are creating four paragraphs and setting different font properties, line breaks, and page breaks for each of them.

WorkingParagraphFonts.java

package com.mkyong.poi.word;

import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import java.io.FileOutputStream;
import java.io.IOException;

public class WorkingParagraphFonts {

    public static void main(String[] args) throws IOException {

        try (XWPFDocument doc = new XWPFDocument()) {

            XWPFParagraph p1 = doc.createParagraph();
            p1.setAlignment(ParagraphAlignment.CENTER);

            // Set Text to Bold and font size to 22 for first paragraph
            XWPFRun r1 = p1.createRun();
            r1.setBold(true);
            r1.setItalic(true);
            r1.setFontSize(22);
            r1.setText("I am first paragraph. My Text is bold, italic, Courier and capitalized");
            r1.setFontFamily("Courier");

            XWPFParagraph p2 = doc.createParagraph();
            //Set color for second paragraph
            XWPFRun r2 = p2.createRun();
            r2.setText("I am second paragraph. My Text is Red in color and is embossed");
            r2.setColor("ff0000");
            r2.setEmbossed(true);

            XWPFParagraph p3 = doc.createParagraph();
            //Set strike for third paragraph and capitalization
            XWPFRun r3 = p3.createRun();
            r3.setStrikeThrough(true);
            r3.setCapitalized(true);
            r3.setText("I am third paragraph. My Text is strike through and is capitalized");

            XWPFParagraph p4 = doc.createParagraph();
            p4.setWordWrapped(true);
            p4.setPageBreak(true);  // new page break
            p4.setIndentationFirstLine(600);

            XWPFRun r4 = p4.createRun();
            r4.setFontSize(40);
            r4.setItalic(true);
            //r4.setTextPosition(100);
            r4.setText("Line 1");
            r4.addBreak();
            r4.setText("Line 2");
            r4.addBreak();
            r4.setText("Line 3");

            // save the docs
            try (FileOutputStream out = new FileOutputStream("c:\\test\\simple.docx")) {
                doc.write(out);
            }

        }

    }

}

word - set font page 1

Figure: Page 1

word - set font page 2

Figure: Page 2

This example creates the document’s header and footer for all pages.

CreateDocumentHeader.java

package com.mkyong.poi.word;

import org.apache.poi.wp.usermodel.HeaderFooterType;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFFooter;
import org.apache.poi.xwpf.usermodel.XWPFHeader;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

public class CreateDocumentHeader {

    public static void main(String[] args) throws IOException {

        try (XWPFDocument doc = new XWPFDocument()) {

            XWPFParagraph p = doc.createParagraph();
            XWPFRun r = p.createRun();
            r.setBold(true);
            r.setFontSize(30);
            r.setText("Create document header and footer!");

            // next page
            XWPFParagraph p2 = doc.createParagraph();
            p2.setWordWrapped(true);
            p2.setPageBreak(true);  // new page break

            XWPFRun r2 = p2.createRun();
            r2.setFontSize(40);
            r2.setItalic(true);
            r2.setText("New Page");

            // document header and footer
            XWPFHeader head = doc.createHeader(HeaderFooterType.DEFAULT);
            head.createParagraph()
                    .createRun()
                    .setText("This is document header");

            XWPFFooter foot = doc.createFooter(HeaderFooterType.DEFAULT);
            foot.createParagraph()
                    .createRun()
                    .setText("This is document footer");

            try (OutputStream os = new FileOutputStream(new File("c:\\test\\header.docx"))) {
                doc.write(os);
            }
        }

    }
}

document header and footer - page 1

Figure: Page 1

document header and footer - page 2

Figure: Page 2

6. Add an image

This example adds an image to the document.

AddImage.java

package com.mkyong.poi.word;

import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.util.Units;
import org.apache.poi.xwpf.usermodel.Document;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class AddImage {

    public static void main(String[] args) throws IOException, InvalidFormatException {

        String imgFile = "c:\\test\\google.png";

        try (XWPFDocument doc = new XWPFDocument()) {

            XWPFParagraph p = doc.createParagraph();
            XWPFRun r = p.createRun();
            r.setText(imgFile);
            r.addBreak();

            // add png image
            try (FileInputStream is = new FileInputStream(imgFile)) {
                r.addPicture(is,
                        Document.PICTURE_TYPE_PNG,    // png file
                        imgFile,
                        Units.toEMU(400),
                        Units.toEMU(200));            // 400x200 pixels
            }

            try (FileOutputStream out = new FileOutputStream("c:\\test\\images.docx")) {
                doc.write(out);
            }
        }

    }
}

word - add an image

7. Create a table

In this example, we will cover how we can insert a table in the word document file. We will use the XWPFTable component to create and initialize a table element.

This example inserts a table containing 3 rows and 3 columns.

CreateTable.java

package com.mkyong.poi.word;

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;

import java.io.FileOutputStream;

public class CreateTable {

    public static void main(String[] args) throws Exception {

        try (XWPFDocument doc = new XWPFDocument()) {

            XWPFTable table = doc.createTable();

            //Creating first Row
            XWPFTableRow row1 = table.getRow(0);
            row1.getCell(0).setText("First Row, First Column");
            row1.addNewTableCell().setText("First Row, Second Column");
            row1.addNewTableCell().setText("First Row, Third Column");

            //Creating second Row
            XWPFTableRow row2 = table.createRow();
            row2.getCell(0).setText("Second Row, First Column");
            row2.getCell(1).setText("Second Row, Second Column");
            row2.getCell(2).setText("Second Row, Third Column");

            //create third row
            XWPFTableRow row3 = table.createRow();
            row3.getCell(0).setText("Third Row, First Column");
            row3.getCell(1).setText("Third Row, Second Column");
            row3.getCell(2).setText("Third Row, Third Column");

            // save to .docx file
            try (FileOutputStream out = new FileOutputStream("c:\\test\\table.docx")) {
                doc.write(out);
            }

        }

    }

}

word - create a table

P.S Warning for OpenOffice 4.1.7, the table APIs have not formatted the table properly.

8. Read and parse document, paragraph, and table.

This example will cover how we can use Apache POI for extracting and reading out text from documents.

8.1 We can extract the entire text from document by using XWPFWordExtractor. The file simple.docx is generated from the example 4. Working with paragraphs, fonts, and page breaks.

ReadParseDocument1.java

package com.mkyong.poi.word;

import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class ReadParseDocument1 {

    public static void main(String[] args) throws IOException {

        String fileName = "c:\\test\\simple.docx";

        try (XWPFDocument doc = new XWPFDocument(
                Files.newInputStream(Paths.get(fileName)))) {

            XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(doc);
            String docText = xwpfWordExtractor.getText();
            System.out.println(docText);

            // find number of words in the document
            long count = Arrays.stream(docText.split("\\s+")).count();
            System.out.println("Total words: " + count);

        }

    }

}

Output

Terminal

I am first paragraph. My Text is bold, italic, Courier and capitalized
I am second paragraph. My Text is Red in color and is embossed
I am third paragraph. My Text is strike through and is capitalized
Line 1
Line 2
Line 3

Total words: 43

8.2 In this example, we use the doc.getParagraphs() to a list of the document’s paragraphs. Once we get the list, we can iterate over individual paragraph elements and get the text.

ReadParseDocument2.java

package com.mkyong.poi.word;

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;

public class ReadParseDocument2 {

    public static void main(String[] args) throws IOException {

        String fileName = "c:\\test\\simple.docx";

        try (XWPFDocument doc = new XWPFDocument(
                Files.newInputStream(Paths.get(fileName)))) {

            // output the same as 8.1
            List<XWPFParagraph> list = doc.getParagraphs();
            for (XWPFParagraph paragraph : list) {
                System.out.println(paragraph.getText());
            }

        }

    }

}

8.3 The file table.docx is generated from the example 7. Create table. We need to fetch the list of all elements in the file and then iterate over it to check for the TABLE component. Once we find the TABLE component, we can iterate over it to fetch table rows and finally iterate over the rows to extract the table’s values.

ReadParseDocumentTable.java

package com.mkyong.poi.word;

import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Iterator;
import java.util.List;

public class ReadParseDocumentTable {

    public static void main(String[] args) throws IOException {

        String fileName = "c:\\test\\table.docx";

        try (XWPFDocument doc = new XWPFDocument(
                Files.newInputStream(Paths.get(fileName)))) {

            /*XWPFWordExtractor xwpfWordExtractor = new XWPFWordExtractor(doc);
            String docText = xwpfWordExtractor.getText();
            System.out.println(docText);*/

            Iterator<IBodyElement> docElementsIterator = doc.getBodyElementsIterator();

            //Iterate through the list and check for table element type
            while (docElementsIterator.hasNext()) {
                IBodyElement docElement = docElementsIterator.next();
                if ("TABLE".equalsIgnoreCase(docElement.getElementType().name())) {
                    //Get List of table and iterate it
                    List<XWPFTable> xwpfTableList = docElement.getBody().getTables();
                    for (XWPFTable xwpfTable : xwpfTableList) {
                        System.out.println("Total Rows : " + xwpfTable.getNumberOfRows());
                        for (int i = 0; i < xwpfTable.getRows().size(); i++) {
                            for (int j = 0; j < xwpfTable.getRow(i).getTableCells().size(); j++) {
                                System.out.println(xwpfTable.getRow(i).getCell(j).getText());
                            }
                        }
                    }
                }
            }

        }

    }

}

Output

Terminal

Total Rows : 3
First Row, First Column
First Row, Second Column
First Row, Third Column
Second Row, First Column
Second Row, Second Column
Second Row, Third Column
Third Row, First Column
Third Row, Second Column
Third Row, Third Column

9. Update document

In this example, we will use a pre-populated template file template.docx. The template file contains the following text Hello ${name}, Congratulations, you are learning something new!. And we will replace the string ${name} with a predefined input and create a new file output.docx with the updated contents.

word- update document - input

Figure: c:\test\template.docx

UpdateDocument.java

package com.mkyong.poi.word;

import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;

public class UpdateDocument {

    public static void main(String[] args) throws IOException {

        UpdateDocument obj = new UpdateDocument();

        obj.updateDocument(
                  "c:\\test\\template.docx",
                  "c:\\test\\output.docx",
                  "mkyong");
    }

    private void updateDocument(String input, String output, String name)
        throws IOException {

        try (XWPFDocument doc = new XWPFDocument(
                Files.newInputStream(Paths.get(input)))
        ) {

            List<XWPFParagraph> xwpfParagraphList = doc.getParagraphs();
            //Iterate over paragraph list and check for the replaceable text in each paragraph
            for (XWPFParagraph xwpfParagraph : xwpfParagraphList) {
                for (XWPFRun xwpfRun : xwpfParagraph.getRuns()) {
                    String docText = xwpfRun.getText(0);
                    //replacement and setting position
                    docText = docText.replace("${name}", name);
                    xwpfRun.setText(docText, 0);
                }
            }

            // save the docs
            try (FileOutputStream out = new FileOutputStream(output)) {
                doc.write(out);
            }

        }

    }

}

Output

word- update document - output

Figure: c:\test\output.docx

Further Reading
For more examples, please refer to this official Apache POI XWPF examples

Download Source Code

References

About Author

author image
Nikita Masand is a Software Engineer and a technical writer. She likes to write articles on Java, Spring boot, Data Structures and Algorithms, and System Design.

Comments

Subscribe
Notify of
3 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
lalit dhami
3 years ago

i need to get the page layout of first page weather it is landscape or portrait . can i do that using XWPF . i tried a lot but did’t find any solution

Soumyaranjan Das
1 year ago

I want to the outptut is if I have a file upload field and i upload a word file so after click the upload button it is showing in below the form openly not download . i use jsp for frontend so kindly reply me.

Prateek
3 years ago

How did you created and posted tutorial here?