Jsoup – Get favicon from html page

favicons_collection1

There are many ways the favicon can be recognized by the web browser :

Example 1


  <head>
	<link rel="icon" href="http://example.com/image.ico" />
  </head>

Example 2


  <head>
	<link rel="icon" href="http://example.com/image.png" />
  </head>

Example 3 – weird, but Google use it.


  <head>
	<meta content="/images/google_favicon_128.png" itemprop="image" />
  </head>

1. Jsoup Example

Code snippets to get above favicon with Jsoup.

  
	Element element = doc.head().select("link[href~=.*\\.(ico|png)]").first();
	System.out.println(element.attr("href"));

	Element element = doc.head().select("meta[itemprop=image]").first();
	System.out.println(element.attr("content"));

2. Unit Test

Unit test example.


package com.webmitta.analysis;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.testng.Assert;
import org.testng.annotations.Test;

@Test
public class TestFavIcon {

  Document document;

  private String PageProvider1() {

	StringBuffer html = new StringBuffer();

	html.append("<!DOCTYPE html>");
	html.append("<html lang=\"en\">");
	html.append("<head>");
	html.append("<link rel=\"icon\" href=\"http://example.com/image.ico\" />");
	html.append("</head>");
	html.append("<body>");
	html.append("something");
	html.append("</body>");
	html.append("</html>");

	return html.toString();

  }

  private String PageProvider2() {

	StringBuffer html = new StringBuffer();

	html.append("<!DOCTYPE html>");
	html.append("<html lang=\"en\">");
	html.append("<head>");
	html.append("<meta content=\"/images/google_favicon_128.png\" itemprop=\"image\">");
	html.append("</head>");
	html.append("<body>");
	html.append("something");
	html.append("</body>");
	html.append("</html>");

	return html.toString();
  }

  @Test
  void test_FavIcon1() {

	Document doc = Jsoup.parse(PageProvider1());

	Element element = doc.head().select("link[href~=.*\\.(ico|png)]").first();
	Assert.assertEquals(element.attr("href"),
		"http://example.com/image.ico");

  }

  @Test
  void test_FavIcon2() {

	Document doc = Jsoup.parse(PageProvider2());

	Element element = doc.head().select("meta[itemprop=image]").first();
	Assert.assertEquals(element.attr("content"),
		"/images/google_favicon_128.png");

  }

}

Output


[TestNG] Running:
  C:\Users\mkyong2002\AppData\Local\Temp\testng-eclipse--1846902117\testng-customsuite.xml

PASSED: test_FavIcon1
PASSED: test_FavIcon2

References

  1. Wikipedia : Favicon
  2. Jsoup HTML Parser Hello World Examples

mkyong

Founder of Mkyong.com, passionate Java and open-source technologies. If you enjoy my tutorials, consider making a donation to these charities.

2 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Grabicon
11 years ago

This is a great tool, but favicons take 3-4 seconds to retrieve if you’re pulling them each time you need them. If you need them faster, http://grabicon.com caches them, and even allows you to request a specific size. It also generates unique icons for sites that don’t have one.

This provides a uniform user experience for web/mobile applications, because icons will always be the same size, and none will be missing.

fulltimekiller
11 years ago

I am using http://lutils.com/FaviconExtractor.aspx
paste the website url to the text field and click get icon

Easy to use