How to validate HTML tag with regular expression
HTML tag Regular Expression Pattern
<("[^"]*"|'[^']*'|[^'">])*>
Description
< #start with opening tag "<"
( # start of group #1
"[^"]*" # allow string with double quotes enclosed - "string"
| # ..or
'[^']*' # allow string with single quote enclosed - 'string'
| # ..or
[^'">] # cant contains one single quotes, double quotes and ">"
) # end of group #1
* # 0 or more
> #end with closing tag ">"
HTML tag, start with an opening tag “<" , follow by double quotes "string", or single quotes 'string' but does not allow one double quotes (") "string, one single quote (') 'string or a closing tag > without single or double quotes enclosed. At last , end with a closing tag “>”
Java Regular Expression Example
package com.mkyong.regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HTMLTagValidator{
private Pattern pattern;
private Matcher matcher;
private static final String HTML_TAG_PATTERN = "<(\"[^\"]*\"|'[^']*'|[^'\">])*>";
public HTMLTagValidator(){
pattern = Pattern.compile(HTML_TAG_PATTERN);
}
/**
* Validate html tag with regular expression
* @param tag html tag for validation
* @return true valid html tag, false invalid html tag
*/
public boolean validate(final String tag){
matcher = pattern.matcher(tag);
return matcher.matches();
}
}
HTML tag that match:
1. “<b>” , “<input value=’>’>”
2. “<input value='<‘>” , “<b/>”
3. “<a href=’http://www.google.com’>”
4. “<br>” , “<br/>”
5. “<input value=\”\” id=’test’>” , “<input value=” id=’test’>”
HTML tag doesn’t match:
1. “<input value=\” id=’test’>” – one double quotes string is not allow
2. “<input value=’ id=’test’>” – one single quotes string is not allow
3. “<input value=> >” – single opening tag > is not allow , have to enclose with single or double quotes
Unit Test – HTMLTagValidatorTest
package com.mkyong.regex;
import org.testng.Assert;
import org.testng.annotations.*;
/**
* HTMLTag validator Testing
* @author mkyong
*
*/
public class HTMLTagValidatorTest {
private HTMLTagValidator htmlTagValidator;
@BeforeClass
public void initData(){
htmlTagValidator = new HTMLTagValidator();
}
@DataProvider
public Object[][] ValidHTMLTagProvider() {
return new Object[][]{
new Object[] {"<b>"},
new Object[] {"<input value='>'>"},
new Object[] {"<input value='<'>"},
new Object[] {"<b/>"},
new Object[] {"<a href='http://www.google.com'>"},
new Object[] {"<br>"},
new Object[] {"<br/>"},
new Object[] {"<input value=\"\" id='test'>"},
new Object[] {"<input value='' id='test'>"}
};
}
@DataProvider
public Object[][] InvalidHTMLTagProvider() {
return new Object[][]{
new Object[] {"<input value=\" id='test'>"},
new Object[] {"<input value=' id='test'>"},
new Object[] {"<input value=> >"}
};
}
@Test(dataProvider = "ValidHTMLTagProvider")
public void ValidHTMLTagTest(String tag) {
boolean valid = htmlTagValidator.validate(tag);
System.out.println("HTMLTag is valid : " + tag + " , " + valid);
Assert.assertEquals(true, valid);
}
@Test(dataProvider = "InvalidHTMLTagProvider",
dependsOnMethods="ValidHTMLTagTest")
public void InValidHTMLTagTest(String tag) {
boolean valid = htmlTagValidator.validate(tag);
System.out.println("HTMLTag is valid : " + tag + " , " + valid);
Assert.assertEquals(false, valid);
}
}
Unit Test – Result
HTMLTag is valid : <b> , true
HTMLTag is valid : <input value='>'> , true
HTMLTag is valid : <input value='<'> , true
HTMLTag is valid : <b/> , true
HTMLTag is valid : <a href='http://www.google.com'> , true
HTMLTag is valid : <br> , true
HTMLTag is valid : <br/> , true
HTMLTag is valid : <input value="" id='test'> , true
HTMLTag is valid : <input value='' id='test'> , true
HTMLTag is valid : <input value=" id='test'> , false
HTMLTag is valid : <input value=' id='test'> , false
HTMLTag is valid : <input value=> > , false
PASSED: ValidHTMLTagTest("<b>")
PASSED: ValidHTMLTagTest("<input value='>'>")
PASSED: ValidHTMLTagTest("<input value='<'>")
PASSED: ValidHTMLTagTest("<b/>")
PASSED: ValidHTMLTagTest("<a href='http://www.google.com'>")
PASSED: ValidHTMLTagTest("<br>")
PASSED: ValidHTMLTagTest("<br/>")
PASSED: ValidHTMLTagTest("<input value="" id='test'>")
PASSED: ValidHTMLTagTest("<input value='' id='test'>")
PASSED: InValidHTMLTagTest("<input value=" id='test'>")
PASSED: InValidHTMLTagTest("<input value=' id='test'>")
PASSED: InValidHTMLTagTest("<input value=> >")
===============================================
com.mkyong.regex.HTMLTagValidatorTest
Tests run: 12, Failures: 0, Skips: 0
===============================================
===============================================
mkyong
Total tests run: 12, Failures: 0, Skips: 0
===============================================
Want to learn more about regular expression? Highly recommend this best and classic book – “Mastering Regular Expression”
This is not working for me, I am getting false always the given text with html or without html tags