How to get URL content in Java
In this Java example, we show you how to get content of a page from URL “mkyong.com” and save it into local file drive, named “test.html”.
package com.mkyong;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
public class GetURLContent {
public static void main(String[] args) {
URL url;
try {
// get URL content
url = new URL("https://mkyong.com");
URLConnection conn = url.openConnection();
// open the stream and put it into BufferedReader
BufferedReader br = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
String inputLine;
//save to this filename
String fileName = "/users/mkyong/test.html";
File file = new File(fileName);
if (!file.exists()) {
file.createNewFile();
}
//use FileWriter to write file
FileWriter fw = new FileWriter(file.getAbsoluteFile());
BufferedWriter bw = new BufferedWriter(fw);
while ((inputLine = br.readLine()) != null) {
bw.write(inputLine);
}
bw.close();
br.close();
System.out.println("Done");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Yes,I know to extract links from a URL but practically I don’t knw how to implement a code to extract data from the obtained links…please help me with this anyone
I have written a program that takes data from a certain API. But I want to continuously fetch data from the API and save it to a file such that I append the data (with some modifications) I get in the second instance to the first one, and so on, and that too for over thousand plus requests. A quick reply would be appreciated.
hello, I want to retrieve the title from mvc framework which using the file tiles.xml to assign title and set body content.
I only have to pass trough the url to be able to retrieve the tiltle.
Any idea please?
thanks in advance
i need to get all the webpages of the websites no just the current webpage
what is the junit testing case for above program?
I want to read only text content from web page not java script, css html tag.So how should we the code.
used so many pattern to replace all the things as a space, but its not working.
i need help in peer to peer networking in android language.. plz help me
Hi MKYONG
how to test this program with JUNIT TEST.Please tell me the steps.
hey did u get the JUNIT Test Case for dis program then plz do mail me on [email protected] ASAP… ITs urgent
Using this code I am able to get the source code of site but the source code I found in not complete some part of the site is messing
Please give me a solution that I will get the full source code from the site.
i am also getting only part of the HTML source.. I am using BufferedReader to read the inputStream..
Hi,
I’m trying to save the html content from web services to sd card in android,and i have the list of url’s that each contain corresponding html page how to download it to web services to sd card in android.
I am getting
java.io.IOException: Access is denied exception.
Is there a way to bypass authentication
Sorry thats working, looks like I had permision issues with creating file.
How to I get it so check the website at regular times, for example, every five minutes?
Thanks!
just put the code line
main(args);
at the end of the program.
Thank`s, that was really good! You do a great job!
Hi,
Thanks for your mkyong , but this code doesnt work for https sites
Any idea how to go about it
Thank You from these informations
nice post,
thanx
I want to write a program that gets my notifications from facebook. Do you think the code above would work?
Not much comment on this, since I don’t know how FB handling the notification, look like it’s using the “push” technique to display the notification, well, you can try, but I don’t think above simple solution will work well, you may need to add the extra Facebook authentication handling.
Thank you, you are right . I ended up using restFB.
Hi,
When I try the above program, I am getting error as below:
java.net.ConnectException: Connection timed out: connect
Please help
Im sorry about my last reply. I misread your comment. A connection timeout can mostly occur if you have an inconsistent internet connection or if the site is having trouble. I just tried out the code a few minutes ago and i had no problems.
Hello Jawahar, I just tried out the above code and it seemed to work for me. The only part that that wasn’t included with the html document was the sites images. If you don’t mind me asking did you change the out put directory from String fileName = “/users/mkyong/test.html”; to your information?
Thank you, it works like a charm 🙂
i want to get data of who’s who r hitting my website..?? how can i..??..plz help..??
Hello Sir,
I am getting below error when I execute the above program, please guide.
java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.(Unknown Source)
at sun.net.www.http.HttpClient.New(Unknown Source)
at sun.net.www.http.HttpClient.New(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at GetURLContent.main(GetURLContent.java:25)
Need to check the internet connectivity
Hi,
When I am trying to execute the above code in my local machine(Windows 7) it is working fine, but when I am trying to execute the same code it is giving
java.net.UnknownHostException: https://mkyong.com
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:175)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
at java.net.Socket.connect(Socket.java:546)
at java.net.Socket.connect(Socket.java:495)
at sun.net.NetworkClient.doConnect(NetworkClient.java:178)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:409)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
at sun.net.www.http.HttpClient.(HttpClient.java:240)
at sun.net.www.http.HttpClient.New(HttpClient.java:321)
at sun.net.www.http.HttpClient.New(HttpClient.java:338)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:935)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:876)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:801)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at GetURLContent.main(GetURLContent.java:22)
Can you suggest me what would be the reason for this.
Thanks in Advance.
please start your Connect your System with Internet and then try
Thank for this code, it would be great help if you tell me how to get website content in txt file. I was trying but this code showing the HTML codes with the content please help me out
Thanks
I want the reverse case …read file(html content ) from hardisk and display as html file in web (i m using java EE ,hibernate ,jsf2.0,and server glassfish 3+)
Your help will be highly appreciated.
How do I get the program to do this regularly, for example every five minutes?
Use a timer.