In this post, I’ll show you how to scrap/fetch the source code of any webpage using Selenium Java.

Method Details – getPageSource()

java.lang.String getPageSource()

  • Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server.
  • The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist’s impression.

Returns:
The source of the current page

getPageSource() is part of WebDriver interface – You can get more details from Here.

Example Program to scrap the Source code of Amazon home page

package swdbasics;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class GetPageSourceCode {

	public static void main(String[] args) throws InterruptedException 
	{
		// inform selenium about driver path
		System.setProperty("webdriver.chrome.driver", "D:\\browser-drivers\\chromedriver.exe");
		// instantiate the ChromeDriver class
		WebDriver driver = new ChromeDriver();
		// launch the site
		driver.get("https://amazon.com");
		//maximize the browser window
		driver.manage().window().maximize();
		// wait for sometime
		Thread.sleep(4*1000);
		// get source code
		String src_code = driver.getPageSource(); // this method will fetch the source code
		// print source code
		System.out.println("*********************************");
		System.out.println(src_code);
		System.out.println("***********************************");
		// wait for sometime
		Thread.sleep(3000);
		//quit the browser
		driver.quit();
	}

}

Happy Learning 🙂