In this post, I’ll show you how to scrap/fetch the source code of any webpage using Selenium Java.
Method Details – getPageSource()
java.lang.String getPageSource()
- Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server.
- The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist’s impression.
Returns:
The source of the current page
getPageSource() is part of WebDriver interface – You can get more details from Here.
Example Program to scrap the Source code of Amazon home page
package swdbasics; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; public class GetPageSourceCode { public static void main(String[] args) throws InterruptedException { // inform selenium about driver path System.setProperty("webdriver.chrome.driver", "D:\\browser-drivers\\chromedriver.exe"); // instantiate the ChromeDriver class WebDriver driver = new ChromeDriver(); // launch the site driver.get("https://amazon.com"); //maximize the browser window driver.manage().window().maximize(); // wait for sometime Thread.sleep(4*1000); // get source code String src_code = driver.getPageSource(); // this method will fetch the source code // print source code System.out.println("*********************************"); System.out.println(src_code); System.out.println("***********************************"); // wait for sometime Thread.sleep(3000); //quit the browser driver.quit(); } }
Happy Learning 🙂