개발/Jsoup

Web Crawling

Zziny 2021. 10. 26. 21:03

1. maven repository에서 Jsoup 가져오기 -> pom.xml

 

2. Jsoup api doc 참고

https://jsoup.org/apidocs/

 

Overview (jsoup Java HTML Parser 1.14.3 API)

jsoup: Java HTML parser that makes sense of real-world HTML soup. jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CS

jsoup.org

 

활용 예

public void jsoupTest() throws IOException {
		
    Element bodyElement = Jsoup.connect("https://comic.naver.com/webtoon/weekday").get().body();
    Elements aTagList = bodyElement.select("#content > div.webtoon_spot2 > ul > li > div > a >img");

    for (Element element : aTagList) {
    	logger.debug(element.attr("title"));
    }
        
}
public void jsoupTest() throws IOException {
		
    Elements bodyElements = Jsoup.connect("http://www.khoa.go.kr/oceangrid/koofs/kor/observation/obs_real_list.do")
        		.get().select("li .rig_value02");

        for (Element element : bodyElements) {
        logger.debug(element.attr("title").toString());
    	}
        
}