html_node() and html_nodes() find HTML tags (nodes) using CSS selectors or XPath expressions.

html_nodes(x, css, xpath)

html_node(x, css, xpath)



Either a document, a node set or a single node.

css, xpath

Nodes to select. Supply one of css or xpath depending on whether you want to use a CSS or XPath 1.0 selector.


html_node() returns a nodeset the same length as the input. html_nodes() flattens the output so there's no direct way to map the output to the input.


CSS selectors are particularly useful in conjunction with, which makes it very easy to discover the selector you need. If you haven't used CSS selectors before, I'd recommend starting with the the fun tutorial at

CSS selector support

CSS selectors are translated to XPath selectors by the selectr package, which is a port of the python cssselect library,

It implements the majority of CSS3 selectors, as described in The exceptions are listed below:

  • Pseudo selectors that require interactivity are ignored: :hover, :active, :focus, :target, :visited.

  • The following pseudo classes don't work with the wild card element, *: *:first-of-type, *:last-of-type, *:nth-of-type, *:nth-last-of-type, *:only-of-type

  • It supports :contains(text)

  • You can use !=, [foo!=bar] is the same as :not([foo=bar])

  • :not() accepts a sequence of simple selectors, not just a single simple selector.


url <- paste0( "", "" ) ateam <- read_html(url) html_nodes(ateam, "center")
#> {xml_nodeset (1)} #> [1] <center><table border="0" cellspacing="1" cellpadding="4" bgcolor="#dcdcd ...
html_nodes(ateam, "center font")
#> {xml_nodeset (1)} #> [1] <font size="4">Domestic Total Gross: <b>$77,222,099</b></font>
html_nodes(ateam, "center font b")
#> {xml_nodeset (1)} #> [1] <b>$77,222,099</b>
# html_nodes() well suited to use with the pipe ateam %>% html_nodes("center") %>% html_nodes("td")
#> {xml_nodeset (7)} #> [1] <td align="center" colspan="2"><font size="4">Domestic Total Gross: <b>$7 ... #> [2] <td valign="top">Distributor: <b><a href="/web/20190202054736/https://www ... #> [3] <td valign="top">Release Date: <b><nobr><a href="/web/20190202054736/http ... #> [4] <td valign="top">Genre: <b>Action</b>\n</td>\n #> [5] <td valign="top">Runtime: <b>1 hrs. 57 min.</b>\n</td> #> [6] <td valign="top">MPAA Rating: <b>PG-13</b>\n</td>\n #> [7] <td valign="top">Production Budget: <b>$110 million</b>\n</td>
ateam %>% html_nodes("center") %>% html_nodes("font")
#> {xml_nodeset (1)} #> [1] <font size="4">Domestic Total Gross: <b>$77,222,099</b></font>
td <- ateam %>% html_nodes("center") %>% html_nodes("td") td
#> {xml_nodeset (7)} #> [1] <td align="center" colspan="2"><font size="4">Domestic Total Gross: <b>$7 ... #> [2] <td valign="top">Distributor: <b><a href="/web/20190202054736/https://www ... #> [3] <td valign="top">Release Date: <b><nobr><a href="/web/20190202054736/http ... #> [4] <td valign="top">Genre: <b>Action</b>\n</td>\n #> [5] <td valign="top">Runtime: <b>1 hrs. 57 min.</b>\n</td> #> [6] <td valign="top">MPAA Rating: <b>PG-13</b>\n</td>\n #> [7] <td valign="top">Production Budget: <b>$110 million</b>\n</td>
# When applied to a list of nodes, html_nodes() returns all matching nodes # beneath any of the elements, flattening results into a new nodelist. td %>% html_nodes("font")
#> {xml_nodeset (1)} #> [1] <font size="4">Domestic Total Gross: <b>$77,222,099</b></font>
# html_node() returns the first matching node. If there are no matching # nodes, it returns a "missing" node td %>% html_node("font")
#> {xml_nodeset (7)} #> [1] <font size="4">Domestic Total Gross: <b>$77,222,099</b></font> #> [2] <NA> #> [3] <NA> #> [4] <NA> #> [5] <NA> #> [6] <NA> #> [7] <NA>
# To pick out an element or elements at specified positions, use [[ and [ ateam %>% html_nodes("table") %>% .[[1]] %>% html_nodes("img")
#> {xml_nodeset (6)} #> [1] <img src=" ... #> [2] <img src="// ... #> [3] <img src="// ... #> [4] <img src="/web/20190202054736im_/ ... #> [5] <img src="/web/20190202054736im_/ ... #> [6] <img src=" ...
ateam %>% html_nodes("table") %>% .[1:2] %>% html_nodes("img")
#> {xml_nodeset (6)} #> [1] <img src=" ... #> [2] <img src="// ... #> [3] <img src="// ... #> [4] <img src="/web/20190202054736im_/ ... #> [5] <img src="/web/20190202054736im_/ ... #> [6] <img src=" ...