Installation and Usage of Goquery

Installation

Execute:

go get github.com/PuerkitoBio/goquery

Import

import "github.com/PuerkitoBio/goquery"

Load the Page

Take the IMDb Popular Movies page as an example:

package main

import (
    "fmt"
    "log"
    "net/http"

    "github.com/PuerkitoBio/goquery"
)

func main() {
    res, err := http.Get("https://www.imdb.com/chart/moviemeter/")
    if err != nil {
        log.Fatal(err)
    }
    defer res.Body.Close()
    if res.StatusCode != 200 {
        log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
    }

Get the Document Object

    doc, err := goquery.NewDocumentFromReader(res.Body)
    if err != nil {
        log.Fatal(err)
    }
    // Other creation methods
    // doc, err := goquery.NewDocumentFromReader(reader io.Reader)
    // doc, err := goquery.NewDocument(url string)
    // doc, err := goquery.NewDocument(strings.NewReader("<p>Example content</p>"))

Select Elements

Element Selector

Select based on basic HTML elements. For example, dom.Find("p") matches all p tags. It supports chained calls:

ele.Find("h2").Find("a")

Attribute Selector

Filter elements by element attributes and values, with multiple matching methods:

Find("div[my]")        // Filter div elements with the my attribute
Find("div[my=zh]")     // Filter div elements whose my attribute is zh
Find("div[my!=zh]")    // Filter div elements whose my attribute is not equal to zh
Find("div[my|=zh]")    // Filter div elements whose my attribute is zh or starts with zh-
Find("div[my*=zh]")    // Filter div elements whose my attribute contains the string zh
Find("div[my~=zh]")    // Filter div elements whose my attribute contains the word zh
Find("div[my$=zh]")    // Filter div elements whose my attribute ends with zh
Find("div[my^=zh]")    // Filter div elements whose my attribute starts with zh

`parent > child` Selector

Filter the child elements under a certain element. For example, dom.Find("div>p") filters the p tags under the div tag.

`element + next` Adjacent Selector

Use it when the elements are irregularly selected, but the previous element has a pattern. For example, dom.Find("p[my=a]+p") filters the adjacent p tags whose my attribute value of the p tag is a.

`element~next` Sibling Selector

Filter the non-adjacent tags under the same parent element. For example, dom.Find("p[my=a]~p") filters the sibling p tags whose my attribute value of the p tag is a.

ID Selector

It starts with # and precisely matches the element. For example, dom.Find("#title") matches the content with id=title, and you can specify the tag dom.Find("p#title").

ele.Find("#title")

Class Selector

It starts with . and filters the elements with the specified class name. For example, dom.Find(".content1"), and you can specify the tag dom.Find("div.content1").

ele.Find(".title")

Selector OR (|) Operation

Combine multiple selectors, separated by commas. Filtering is done if any one of them is satisfied. For example, Find("div,span").

func main() {
    html := `<body>
                <div lang="zh">DIV1</div>
                <span>
                    <div>DIV5</div>
                </span>
            </body>`
    dom, err := goquery.NewDocumentFromReader(strings.NewReader(html))
    if err != nil {
        log.Fatalln(err)
    }
    dom.Find("div,span").Each(func(i int, selection *goquery.Selection) {
        fmt.Println(selection.Html())
    })
}

Filters

`:contains` Filter

Filter elements that contain the specified text. For example, dom.Find("p:contains(a)") filters the p tags that contain a.

dom.Find("div:contains(DIV2)").Each(func(i int, selection *goquery.Selection) {
    fmt.Println(selection.Text())
})

`:has(selector)`

Filter elements that contain the specified element nodes.

`:empty`

Filter elements that have no child elements.

`:first-child` and `:first-of-type` Filters

Find("p:first-child") filters the first p tag; first-of-type requires it to be the first element of that type.

`:last-child` and `:last-of-type` Filters

The opposite of :first-child and :first-of-type.

`:nth-child(n)` and `:nth-of-type(n)` Filters

:nth-child(n) filters the nth element of the parent element; :nth-of-type(n) filters the nth element of the same type.

`:nth-last-child(n)` and `:nth-last-of-type(n)` Filters

Calculate in reverse order, with the last element being the first one.

`:only-child` and `:only-of-type` Filters

Find(":only-child") filters the only child element in the parent element; Find(":only-of-type") filters the only element of the same type.

Get Content

ele.Html()
ele.Text()

Traversal

Use the Each method to traverse the selected elements:

ele.Find(".item").Each(func(index int, elA *goquery.Selection) {
    href, _ := elA.Attr("href")
    fmt.Println(href)
})

Built-in Functions

Array Positioning Functions

Eq(index int) *Selection
First() *Selection
Get(index int) *html.Node
Index...() int
Last() *Selection
Slice(start, end int) *Selection

Extended Functions

Add...()
AndSelf()
Union()

Filtering Functions

End()
Filter...()
Has...()
Intersection()
Not...()

Loop Traversal Functions

Each(f func(int, *Selection)) *Selection
EachWithBreak(f func(int, *Selection) bool) *Selection
Map(f func(int, *Selection) string) (result []string)

Document Modification Functions

After...()
Append...()
Before...()
Clone()
Empty()
Prepend...()
Remove...()
ReplaceWith...()
Unwrap()
Wrap...()
WrapAll...()
WrapInner...()

Attribute Manipulation Functions

Attr*(), RemoveAttr(), SetAttr()
AttrOr(e string, d string)
AddClass(), HasClass(), RemoveClass(), ToggleClass()
Html()
Length()
Size()
Text()

Node Search Functions

Contains()
Is...()

Document Tree Traversal Functions

Children...()
Contents()
Find...()
Next...() *Selection
NextAll() *Selection
Parent[s]...()
Prev...() *Selection
Siblings...()

Type Definitions

Document
Selection
Matcher

Helper Functions

NodeName
OuterHtml

Examples

Getting Started Example

func main() {
    html := `<html>
            <body>
                <h1 id="title">O Captain! My Captain!</h1>
                <p class="content1">
                O Captain! my Captain! our fearful trip is done,
                The ship has weather’d every rack, the prize we sought is won,
                The port is near, the bells I hear, the people all exulting,
                While follow eyes the steady keel, the vessel grim and daring;
                </p>
            </body>
            </html>`
    dom, err := goquery.NewDocumentFromReader(strings.NewReader(html))
    if err != nil {
        log.Fatalln(err)
    }
    dom.Find("p").Each(func(i int, selection *goquery.Selection) {
        fmt.Println(selection.Text())
    })
}

Example of Crawling IMDb Popular Movie Information

package main

import (
    "fmt"
    "log"

    "github.com/PuerkitoBio/goquery"
)

func main() {
    doc, err := goquery.NewDocument("https://www.imdb.com/chart/moviemeter/")
    if err != nil {
        log.Fatal(err)
    }
    doc.Find(".titleColumn a").Each(func(i int, selection *goquery.Selection) {
        title := selection.Text()
        href, _ := selection.Attr("href")
        fmt.Printf("Movie Name: %s, Link: https://www.imdb.com%s\n", title, href)
    })
}

The above examples extract the movie names and link information from the IMDb popular movies page. In actual use, you can adjust the selectors and processing logic according to your needs.

Leapcell: The Next-Gen Serverless Platform for Web Hosting

Finally, I would like to recommend the best platform for deploying Go services: Leapcell

1. Multi-Language Support

Develop with JavaScript, Python, Go, or Rust.

2. Deploy unlimited projects for free

pay only for usage — no requests, no charges.

3. Unbeatable Cost Efficiency

Pay-as-you-go with no idle charges.
Example: $25 supports 6.94M requests at a 60ms average response time.

4. Streamlined Developer Experience

Intuitive UI for effortless setup.
Fully automated CI/CD pipelines and GitOps integration.
Real-time metrics and logging for actionable insights.

5. Effortless Scalability and High Performance

Auto-scaling to handle high concurrency with ease.
Zero operational overhead — just focus on building.

Explore more in the documentation!

Leapcell Twitter: https://x.com/LeapcellHQ