Jiakai's Blog | Xiaohongshu RSS Solutions

Xiaohongshu RSS Solutions

2024-09-21

#xiaohongshu, #rss

903 Words

5 min

Update (2025.7.7)

Not recommended to scrape Xiaohongshu RSS full text [remove fulltext suffix from RSS link]—too easy to be targeted by anti-crawling. Account repeatedly hitting captcha popups is very annoying.

Not denying Xiaohongshu’s convenience for searching Chinese information, but can’t ignore the platform’s drawbacks—walled garden, Instagram clone…

Maybe someday I’ll completely abandon Xiaohongshu RSS tinkering—after all, many shares on that platform aren’t first-hand sources, and the posing girls are lacking too.

Update (2025.1.31)

Xiaohongshu anti-crawling is strict. With cookies, suggest rate limiting [add CACHE_EXPIRE and CACHE_CONTENT_EXPIRE environment variables]. Otherwise initially you get multiple images, later only cover image, then some routes just show Error. via: https://github.com/DIYgod/RSSHub/issues/17912

Added new cookie value—notes have multiple images again.

Added new cookie value, notes have multiple images again

Update (2024.12.11)

While cleaning inbox at noon, found someone asking me about Xiaohongshu follow subscription image display issues.

After checking RSSHub project’s official docs and GitHub issues, understood that self-deployed RSSHub instances can add XIAOHONGSHU_COOKIE environment variable to enable Xiaohongshu full text scraping.

Effect shown:

Xiaohongshu full text scraping effect 1

Xiaohongshu full text scraping effect 2

Official RSSHub instance hasn’t added XIAOHONGSHU_COOKIE for full text scraping yet [as of December 11, 2024].

Steps:

Open Xiaohongshu homepage, right-click select Inspect, after opening developer console, refresh page, select Network tab, select explore request, Headers—>Request Headers—>copy the long Cookie value.

Get Xiaohongshu cookie

RSSHub docker-compose.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


services:
    rsshub:
        image: diygod/rsshub
        restart: always
        ports:
            - '127.0.0.1:1200:1200'
        environment:
            NODE_ENV: production
            CACHE_TYPE: redis
            REDIS_URL: 'redis://redis:6379/'
            PUPPETEER_WS_ENDPOINT: 'ws://browserless:3000'  # marked
            YOUTUBE_KEY: 'xxx'
        env_file: ".env"
        depends_on:
            - redis
            - browserless  # marked

    browserless:  # marked
        image: browserless/chrome  # marked
        restart: always  # marked
        ulimits:  # marked
          core:  # marked
            hard: 0  # marked
            soft: 0  # marked

    redis:
        image: redis:alpine
        restart: always
        volumes:
            - /root/stacks/rsshub/redis-data:/data

Create file named .env.

.env file content [one long string]:

1

XIAOHONGSHU_COOKIE="abRequestId=xxx"

Finally docker compose down && docker compose up -d command to rebuild RSSHub container.

Claude taught me this—implementation might not be elegant [cookie fed to Claude had a few letters changed, didn’t feed real cookie value].

Claude’s guidance

URLs like below can achieve full text subscription to Xiaohongshu notes.

example: https://self-deployed-rsshub-link/xiaohongshu/user/xiaohongshu-userid/notes/fulltext

My email reply:

Email reply

Update (2024.10.30)

So RSSHub supports Xiaohongshu notes. This morning saw someone share RSSHub author Diygod’s new work—Follow Xiaohongshu beauty list. I was stunned—in my impression it wasn’t supported.

RSSHub supports Xiaohongshu

Checked docs and tried—really works. Self-deployed RSSHub instance also works. I declare Distill Web Monitor retired immediately. Below is obsolete—unless someday RSSHub Xiaohongshu errors, then consider below method.

RSSHub docs introduction to Xiaohongshu notes

Thanks to developers’ pull requests—probably RSSHub’s Xiaohongshu was recently revived.

Thanks to developers’ pull requests

Saw an open pull request adding cookie support to Xiaohongshu. Actually Xiaohongshu has anti-crawling mechanism—frequent crawling triggers captcha.

Developer trying to add cookie support to RSSHub Xiaohongshu

Update (2024.09.24)

If you find Distill Web Monitor’s opened Xiaohongshu blogger homepage hasn’t closed for a while, definitely captcha appeared—manually rotate to solve.

Background

First half of year deployed RSSWorker—attracted by README mentioning Xiaohongshu support. This filled RSSHub’s gap for Xiaohongshu.

I enthusiastically added RSS subscription links for some Xiaohongshu bloggers (including my roommate) to RSS readers. However, checking daily showed no updates. Tried multiple RSS readers—from Tiny Tiny RSS to Miniflux, finally even Android app Read You—same result, none showed Xiaohongshu’s latest content.

But opening RSSWorker-generated Xiaohongshu RSS showed new items.

RSSWorker-generated Xiaohongshu RSS shows new items

With Claude 3.5 Sonnet’s help, I briefly understood the author’s Xiaohongshu RSS scraping code. Author extracts data from JSON data in <script> tags—code already gets all information Xiaohongshu provides to anonymous visitors when not logged in.

Xiaohongshu script tag provided JSON data

Analyzing RSSWorker-generated Xiaohongshu RSS link carefully, you’ll find although all items are scraped, each item has these issues:

Missing pubDate

Scraping Xiaohongshu blogger homepage without login can’t get each note’s publish time. Missing publish time may cause RSS readers to not update correctly.

Each note’s link all point to blogger’s homepage link

Scraping Xiaohongshu blogger homepage without login can’t get each note’s unique link. Missing unique links for each item is another reason RSS readers don’t update.

RSSWorker project issues also have users reporting Xiaohongshu RSS problems. My roommate has no pinned posts—RSS reader still doesn’t update.

User feedback on RSSWorker project Xiaohongshu RSS issues

Solution

My solution uses Distill Web Monitor, a web monitoring tool.

Distill Web Monitor

Every 24 hours automatically opens monitored Xiaohongshu homepage. If updated, plugin icon shows red dot.

Monitored object has updates, Distill Web Monitor shows red dot

After installing this Chrome extension, pin it to Chrome’s upper right extensions list.

Open the Xiaohongshu blogger homepage to monitor. Click Distill Web Monitor plugin icon, click Monitor parts of page.

Mouse select a note’s title, select XPath, then remove the index from subsequent XPath expression.

Monitor Xiaohongshu blogger homepage step 1

You can see the right side already shows all note titles matched by XPath. Click Save selections.

Monitor Xiaohongshu blogger homepage step 2

Schedule checks—select 1 day or more. After all, social media update frequency isn’t high.

You can also make other settings, like I removed sound notification in Actions. Finally click Save.

Monitor Xiaohongshu blogger homepage step 3

Then every 24 hours or so, this plugin automatically opens blogger’s homepage, checks for updates—if updated, shows red dot on plugin icon. Auto-opens then auto-closes.

This is my current solution for getting Xiaohongshu blogger updates. Of course He Tongxue was just for demo—I’ll delete that monitor after demo. I mainly use this method to get my roommate’s Xiaohongshu updates.

Generally, I use Xiaohongshu as Chinese search engine—sometimes content is much more useful than regular search engines. Like today I searched learning content there, saw others’ exchanges or debates, also gave me new learning ideas.

Xiaohongshu = Chinese search engine