Web page readable content extraction API data interface, intelligent extraction of key element information of articles, intelligent extraction of multiple element information.
1. Product features
- Intelligent extraction of readable content from web pages;
- Provide HTML code for the readable content of the web page;
- Supports passing web page HTML or web page URL parameters;
- Supports multiple element information extraction, including article title, author, text direction, language, content, content (excluding HTML tags, divided by paragraphs), article length, article abstract, website name, article release time;
- Second-level parsing performance, supporting high concurrency;
- Data is continuously updated and maintained;
- Full interface supports HTTPS (TLS v1.0/v1.1/v1.2/v1.3);
- Fully compatible with Apple ATS;
- Nationwide multi-node CDN deployment;
- The interface responds extremely quickly, and multiple servers build API interfaces for load balancing;
- Interface call status and status monitoring
2. API documentation
Interface details: https://www.gugudata.com/api/details/readability
Interface address: https://api.gugudata.com/websitetools/readability
Return format: application/json; charset=utf-8
Request method: POST
Request protocol: HTTPS
Request example: https://api.gugudata.com/websitetools/readability
Data preview: https://www.gugudata.com/preview/readability
Interface test: https://api.gugudata.com/websitetools/readability/demo
3. Request parameters
| Parameter name | Parameter type | Whether it is required | Default value | Remarks |
| :----: | :------: | :------: | :----------: | :----------- --------------------------------------------------: |
| appkey | string | Yes | YOUR_APPKEY | APPKEY obtained after payment |
| html | string | No | YOUR_VALUE | The HTML content of the web page to be extracted, and the parameter url, choose one of the two |
| url | string | No | YOUR_VALUE | The URL address of the web page to be extracted, and the parameter html, choose one of the two. (We do not deal with the problem of being unable to properly request web page content for subsequent processing due to anti-crawling of the origin site) |
4. Return parameters
| Parameter name | Parameter type | Remarks |
| :--------------------------: | :------: | :---------- ----------------------------: |
| DataStatus.RequestParameter | string | Interface request parameters |
| DataStatus.StatusCode | int | Status code returned by the interface |
| DataStatus.StatusDescription | string | Interface return status description |
| DataStatus.ResponseDateTime | string | Interface data return time |
| DataStatus.DataTotalCount | int | The total amount of data under this condition, generally used for paging calculations |
| Data.Title | string | Article title |
| Data.Byline | string | Article author |
| Data.Dir | string | Article text direction |
| Data.Lang | string | Article language |
| Data.Content | string | Article content |
| Data.TextContent | string | Article content (excluding HTML tags, split by paragraphs) |
| Data.Length | int | Article length |
| Data.Excerpt | string | Article summary |
| Data.SiteName | string | Website name |
| Data.PublishedTime | string[] | Article publication time |
5. Interface HTTP response standard status code
| Status code | Status code explanation | Remarks |
| :----: | :----------: | :---------------------------- ----------------------------------: |
| 200 | The interface responds normally | See the business status code below Interface custom status code |
| 403 | Request frequency exceeds limit | The CDN layer intelligently determines the frequency of IP requests. General high-frequency requests will not trigger this status code |
6. Interface response status code
| | | |
| :----------: | :---------------: | :------------------ --------------------------: |
| Custom status code | Custom status code explanation | Remarks |
| 200 | Normal return | |
| 400 | Parameter error | |
| 402 | APPKEY error | Please check whether the passed APPKEY is the value obtained from the Developer Center |
| 403 | Account in arrears | Please pay attention to the order expiration SMS reminder in time |
| 429 | Request frequency limited | Cannot exceed 100 requests per second |
| 500 | Interface response error | |
7. Development language request sample code
The development languages included in the sample code are: C#, Go, Java, jQuery, Node.js, Objective-C, PHP, Python, Ruby, Swift, etc. Other languages can implement corresponding RESTful API requests.
8. Frequently Asked Questions Q&A
- Q: Is data request cached?
A: All data is returned directly, and some periodic data is cached during the update cycle.
- Q: How to ensure the security of keys during requests?
A: It is generally recommended that requests to our API be placed in the back-end service of your application. All front-end requests of your application should be directed to your own back-end service. This architecture is also purer and easier to maintain.
- Q: What development languages can the interface be used for?
A: It can be used in all development languages that can make network requests, and can be used to quickly build data for your project.
- Q: Can the performance of the interface be guaranteed?
A: The interface backend architecture is consistent with the commercial project architecture we provide to enterprises. You can view the interface-related return performance and information by accessing the test interface.
Gugu Data, a professional data provider, provides professional and comprehensive data interfaces and business data analysis, making data your production raw material.
Based on the hundreds of billions of data storage and performance optimization and related massive basic data support we have provided to enterprise customers over the past seven years, Gugu Data abstracts some compliant general data and general functions into product-level data APIs, which greatly satisfies users' needs in products. The demand for basic data during the development process also reduces the storage and operation and maintenance costs of massive data, as well as the technical threshold and human development costs of complex functions.
In addition to the classified data and functional interfaces we have opened, there is also a massive amount of data that is being sorted, cleaned, integrated, and constructed. More data and cloud functional interface APIs will be opened for users to use in the future.
Currently open data interface API
- [Barcode tool] Universal QR code generation
- [Barcode Tool] Wi-Fi wireless network QR code generation
- [Barcode tool] Universal barcode generation
- [Image Recognition] Universal File Stream OCR to Text
- [Image Recognition] Universal OCR
- [Image Recognition] Universal Image OCR to Word
- [Image recognition] HTML to PDF
- [Image recognition] HTML to Word
- [Image recognition] Markdown to PDF
- [Image recognition] PDF parsing and formatting output
- [Area/Coordinates] Basic information on universities and colleges across the country
- [Area/Coordinates] Geographic coordinates inverse encoding
- [Area/Coordinates] IP address location
- [Region/Coordinates] National province, city, and street area information
- [Area/Coordinates] Geographic coordinate system conversion
- [Metadata/Dictionary] Provincial college entrance examination admission scores over the years
- [Metadata/Dictionary] Admission scores for college entrance examinations over the years
- [Metadata/Dictionary] Admission scores for majors in college entrance examinations over the years
- [Metadata/Dictionary] National University Major Data
- [News/Information] Software Development Technology Blog Headlines
- [News/Information] Get the text of any linked article
- [News/Information] Public account headline article
- [News/Information] Get the text image of any link
- [News/Information] Get the cover of the public account article
- [News/Information] Collection of humorous jokes
- [SMS/Voice] Mobile Phone Attribution Query
- [SMS/Voice] International mobile phone number check and correction
- [Text/Text] Chinese text segmentation
- [Text/Text] Chinese and English typesetting standardization
- [Text/Text] Millions of Chinese couplet data
- [Text/Text] International Standard Book Number ISBN
- [Text/Text] Simplified and Traditional Chinese conversion
- [Text/Text] Complete Collection of Tang Poems and Song Ci
- [Text/Text] Intelligent extraction of keyword summary
- [Text/Text] Text semantic similarity detection
- [Text/Text] NLP Chinese Intelligent Error Correction
- [Text/Text] Artificial intelligence couplet generation
- [Text/Text] NLP language detection
- [Weather/Air Quality] National Weather Forecast Information
- [Weather/Air Quality] National Real-time Air Quality Index
- [Weather/Air Quality] Sunrise and sunset times
- [Weather/Air Quality] Lunar Calendar and Twenty-Four Solar Terms
- [Website Tools] Get any site title and icon
- [Stock Quotes] US stock real-time market data
- [Stock Quotation] US Stock Historical Quotation Data
- [Stock Quotes] US stock time-sharing trading data
- [Stock Quotes] Basic financial data of US stocks over the years
- [Stock Quotation] Hong Kong Stock Real-time Quotation Data
- [Stock Quotation] Hong Kong Stock Historical Quotation Data
- [Stock Quotes] Hong Kong stock time-sharing trading data
- [Stock Quotes] Hong Kong Stock Listed Company Announcement
- [Stock Quotes] Three major financial statements of Hong Kong stocks over the years
- [Stock Quotation] A-share real-time market data
- [Stock Quotes] A-share historical market data
- [Stock Quotation] A-share time-sharing trading data
- [Stock Quotes] Three major financial statements of A shares over the years
- [Stock Quotes] China Stock Index Data
- [Stock Quotes] A-share stock information query
- [Stock Quotes] A-share financial indicators over the years
- [Stock Quotes] A-Share Index Component Data
- [Stock Quotation] A-Share Index Historical Data
- [Stock Quotes] A-share pre-market data
- [Stock Quotation] A share transaction data
- [Stock Quotes] A-share trading calendar
- [Stock Quotation] Options real-time market data
- [Stock Quotes] Fund Basic Information List
- [Stock Quotes] A-share stock code
- [Stock Quotes] Index Fund Basic Information
- [Stock Quotes] Open-end fund net value real-time data
- [Stock Quotes] Open-end Fund Net Value Historical Data
- [Stock Quotes] Science and Technology Innovation Board Historical Quotation Data
- [Stock Quotes] US Stock Pink Sheets Real-time Quotation Data
- [Stock Quotes] Classified US stock real-time market data
- [Stock Quotes] Real-time data of public open-end funds
- [Stock Quotes] Historical data of public open-end funds
- [Stock Quotes] Exchange-traded fund real-time data
- [Stock Quotation] Historical data of exchange-traded funds
- [Stock Quotes] Exchange Trading Fund Time Sharing Quotes
- [Stock Quotes] Open-end Fund Real-time Ranking
- [Stock Quotes] Open-ended exchange-traded funds ranking
- [Stock Quotation] A-Share Index Time-sharing Quotation Data
- [Stock Quotes] Open-end Fund Net Value Estimation Data
- [Stock Quotation] Hong Kong Stock Index Real-time Quotation Data
- [Stock Quotation] Hong Kong Stock Index Historical Quotation Data
- [Stock Quotes] Hong Kong Stock Basic Information Data
- [Stock Quotes] A-share capital flow ranking
- [Stock Quotes] A-share capital flow
- [Stock Quotes] A-share trading data
- [Stock Quotes] International Currency Exchange Rate
- [Sports/Competition] Olympic competition data over the years
- [Website Tools] Web page readable content extraction