Scraping/Crawling/Parsing HTML in ReactNative
Recently while building an app in ReactNative I encountered a scenario/feature where I needed to parse a whole website to feed data to my app . When this requirement came to me , the first thing which came to my mind was to use X-Path because I had used this in my college days , but soon realised that it it tedious for my task .
Soon I started searching on internet for a easy, reliable and efficient way to implement this , but to my surprise there were very few to no comprehensive solution around there , specially for the kind of level of complexity of parsing/scraping I was looking at . I tried and tested several packages and many ways to do it , but the one I finally found working efficiently for me is Cheerio . This react package is a reimplementation of j-Query's core features .
Now that I found the library , I thought few more minutes I will have my work done .. But , wait it wasn't so easy at least for me with the kind of limited documentation and examples I found with this .
So lets right away get into the problem solving part now , we would divide it in 3 high level category .
- Fetching HTML content from server .
- Extracting the data from that HTML .
- Link to sample react native project .
Fetching HTML content from server
For this step here , we will be using github.com . So the first step should be fetch the data which this page returns , for this purpose we will be using another library called as Axios . You can use fetch or any other library which can do a http server call .
Sample Code :-
scrapeHTMLContent = (url) => {
const axios = require('axios');
return new Promise((resolve, reject) => {
axios.get(url).then((response) => {
if (response.status === 200) resolve(response.data);
else reject(response.data);
});
});
}
Here the parameter of the function, is the url from which you want to scrape the content .
This function send a request to the URL , gets the data from html page , using axios. The url which we will use for our example is
https://www.amazon.in/?ref=gw_intl_in&pf_rd_p=4c740c5e-7140-40f3-8386-f749c0cf29ca&pf_rd_r=2HTAXTAEYHKAA1QJRFEV
The first data we will be fetching is title and the next one would be footer tag inside a div tag , all these are present as the response of our html link we are using for demo . A quick way to check that is open the above link in browser and right click to view source . Here is the code which does the reading of data from HTML .
Sample Code :-
This is a very simple code snippet and each line is self explanatory . First of all we are loading the whole html data which we got from above step , next is to find the tag , here first we want to search for tag 'title' and next is we want to search for 'footer' tag which is a child of 'div'.
There are few other features available with Cheerio , please refer to the documentation here for further details .
As mentioned in the beginning of this article , here is the link to sample react native project .
This project is tested on iOS , but should work fine on Android as well .
If you have any questions, comments, queries , please feel free to reach out to me or post your queries in comments section .
Stay tuned for more interesting articles . Please write your suggestions , any topic you want me to write about on abhinandansahgal@gmail.com.
This function send a request to the URL , gets the data from html page , using axios. The url which we will use for our example is
https://www.amazon.in/?ref=gw_intl_in&pf_rd_p=4c740c5e-7140-40f3-8386-f749c0cf29ca&pf_rd_r=2HTAXTAEYHKAA1QJRFEV
Extracting the data from HTML
This is second and probably the most important step in this complete process. We will be identifying few key fields from the above link and try to get those values .
Note:- While working with Cheerio , you will see terms like Nodes,Attributes etc . Each tags(html tag to be specific) here is represented as Nodes ,and attributes are identified as metadata to that node .
The first data we will be fetching is title and the next one would be footer tag inside a div tag , all these are present as the response of our html link we are using for demo . A quick way to check that is open the above link in browser and right click to view source . Here is the code which does the reading of data from HTML .
Sample Code :-
readData = (data) => {
const $ = cheerio.load(data);
const mainContent = $('title', '');
console.log('Main content is ',mainContent);
const footer = $('div', 'footer');
console.log('Footer content is ',footer);
}
This is a very simple code snippet and each line is self explanatory . First of all we are loading the whole html data which we got from above step , next is to find the tag , here first we want to search for tag 'title' and next is we want to search for 'footer' tag which is a child of 'div'.
There are few other features available with Cheerio , please refer to the documentation here for further details .
Link to sample react-native project:-
As mentioned in the beginning of this article , here is the link to sample react native project .
This project is tested on iOS , but should work fine on Android as well .
If you have any questions, comments, queries , please feel free to reach out to me or post your queries in comments section .
Stay tuned for more interesting articles . Please write your suggestions , any topic you want me to write about on abhinandansahgal@gmail.com.
Comments
Post a Comment