Web scraping by inspecting network traffic
How to scrape data from South African E-Tender portal
Last updated
How to scrape data from South African E-Tender portal
Last updated
As discussed in previous lessons, tender notices and supporting documents issued by public bodies should be added to the central E-Tender portal run by national government. The data uploaded here is hard to work with, partly because it is incomplete - many important pieces of information are stored in attached documents rather than in the site database - and you cannot easily download it for analysis.
In this topic, we will show you how address the second of those challenges, by scraping data from the website using its own search query URLs.
There's some terminology to understand.
Application Programming Interface (API) An API is used when two software applications want to talk to each other. In this case, the API connects your web browser to the E-Tenders database. Database search queries are passed over as that part of the URL in your browser's address bar which follows the question mark.
JavaScript Object Notation (JSON) Data is passed from the database to your browser in response to an API query, in a data format known as JSON. JSON is a little bit similar to a CSV file, but with data arranged in a different format so as to allow more flexibility.
Open the E-Tender portal https://www.etenders.gov.za/ , and click browse opportunities.
Notice there are four categories to choose from, Currently Advertised, Awarded, Closed and Cancelled tenders.
What if you wanted to create a list of all currently advertised tenders which you can import to your spreadsheet software to analyse?
If you click on Currently Advertised, you should see a table appear in the middle of the page with tender information. This data is populated by an API call, and we can find the specific link by pressing F12 to open up our browser Inspect function.
You can also hover your mouse over the table, right-click and select Inspect.
In the Inspect window, click on the Network tab. Your screen should look something like this.
You may need to reload the page at this point.
Now click the Fetch/XHR button to filter the output of this screen, then click on the result. You should see the API request being sent to collect data.
Double click on the result under Name. This should start ?status=1&_= and finish with a long number that represents the last record. It will take you to the URL that’s returning data. When you open this in a new browser tab, you should see something like this.
This is the JSON output that is sent in response to the API request. Here are examples of the API requests at the time of writing.
Currently advertised
https://www.etenders.gov.za/Home/TenderOpportunities/?status=1&_=1654507040789
Awarded
https://www.etenders.gov.za/Home/TenderOpportunities/?status=2&_=1654507040789
Closed
https://www.etenders.gov.za/Home/TenderOpportunities/?status=3&_=1654507040789
Cancelled
https://www.etenders.gov.za/Home/TenderOpportunities/?status=4&_=1654507040789
Once you have the JSON data in your browser, you can save it onto your local machine., Just right click and choose Save in the menu. Save as “your_name.json”
Spreadsheet software can't read JSON files directly, however, so next you'll need to convert this data to a CSV file. Our favourite too for this conversion is https://csvjson.com/.
Click JSON to CSV then upload the file you just created, then click Convert.
After the conversion has taken place, you'll be able to select download to save the CSV file on your desktop.
Importing a CSV into Google Sheets is easy. Create a new spreadsheet and call it Currently Advertised. Now, under the File menu, choose Import, select your CSV file then Replace Current Sheet and finally Import data.
Now you have a spreadsheet with the details of all currently open tenders on the E-Tender portal, including department name, contact details and a brief description. You can do the same for closed, cancelled and awarded tenders too.