Postman Scraping Tutorial with Ambee API

By: Melvin He (2023)

To Web Scrape Real-Time Environmental Data from Ambee API using Postman, follow these steps:

Step 1: Sign up for an account on Ambee (https://www.ambeedata.com/signup) and get an API key.

Step 2: Choose the dataset you want (air pollution, weather, fire, NDVI, etc)

Step 3: Choose the endpoint & API datasetyou want to scrape data from. 

You can choose endpoints by countryCode, postalCode, lattitude & longitude, etc.

Step 4: Construct API request URL by appending the endpoint path to the Ambee API base URL: https://api.ambeedata.com/ + endpoint path.

Ex) https://api.ambeedata.com/latest/by-lat-lng?lat=12&lng=77

Step 5: Download Postman and set the necessary headers in your request, including your API key, the content type (usually application/json), and any other required parameters for the endpoint you're using.

Step 6: Send a GET request to the API endpoint URL using your preferred programming language or tool. You can use libraries like requests, urllib, or scrapy in Python to do this.

You may want to consider saving the generated Code Snippet from Postman & copying it into an IDE (Check out Additional Steps)

Step 7: Parse the response to extract the data you need. The response will typically be in JSON format, so you can use built-in functions or libraries like json in Python to parse it.

Step 8: Save the extracted data to a file or database or use it for further analysis.

Additional Steps:

Step 1: Consider the following CSV File I Compiled Onto a Public Google Sheets:

Step 2: Update Heading Row with Desired extracted data

Header Row:
City,City (ASCII),Latitude,Longitude,Country,ISO3,Population,CO,NO2,OZONE,PM10,PM25,SO2,AQI,Continent,Region,Administrative Division,Capitol of Administrative Level

Desired extracted data (in this example):

CO,NO2,OZONE,PM10,PM25,SO2

Step 3: Copy it & download it as a CSV

You typically may want to avoid spaces in file names, but here is an example in the case that you do have spaces:

ex)10000 Air Pollution Data.csv

Step 4: Feel Free to Grep the CSV to only include Regions of Your Choice

The command grep 'USA' 10000\ Air\ Pollution\ Data.csv > filtered_usa.csv is a command-line instruction that searches for the string "USA" in the file named "10000 Air Pollution Data.csv" and redirects the output to a new file called "filtered_usa.csv". The > symbol is used to redirect the output of the grep command to a new file. In this case, the output of the grep command (i.e., all lines in "10000 Air Pollution Data.csv" that contain the string "USA") will be saved to a new file called "filtered_usa.csv".

So, the resulting "filtered_usa.csv" file will contain all the lines from "10000 Air Pollution Data.csv" that contain the string "USA". This can be useful if you want to filter out only the data related to the USA from a large dataset.

grep 'USA' 10000\ Air\ Pollution\ Data.csv > filtered_usa.csv

Remember to add your heading row back if you've grepped.

Step 5: Download and Open an IDE such as VSCode

Step 6: Create an python script (Try out something similar to the following)

import pandas as pdimport jsonimport requestsimport math
payload={}headers = {    'x-api-key': 'REPLACE WITH YOUR API KEY'}
df = pd.read_csv("/Users/<INSERT USER>/Desktop/CSCI 1951T/Air pollution CSV/filtered_usa.csv", header = 0)
print(df.head(1))indx = 0for i in df.index:        # Runs API calls 1000 times    if indx > 1000:        break        lat = df.at[i, "Latitude"]    long = df.at[i, "Longitude"]    url = "https://api.ambeedata.com/latest/by-lat-lng?lat=" + str(lat) + "&lng=" + str(long)    response = requests.request("GET", url, headers=headers, data=payload)    try:        response = json.loads(response.text)    except:        indx += 1        print("Made it here (NULL):" + str(indx))        df.at[i, "CO"] = "NULL"        df.at[i, "NO2"] = "NULL"        df.at[i, "OZONE"] = "NULL"        df.at[i, "PM10"] = "NULL"        df.at[i, "PM25"] = "NULL"        df.at[i, "SO2"] = "NULL"        df.at[i, "AQI"] = "NULL"        continue    if response["message"] == "Data not available!":        indx += 1        print("Made it here (NULL):" + str(indx))        df.at[i, "CO"] = "NULL"        df.at[i, "NO2"] = "NULL"        df.at[i, "OZONE"] = "NULL"        df.at[i, "PM10"] = "NULL"        df.at[i, "PM25"] = "NULL"        df.at[i, "SO2"] = "NULL"        df.at[i, "AQI"] = "NULL"        continue    df.at[i, "CO"] = response["stations"][0]["CO"]    df.at[i, "NO2"] = response["stations"][0]["NO2"]    df.at[i, "OZONE"] = response["stations"][0]["OZONE"]    df.at[i, "PM10"] = response["stations"][0]["PM10"]    df.at[i, "PM25"] = response["stations"][0]["PM25"]    df.at[i, "SO2"] = response["stations"][0]["SO2"]    df.at[i, "AQI"] = response["stations"][0]["AQI"]    print("Made it here:" + str(indx))    indx+=1

df.to_csv("USA_pollution.csv")

Step 7: Feel Free to Customize Parameters to Correspond with Type of Data Scraped