Getting TAP (Table Access Protocol) data into Pandas DataFrame format

Since there is very little documentation on this, this makes for a perfect post (especially for people who are new to working with astronomical data). The TAP protocol was set by the International Virtual Observatory Alliance and PyVO is a Python library that interfaces with sites that use this protocol, such as the NASA Exoplanet Archive.

Getting Started

Create a virtual environment if you have not already and activate it for your shell (bash is used here):

python -m venv .env
source .env/bin/activate

Install PyVO and Pandas libraries

pip3 install pyvo pandas -Uq

Create a new python file tap.py and import both pandas and pyvo.

Code

Have the documentation for your data source ready. You need to find the TAP query URL for your data source and know which table from your data source you want to use. My examples will be using the NASA Exoplanet Archive (they have great docs!) but adapt this to your needs.
Create a new TAPService with the TAP query URL

service = pyvo.dal.TAPService("Insert TAP URL Here")

Select the data you want (specifically the table and columns)

results = service.search("SELECT * from table_name")

This is using ADQL, a variant of SQL for astronomical data. You can replace the asterisk * with the column names of your choice or leave it there to get all the columns in your table.

Convert it into a Pandas DataFrame

table = results.to_table()
df = table.to_pandas(index=True)

Full example for a table from the Exoplanet Archive

import pyvo
import pandas as pd
service = pyvo.dal.TAPService("https://exoplanetarchive.ipac.caltech.edu/TAP")
results = service.search("SELECT * FROM pscomppars")
table = results.to_table()
df = table.to_pandas(index=True)

If you are running this multiple times (say in a Jupyter Notebook), I would implement a very simple caching method like the following which has the added benefit of being able to see in other programs such as Excel, Google Sheets, and LibreOffice Calc.

try:
    df = pd.read_csv("./data/planets.csv")  #
except FileNotFoundError:
    service = TAPService("https://exoplanetarchive.ipac.caltech.edu/TAP")
    results = service.search("SELECT * FROM pscomppars")
    df: pd.DataFrame = results.to_table().to_pandas(index=True)
    # df.drop(columns=df.columns[0], axis=1, inplace=True)
    df.to_csv(
        "./data/planets.csv",
    )  # cache

For those who are curious about what is going on in the background, here is my current understanding: I am pretty sure that PyVO is just using the requests library to fetch the data in a VOTable format as a that the astropy library reads and convert into a pandas DataFrame

Thank you for reading.

Getting Started#

Code#

Getting Started

Code