openFDA is designed primarily for real-time queries. Using combinations of the skip
/limit
parameters you can page through a result set that has up to 26,000
hits.
This limit is in place to protect openFDA infrastructure and is sufficient in most cases;
however, sometimes it is desirable to navigate through a result set that exceeds 26,000
search matches.
If you are unable to narrow your search criteria to decrease the number of hits, consider the following strategies
to obtain data from large result sets:
Download the entire dataset in JSON format and use tools such as jq to extract the results you need or write custom code to do so.
Use the “Search After” feature that permits scrolling through a result set of unlimited size, up to the size of the dataset itself. The following are the basic steps you need to follow:
Execute your initial query that produces a large number of matches. Make sure not to include the
skip
parameter, because skip
and search_after
do not work together (technical explanation is here).
The initial query will return your first page of data. For example:
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.product_type.exact:%22HUMAN%20PRESCRIPTION%20DRUG%22&limit=100&sort=receivedate:asc
Extract the Link
HTTP header contained in the response. Detailed information about the purpose and structure
of the Link
header can be found here.
In short, this header will contain a rel="Next"
URL representing the query you need to use to obtain the next page of data. Missing
header indicates you are already on the last page. An example of extracting the header value using curl
:
bash-3.2$ curl -sIg 'https://api.fda.gov/drug/event.json?search=patient.drug.openfda.product_type.exact:%22HUMAN%20PRESCRIPTION%20DRUG%22&limit=100&sort=receivedate:asc' | grep "Link: "
Link: <https://api.fda.gov/drug/event.json?search=patient.drug.openfda.product_type.exact%3A%22HUMAN%20PRESCRIPTION%20DRUG%22&limit=100&sort=receivedate%3Aasc&skip=0&search_after=0%3D1068076800000%3B1%3Dsafetyreport%25234022687>; rel="next"
search_after
query parameter.Link
header is no longer present in the response, which indicates you are on the last page.