Base modules


class entrezpy.base.query.EutilsQuery(eutil, tool, email, apikey=None, apikey_var=None, threads=None, qid=None)

EutilsQuery implements the base class for all entrezpy queries to E-Utils. It handles the information required by every query, e.g. base query url, email address, allowed requests per second, apikey, etc. It declares the virtual method inquire() which needs to be implemented by every request since they differ among queries.

An NCBI API key will bet set as follows:

  • passed as argument during initialization
  • check enviromental variable passed as argument
  • check enviromental variable NCBI_API_KEY

Upon initalization, following parameters are set:

  • set unique query id
  • check for / set NCBI apikey
  • initialize entrezpy.requester.requester.Requester with allowed requests per second
  • assemble Eutil url for desire EUtils function
  • initialize Multithreading queue and register query at entrezpy.base.monitor.QueryMonitor for logging

Multithreading is handled using the nested classes entrezpy.base.query.EutilsQuery.RequestPool and entrezpy.base.query.EutilsQuery.ThreadedRequester.

Inits EutilsQuery instance with eutil, toolname, email, apikey, apikey_envar, threads and qid.

  • eutil (str) – name of eutil function on EUtils server
  • tool (str) – tool name
  • email (str) – user email
  • apikey (str) – NCBI apikey
  • apikey_var (str) – enviroment variable storing NCBI apikey
  • threads (int) – set threads for multithreading
  • qid (str) – unique query id
  • id – unique query id
  • base_url – unique query id
  • requests_per_sec (int) – default limit of requests/sec (set by NCBI)
  • max_requests_per_sec (int) – max.requests/sec with apikeyby (set NCBI)
  • url (str) – full URL for Eutil function
  • contact (str) – user email (required by NCBI)
  • tool (str) – tool name (required by NCBI)
  • apikey (str) – NCBI apikey
  • num_threads (int) – number of threads to use
  • failed_requests (list) – store failed requests for analysis if desired
  • request_poolentrezpy.base.query.EutilsQuery.RequestPool instance
  • request_counter (int) – requests counter for a EutilsQuery instance
base_url = ''

Base url for all Eutil request

inquire(parameter, analyzer)

Virtual function starting query. Each query requires its own implementation.

  • parameter (dict) – E-Utilities parameters
  • analzyer (entrezpy.base.analyzer.EutilsAnalzyer) – query response analyzer


Return type:



Virtual function testing and handling failed requests. These requests fail due to HTTP/URL issues and stored entrezpy.base.query.EutilsQuery.failed_requests

check_ncbi_apikey(apikey=None, env_var=None)

Checks and sets NCBI apikey.

  • apikey (str) – NCBI apikey
  • env_var (str) – enviromental variable storing NCBI apikey

Prepares request for sending to E-Utilities with require quey attributes.

Parameters:request (entrezpy.base.request.EutilsRequest) – entrezpy request instance
Returns:request instance with EUtils parameters
Return type:entrezpy.base.request.EutilsRequest
add_request(request, analyzer)

Adds one request and corresponding analyzer to the request pool.


Starts query monitoring

Parameters:query_parameters (entrezpy.base.parameter.EutilsParameter) – query parameters

Stops query monitoring


Updates query monitoring parameters if follow up requests are required.

Parameters:updated_query_parameters (entrezpy.base.parameter.EutilsParameter) – updated query parameters

Reports if at least one request failed.


Dump all attributes


Tests for request errors



class entrezpy.base.parameter.EutilsParameter(parameter=None)

EutilsParameter set and check parameters for each query. EutilsParameter is populated from a dictionary with valid E-Utilities parameters for the corresponding query. It declares virtual functions where necessary.

Simple helper functions are presented to test the common parameters db, WebEnv, query_key and usehistory.


usehistory is the parameter used for Entrez history queries and is set to True (use it) by default. It can be set to False to ommit history server use.

haveExpectedRequests() tests if the of the number of requests has been calculated.

The virtual methods check() and dump() need thrir own implementation since they can vary between queries.


check() is expected to run after all parameters have been set.


parameter (dict) – Eutils query parameters

  • db (str) – Entrez database name
  • webenv (str) – WebEnv
  • querykey (int) – querykey
  • expected_request (int) – number of expected request for the query
  • doseq (bool) – use id= parameter for each uid in POST

Check for required db parameter

Return type:bool

Check for required WebEnv parameter

Return type:bool

Check for required QueryKey parameter

Return type:bool

Check if history server should be used.

Return type:bool

Check fo expected requests. Hints an error if no requests are expected.

Return type:bool

Virtual function to run a check before starting the query. This is a crucial step and should abort upon failing.

Raises:NotImplementedError – if not implemented

Dump instance attributes

Return type:dict
Raises:NotImplementedError – if not implemented


class entrezpy.base.request.EutilsRequest(eutil, db)

EutilsRequest is the base class for requests from entrezpy.base.query.EutilsQuery.

EutilsRequests instantiate in entrezpy.base.query.EutilsQuery.inquire() before being added to the request pool by entrezpy.base.query.EutilsQuery.add_request(). Each EutilsRequest triggers an answer at the NCBI Entrez servers if no connection errors occure.

EutilsRequest stores the required information for POST requests. Its status can be queried from outside by entrezpy.base.request.EutilsRequest.get_observation(). EutilsRequest instances store information not present in the server response and is required by entrezpy.base.analyzer.EutilsAnalyzer to parse responses and errors correctly. Several instance attributes are not required for a POST request but help debugging.

Each request is automatoically assigned an id to identify and trace requests using the query id and request id.

  • eutil (str) – eutil function for this request, e.g. efetch.fcgi
  • db (str) – database for request

Initializes a new request with initial attributes as part of a query in entrezpy.base.query.EutilsQuery.

  • tool (str) – tool name to which this request belongs
  • url (str) – full Eutil url
  • contact (str) – use email
  • apikey (str) – NBCI apikey
  • query_id (str) – entrezpy.base.query.EutilsQuery.query_id which initiated this request
  • status (int) – request status : 0->success, 1->Fail,2->Queued
  • size (int) – size of request, e.g. number of UIDs
  • start_time (float) – start time of request in seconds since epoch
  • duration – duration for this request in seconds
  • doseq – set doseq parameter in entrezpy.request.Request.request()


status is work in progress.


Virtual function returning the POST parameters for the request from required attributes.

Return type:dict

Returns instance attributes required for every POST request.

Parameters:extend (dict) – parameters extending basic parameters
Returns:base parameters for POST request
Return type:dict

Set status if request succeeded


Set status if request failed

report_status(processed_requests=None, expected_requests=None)

Reports request status when triggered

Returns:full request id
Return type:str

Sets request error and HTTP/URL error message

Parameters:error (str) – HTTP/URL error

Starts time to measure request duration.


Calculates request duration


Dumps internal attributes for request.

Parameters:extend (dict) – extend dump with additional information


class entrezpy.base.analyzer.EutilsAnalyzer

EutilsAnalyzer is the base class for an entrezpy analyzer. It prepares the response based on the requested format and checks for E-Utilities errors. The function parse() is invoked after every request by the corresponding query class, e.g. Esearcher. This allows analyzing data as it arrives without waiting until larger queries have been fetched. This approach allows implementing analyzers which can store already downloaded data to establish checkpoints or trigger other actions based on the received data.

Two virtual classes are the core and need their own implementation to support specific queries:


Responses from NCBI are not very well documented and functions will be extended as new errors are encountered.

Inits EutilsAnalyzer with unknown type of result yet. The result needs to be set upon receiving the first response by init_result().

  • hasErrorResponse (bool) – flag indicating error in response
  • result – result instance
known_fmts = {'json', 'text', 'xml'}

Store formats known to EutilsAnalzyer

init_result(response, request)

Virtual function to initialize result instance. This allows to set attributes from the first response and request.

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
analyze_error(response, request)

Virtual function to handle error responses

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
analyze_result(response, request)

Virtual function to handle responses, i.e. parsing them and prepare them for entrezpy.base.result.EutilsResult

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
parse(raw_response, request)

Check for errors and calls parser for the raw response.


NotImplementedError – if request format is not in EutilsAnalyzer.known_fmts

convert_response(raw_response_decoded, request)

Converts raw_response into the expected format, deduced from request and set via the retmode parameter.


response in parseable format

Return type:

dict or io.stringIO

Using threads without locks randomly ‘looses’ the response, i.e. the raw response is emptied between requests. With locks, it works, but threading is not much faster than non-threading. It seems JSON is more prone to this than XML.
isErrorResponse(response, request)

Checking for error messages in response from Entrez Servers and set flag hasErrorResponse.


error status

Return type:



Checks for errors in XML responses

Parameters:response (io.stringIO) – XML response
Returns:if XML response has error message
Return type:bool

Checks for errors in JSON responses. Not unified among Eutil functions.

Parameters:response (dict) – reponse
Returns:status if JSON response has error message
Return type:bool

Test if response has errors

Return type:bool

Return result

Returns:result instance
Return type:entrezpy.base.result.EutilsResult

Return follow-up parameters if available

Returns:Follow-up parameters
Return type:dict

Test for empty result

Return type:bool


class entrezpy.base.result.EutilsResult(function, qid, db, webenv=None, querykey=None)

EutilsResult is the base class for an entrezpy result. It sets the required result attributes common for all result and declares virtual functions to interact with other entrezpy classes. Empty results are successful results since no query error has been received. entrezpy.base.result.EutilsResult.size() is important to

  • determine if and how many follow-up requests are required
  • if it’s an empty result
  • function (string) – EUtil function of the result
  • qid (string) – query id
  • db (string) – Entrez database name for result
  • webenv (string) – WebEnv of response
  • querykey (int) – querykey of response

Returns result size in the corresponding ResultSize unit

Return type:int
Raises:NotImplementedError – if implementation is missing

Dumps all instance attributes

Return type:dict
Raises:NotImplementedError – if implementation is missing

Assembles parameters for automated follow-ups. Use the query key from the first request by default.

Parameters:reqnum (int) – request number for which query_key should be returned
Returns:EUtils parameters
Return type:dict
Raises:NotImplementedError – if implementation is missing

Indicates empty result.

Return type:bool
Raises:NotImplementedError – if implementation is missing