Base modules

Query

class entrezpy.base.query.EutilsQuery(eutil, tool, email, apikey=None, apikey_var=None, threads=None, qid=None)

EutilsQuery implements the base class for all entrezpy queries to E-Utils. It handles the information required by every query, e.g. base query url, email address, allowed requests per second, apikey, etc. It declares the virtual method inquire() which needs to be implemented by every request since they differ among queries.

An NCBI API key will bet set as follows:

  • passed as argument during initialization
  • check enviromental variable passed as argument
  • check enviromental variable NCBI_API_KEY

Upon initalization, following parameters are set:

  • set unique query id
  • check for / set NCBI apikey
  • initialize entrezpy.requester.requester.Requester with allowed requests per second
  • assemble Eutil url for desire EUtils function
  • initialize Multithreading queue and register query at entrezpy.base.monitor.QueryMonitor for logging

Multithreading is handled using the nested classes entrezpy.base.query.EutilsQuery.RequestPool and entrezpy.base.query.EutilsQuery.ThreadedRequester.

Inits EutilsQuery instance with eutil, toolname, email, apikey, apikey_envar, threads and qid.

Parameters:
  • eutil (str) – name of eutil function on EUtils server
  • tool (str) – tool name
  • email (str) – user email
  • apikey (str) – NCBI apikey
  • apikey_var (str) – enviroment variable storing NCBI apikey
  • threads (int) – set threads for multithreading
  • qid (str) – unique query id
Variables:
  • id – unique query id
  • base_url – unique query id
  • requests_per_sec (int) – default limit of requests/sec (set by NCBI)
  • max_requests_per_sec (int) – max.requests/sec with apikeyby (set NCBI)
  • url (str) – full URL for Eutil function
  • contact (str) – user email (required by NCBI)
  • tool (str) – tool name (required by NCBI)
  • apikey (str) – NCBI apikey
  • num_threads (int) – number of threads to use
  • failed_requests (list) – store failed requests for analysis if desired
  • request_poolentrezpy.base.query.EutilsQuery.RequestPool instance
  • request_counter (int) – requests counter for a EutilsQuery instance
base_url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'

Base url for all Eutil request

inquire(parameter, analyzer)

Virtual function starting query. Each query requires its own implementation.

Parameters:
  • parameter (dict) – E-Utilities parameters
  • analzyer (entrezpy.base.analyzer.EutilsAnalzyer) – query response analyzer
Returns:

analyzer

Return type:

entrezpy.base.analyzer.EutilsAnalzyer

check_requests()

Virtual function testing and handling failed requests. These requests fail due to HTTP/URL issues and stored entrezpy.base.query.EutilsQuery.failed_requests

check_ncbi_apikey(apikey=None, env_var=None)

Checks and sets NCBI apikey.

Parameters:
  • apikey (str) – NCBI apikey
  • env_var (str) – enviromental variable storing NCBI apikey
prepare_request(request)

Prepares request for sending to E-Utilities with require quey attributes.

Parameters:request (entrezpy.base.request.EutilsRequest) – entrezpy request instance
Returns:request instance with EUtils parameters
Return type:entrezpy.base.request.EutilsRequest
add_request(request, analyzer)

Adds one request and corresponding analyzer to the request pool.

Parameters:
monitor_start(query_parameters)

Starts query monitoring

Parameters:query_parameters (entrezpy.base.parameter.EutilsParameter) – query parameters
monitor_stop()

Stops query monitoring

monitor_update(updated_query_parameters)

Updates query monitoring parameters if follow up requests are required.

Parameters:updated_query_parameters (entrezpy.base.parameter.EutilsParameter) – updated query parameters
hasFailedRequests()

Reports if at least one request failed.

dump()

Dump all attributes

isGoodQuery()

Tests for request errors

rtype:bool

Parameter

class entrezpy.base.parameter.EutilsParameter(parameter=None)

EutilsParameter set and check parameters for each query. EutilsParameter is populated from a dictionary with valid E-Utilities parameters for the corresponding query. It declares virtual functions where necessary.

Simple helper functions are presented to test the common parameters db, WebEnv, query_key and usehistory.

Note

usehistory is the parameter used for Entrez history queries and is set to True (use it) by default. It can be set to False to ommit history server use.

haveExpectedRequests() tests if the of the number of requests has been calculated.

The virtual methods check() and dump() need thrir own implementation since they can vary between queries.

Warning

check() is expected to run after all parameters have been set.

Parameters:

parameter (dict) – Eutils query parameters

Variables:
  • db (str) – Entrez database name
  • webenv (str) – WebEnv
  • querykey (int) – querykey
  • expected_request (int) – number of expected request for the query
  • doseq (bool) – use id= parameter for each uid in POST
haveDb()

Check for required db parameter

Return type:bool
haveWebenv()

Check for required WebEnv parameter

Return type:bool
haveQuerykey()

Check for required QueryKey parameter

Return type:bool
useHistory()

Check if history server should be used.

Return type:bool
haveExpectedRequets()

Check fo expected requests. Hints an error if no requests are expected.

Return type:bool
check()

Virtual function to run a check before starting the query. This is a crucial step and should abort upon failing.

Raises:NotImplementedError – if not implemented
dump()

Dump instance attributes

Return type:dict
Raises:NotImplementedError – if not implemented

Request

class entrezpy.base.request.EutilsRequest(eutil, db)

EutilsRequest is the base class for requests from entrezpy.base.query.EutilsQuery.

EutilsRequests instantiate in entrezpy.base.query.EutilsQuery.inquire() before being added to the request pool by entrezpy.base.query.EutilsQuery.add_request(). Each EutilsRequest triggers an answer at the NCBI Entrez servers if no connection errors occure.

EutilsRequest stores the required information for POST requests. Its status can be queried from outside by entrezpy.base.request.EutilsRequest.get_observation(). EutilsRequest instances store information not present in the server response and is required by entrezpy.base.analyzer.EutilsAnalyzer to parse responses and errors correctly. Several instance attributes are not required for a POST request but help debugging.

Each request is automatoically assigned an id to identify and trace requests using the query id and request id.

Parameters:
  • eutil (str) – eutil function for this request, e.g. efetch.fcgi
  • db (str) – database for request

Initializes a new request with initial attributes as part of a query in entrezpy.base.query.EutilsQuery.

Variables:
  • tool (str) – tool name to which this request belongs
  • url (str) – full Eutil url
  • contact (str) – use email
  • apikey (str) – NBCI apikey
  • query_id (str) – entrezpy.base.query.EutilsQuery.query_id which initiated this request
  • status (int) – request status : 0->success, 1->Fail,2->Queued
  • size (int) – size of request, e.g. number of UIDs
  • start_time (float) – start time of request in seconds since epoch
  • duration – duration for this request in seconds
  • doseq – set doseq parameter in entrezpy.request.Request.request()

Note

status is work in progress.

get_post_parameter()

Virtual function returning the POST parameters for the request from required attributes.

Return type:dict
Raises:NotImplemetedError
prepare_base_qry(extend=None)

Returns instance attributes required for every POST request.

Parameters:extend (dict) – parameters extending basic parameters
Returns:base parameters for POST request
Return type:dict
set_status_success()

Set status if request succeeded

set_status_fail()

Set status if request failed

report_status(processed_requests=None, expected_requests=None)

Reports request status when triggered

get_request_id()
Returns:full request id
Return type:str
set_request_error(error)

Sets request error and HTTP/URL error message

Parameters:error (str) – HTTP/URL error
start_stopwatch()

Starts time to measure request duration.

calc_duration()

Calculates request duration

dump_internals(extend=None)

Dumps internal attributes for request.

Parameters:extend (dict) – extend dump with additional information

Analyzer

class entrezpy.base.analyzer.EutilsAnalyzer

EutilsAnalyzer is the base class for an entrezpy analyzer. It prepares the response based on the requested format and checks for E-Utilities errors. The function parse() is invoked after every request by the corresponding query class, e.g. Esearcher. This allows analyzing data as it arrives without waiting until larger queries have been fetched. This approach allows implementing analyzers which can store already downloaded data to establish checkpoints or trigger other actions based on the received data.

Two virtual classes are the core and need their own implementation to support specific queries:

Note

Responses from NCBI are not very well documented and functions will be extended as new errors are encountered.

Inits EutilsAnalyzer with unknown type of result yet. The result needs to be set upon receiving the first response by init_result().

Variables:
  • hasErrorResponse (bool) – flag indicating error in response
  • result – result instance
known_fmts = {'json', 'text', 'xml'}

Store formats known to EutilsAnalzyer

init_result(response, request)

Virtual function to initialize result instance. This allows to set attributes from the first response and request.

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
analyze_error(response, request)

Virtual function to handle error responses

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
analyze_result(response, request)

Virtual function to handle responses, i.e. parsing them and prepare them for entrezpy.base.result.EutilsResult

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
parse(raw_response, request)

Check for errors and calls parser for the raw response.

Parameters:
Raises:

NotImplementedError – if request format is not in EutilsAnalyzer.known_fmts

convert_response(raw_response_decoded, request)

Converts raw_response into the expected format, deduced from request and set via the retmode parameter.

Parameters:
Returns:

response in parseable format

Return type:

dict or io.stringIO

..note::
Using threads without locks randomly ‘looses’ the response, i.e. the raw response is emptied between requests. With locks, it works, but threading is not much faster than non-threading. It seems JSON is more prone to this than XML.
isErrorResponse(response, request)

Checking for error messages in response from Entrez Servers and set flag hasErrorResponse.

Parameters:
Returns:

error status

Return type:

bool

check_error_xml(response)

Checks for errors in XML responses

Parameters:response (io.stringIO) – XML response
Returns:if XML response has error message
Return type:bool
check_error_json(response)

Checks for errors in JSON responses. Not unified among Eutil functions.

Parameters:response (dict) – reponse
Returns:status if JSON response has error message
Return type:bool
isSuccess()

Test if response has errors

Return type:bool
get_result()

Return result

Returns:result instance
Return type:entrezpy.base.result.EutilsResult
follow_up()

Return follow-up parameters if available

Returns:Follow-up parameters
Return type:dict
isEmpty()

Test for empty result

Return type:bool

Result

class entrezpy.base.result.EutilsResult(function, qid, db, webenv=None, querykey=None)

EutilsResult is the base class for an entrezpy result. It sets the required result attributes common for all result and declares virtual functions to interact with other entrezpy classes. Empty results are successful results since no query error has been received. entrezpy.base.result.EutilsResult.size() is important to

  • determine if and how many follow-up requests are required
  • if it’s an empty result
Parameters:
  • function (string) – EUtil function of the result
  • qid (string) – query id
  • db (string) – Entrez database name for result
  • webenv (string) – WebEnv of response
  • querykey (int) – querykey of response
size()

Returns result size in the corresponding ResultSize unit

Return type:int
Raises:NotImplementedError – if implementation is missing
dump()

Dumps all instance attributes

Return type:dict
Raises:NotImplementedError – if implementation is missing

Assembles parameters for automated follow-ups. Use the query key from the first request by default.

Parameters:reqnum (int) – request number for which query_key should be returned
Returns:EUtils parameters
Return type:dict
Raises:NotImplementedError – if implementation is missing
isEmpty()

Indicates empty result.

Return type:bool
Raises:NotImplementedError – if implementation is missing

Monitor