Base modules¶

Query¶

class entrezpy.base.query.EutilsQuery(eutil, tool, email, apikey=None, apikey_var=None, threads=None, qid=None)¶

EutilsQuery implements the base class for all entrezpy queries to E-Utils. It handles the information required by every query, e.g. base query url, email address, allowed requests per second, apikey, etc. It declares the virtual method inquire() which needs to be implemented by every request since they differ among queries.

An NCBI API key will bet set as follows:

passed as argument during initialization

check enviromental variable passed as argument

check enviromental variable NCBI_API_KEY

Upon initalization, following parameters are set:

set unique query id

check for / set NCBI apikey

initialize entrezpy.requester.requester.Requester with allowed requests per second

assemble Eutil url for desire EUtils function

initialize Multithreading queue and register query at entrezpy.base.monitor.QueryMonitor for logging

Multithreading is handled using the nested classes entrezpy.base.query.EutilsQuery.RequestPool and entrezpy.base.query.EutilsQuery.ThreadedRequester.

Inits EutilsQuery instance with eutil, toolname, email, apikey, apikey_envar, threads and qid.

Parameters:

eutil (str) – name of eutil function on EUtils server

tool (str) – tool name

email (str) – user email

apikey (str) – NCBI apikey

apikey_var (str) – enviroment variable storing NCBI apikey

threads (int) – set threads for multithreading

qid (str) – unique query id

Variables:

id – unique query id

base_url – unique query id

requests_per_sec (int) – default limit of requests/sec (set by NCBI)

max_requests_per_sec (int) – max.requests/sec with apikeyby (set NCBI)

url (str) – full URL for Eutil function

contact (str) – user email (required by NCBI)

tool (str) – tool name (required by NCBI)

apikey (str) – NCBI apikey

num_threads (int) – number of threads to use

failed_requests (list) – store failed requests for analysis if desired

request_pool – entrezpy.base.query.EutilsQuery.RequestPool instance

request_counter (int) – requests counter for a EutilsQuery instance

base_url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'¶

Base url for all Eutil request

inquire(parameter, analyzer)¶

Virtual function starting query. Each query requires its own implementation.

Parameters:

parameter (dict) – E-Utilities parameters

analzyer (entrezpy.base.analyzer.EutilsAnalzyer) – query response analyzer

Returns:
analyzer

Return type:
entrezpy.base.analyzer.EutilsAnalzyer

check_requests()¶

Virtual function testing and handling failed requests. These requests fail due to HTTP/URL issues and stored entrezpy.base.query.EutilsQuery.failed_requests

check_ncbi_apikey(apikey=None, env_var=None)¶

Checks and sets NCBI apikey.

Parameters:

apikey (str) – NCBI apikey

env_var (str) – enviromental variable storing NCBI apikey

prepare_request(request)¶

Prepares request for sending to E-Utilities with require quey attributes.

Parameters: request (entrezpy.base.request.EutilsRequest) – entrezpy request instance

Returns: request instance with EUtils parameters

Return type: entrezpy.base.request.EutilsRequest

add_request(request, analyzer)¶

Adds one request and corresponding analyzer to the request pool.

Parameters:

request (entrezpy.base.request.EutilsRequest) – entrezpy request instance

analzyer – entrezpy analyzer instance

monitor_start(query_parameters)¶

Starts query monitoring

Parameters: query_parameters (entrezpy.base.parameter.EutilsParameter) – query parameters

monitor_stop()¶

Stops query monitoring

monitor_update(updated_query_parameters)¶

Updates query monitoring parameters if follow up requests are required.

Parameters: updated_query_parameters (entrezpy.base.parameter.EutilsParameter) – updated query parameters

hasFailedRequests()¶

Reports if at least one request failed.

dump()¶

Dump all attributes

isGoodQuery()¶

Tests for request errors

rtype: bool

Parameter¶

class entrezpy.base.parameter.EutilsParameter(parameter=None)¶

EutilsParameter set and check parameters for each query. EutilsParameter is populated from a dictionary with valid E-Utilities parameters for the corresponding query. It declares virtual functions where necessary.

Simple helper functions are presented to test the common parameters db, WebEnv, query_key and usehistory.

Note

usehistory is the parameter used for Entrez history queries and is set to True (use it) by default. It can be set to False to ommit history server use.

haveExpectedRequests() tests if the of the number of requests has been calculated.

The virtual methods check() and dump() need thrir own implementation since they can vary between queries.

Warning

check() is expected to run after all parameters have been set.

Parameters:
parameter (dict) – Eutils query parameters

Variables:

db (str) – Entrez database name

webenv (str) – WebEnv

querykey (int) – querykey

expected_request (int) – number of expected request for the query

doseq (bool) – use id= parameter for each uid in POST

haveDb()¶

Check for required db parameter

Return type: bool

haveWebenv()¶

Check for required WebEnv parameter

Return type: bool

haveQuerykey()¶

Check for required QueryKey parameter

Return type: bool

useHistory()¶

Check if history server should be used.

Return type: bool

haveExpectedRequets()¶

Check fo expected requests. Hints an error if no requests are expected.

Return type: bool

check()¶

Virtual function to run a check before starting the query. This is a crucial step and should abort upon failing.

Raises: NotImplementedError – if not implemented

dump()¶

Dump instance attributes

Return type: dict

Raises: NotImplementedError – if not implemented

Request¶

class entrezpy.base.request.EutilsRequest(eutil, db)¶

EutilsRequest is the base class for requests from entrezpy.base.query.EutilsQuery.

EutilsRequests instantiate in entrezpy.base.query.EutilsQuery.inquire() before being added to the request pool by entrezpy.base.query.EutilsQuery.add_request(). Each EutilsRequest triggers an answer at the NCBI Entrez servers if no connection errors occure.

EutilsRequest stores the required information for POST requests. Its status can be queried from outside by entrezpy.base.request.EutilsRequest.get_observation(). EutilsRequest instances store information not present in the server response and is required by entrezpy.base.analyzer.EutilsAnalyzer to parse responses and errors correctly. Several instance attributes are not required for a POST request but help debugging.

Each request is automatoically assigned an id to identify and trace requests using the query id and request id.

Parameters:

eutil (str) – eutil function for this request, e.g. efetch.fcgi

db (str) – database for request

Initializes a new request with initial attributes as part of a query in entrezpy.base.query.EutilsQuery.

Variables:

tool (str) – tool name to which this request belongs

url (str) – full Eutil url

contact (str) – use email

apikey (str) – NBCI apikey

query_id (str) – entrezpy.base.query.EutilsQuery.query_id which initiated this request

status (int) – request status : 0->success, 1->Fail,2->Queued

size (int) – size of request, e.g. number of UIDs

start_time (float) – start time of request in seconds since epoch

duration – duration for this request in seconds

doseq – set doseq parameter in entrezpy.request.Request.request()

Note

status is work in progress.

get_post_parameter()¶

Virtual function returning the POST parameters for the request from required attributes.

Return type: dict

Raises: NotImplemetedError –

prepare_base_qry(extend=None)¶

Returns instance attributes required for every POST request.

Parameters: extend (dict) – parameters extending basic parameters

Returns: base parameters for POST request

Return type: dict

set_status_success()¶

Set status if request succeeded

set_status_fail()¶

Set status if request failed

report_status(processed_requests=None, expected_requests=None)¶

Reports request status when triggered

get_request_id()¶

Returns: full request id

Return type: str

set_request_error(error)¶

Sets request error and HTTP/URL error message

Parameters: error (str) – HTTP/URL error

start_stopwatch()¶

Starts time to measure request duration.

calc_duration()¶

Calculates request duration

dump_internals(extend=None)¶

Dumps internal attributes for request.

Parameters: extend (dict) – extend dump with additional information

Analyzer¶

class entrezpy.base.analyzer.EutilsAnalyzer¶

EutilsAnalyzer is the base class for an entrezpy analyzer. It prepares the response based on the requested format and checks for E-Utilities errors. The function parse() is invoked after every request by the corresponding query class, e.g. Esearcher. This allows analyzing data as it arrives without waiting until larger queries have been fetched. This approach allows implementing analyzers which can store already downloaded data to establish checkpoints or trigger other actions based on the received data.

Two virtual classes are the core and need their own implementation to support specific queries:

analyze_error()

analyze_result()

Note

Responses from NCBI are not very well documented and functions will be extended as new errors are encountered.

Inits EutilsAnalyzer with unknown type of result yet. The result needs to be set upon receiving the first response by init_result().

Variables:

hasErrorResponse (bool) – flag indicating error in response

result – result instance

known_fmts = {'json', 'text', 'xml'}¶

Store formats known to EutilsAnalzyer

init_result(response, request)¶

Virtual function to initialize result instance. This allows to set attributes from the first response and request.

Parameters: response (dict or io.StringIO) – converted response from convert_response()

Raises: NotImplementedError – if implementation is missing

analyze_error(response, request)¶

Virtual function to handle error responses

Parameters: response (dict or io.StringIO) – converted response from convert_response()

Raises: NotImplementedError – if implementation is missing

analyze_result(response, request)¶

Virtual function to handle responses, i.e. parsing them and prepare them for entrezpy.base.result.EutilsResult

Parameters: response (dict or io.StringIO) – converted response from convert_response()

Raises: NotImplementedError – if implementation is missing

parse(raw_response, request)¶

Check for errors and calls parser for the raw response.

Parameters:

raw_response (urllib.request.Request) – response from entrezpy.requester.requester.Requester

request (entrezpy.base.request.EutilsRequest) – query request

Raises:
NotImplementedError – if request format is not in EutilsAnalyzer.known_fmts

convert_response(raw_response_decoded, request)¶

Converts raw_response into the expected format, deduced from request and set via the retmode parameter.

Parameters:

raw_response (urllib.request.Request) – response entrezpy.requester.requester.Requester

request (entrezpy.base.request.EutilsRequest) – query request

Returns:
response in parseable format

Return type:
dict or io.stringIO

..note::

Using threads without locks randomly ‘looses’ the response, i.e. the raw response is emptied between requests. With locks, it works, but threading is not much faster than non-threading. It seems JSON is more prone to this than XML.

isErrorResponse(response, request)¶

Checking for error messages in response from Entrez Servers and set flag hasErrorResponse.

Parameters:

response (dict or io.stringIO) – parseable response from convert_response()

request (entrezpy.base.request.EutilsRequest) – query request

Returns:
error status

Return type:
bool

check_error_xml(response)¶

Checks for errors in XML responses

Parameters: response (io.stringIO) – XML response

Returns: if XML response has error message

Return type: bool

check_error_json(response)¶

Checks for errors in JSON responses. Not unified among Eutil functions.

Parameters: response (dict) – reponse

Returns: status if JSON response has error message

Return type: bool

isSuccess()¶

Test if response has errors

Return type: bool

get_result()¶

Return result

Returns: result instance

Return type: entrezpy.base.result.EutilsResult

follow_up()¶

Return follow-up parameters if available

Returns: Follow-up parameters

Return type: dict

isEmpty()¶

Test for empty result

Return type: bool

Result¶

class entrezpy.base.result.EutilsResult(function, qid, db, webenv=None, querykey=None)¶

EutilsResult is the base class for an entrezpy result. It sets the required result attributes common for all result and declares virtual functions to interact with other entrezpy classes. Empty results are successful results since no query error has been received. entrezpy.base.result.EutilsResult.size() is important to

determine if and how many follow-up requests are required

if it’s an empty result

Parameters:

function (string) – EUtil function of the result

qid (string) – query id

db (string) – Entrez database name for result

webenv (string) – WebEnv of response

querykey (int) – querykey of response

size()¶

Returns result size in the corresponding ResultSize unit

Return type: int

Raises: NotImplementedError – if implementation is missing

dump()¶

Dumps all instance attributes

Return type: dict

Raises: NotImplementedError – if implementation is missing

get_link_parameter(reqnum=0)¶

Assembles parameters for automated follow-ups. Use the query key from the first request by default.

Parameters: reqnum (int) – request number for which query_key should be returned

Returns: EUtils parameters

Return type: dict

Raises: NotImplementedError – if implementation is missing

isEmpty()¶

Indicates empty result.

Return type: bool

Raises: NotImplementedError – if implementation is missing

Monitor¶