Base modules¶
Query¶
- class
entrezpy.base.query.
EutilsQuery
(eutil, tool, email, apikey=None, apikey_var=None, threads=None, qid=None)¶EutilsQuery implements the base class for all entrezpy queries to E-Utils. It handles the information required by every query, e.g. base query url, email address, allowed requests per second, apikey, etc. It declares the virtual method
inquire()
which needs to be implemented by every request since they differ among queries.An NCBI API key will bet set as follows:
- passed as argument during initialization
- check enviromental variable passed as argument
- check enviromental variable NCBI_API_KEY
Upon initalization, following parameters are set:
- set unique query id
- check for / set NCBI apikey
- initialize
entrezpy.requester.requester.Requester
with allowed requests per second- assemble Eutil url for desire EUtils function
- initialize Multithreading queue and register query at
entrezpy.base.monitor.QueryMonitor
for loggingMultithreading is handled using the nested classes
entrezpy.base.query.EutilsQuery.RequestPool
andentrezpy.base.query.EutilsQuery.ThreadedRequester
.Inits EutilsQuery instance with eutil, toolname, email, apikey, apikey_envar, threads and qid.
Parameters:
- eutil (str) – name of eutil function on EUtils server
- tool (str) – tool name
- email (str) – user email
- apikey (str) – NCBI apikey
- apikey_var (str) – enviroment variable storing NCBI apikey
- threads (int) – set threads for multithreading
- qid (str) – unique query id
Variables:
- id – unique query id
- base_url – unique query id
- requests_per_sec (int) – default limit of requests/sec (set by NCBI)
- max_requests_per_sec (int) – max.requests/sec with apikeyby (set NCBI)
- url (str) – full URL for Eutil function
- contact (str) – user email (required by NCBI)
- tool (str) – tool name (required by NCBI)
- apikey (str) – NCBI apikey
- num_threads (int) – number of threads to use
- failed_requests (list) – store failed requests for analysis if desired
- request_pool –
entrezpy.base.query.EutilsQuery.RequestPool
instance- request_counter (int) – requests counter for a EutilsQuery instance
base_url
= 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'¶Base url for all Eutil request
inquire
(parameter, analyzer)¶Virtual function starting query. Each query requires its own implementation.
Parameters:
- parameter (dict) – E-Utilities parameters
- analzyer (
entrezpy.base.analyzer.EutilsAnalzyer
) – query response analyzerReturns: analyzer
Return type:
entrezpy.base.analyzer.EutilsAnalzyer
check_requests
()¶Virtual function testing and handling failed requests. These requests fail due to HTTP/URL issues and stored
entrezpy.base.query.EutilsQuery.failed_requests
check_ncbi_apikey
(apikey=None, env_var=None)¶Checks and sets NCBI apikey.
Parameters:
- apikey (str) – NCBI apikey
- env_var (str) – enviromental variable storing NCBI apikey
prepare_request
(request)¶Prepares request for sending to E-Utilities with require quey attributes.
Parameters: request ( entrezpy.base.request.EutilsRequest
) – entrezpy request instanceReturns: request instance with EUtils parameters Return type: entrezpy.base.request.EutilsRequest
add_request
(request, analyzer)¶Adds one request and corresponding analyzer to the request pool.
Parameters:
- request (
entrezpy.base.request.EutilsRequest
) – entrezpy request instance- analzyer – entrezpy analyzer instance
monitor_start
(query_parameters)¶Starts query monitoring
Parameters: query_parameters ( entrezpy.base.parameter.EutilsParameter
) – query parameters
monitor_stop
()¶Stops query monitoring
monitor_update
(updated_query_parameters)¶Updates query monitoring parameters if follow up requests are required.
Parameters: updated_query_parameters ( entrezpy.base.parameter.EutilsParameter
) – updated query parameters
hasFailedRequests
()¶Reports if at least one request failed.
dump
()¶Dump all attributes
isGoodQuery
()¶Tests for request errors
rtype: bool
Parameter¶
- class
entrezpy.base.parameter.
EutilsParameter
(parameter=None)¶EutilsParameter set and check parameters for each query. EutilsParameter is populated from a dictionary with valid E-Utilities parameters for the corresponding query. It declares virtual functions where necessary.
Simple helper functions are presented to test the common parameters db, WebEnv, query_key and usehistory.
Note
usehistory
is the parameter used for Entrez history queries and is set to True (use it) by default. It can be set to False to ommit history server use.
haveExpectedRequests()
tests if the of the number of requests has been calculated.The virtual methods
check()
anddump()
need thrir own implementation since they can vary between queries.Warning
check()
is expected to run after all parameters have been set.
Parameters: parameter (dict) – Eutils query parameters
Variables:
- db (str) – Entrez database name
- webenv (str) – WebEnv
- querykey (int) – querykey
- expected_request (int) – number of expected request for the query
- doseq (bool) – use id= parameter for each uid in POST
haveDb
()¶Check for required db parameter
Return type: bool
haveWebenv
()¶Check for required WebEnv parameter
Return type: bool
haveQuerykey
()¶Check for required QueryKey parameter
Return type: bool
useHistory
()¶Check if history server should be used.
Return type: bool
haveExpectedRequets
()¶Check fo expected requests. Hints an error if no requests are expected.
Return type: bool
check
()¶Virtual function to run a check before starting the query. This is a crucial step and should abort upon failing.
Raises: NotImplementedError – if not implemented
dump
()¶Dump instance attributes
Return type: dict Raises: NotImplementedError – if not implemented
Request¶
- class
entrezpy.base.request.
EutilsRequest
(eutil, db)¶EutilsRequest is the base class for requests from
entrezpy.base.query.EutilsQuery
.EutilsRequests instantiate in
entrezpy.base.query.EutilsQuery.inquire()
before being added to the request pool byentrezpy.base.query.EutilsQuery.add_request()
. Each EutilsRequest triggers an answer at the NCBI Entrez servers if no connection errors occure.
EutilsRequest
stores the required information for POST requests. Its status can be queried from outside byentrezpy.base.request.EutilsRequest.get_observation()
. EutilsRequest instances store information not present in the server response and is required byentrezpy.base.analyzer.EutilsAnalyzer
to parse responses and errors correctly. Several instance attributes are not required for a POST request but help debugging.Each request is automatoically assigned an id to identify and trace requests using the query id and request id.
Parameters:
- eutil (str) – eutil function for this request, e.g. efetch.fcgi
- db (str) – database for request
Initializes a new request with initial attributes as part of a query in
entrezpy.base.query.EutilsQuery
.
Variables:
- tool (str) – tool name to which this request belongs
- url (str) – full Eutil url
- contact (str) – use email
- apikey (str) – NBCI apikey
- query_id (str) –
entrezpy.base.query.EutilsQuery.query_id
which initiated this request- status (int) – request status : 0->success, 1->Fail,2->Queued
- size (int) – size of request, e.g. number of UIDs
- start_time (float) – start time of request in seconds since epoch
- duration – duration for this request in seconds
- doseq – set doseq parameter in
entrezpy.request.Request.request()
Note
status
is work in progress.
get_post_parameter
()¶Virtual function returning the POST parameters for the request from required attributes.
Return type: dict Raises: NotImplemetedError –
prepare_base_qry
(extend=None)¶Returns instance attributes required for every POST request.
Parameters: extend (dict) – parameters extending basic parameters Returns: base parameters for POST request Return type: dict
set_status_success
()¶Set status if request succeeded
set_status_fail
()¶Set status if request failed
report_status
(processed_requests=None, expected_requests=None)¶Reports request status when triggered
get_request_id
()¶
Returns: full request id Return type: str
set_request_error
(error)¶Sets request error and HTTP/URL error message
Parameters: error (str) – HTTP/URL error
start_stopwatch
()¶Starts time to measure request duration.
calc_duration
()¶Calculates request duration
dump_internals
(extend=None)¶Dumps internal attributes for request.
Parameters: extend (dict) – extend dump with additional information
Analyzer¶
- class
entrezpy.base.analyzer.
EutilsAnalyzer
¶EutilsAnalyzer is the base class for an entrezpy analyzer. It prepares the response based on the requested format and checks for E-Utilities errors. The function parse() is invoked after every request by the corresponding query class, e.g. Esearcher. This allows analyzing data as it arrives without waiting until larger queries have been fetched. This approach allows implementing analyzers which can store already downloaded data to establish checkpoints or trigger other actions based on the received data.
Two virtual classes are the core and need their own implementation to support specific queries:
Note
Responses from NCBI are not very well documented and functions will be extended as new errors are encountered.
Inits EutilsAnalyzer with unknown type of result yet. The result needs to be set upon receiving the first response by
init_result()
.
Variables:
- hasErrorResponse (bool) – flag indicating error in response
- result – result instance
known_fmts
= {'json', 'text', 'xml'}¶Store formats known to EutilsAnalzyer
init_result
(response, request)¶Virtual function to initialize result instance. This allows to set attributes from the first response and request.
Parameters: response (dict or io.StringIO) – converted response from convert_response()
Raises: NotImplementedError – if implementation is missing
analyze_error
(response, request)¶Virtual function to handle error responses
Parameters: response (dict or io.StringIO) – converted response from convert_response()
Raises: NotImplementedError – if implementation is missing
analyze_result
(response, request)¶Virtual function to handle responses, i.e. parsing them and prepare them for
entrezpy.base.result.EutilsResult
Parameters: response (dict or io.StringIO) – converted response from convert_response()
Raises: NotImplementedError – if implementation is missing
parse
(raw_response, request)¶Check for errors and calls parser for the raw response.
Parameters:
- raw_response (
urllib.request.Request
) – response fromentrezpy.requester.requester.Requester
- request (
entrezpy.base.request.EutilsRequest
) – query requestRaises: NotImplementedError – if request format is not in
EutilsAnalyzer.known_fmts
convert_response
(raw_response_decoded, request)¶Converts raw_response into the expected format, deduced from request and set via the retmode parameter.
Parameters:
- raw_response (
urllib.request.Request
) – responseentrezpy.requester.requester.Requester
- request (
entrezpy.base.request.EutilsRequest
) – query requestReturns: response in parseable format
Return type: dict or
io.stringIO
- ..note::
- Using threads without locks randomly ‘looses’ the response, i.e. the raw response is emptied between requests. With locks, it works, but threading is not much faster than non-threading. It seems JSON is more prone to this than XML.
isErrorResponse
(response, request)¶Checking for error messages in response from Entrez Servers and set flag
hasErrorResponse
.
Parameters:
- response (dict or
io.stringIO
) – parseable response fromconvert_response()
- request (
entrezpy.base.request.EutilsRequest
) – query requestReturns: error status
Return type: bool
check_error_xml
(response)¶Checks for errors in XML responses
Parameters: response ( io.stringIO
) – XML responseReturns: if XML response has error message Return type: bool
check_error_json
(response)¶Checks for errors in JSON responses. Not unified among Eutil functions.
Parameters: response (dict) – reponse Returns: status if JSON response has error message Return type: bool
isSuccess
()¶Test if response has errors
Return type: bool
get_result
()¶Return result
Returns: result instance Return type: entrezpy.base.result.EutilsResult
follow_up
()¶Return follow-up parameters if available
Returns: Follow-up parameters Return type: dict
isEmpty
()¶Test for empty result
Return type: bool
Result¶
- class
entrezpy.base.result.
EutilsResult
(function, qid, db, webenv=None, querykey=None)¶EutilsResult is the base class for an entrezpy result. It sets the required result attributes common for all result and declares virtual functions to interact with other entrezpy classes. Empty results are successful results since no query error has been received.
entrezpy.base.result.EutilsResult.size()
is important to
- determine if and how many follow-up requests are required
- if it’s an empty result
Parameters:
- function (string) – EUtil function of the result
- qid (string) – query id
- db (string) – Entrez database name for result
- webenv (string) – WebEnv of response
- querykey (int) – querykey of response
size
()¶Returns result size in the corresponding ResultSize unit
Return type: int Raises: NotImplementedError – if implementation is missing
dump
()¶Dumps all instance attributes
Return type: dict Raises: NotImplementedError – if implementation is missing
get_link_parameter
(reqnum=0)¶Assembles parameters for automated follow-ups. Use the query key from the first request by default.
Parameters: reqnum (int) – request number for which query_key should be returned Returns: EUtils parameters Return type: dict Raises: NotImplementedError – if implementation is missing
isEmpty
()¶Indicates empty result.
Return type: bool Raises: NotImplementedError – if implementation is missing