Efetch modules

Efetcher

Inheritance diagram of entrezpy.efetch.efetcher
class entrezpy.efetch.efetcher.Efetcher(tool, email, apikey=None, apikey_var=None, threads=None, qid=None)

Bases: entrezpy.base.query.EutilsQuery

Efetcher implements Efetch E-Utilities queries [0]. It implements entrezpy.base.query.EutilsQuery.inquire() to fetch data from NCBI Entrez servers. [0]: https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch [1]: https://www.ncbi.nlm.nih.gov/books/NBK25497/table/ chapter2.T._entrez_unique_identifiers_ui/?report=objectonly

Variables:resultentrezpy.base.result.EutilsResult
inquire(parameter, analyzer=<entrezpy.efetch.efetch_analyzer.EfetchAnalyzer object>)

Implements entrezpy.base.query.EutilsQuery.inquire() and configures fetch.

Note

Efetch prefers to know the number of UIDs to fetch, i.e. number of UIDs or retmax. If this information is missing, the max number of UIDs for the specific retmode and rettype are fetched.

Parameters:
Returns:

analyzer instance or None if request errors have been encountered

Return type:

entrezpy.base.analyzer.EutilsAnalyzer or None

EfetchParameter

Inheritance diagram of entrezpy.efetch.efetch_parameter
entrezpy.efetch.efetch_parameter.DEF_RETMODE = 'xml'

Default retmode for fetch requests

class entrezpy.efetch.efetch_parameter.EfetchParameter(param)

Bases: entrezpy.base.parameter.EutilsParameter

EfetchParameter implements checks and configures an EftechQuery. A fetch query knows its size due to the id parameter or earlier result stored on the Entrez history server using WebEnv and query_key. The default retmode (fetch format) is set to XML because all E-Utilities can retun XML but not JSON, unfortunately.

req_limits = {'json': 500, 'text': 10000, 'xml': 10000}

Max number of UIDs to fetch per request mode

valid_retmodes = {'gene': {'text', 'xml'}, 'nuccore': {'text', 'xml'}, 'pmc': {'xml'}, 'poset': {'text', 'xml'}, 'protein': {'text', 'xml'}, 'pubmed': {'text', 'xml'}, 'sequences': {'text', 'xml'}}

Enforced request uid sizes by NCBI for fetch requests by format

adjust_retmax(retmax)

Adjusts retmax parameter. Order of check is crucial.

Parameters:retmax (int) – retmax value
Returns:adjusted retmax or None if all UIDs are fetched
Return type:int or None
check_retmode(retmode)

Checks for valid retmode and retmode combination

Parameters:retmode (str) – retmode parameter
Returns:retmode
Return type:str
adjust_reqsize(reqsize)

Adjusts request size for query

Parameters:reqsize (str or None) – Request size parameter
Returns:adjusted request size
Return type:int
calculate_expected_requests(qsize=None, reqsize=None)

Calculate anf set the expected number of requests. Uses internal parameters if non are provided.

Parameters:
  • or None qsize (int) – query size, i.e. expected number of data sets
  • reqsize (int) – number of data sets to fetch in one request
haveDb()

Check for required db parameter

Return type:bool
haveExpectedRequets()

Check fo expected requests. Hints an error if no requests are expected.

Return type:bool
haveQuerykey()

Check for required QueryKey parameter

Return type:bool
haveWebenv()

Check for required WebEnv parameter

Return type:bool
useHistory()

Check if history server should be used.

Return type:bool
check()

Implements entrezpy.base.parameter.EutilsParameter.check to check for the minumum required parameters. Aborts if any check fails.

dump()

Dump instance attributes

Return type:dict
Raises:NotImplementedError – if not implemented

EfetchAnalyzer

Inheritance diagram of entrezpy.efetch.efetch_analyzer
class entrezpy.efetch.efetch_analyzer.EfetchAnalyzer

Bases: entrezpy.base.analyzer.EutilsAnalyzer

EfetchAnalyzer implements a basic analysis of Efetch E-Utils responses. Stores results in a entrezpy.efetch.efetch_result.EfetchResult instance.

Note

This is a very superficial analyzer for documentation and educational purposes. In almost all cases a more specific analyzer has to be implemented in inheriting entrezpy.base.analyzer.EutilsAnalyzer and implementing the virtual functions entrezpy.base.analyzer.EutilsAnalzyer.analyze_result() and entrezpy.base.analyzer.EutilsAnalzyer.analyze_error().

Variables:resultentrezpy.efetch.efetch_result.EfetchResult
init_result(response, request)

Should be implemented if used properly

analyze_result(response, request)

Virtual function to handle responses, i.e. parsing them and prepare them for entrezpy.base.result.EutilsResult

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
analyze_error(response, request)

Virtual function to handle error responses

Parameters:response (dict or io.StringIO) – converted response from convert_response()
Raises:NotImplementedError – if implementation is missing
norm_response(response, rettype=None)

Normalizes response for printing

Parameters:response (dict or io.StringIO) – efetch response
Returns:str or dict
isEmpty()

Test for empty result

Return type:bool
check_error_json(response)

Checks for errors in JSON responses. Not unified among Eutil functions.

Parameters:response (dict) – reponse
Returns:status if JSON response has error message
Return type:bool
check_error_xml(response)

Checks for errors in XML responses

Parameters:response (io.stringIO) – XML response
Returns:if XML response has error message
Return type:bool
convert_response(raw_response_decoded, request)

Converts raw_response into the expected format, deduced from request and set via the retmode parameter.

Parameters:
Returns:

response in parseable format

Return type:

dict or io.stringIO

..note::
Using threads without locks randomly ‘looses’ the response, i.e. the raw response is emptied between requests. With locks, it works, but threading is not much faster than non-threading. It seems JSON is more prone to this than XML.
follow_up()

Return follow-up parameters if available

Returns:Follow-up parameters
Return type:dict
get_result()

Return result

Returns:result instance
Return type:entrezpy.base.result.EutilsResult
isErrorResponse(response, request)

Checking for error messages in response from Entrez Servers and set flag hasErrorResponse.

Parameters:
Returns:

error status

Return type:

bool

isSuccess()

Test if response has errors

Return type:bool
known_fmts = {'json', 'text', 'xml'}
parse(raw_response, request)

Check for errors and calls parser for the raw response.

Parameters:
Raises:

NotImplementedError – if request format is not in EutilsAnalyzer.known_fmts

EfetchRequest

Inheritance diagram of entrezpy.efetch.efetch_request
class entrezpy.efetch.efetch_request.EfetchRequest(eutil, parameter, start, size)

Bases: entrezpy.base.request.EutilsRequest

The EfetchRequest class implements a single request as part of an Efetch query. It stores and prepares the parameters for a single request. entrezpy.efetch.efetch_query.Efetch.inquire() calculates start and size for a single request.

Parameters:
get_post_parameter()

Virtual function returning the POST parameters for the request from required attributes.

Return type:dict
Raises:NotImplemetedError
dump()

Dumps instance attributes

calc_duration()

Calculates request duration

dump_internals(extend=None)

Dumps internal attributes for request.

Parameters:extend (dict) – extend dump with additional information
get_request_id()
Returns:full request id
Return type:str
prepare_base_qry(extend=None)

Returns instance attributes required for every POST request.

Parameters:extend (dict) – parameters extending basic parameters
Returns:base parameters for POST request
Return type:dict
report_status(processed_requests=None, expected_requests=None)

Reports request status when triggered

set_request_error(error)

Sets request error and HTTP/URL error message

Parameters:error (str) – HTTP/URL error
set_status_fail()

Set status if request failed

set_status_success()

Set status if request succeeded

start_stopwatch()

Starts time to measure request duration.