Conduit module

Conduit

Inheritance diagram of entrezpy.conduit
class entrezpy.conduit.Conduit(email, apikey=None, apikey_envar=None, threads=None)

Conduit simplifies to create pipelines and queries for entrezpy. Conduit stores results from previous requests, allowing to concatenate queries and retrieve obtained results later if required to reduce the need to redownload data. Conduit can use multiple threads to speed up data download, but some external libraries can break, e.g. SQLite3.

Queries instances in pipelines of Conduit.Pipeline are stored in the dictionary Conduit.queries with the query id as key and are accessible by all Conduit instances. A single Conduit.Pipeline stores only the query id for this instance

Parameters:
  • email (str) – user email
  • apikey (str) – NCBI apikey
  • apikey_var (str) – enviroment variable storing NCBI apikey
  • threads (int) – set threads for multithreading
queries = {}

Query storage

analyzers = {}

Analyzed query storage

class Query(function, parameter, dependency=None, analyzer=None)

Entrezpy query for a Conduit pipeline. Conduit assembles pipelines using several Query() instances. If a dependency is given, it uses those parameters as basis using :meth:.resolve_dependency`.

Parameters:
  • function (str) – Eutils function
  • parameter (dict) – function parameters
  • dependency (str) – query id from earlier query
  • analyzer (entrezpy.base.analyzer.EutilsAnalyzer) – analyzer instance for this query
resolve_dependency()

Resolves dependencies to obtain paremeters from earlier query. Parameters passed to this instance will overwrite dependency parameters

dump()
class Pipeline

The Pipeline class implements a query pipeline with several consecutive queries. New pipelines are obtained through Conduit. Query instances are stored in Conduit.queries and the corresponding query id’s in queries. Every added query returns its id which can be used to retrieve it.

Variables:queries – queries for this Pipeline instance

Adds Esearch query

Parameters:
Returns:

Conduit query

Return type:

ConduitQuery

Adds Elink query. Signature as Conduit.Pipeline.add_search()

add_post(parameter=None, dependency=None, analyzer=None)

Adds Epost query. Signature as Conduit.Pipeline.add_search()

add_summary(parameter=None, dependency=None, analyzer=None)

Adds Esummary query. Signature as Conduit.Pipeline.add_search()

add_fetch(parameter=None, dependency=None, analyzer=None)

Adds Efetch query. Same signature as Conduit.Pipeline.add_search() but analyzer is required as this step obtains highly variable results.

add_query(query)

Adds query to own pipeline and storage

Parameters:query (Conduit.Query) – Conduit query
Returns:query id of added query
Return type:str
run(pipeline)

Runs one query in pipeline and checks for errors. If errors are encounterd the pipeline aborts.

Parameters:pipeline (Conduit.Pipeline) – Conduit pipeline
check_query(query)

Check for successful query.

Parameters:query (Conduit.Query) – Conduit query
get_result(query_id)

“Returns stored result from previous run.

Parameters:query_id (str) – query id
Returns:Result from this query
Return type:entrezpy.base.result.EutilsResult
new_pipeline()

Retrurns new Conduit pipeline.

Returns:Conduit pipeline
Return type:Conduit.Pipeline
search(query, analyzer=<class 'entrezpy.esearch.esearch_analyzer.EsearchAnalyzer'>)

Configures and runs an Esearch query. Analyzer are class references and instantiated here.

Parameters:
  • query (Conduit.Query) – Conduit Query
  • analyzer – reference to analyzer class
Returns:

analyzer

Return type:

entrezpy.esearch.esearch_analyzer.EsearchAnalyzer

summarize(query, analyzer=<class 'entrezpy.esummary.esummary_analyzer.EsummaryAnalyzer'>)

Configures and runs an Esummary query. Analyzer are class references and instantiated here.

Parameters:
  • query (Conduit.Query) – Conduit Query
  • analyzer – reference to analyzer class
Returns:

analyzer

Return type:

entrezpy.esummary.esummary_analyzer.EsummaryAnalyzer

Configures and runs an Elink query. Analyzer are class references and instantiated here.

Parameters:
  • query (Conduit.Query) – Conduit Query
  • analyzer – reference to analyzer class
Returns:

analyzer

Return type:

entrezpy.elink.elink_analyzer.ElinkAnalyzer

post(query, analyzer=<class 'entrezpy.epost.epost_analyzer.EpostAnalyzer'>)

Configures and runs an Epost query. Analyzer are class references and instantiated here.

Parameters:
  • query (Conduit.Query) – Conduit Query
  • analyzer – reference to analyzer class
Returns:

analyzer

Return type:

entrezpy.epost.epost_analyzer.EpostAnalyzer

fetch(query, analyzer=<class 'entrezpy.efetch.efetch_analyzer.EfetchAnalyzer'>)

uns an Efetch query. The Analyzer needs to be added to the quuery

Parameters:
  • query (Conduit.Query) – Conduit Query
  • analyzer – reference to analyzer class
Returns:

analyzer

Returns:

analyzer

Return type:

entrezpy.efetch.efetch_analyzer.EfetchAnalyzer