Linking within and between Entrezpy databasesΒΆ
Using multiple links in a Conduit pipeline requires to run an Esearch afterwards to keep track of the proper UIDs. This is a quirk of the E-Eutilties (Entrez-Direct uses the same trick).
- Search the Pubmed Enrez database
- Increase the number of possible UIDs by searching pubmed again using the first UIDs to find publications linked to initial search
- Link the Pubmed UIDs to
nuccore
UIDs- Fetch the found UIDs from
nuccore
The following code shows howto use multiple links within a Conduit pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import entrezpy.conduit
w = entrezpy.conduit.Conduit(args.email)
find_genomes = w.new_pipeline()
sid = find_genomes.add_search({'db':'pubmed', 'term' : 'capsid AND infection', 'rettype':'count'})
lid1 = find_genomes.add_link({'cmd':'neighbor_history', 'db':'pubmed'}, dependency=sid)
lid1 = find_genomes.add_search({'rettype': 'count', 'cmd':'neighbor_history'}, dependency=lid1)
lid2 = find_genomes.add_link({'db':'nuccore', 'cmd':'neighbor_history'}, dependency=lid1)
lid2 = find_genomes.add_search({'rettype': 'count', 'cmd':'neighbor_history'}, dependency=lid2)
find_genomes.add_fetch({'retmode':'xml', 'rettype':'fasta'}, dependency=lid2)
a = w.run(find_genomes)
|
Lines 1 - 4: Analogoues as shown in Conduit pipelines
- Line 6: Addsa search query to the Conduit pipline in Entrez database pubmed
- without downloading UIDs and store it in
sid
- Line 8: Add a link query to the Conduit pipline to link the UIDs found in search
sid
topubmed
and store the result on the history server. Store the query in lid1- Line 9: Update the link results for later use and store in the history server.
- Overwrite
lid1
with the updated query. - Line 11: Link the pubmed UIDs to nuccore and store in the history server. Store
- the query in
lid2
. - Line 12: Update the link results for later use and store in the history server.
- Overwirte
lid2
with the updated query - Line 14: Add fetch step to Conduit pipeline with the last link result as
- dependency. Request the data as FASTA sequences in XML format (Tinyseq XML).
Line 15: Run the pipeline.