Title: | Search and Retrieve Scientific Publication Records from PubMed |
---|---|
Description: | Query NCBI Entrez and retrieve PubMed records in XML or text format. Process PubMed records by extracting and aggregating data from selected fields. A large number of records can be easily downloaded via this simple-to-use interface to the NCBI PubMed API. |
Authors: | Damiano Fantini |
Maintainer: | Damiano Fantini <[email protected]> |
License: | GPL-3 |
Version: | 3.1.3 |
Built: | 2025-01-27 03:09:27 UTC |
Source: | https://github.com/dami82/easypubmed |
Class easyPubMed defines objects that represent PubMed Query jobs and the corresponding results. Briefly, these objects are initialized using information that will guide the communication with the NCBI Entrez server. Also, easyPubMed objects are used to store raw and processed data retrieved from Pubmed.
## S4 method for signature 'easyPubMed' initialize(.Object, query_string, job_info)
## S4 method for signature 'easyPubMed' initialize(.Object, query_string, job_info)
.Object |
The easyPubMed object being built. |
query_string |
String (character vector of length 1) corresponding to the user-provided text of the query to be submitted to PubMed. |
job_info |
List, this should be the output of 'EPM_job_split()'. |
query
String (character vector of length 1) corresponding to the PubMed request submitted by the user.
meta
List including meta information about the PubMed Query job.
uilist
List including all unique identifiers corresponding to the Pubmed records returned by the query. Can be empty.
raw
List including the raw data (in 'xml' or 'medline' format) retrieved from the NCBI eFetch server. Can be empty.
data
Data.frame including processed data based on the xml raw data retrieved from PubMed.
misc
List including additional information.
Damiano Fantini [email protected]
Fetch raw PubMed records from PubMed. Records can be downloaded in text or xml format and stored into a local object or written to local files.
epm_fetch( x, format = "xml", api_key = NULL, write_to_file = FALSE, outfile_path = NULL, outfile_prefix = NULL, store_contents = TRUE, encoding = "UTF-8", verbose = TRUE )
epm_fetch( x, format = "xml", api_key = NULL, write_to_file = FALSE, outfile_path = NULL, outfile_prefix = NULL, store_contents = TRUE, encoding = "UTF-8", verbose = TRUE )
x |
An 'easyPubMed' object. |
format |
String, the desired format for the raw records. This argument must take one of the following values: 'c("uilist", "medline", "xml")' and defaults to '"xml"'. |
api_key |
String, corresponding to the NCBI API token (if available). NCBI token strings can be requested from NCBI. Record download will be faster if a valid NCBI token is used. This argument can be 'NULL'. |
write_to_file |
Logical of length 1. Shall raw records be written to a file on the local machine. It defaults to 'FALSE'. |
outfile_path |
Path to the folder on the local machine where files will be saved (if 'write_to_file' is 'TRUE'). It must point to an already existing directory. If 'NULL', the working directory will be used. |
outfile_prefix |
String, prefix that will be added to the name of each file written to the local machine. This argument is parsed only when 'write_to_file' is 'TRUE'. If 'NULL', an arbitrary prefix will be added (easypubmed_job_YYYYMMDDHHMM). |
store_contents |
Logical of length 1. Shall raw records be stored in the 'easyPubMed' object. It defaults to 'TRUE'. It may convenient to switch this to 'FALSE' when downloading large number of records. If 'store_contents' is 'FALSE', 'write_to_file' must be 'TRUE'. |
encoding |
String, the encoding of the records retrieved from PubMed. Typically, this is 'UTF-8'. |
verbose |
Logical, shall details about the progress of the operation be printed to console. |
an easyPubMed object.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x = x, format = 'uilist') x }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x = x, format = 'uilist') x }, silent = TRUE) setTimeLimit(elapsed = Inf)
Read one or more text files including XML-decorated raw PubMed records and rebuild an 'easyPubMed' object. The function expects all files to be generated from the same query using 'easyPubMed>3.0' and the 'epm_fetch()' function setting 'write_to_file' to 'TRUE'. This function can import a fraction or all of the files resulting from a single query. Files resulting from non-compatible fetch jobs will be dropped.
epm_import_xml(x)
epm_import_xml(x)
x |
Character vector, the paths to text files including XML-decorated raw PubMed records saved using 'easyPubMed>3.0'. |
an 'easyPubMed' object including raw XML PubMed records.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x = x, format = 'xml', write_to_file = TRUE, outfile_prefix = 'test', store_contents = FALSE) y <- epm_import_xml('test_batch_01.txt') tryCatch({unlink('test_batch_01.txt')}, error = function(e) { NULL }) print(paste0(' Raw Record Num (fetched): ', getEPMMeta(x)$raw_record_num)) print(paste0('Raw Record Num (read & rebuilt): ', getEPMMeta(y)$raw_record_num)) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x = x, format = 'xml', write_to_file = TRUE, outfile_prefix = 'test', store_contents = FALSE) y <- epm_import_xml('test_batch_01.txt') tryCatch({unlink('test_batch_01.txt')}, error = function(e) { NULL }) print(paste0(' Raw Record Num (fetched): ', getEPMMeta(x)$raw_record_num)) print(paste0('Raw Record Num (read & rebuilt): ', getEPMMeta(y)$raw_record_num)) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Read a raw PubMed record, identify XML tags, extract information and cast it into a structured data.frame. The expected input is an XML-tag-decorated string corresponding to a single PubMed record. Information about article title, authors, affiliations, journal name and abbreviation, publication date, references, and keywords are returned.
epm_parse( x, max_authors = 10, autofill_address = TRUE, compact_output = TRUE, include_abstract = TRUE, max_references = 150, ref_id_type = "doi", verbose = TRUE )
epm_parse( x, max_authors = 10, autofill_address = TRUE, compact_output = TRUE, include_abstract = TRUE, max_references = 150, ref_id_type = "doi", verbose = TRUE )
x |
An 'easyPubMed' object. The object must include raw records (n>0) downloaded in the 'xml' format. |
max_authors |
Numeric, maximum number of authors to retrieve. If this is set to -1, only the last author is extracted. If this is set to 1, only the first author is returned. If this is set to 2, the first and the last authors are extracted. If this is set to any other positive number (i), up to the leading (n-1) authors are retrieved together with the last author. If this is set to a number larger than the number of authors in a record, all authors are returned. Note that at least 1 author has to be retrieved, therefore a value of 0 is not accepted (coerced to -1). |
autofill_address |
Logical, shall author affiliations be propagated within each record to fill missing values. |
compact_output |
Logical, shall record data be returned in a compact format where each row is a single record and author names are collapsed together. If 'FALSE', each row corresponds to a single author of the publication and the record-specific data are recycled for all included authors (legacy approach). |
include_abstract |
Logical, shall abstract text be included in the output data.frame. If 'FALSE', the abstract text column is populated with a missing value. |
max_references |
Numeric, maximum number of references to return (for each PubMed record). |
ref_id_type |
String, must be one of the following values: ‘c(’pmid', 'doi')'. Type of identifier used to describe citation references. |
verbose |
Logical, shall details about the progress of the operation be printed to console. |
an easyPubMed object including a data.frame ('data' slot) that stores information extracted from its raw XML PubMed records.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x = x, format = 'xml') x <- epm_parse(x, include_abstract = FALSE, max_authors = 1) get_epm_data(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x = x, format = 'xml') x <- epm_parse(x, include_abstract = FALSE, max_authors = 1) get_epm_data(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Read a raw PubMed record, identify XML tags, extract information and cast it into a structured 'data.frame'. The expected input is an XML-tag-decorated string corresponding to a single PubMed record. Information about article title, authors, affiliations, journal name and abbreviation, publication date, references, and keywords are returned.
epm_parse_record( pubmedArticle, max_authors = 15, autofill_address = TRUE, compact_output = TRUE, include_abstract = TRUE, max_references = 1000, ref_id_type = "pmid" )
epm_parse_record( pubmedArticle, max_authors = 15, autofill_address = TRUE, compact_output = TRUE, include_abstract = TRUE, max_references = 1000, ref_id_type = "pmid" )
pubmedArticle |
String, this is an XML-tag-decorated raw PubMed record. |
max_authors |
Numeric, maximum number of authors to retrieve. If this is set to -1, only the last author is extracted. If this is set to 1, only the first author is returned. If this is set to 2, the first and the last authors are extracted. If this is set to any other positive number (i), up to the leading (n-1) authors are retrieved together with the last author. If this is set to a number larger than the number of authors in a record, all authors are returned. Note that at least 1 author has to be retrieved, therefore a value of 0 is not accepted (coerced to -1). |
autofill_address |
Logical, shall author affiliations be propagated within each record to fill missing values. |
compact_output |
Logical, shall record data be returned in a compact format where each row is a single record and author names are collapsed together. If 'FALSE', each row corresponds to a single author of the publication and the record-specific data are recycled for all included authors. |
include_abstract |
Logical, shall abstract text be included in the output data.frame. If 'FALSE', the abstract text column is populated with a missing value. |
max_references |
Numeric, maximum number of references to return (for each PubMed record). |
ref_id_type |
String, must be one of the following values: ‘c(’pmid', 'doi')'. Type of identifier used to describe citation references. |
a data.frame including information extracted from a raw XML PubMed record.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
data(epm_samples) x <- epm_samples$bladder_cancer_2018$demo_data_03$raw[[1]] epm_parse_record(x)
data(epm_samples) x <- epm_samples$bladder_cancer_2018$demo_data_03$raw[[1]] epm_parse_record(x)
Query PubMed (Entrez) via the PubMed API eSearch utility.
Calling this function results in submitting a query to the NCBI EUtils
server and then capturing and parsing the response.
The number of records expected to be returned by the query is
determined. If this number is bigger than n=10,000, the record retrieval job
is automatically split in a list of smaller manageable sub-queries.
This function returns an "easyPubMed" object, which includes all
information required to retrieve PubMed records using the epm_fetch()
function.
epm_query(query_string, api_key = NULL, verbose = TRUE)
epm_query(query_string, api_key = NULL, verbose = TRUE)
query_string |
String (character vector of length 1), corresponding to the query string. |
api_key |
String (character vector of length 1), corresponding to the NCBI API key. Can be 'NULL'. |
verbose |
logical, shall progress information be printed to console. Defaults to 'TRUE'. |
This function will use "query_string" for querying PubMed. The Query Term can include one or multiple words, as well as the standard PubMed operators (AND, OR, NOT) and tags (i.e., [AU], [PDAT], [Affiliation], and so on).
An easyPubMed object which includes no PubMed records.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]' epm_query(query_string = qry, verbose = FALSE) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]' epm_query(query_string = qry, verbose = FALSE) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Execute a PubMed query using a full-length publication title as query string. Tokenization and stopword removal is automatically performed. The goal is to mimic a Pubmed citation matching search. Because of this approach, it is possible that a query by full-length title may return more than one record.
epm_query_by_fulltitle( fulltitle, field = "[Title]", api_key = NULL, verbose = TRUE )
epm_query_by_fulltitle( fulltitle, field = "[Title]", api_key = NULL, verbose = TRUE )
fulltitle |
String (character vector of length 1) that corresponds to the full-length publication title used for querying PubMed (titles should be used as is, without adding trailing filter tags). |
field |
String (character vector of length 1). This indicates the PubMed record field where the full-length string (fulltitle) should be searched in. By default, this points to the 'Title' field. However, the field can be changed (always use fields supported by PubMed) as required by the user (for example, to attempt an exact-match query using a specific sentence included in the abstract of a record). |
api_key |
String (character vector of length 1), corresponding to the NCBI API key. Can be 'NULL'. |
verbose |
Logical, shall details about the progress of the operation be printed to console. |
an easyPubMed object.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ q <- 'Analysis of Mutational Signatures Using the mutSignatures R Library.' epm_query_by_fulltitle(q) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ q <- 'Analysis of Mutational Signatures Using the mutSignatures R Library.' epm_query_by_fulltitle(q) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Query PubMed using a list of PubMed record identifiers (PMIDs) as input. The list of identifiers is automatically split into a series of manageable-sized chunks (max n=50 PMIDs per chunk).
epm_query_by_pmid(pmids, api_key = NULL, verbose = TRUE)
epm_query_by_pmid(pmids, api_key = NULL, verbose = TRUE)
pmids |
Vector (character or numeric), list of Pubmed record identifiers (PMIDs). Values will be coerced to character. |
api_key |
String (character vector of length 1), corresponding to the NCBI API key. Can be 'NULL'. |
verbose |
Logical, shall details about the progress of the operation be printed to console. |
an easyPubMed object.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ my_pmids <- c(34097668, 34097669, 34097670) epm_query_by_pmid(my_pmids) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ my_pmids <- c(34097668, 34097669, 34097670) epm_query_by_pmid(my_pmids) }, silent = TRUE) setTimeLimit(elapsed = Inf)
This dataset includes a collection of sample data obtained from PubMed records and saved in different formats. This dataset is used to demonstrate specific functionalities of the 'easyPubMed' R library. Each element in the 'epm_samples' list corresponds to a different input or intermediate object.
data("epm_samples")
data("epm_samples")
The dataset is formatted as a list including 4 elements:
* 'bladder_cancer_2018': List of 4
* 'bladder_cancer_40y': List of 1
* 'fx': List of 5
## Display some contents data("epm_samples") # Display Query String used for collecting the data print(epm_samples$bladder_cancer_2018$demo_data_01)
## Display some contents data("epm_samples") # Display Query String used for collecting the data print(epm_samples$bladder_cancer_2018$demo_data_01)
Collection of 133 Stopwords that can be removed from query strings to improve the accuracy of exact-match PubMed queries.
data("epm_stopwords")
data("epm_stopwords")
A character vector including all PubMed stopwords tat are typically filtered out from queries.
Number of stopwords included, n=133.
## Display some contents data("epm_stopwords") head(epm_stopwords)
## Display some contents data("epm_stopwords") head(epm_stopwords)
Retrieve PubMed records from Entrez following a search performed via the get_pubmed_ids() function. Data are downloaded in the XML or TXT format and are retrieved in batches of up to 5000 records.
fetch_pubmed_data( pubmed_id_list, retstart = 0, retmax = 500, format = "xml", encoding = "UTF8", api_key = NULL, verbose = TRUE )
fetch_pubmed_data( pubmed_id_list, retstart = 0, retmax = 500, format = "xml", encoding = "UTF8", api_key = NULL, verbose = TRUE )
pubmed_id_list |
An easyPubMed object. |
retstart |
Integer (>=0): this argument is ignored. |
retmax |
Integer (>=1): this argument is ignored. |
format |
String: element specifying the output format. The following values are allowed: c("xml", "medline", "uilist"). |
encoding |
String, the encoding of the records retrieved from Pubmed. This argument is ignored and set to 'UTF-8'. |
api_key |
String, corresponding to the NCBI API token (if available). NCBI token strings can be requested from NCBI. Record download will be faster if a valid NCBI token is used. This argument can be NULL. |
verbose |
Logical, shall details about the progress of the operation be printed to console. |
The 'fetch_pubmed_data()' function is now obsolete. You should use the 'epm_fetch()' function instead. Please, have a look at the manual or the vignette. The 'fetch_pubmed_data()' function will be retired in the second half of 2024.
Character vector of length >= 1. If format is set to "xml" (default), a single String including all PubMed records (decorated with XML tags) is returned. If a different format is selected, a vector of strings is returned, where each element corresponds to a line of the output document.
Damiano Fantini [email protected]
https://www.data-pulse.com/dev_site/easypubmed/ https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/
## Example 01: retrieve PubMed record Unique Identifiers (uilist) # Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ q <- 'Damiano Fantini[AU] AND "2018"[PDAT]' x <- get_pubmed_ids(pubmed_query_string = q) y <- fetch_pubmed_data(x, format = "uilist") y }, silent = TRUE) setTimeLimit(elapsed = Inf) ## Not run: ## Example 02: retrieve data in XML format q <- 'Damiano Fantini[AU] AND "2018"[PDAT]' x <- epm_query(query_string = q) y <- fetch_pubmed_data(x, format = "xml") y ## End(Not run)
## Example 01: retrieve PubMed record Unique Identifiers (uilist) # Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ q <- 'Damiano Fantini[AU] AND "2018"[PDAT]' x <- get_pubmed_ids(pubmed_query_string = q) y <- fetch_pubmed_data(x, format = "uilist") y }, silent = TRUE) setTimeLimit(elapsed = Inf) ## Not run: ## Example 02: retrieve data in XML format q <- 'Damiano Fantini[AU] AND "2018"[PDAT]' x <- epm_query(query_string = q) y <- fetch_pubmed_data(x, format = "xml") y ## End(Not run)
Retrieve PubMed records for an 'easyPubMed' object.
fetchEPMData(x, params) ## S4 method for signature 'easyPubMed,list' fetchEPMData(x, params)
fetchEPMData(x, params) ## S4 method for signature 'easyPubMed,list' fetchEPMData(x, params)
x |
an easyPubMed-class object. |
params |
list including parameters to tune the record retrieval job. For more info, see '?easyPunMed:::EPM_validate_fetch_params'. |
Obtain Processed Data that were extracted from a list of PubMed records. This is a wrapper function that calls the 'getEPMData()' method. This function returns contents from the 'data' slot.
get_epm_data(x)
get_epm_data(x)
x |
An 'easyPubMed' object. |
a 'data.frame' including processed data from an 'easyPubMed' object.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x) x <- epm_parse(x, max_references = 5, max_authors = 5) get_epm_data(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x) x <- epm_parse(x, max_references = 5, max_authors = 5) get_epm_data(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Request Meta Data from an 'easyPubMed' object. This is a wrapper function that calls the 'getEPMMeta()' method. This function returns contents from the 'meta' slot.
get_epm_meta(x)
get_epm_meta(x)
x |
An 'easyPubMed' object. |
a list including meta data from an 'easyPubMed' object.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') get_epm_meta(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') get_epm_meta(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Request Raw Data from an 'easyPubMed' object. This is a wrapper function that calls the 'getEPMRaw()' method. This function returns contents from the 'raw' slot.
get_epm_raw(x)
get_epm_raw(x)
x |
An 'easyPubMed' object. |
a list including raw data from an 'easyPubMed' object.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x) get_epm_raw(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x) get_epm_raw(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Request the list of unique PubMed Record Identifiers that are contained in an 'easyPubMed' object. This function is a wrapper function calling the 'getEPMUilist()' method. This function returns contents from the 'uilist' slot.
get_epm_uilist(x)
get_epm_uilist(x)
x |
An 'easyPubMed' object. |
a character vector including a list of unique record identifiers from an 'easyPubMed' object.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x) get_epm_uilist(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]') x <- epm_fetch(x) get_epm_uilist(x) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Query PubMed (Entrez) in a simple way via the PubMed API eSearch function.
Calling this function results in posting the query results on the PubMed
History Server. This allows later access to the resulting data via the
fetch_pubmed_data() function, or other easyPubMed functions.
NOTE: this function has become obsolete. You should use the epm_query()
function instead. Please, have a look at the manual or the vignette.
The get_pubmed_ids()
function will be retired in 2024.
get_pubmed_ids(pubmed_query_string, api_key = NULL)
get_pubmed_ids(pubmed_query_string, api_key = NULL)
pubmed_query_string |
String (character vector of length 1), corresponding to the query string used for querying PubMed. |
api_key |
String (character vector of length 1), corresponding to the NCBI API key. Can be NULL. |
This function will use the String provided as argument for querying PubMed via the eSearch function of the PubMed API. The Query Term can include one or multiple words, as well as the standard PubMed operators (AND, OR, NOT) and tags (i.e., [AU], [PDAT], [Affiliation], and so on). ESearch will post the UIDs resulting from the search operation onto the History server so that they can be used directly in a subsequent fetchPubmedData() call.
An easyPubMed object which includes no PubMed records.
Damiano Fantini, [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]' get_pubmed_ids(pubmed_query_string = qry) }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]' get_pubmed_ids(pubmed_query_string = qry) }, silent = TRUE) setTimeLimit(elapsed = Inf)
Retrieve processed data from an 'easyPubMed' object.
getEPMData(x) ## S4 method for signature 'easyPubMed' getEPMData(x)
getEPMData(x) ## S4 method for signature 'easyPubMed' getEPMData(x)
x |
an object of class 'easyPubMed'. |
Retrieve the list of record retrieval sub-jobs from an 'easyPubMed' object. Record retrieval sub-jobs are stored in a 'data.frame' and each row corresponds to an independent non-overlapping PubMed query. This 'data.frame' guides the record retrieval process. The 'data.frame' is obtained from the 'misc' slot of an 'easyPubMed' object.
getEPMJobList(x) ## S4 method for signature 'easyPubMed' getEPMJobList(x)
getEPMJobList(x) ## S4 method for signature 'easyPubMed' getEPMJobList(x)
x |
an object of class 'easyPubMed'. |
Retrieve meta data from an 'easyPubMed' object.
getEPMMeta(x) ## S4 method for signature 'easyPubMed' getEPMMeta(x)
getEPMMeta(x) ## S4 method for signature 'easyPubMed' getEPMMeta(x)
x |
an object of class 'easyPubMed'. |
Retrieve miscellaneous information stored in an 'easyPubMed' object.
getEPMMisc(x) ## S4 method for signature 'easyPubMed' getEPMMisc(x)
getEPMMisc(x) ## S4 method for signature 'easyPubMed' getEPMMisc(x)
x |
an object of class 'easyPubMed'. |
Retrieve the user-provided query string from an 'easyPubMed' object.
getEPMQuery(x) ## S4 method for signature 'easyPubMed' getEPMQuery(x)
getEPMQuery(x) ## S4 method for signature 'easyPubMed' getEPMQuery(x)
x |
an object of class 'easyPubMed'. |
Retrieve the raw PubMed record data stored in an 'easyPubMed' object.
getEPMRaw(x) ## S4 method for signature 'easyPubMed' getEPMRaw(x)
getEPMRaw(x) ## S4 method for signature 'easyPubMed' getEPMRaw(x)
x |
an object of class 'easyPubMed'. |
Retrieve the list of unique record identifiers (PMIDs) from an 'easyPubMed' object.
getEPMUilist(x) ## S4 method for signature 'easyPubMed' getEPMUilist(x)
getEPMUilist(x) ## S4 method for signature 'easyPubMed' getEPMUilist(x)
x |
an object of class 'easyPubMed'. |
Extract, parse and format information from raw PubMed records stored in an 'easyPubMed' object.
parseEPMData(x, params) ## S4 method for signature 'easyPubMed,list' parseEPMData(x, params)
parseEPMData(x, params) ## S4 method for signature 'easyPubMed,list' parseEPMData(x, params)
x |
an easyPubMed-class object |
params |
list including parameters to tune the record data parsing job. For more info, see '?easyPunMed:::EPM_validate_parse_params'. |
Print method of the easyPubMed Class.
## S4 method for signature 'easyPubMed' print(x)
## S4 method for signature 'easyPubMed' print(x)
x |
the 'easyPubMed' object being shown. |
Attach (or replace) processed data to an 'easyPubMed' object.
setEPMData(x, y) ## S4 method for signature 'easyPubMed,data.frame' setEPMData(x, y)
setEPMData(x, y) ## S4 method for signature 'easyPubMed,data.frame' setEPMData(x, y)
x |
an object of class 'easyPubMed'. |
y |
'data.frame' including processed data. |
Attach (or replace) the list of record retrieval sub-jobs to an 'easyPubMed' object. Record retrieval sub-jobs are stored in a data.frame and each row corresponds to an independent non-overlapping PubMed query. This 'data.frame' guides the record retrieval process. The 'data.frame' is written into the 'misc' slot of an 'easyPubMed' object.
setEPMJobList(x, y) ## S4 method for signature 'easyPubMed,data.frame' setEPMJobList(x, y)
setEPMJobList(x, y) ## S4 method for signature 'easyPubMed,data.frame' setEPMJobList(x, y)
x |
an object of class 'easyPubMed'. |
y |
'data.frame' including the list of PubMed record retrieaval sub-jobs. |
Attach (or replace) meta data to an 'easyPubMed' object.
setEPMMeta(x, y) ## S4 method for signature 'easyPubMed,list' setEPMMeta(x, y)
setEPMMeta(x, y) ## S4 method for signature 'easyPubMed,list' setEPMMeta(x, y)
x |
an object of class 'easyPubMed'. |
y |
list including meta data information. |
Attach (or replace) miscellaneous information to an 'easyPubMed' object.
setEPMMisc(x, y) ## S4 method for signature 'easyPubMed,list' setEPMMisc(x, y)
setEPMMisc(x, y) ## S4 method for signature 'easyPubMed,list' setEPMMisc(x, y)
x |
an object of class 'easyPubMed'. |
y |
list including miscellaneous data and information. |
Attach (or replace) a user-provided query string to an 'easyPubMed' object.
setEPMQuery(x, y) ## S4 method for signature 'easyPubMed,character' setEPMQuery(x, y)
setEPMQuery(x, y) ## S4 method for signature 'easyPubMed,character' setEPMQuery(x, y)
x |
an object of class 'easyPubMed'. |
y |
string (character vector of length 1) corresponding to a PubMed query string. |
Attach (or replace) raw PubMed record data to an 'easyPubMed' object.
setEPMRaw(x, y) ## S4 method for signature 'easyPubMed,list' setEPMRaw(x, y)
setEPMRaw(x, y) ## S4 method for signature 'easyPubMed,list' setEPMRaw(x, y)
x |
an object of class 'easyPubMed'. |
y |
list of PubMed records (raw data). |
Attach (or replace) the list of unique record identifiers (PMIDs) to an 'easyPubMed' object.
setEPMUilist(x, y) ## S4 method for signature 'easyPubMed,list' setEPMUilist(x, y)
setEPMUilist(x, y) ## S4 method for signature 'easyPubMed,list' setEPMUilist(x, y)
x |
an object of class 'easyPubMed'. |
y |
list of unique PubMed record identifiers (PMIDs). |
Show method of the easyPubMed Class.
## S4 method for signature 'easyPubMed' show(object)
## S4 method for signature 'easyPubMed' show(object)
object |
the 'easyPubMed' object being shown. |
Extract Publication Info from PubMed records and cast data into a data.frame where each row corresponds to a different author. It is possible to limit data extraction to first authors or last authors only, or get information about all authors of each PubMed record.
table_articles_byAuth( pubmed_data, included_authors = "all", max_chars = 500, autofill = TRUE, dest_file = NULL, getKeywords = TRUE, encoding = "UTF8" )
table_articles_byAuth( pubmed_data, included_authors = "all", max_chars = 500, autofill = TRUE, dest_file = NULL, getKeywords = TRUE, encoding = "UTF8" )
pubmed_data |
PubMed Data in XML format: typically, an XML file resulting from a batch_pubmed_download() call or an XML object, result of a fetch_pubmed_data() call. |
included_authors |
Character: c("first", "last", "all"). Only includes information from the first, the last or all authors of a PubMed record. |
max_chars |
This argument is ignored. In this version of the function, the whole Abstract Text is returned. |
autofill |
Logical. If TRUE, missing affiliations are imputed according to the available values (from the same article). |
dest_file |
String (character of length 1). Name of the file that will be written for storing the output. If NULL, no file will be saved. |
getKeywords |
This argument is ignored. In this version of the function MeSH terms and codes (i.e., keywords) are parsed by default. |
encoding |
The encoding of an input/output connection can be specified by name (for example, "ASCII", or "UTF-8", in the same way as it would be given to the function base::iconv(). See iconv() help page for how to find out more about encodings that can be used on your platform. Here, we recommend using "UTF-8". |
The 'table_articles_byAuth()' function is now obsolete. You should use the 'epm_parse()' function instead. Please, have a look at the manual or the vignette. The 'table_articles_byAuth()' function will be retired in the second half of 2024.
Data frame including the following fields: 'c("pmid", "doi", "title", "abstract", "year", "month", "day", "jabbrv", "journal", "keywords", "mesh", "lastname", "firstname", "address", "email")'.
Damiano Fantini [email protected]
https://www.data-pulse.com/dev_site/easypubmed/
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ q0 <- 'Damiano Fantini[AU] AND "2018"[PDAT]' q1 <- easyPubMed::get_pubmed_ids(pubmed_query_string = q0) q2 <- fetch_pubmed_data(pubmed_id_list = q1) df <- table_articles_byAuth(q2, included_authors = 'first') df[, c('pmid', 'lastname', 'jabbrv', 'year', 'month', 'day')] }, silent = TRUE) setTimeLimit(elapsed = Inf)
# Note: a time limit can be set in order to kill the operation when/if # the NCBI/Entrez server becomes unresponsive. setTimeLimit(elapsed = 4.9) try({ q0 <- 'Damiano Fantini[AU] AND "2018"[PDAT]' q1 <- easyPubMed::get_pubmed_ids(pubmed_query_string = q0) q2 <- fetch_pubmed_data(pubmed_id_list = q1) df <- table_articles_byAuth(q2, included_authors = 'first') df[, c('pmid', 'lastname', 'jabbrv', 'year', 'month', 'day')] }, silent = TRUE) setTimeLimit(elapsed = Inf)