9. Processing CEDAR Exports

9.1. Overview

This section provides instructions for processing CEDAR exports (queries, timber), so that they can be used to populate the iAM.AMR models.

This processing is performed using the sawmill R package. If you are not familiar with sawmill, please review the section on sawmill, and install it as per the instructions on that page before continuing.

Tip

This section should be read concurrently with the last step of your chosen installation procedure (Bootstrap or Standard): please see the Installation and Use section of the sawmill GitHub repository’s README instruction file).

9.2. Raw Timber

CEDAR timber should be in the form of an Excel (.xlsx) file, where each row represents an individual factor.

The following table is an example of a properly formatted input timber file (header row and one example factor row are shown).

Timber Example
RWID ident_doi ident_pmid name_bibtex ID_factor AMR factor_title factor_description host_01 host_02 microbe_01 microbe_02 stage_allocate stage_observe group_exposed group_referent res_format res_unit contable_a contable_b contable_c contable_d prevtable_a prevtable_b prevtable_c prevtable_d table_n_exp table_n_ref odds_ratio odds_ratio_lo odds_ratio_up odds_ratio_sig odds_ratio_confidence ID_meta meta_amr meta_type
13826 10.1016/j.vetmic.2007.05.025   Thakur2007 10180 tetracycline Antimicrobial-free production system type Intensive (indoor) vs extensive (outdoor) production system type. Indoor = single pen (space of 2.4 ft^2/pig at nursery and 7.4 ft^2/pig at finishing). Outdoor = uncovered barricaded area. Isolates taken at pre-evisceration stage. Swine Carcass Salmonella spp. Farm Abattoir Outdoor Indoor Contingency Table Isolate 17   5           43 5         95      

Attention

The left-to-right order and names of the fields in your input file must match that shown above exactly, otherwise sawmill will raise an error.

Each field has an expected data type, as dictated below. A description of each field is also provided.

Timber Specification
Field Name Expected Data Type Field Description
RWID text RefWorks ID for the reference
ident_doi text DOI for the reference
ident_pmid text PMID for the reference
name_bibtex text First author + publication date of reference
ID_factor numeric ID of the factor
AMR text Antimicrobial assayed for resistance for the factor in question
factor_title text Factor title
factor_description text Factor description
host_01 text Host involved (i.e. cattle, swine)
host_02 text Specific host involved (i.e. dairy cows, beef calves)
microbe_01 text Microbe genus involved (i.e. Escherichia, Salmonella)
microbe_02 text Microbe species involved (i.e. coli, spp.)
stage_allocate text Stage of production at which the intervention to the exposed group is made in the study
stage_observe text Stage of production at which the intervention’s effect is measured in the study
group_exposed text Description of the exposed group
group_referent text Description of the referent group
res_format text Set of result fields available in the study (i.e. Contingency Table, Prevalence Table)
res_unit text Unit to which the results apply (i.e. flock, isolate, pooled isolate)
contable_a numeric Contingency Table: # AMR+ in exposed group
contable_b numeric Contingency Table: # AMR- in exposed group
contable_c numeric Contingency Table: # AMR+ in referent group
contable_d numeric Contingency Table: # AMR- in referent group
prevtable_a numeric Prevalence Table: % AMR+ in exposed group
prevtable_b numeric Prevalence Table: % AMR- in exposed group
prevtable_c numeric Prevalence Table: % AMR+ in referent group
prevtable_d numeric Prevalence Table: % AMR- in referent group
table_n_exp numeric Total # [res_unit] in exposed group
table_n_ref numeric Total # [res_unit] in referent group
odds_ratio numeric Odds Ratio: Odds ratio value for the factor
odds_ratio_lo numeric Odds Ratio: Lower bound of the confidence interval
odds_ratio_up numeric Odds Ratio: Upper bound of the confidence interval
odds_ratio_sig text Odds Ratio: Significance value (p-value)
odds_ratio_confidence numeric Odds Ratio: The confidence interval (i.e. 95%)
ID_meta text Meta-analysis: ID of the meta-analysis group to which the factor belongs (if applicable)
meta_amr text Meta-analysis: antimicrobial/class of antimicrobials to which resistance was assayed in the factors included in this particular meta-analysis group (if applicable)
meta_type text Meta-analysis: type/level of granularity of this particular meta-analysis group (i.e. within studies, across studies) (if applicable)

Attention

The type of data contained within each of the fields in your input file should match those outlined above, as processing errors can occur otherwise. Please see Warnings due to unexpected data types for more information.

9.3. Using sawmill

9.3.1. Changing default values of sawmill arguments

Tip

This sub-section is optional if you have chosen the Bootstrap installation.

Complete descriptions of these arguments and guides as to how they should be changed can be found in the Sawmill Arguments section of the sawmill GitHub repository’s README.md file.

To change these arguments, open start_mill.R and mill.R. The default values are specified in this script in a single line of code, as shown for mill.R in the following figure.

Image showing the default sawmill arguments.

Default arguments in sawmill’s mill.R script.

The argument values can be changed directly in this line of code. For example, if you wanted to change the argument insensible_p_lo to 98, simply replace the 99 after the = sign with 98.

Attention

You must click Install and Restart in the Build tab of RStudio for any changes to the code to take effect.

9.3.2. Adding meta-analysis groupings

Upon examining the processed timber, you may wish to group certain factors together for meta-analysis in the raw timber and rerun sawmill.

Attention

Meta-analysis is currently only supported for timber from CEDAR v2.

To add a meta-analysis grouping, make the following changes to the optional meta-analysis fields in the original, raw timber file:

  1. ID_meta: assign the same meta-analysis ID to all factors you wish to include in the grouping
  2. meta_amr: specify the antimicrobial or class of antimicrobials to which resistance is assayed
  3. meta_type: describe the type and level of granularity of the meta-analysis grouping

Tip

The actual meta-analysis ID assigned to a particular grouping is irrelevant, as long as it is consistent across all factors in the grouping.

The table below provides example values for each meta-analysis field, as they might appear for a factor in the raw timber.

Meta-analysis Example
ID_meta meta_amr meta_type
7 third-generation cephalosporin Within Study, Same Antimicrobial Class

All three meta-analysis fields (ID_meta, meta_amr, and meta_type) can simply be left blank for factors that should not be involved in meta-analysis calculations.

9.3.3. Running sawmill

Please see the instructions in the Installation and Use section of the GitHub repository’s README.md file.

Prompts will appear in the Console as you follow the instructions from GitHub. Enter the information requested by the prompts and select the input timber file from its saved location on your computer.

Once sawmill is finished running, it will prompt you to save one or more output files. For each one, you will be prompted to select the save location on your computer.

Important

Save all output files with .csv extensions to prevent errors from occurring.

If errors or warnings appear, please see the following sub-sections.

Caution

You will likely rerun sawmill many times, as deciding which factors to include in a model is an iterative process. You will need to enter the command rm(list = ls()) into the Console before rerunning sawmill. This must be done once for every rerun. This way, variables saved during sawmill’s previous run will not carry over to the new one.

9.3.4. Errors

Errors will stop sawmill from continuing to run, at whichever point in the pipeline they are raised.

An error message will appear in the Console, indicating which function caused the error. For example, if the error is raised in the build_chairs function, the message will look something like the following:

Image of example error message displayed in the console tab.

Example error message.

Please note that only the lines beginning with “Error” constitute the actual error message. Although the “Processed function…” lines are also in red text, they should be present in the case of a normal output (i.e. one without errors or warnings).

Important

In the event of an error, please send the error message and input timber file that produced it to the maintainer of sawmill’s GitHub repository.

9.3.5. Warnings

Warnings alert the user to potential problems with the code or input data.

Their presence can indicate that sawmill may run into an error at a later step in the processing pipeline, or simply that the current code or input data will produce an incorrect output without further warning. Others may mean nothing; sawmill may continue to execute flawlessly.

Warnings do not stop the pipeline at the point they are raised, but they are still worth examining.

9.3.5.1. Warnings due to unexpected data types

If sawmill detects that one or more cells in the input timber file do not match the expected data types for their respective columns, a warning message will be generated for each mismatching cell. The warning messages are informative; they specify the exact cell addresses within your input file that contain data of the unexpected type.

These particular warnings will also generate a prompt asking whether you would like to stop the pipeline and fix your input data, or continue with processing anyway.

Image of example warning prompt.

Warning prompt.

Caution

Electing to continue with processing when faced with this prompt can create unwanted/unexpected results, which you may not receive further warning about.

The type of warning received (Coercing or Expecting) can help you decide whether or not you should continue.

9.3.5.1.1. Coercing warnings

Coercing warnings appear when R is able to convert the affected cell(s) to the appropriate, expected data type(s).

Below is an example of a cell that is likely to produce a coercing warning. This value is in the odds_ratio_up column, so its data type should be numeric. While the value is a number, it is formatted as text (flagged by Excel in the upper left corner of the cell).

Image of Microsoft Excel spreadsheet example showing cell that produce expected warning.

Example of a cell that produces a coercing warning.

Warning messages for coercing warnings appear in the Console and look something like that shown below. The Excel cell shown above produced one of these warnings (the one affecting AE524 / R524C31).

Image of coercing warning messages.

Coercing warning examples.

If only coercing warnings are present, you can safely choose to continue with processing when faced with the prompt.

9.3.5.1.2. Expecting warnings

Expecting warnings appear when R is not able to convert the affected cell(s) to the appropriate, expected data type(s).

Below is an example of a cell that is likely to produce an expecting warning. This value is in the prev_table_d column, so its data type should be numeric. However, a text string is present, and it cannot be converted to a numeric data type.

Image of Microsoft Excel spreadsheet displaying cell that produces expected warning.

Example of a cell that produces an expecting warning.

Warning messages for expecting warnings appear in the Console and look something like that shown below. The Excel cell shown above produced this warning; it affects cell Z2 / R2C26.

Image of expecting warning example in the console tab.

Expecting warning example.

The implications of expecting warnings vary depending on the columns in which they occur.

If the affected cell(s) are in any of the columns specified in the table below, you should stop the pipeline and fix the affected cells. These fields have a direct effect on the odds ratio calculation, so in the event of unexpected data types in any of these, sawmill will typically deem the factor unusable, excluding the row from further processing and writing it to the scrap pile without warning.

Columns Which Affect Calculations
CEDAR v1 Field Name CEDAR v2 Field Name
result_format res_format
tbl_a contable_a
tbl_b contable_b
tbl_c contable_c
tbl_d contable_d
tbl_p prevtable_a
  prevtable_b
tbl_q prevtable_c
  prevtable_d
tbl_m1 table_n_exp
tbl_m2 table_n_ref
factor_or odds_ratio
or_lo odds_ratio_lo
or_up odds_ratio_up

If the affected cell(s) are in any of the other columns, however, sawmill will simply replace the cell with a value of NA. The factor will not be deleted, and the row will still appear in the processed timber. In cases like this, it is up to the user whether or not to continue with processing when faced with the prompt.

Attention

Output fields may still be affected by unexpected data types in these other columns. For instance, the url and html_link output columns are affected by ident_doi (v2)/docID (v1), and sometimes ident_pmid (v2). Also, the identifier output column is affected by ID_factor (v2)/ID (v1) and factor_title (v2)/title (v1).

9.3.5.2. Other warnings

Every time you execute sawmill, you will likely see a message resembling the following in the Console, once the pipeline has finished and you have saved your processed timber.

Image of generic warning alert message.

Generic warnings alert.

If you follow the prompt by entering the following into the Console:

warnings()

You will see something closely resembling the following:

Image of generic warnings in the console tab.

Generic warning messages.

This type of warning can be ignored. It occurs when the significance value (p-value) for the factor is calculated using the Fisher’s exact test. Since the values used in the Fisher’s test must be rounded to the nearest integer, a warning is generated to notify the user that the rounding took place.

Attention

If the warning messages are of any other nature than those mentioned, please contact the maintainer of sawmill’s GitHub repository for assistance.

9.4. Evaluating the Processed Timber (Planks) and Other Outputs

This section outlines the fields that will be present in the processed timber .csv file. Each row now represents a plank of processed timber, or a factor usable for an iAM.AMR model.

An overview of additional output .csv files that may be produced is also provided.

9.4.1. The output .csv files

9.4.1.1. Processed timber

A processed timber file is produced for each successful run of sawmill.

Two types of planks (rows) are present in the following order, from top to bottom:

  1. Error-free factors for which an odds ratio and other outputs were successfully calculated
  2. Meta-analysis results for each meta-analysis grouping (each unique meta-analysis ID)

Note

Rows containing the results of a meta-analysis will look slightly different (for instance, some fields may have values of NA).

9.4.1.2. Scrap pile

This file is only provided as an output if there is at least one erroneous factor in the raw timber.

The scrap pile contains all erroneous factors, or factors for which an odds ratio and other key outputs were not successfully calculated.

Its fields are overall quite similar to those present in the raw timber, with two unique additions:

  1. exclude_sawmill: Flagged as TRUE, indicating that the factor was excluded from calculations by sawmill due to errors/missing data
  2. exclude_sawmill_reason: A more detailed description of why the factor was not usable

9.4.1.3. Full meta-analysis results

This file is only provided as an output if there is at least one meta-analysis grouping in the raw timber.

Each row represents the results from a single meta-analysis grouping, indicated by the value of ID_meta in the far-left column.

The main estimates produced by the meta-analysis calculation (odds ratio, standard error of the log(odds ratio), and p-value) are included in the processed timber. However, the full results produced by metafor (the meta-analysis R package used by sawmill), contain many more fields describing other parameters of the calculation.

For a full description of these parameters, please see pg. 241 of the metafor user guide, which is the Value list for rma.uni.

9.4.2. Planks

The following table is an example of processed timber.

While all fields present in the input timber are retained in the output, some will have new names. Sawmill renames some of the fields to improve uniformity between v1 and v2 outputs.

Example Output
ID RWID identifier factor_title factor_description ref_key html_link group_exposed group_referent odds_ratio se_log_or pval logOR ID_meta meta_amr meta_type AMR host_01 host_02 microbe_01 microbe_02 stage_allocate stage_observe res_unit res_format grain A B C D P R Q S nexp nref odds oddslo oddsup oddsig oddsci low_cell_count null_comparison insensible_prev_table doi pmid url
10002 10723 R10002_Apramaycin_su Apramaycin sulfate, carbadox, and chlortetracycline hydroxchloride use All used as feed additives. Apramaycin sulfate: 150 g/ton as Apralan 7; 5 lb/pig at weaning. Carbadox: 50 g/ton as Mecadox 2.5; about 15 lb/pig. Chlortetracycline hydroxchloride: 250 g/ton; 14 days ad libitum. Kim2005 <a href=”http://dx.doi.org/10.1089/fpd.2005.2.304”>Click Here</a> Apramaycin sulfate, carbadox, and chlortetracycline hydroxchloride use No use 0.986111111 0.181571417 1 -0.013986242 NA NA NA tetracycline Swine Piglets Escherichia coli Farm Farm Isolate Prevalence Table prev_table_pos_tot 142 108 140 105 56.6 26.3 57.2 38 250 245 NA NA NA NA 95 FALSE FALSE TRUE 10.1089/fpd.2005.2.304 NA http://dx.doi.org/10.1089/fpd.2005.2.304

A description of each output field is provided below. The fields which are added by sawmill and thus only appear in the processed timber are also annotated with the function responsible for adding them.

Tip

The odds_ratio, se_log_or, and pval fields are added by the do_MA function in cases where the row contains the results of a meta-analysis.

Tip

The logOR field is only added if there is at least one meta-analysis grouping (one unique meta-analysis ID) in the raw timber.

Output Specification
Field Name Field Description Added to Output by Function __
ID ID of the factor  
RWID RefWorks ID for the reference  
identifier Unique identifier for this factor, for Analytica add_ident
factor_title Factor title  
factor_description Factor description  
ref_key First author + publication date of reference  
html_link HTML link to the reference study add_HTMLink
group_exposed Description of the exposed group  
group_referent Description of the referent group  
odds_ratio Final odds ratio (either copied from odds field if provided, or calculated) build_chairs
se_log_or Standard error of the log(odds ratio) build_chairs
pval Significance value (p-value, either copied from oddsig field if the grain is odds_ratio, or calculated for other grains) build_horse
logOR Log of the final odds ratio do_MA
ID_meta Meta-analysis: ID of the meta-analysis group to which the factor belongs (if applicable)  
meta_amr Meta-analysis: antimicrobial/class of antimicrobials to which resistance was assayed in the factors included in this particular meta-analysis group (if applicable)  
meta_type Meta-analysis: type/level of granularity of this particular meta-analysis group (i.e. within studies, across studies) (if applicable)  
AMR Antimicrobial assayed for resistance for the factor in question  
host_01 Host involved (i.e. cattle, swine)  
host_02 Specific host involved (i.e. dairy cows, beef calves)  
microbe_01 Microbe genus involved (i.e. Escherichia, Salmonella)  
microbe_02 Microbe species involved (i.e. coli, spp.)  
stage_allocate Stage of production at which the intervention to the exposed group is made in the study  
stage_observe Stage of production at which the intervention’s effect is measured in the study  
res_unit Unit to which the results apply (i.e. flock, isolate, pooled isolate)  
res_format Set of result fields available in the study (i.e. Contingency Table, Prevalence Table)  
grain Set of result fields provided in the study (i.e. if A, B, C, and D are provided, the grain is con_table_pos_neg) check_grain
A Contingency Table: # AMR+ in exposed group  
B Contingency Table: # AMR- in exposed group  
C Contingency Table: # AMR+ in referent group  
D Contingency Table: # AMR- in referent group  
P Prevalence Table: % AMR+ in exposed group  
R Prevalence Table: % AMR- in exposed group  
Q Prevalence Table: % AMR+ in referent group  
S Prevalence Table: % AMR- in referent group  
nexp Total # [res_unit] in exposed group  
nref Total # [res_unit] in referent group  
odds Odds Ratio: Odds ratio value for the factor  
oddslo Odds Ratio: Lower bound of the confidence interval  
oddsup Odds Ratio: Upper bound of the confidence interval  
oddsig Odds Ratio: Significance value (p-value)  
oddsci Odds Ratio: The confidence interval (i.e. 95%) add_CI
low_cell_count If TRUE, at least one of A, B, C, or D is less than or equal to the low_cell_threshold and a correction factor was applied to A, B, C, and D build_table
null_comparison If TRUE, both A and C are equal to 0 (meaning that no AMR+ observations were made) build_table
insensible_prev_table If TRUE, the grain is prev_table_pos_tot and the values in the prevalence table do not add up to 100 where they should polish_table
doi DOI for the reference  
pmid PMID for the reference  
url URL to the reference study add_URL

9.4.3. Checking the validation fields

These are present in the processed timber file.

9.4.3.1. Low cell count factors

When one or more of the four values in the 2x2 contingency table is equal to zero, sawmill sets the low_cell_count field to True. To avoid divide by zero errors, sawmill increments all four values by 0.5.

9.4.3.2. Null comparison factors

When the # AMR+ observations in both the exposed and referent groups are equal to zero, sawmill sets the null_comparison field to True. To avoid divide by zero errors, sawmill increments all four values by 0.5.

Any null comparison factors also have the low_cell_count field set to True.

9.4.3.3. CEDAR v2: factors with an insensible_prev_table

Check your output .csv file for rows where the insensible_prev_table field is set to True. These rows likely have data entry errors in the prevalence table columns, as this result indicates that (% AMR+ exposed) + (% AMR- exposed) does not come to approximately 100, and/or that (% AMR+ referent) + (% AMR- referent) does not come to approximately 100.