9. Processing CEDAR Exports¶

9.1. Overview¶

This section provides instructions for processing CEDAR exports (queries, timber), so that they can be used to populate the iAM.AMR models.

This processing is performed using the sawmill R package. If you are not familiar with sawmill, please review the section on sawmill, and install it as per the instructions on that page before continuing.

Tip

This section should be read concurrently with the last step of your chosen installation procedure (Bootstrap or Standard): please see the Installation and Use section of the sawmill GitHub repository’s README instruction file).

9.2. Raw Timber¶

CEDAR timber should be in the form of an Excel (.xlsx) file, where each row represents an individual factor.

The following table is an example of a properly formatted input timber file (header row and one example factor row are shown).

Timber Example¶
RWID	ident_doi	ident_pmid	name_bibtex	ID_factor	AMR	factor_title	factor_description	host_01	host_02	microbe_01	microbe_02	stage_allocate	stage_observe	group_exposed	group_referent	res_format	res_unit	contable_a	contable_b	contable_c	contable_d	prevtable_a	prevtable_b	prevtable_c	prevtable_d	table_n_exp	table_n_ref	odds_ratio	odds_ratio_lo	odds_ratio_up	odds_ratio_sig	odds_ratio_confidence	ID_meta	meta_amr	meta_type
13826	10.1016/j.vetmic.2007.05.025		Thakur2007	10180	tetracycline	Antimicrobial-free production system type	Intensive (indoor) vs extensive (outdoor) production system type. Indoor = single pen (space of 2.4 ft^2/pig at nursery and 7.4 ft^2/pig at finishing). Outdoor = uncovered barricaded area. Isolates taken at pre-evisceration stage.	Swine	Carcass	Salmonella	spp.	Farm	Abattoir	Outdoor	Indoor	Contingency Table	Isolate	17		5						43	5					95

Attention

The left-to-right order and names of the fields in your input file must match that shown above exactly, otherwise sawmill will raise an error.

Each field has an expected data type, as dictated below. A description of each field is also provided.

Timber Specification¶
Field Name	Expected Data Type	Field Description
RWID	text	RefWorks ID for the reference
ident_doi	text	DOI for the reference
ident_pmid	text	PMID for the reference
name_bibtex	text	First author + publication date of reference
ID_factor	numeric	ID of the factor
AMR	text	Antimicrobial assayed for resistance for the factor in question
factor_title	text	Factor title
factor_description	text	Factor description
host_01	text	Host involved (i.e. cattle, swine)
host_02	text	Specific host involved (i.e. dairy cows, beef calves)
microbe_01	text	Microbe genus involved (i.e. Escherichia, Salmonella)
microbe_02	text	Microbe species involved (i.e. coli, spp.)
stage_allocate	text	Stage of production at which the intervention to the exposed group is made in the study
stage_observe	text	Stage of production at which the intervention’s effect is measured in the study
group_exposed	text	Description of the exposed group
group_referent	text	Description of the referent group
res_format	text	Set of result fields available in the study (i.e. Contingency Table, Prevalence Table)
res_unit	text	Unit to which the results apply (i.e. flock, isolate, pooled isolate)
contable_a	numeric	Contingency Table: # AMR+ in exposed group
contable_b	numeric	Contingency Table: # AMR- in exposed group
contable_c	numeric	Contingency Table: # AMR+ in referent group
contable_d	numeric	Contingency Table: # AMR- in referent group
prevtable_a	numeric	Prevalence Table: % AMR+ in exposed group
prevtable_b	numeric	Prevalence Table: % AMR- in exposed group
prevtable_c	numeric	Prevalence Table: % AMR+ in referent group
prevtable_d	numeric	Prevalence Table: % AMR- in referent group
table_n_exp	numeric	Total # [res_unit] in exposed group
table_n_ref	numeric	Total # [res_unit] in referent group
odds_ratio	numeric	Odds Ratio: Odds ratio value for the factor
odds_ratio_lo	numeric	Odds Ratio: Lower bound of the confidence interval
odds_ratio_up	numeric	Odds Ratio: Upper bound of the confidence interval
odds_ratio_sig	text	Odds Ratio: Significance value (p-value)
odds_ratio_confidence	numeric	Odds Ratio: The confidence interval (i.e. 95%)
ID_meta	text	Meta-analysis: ID of the meta-analysis group to which the factor belongs (if applicable)
meta_amr	text	Meta-analysis: antimicrobial/class of antimicrobials to which resistance was assayed in the factors included in this particular meta-analysis group (if applicable)
meta_type	text	Meta-analysis: type/level of granularity of this particular meta-analysis group (i.e. within studies, across studies) (if applicable)

Attention

The type of data contained within each of the fields in your input file should match those outlined above, as processing errors can occur otherwise. Please see Warnings due to unexpected data types for more information.

9.3. Using sawmill¶

9.3.1. Changing default values of sawmill arguments¶

Tip

This sub-section is optional if you have chosen the Bootstrap installation.

Complete descriptions of these arguments and guides as to how they should be changed can be found in the Sawmill Arguments section of the sawmill GitHub repository’s README.md file.

To change these arguments, open start_mill.R and mill.R. The default values are specified in this script in a single line of code, as shown for mill.R in the following figure.

Image showing the default sawmill arguments.

Default arguments in sawmill’s mill.R script.

The argument values can be changed directly in this line of code. For example, if you wanted to change the argument insensible_p_lo to 98, simply replace the 99 after the = sign with 98.

Attention

You must click Install and Restart in the Build tab of RStudio for any changes to the code to take effect.

9.3.2. Adding meta-analysis groupings¶

Upon examining the processed timber, you may wish to group certain factors together for meta-analysis in the raw timber and rerun sawmill.

Attention

Meta-analysis is currently only supported for timber from CEDAR v2.

To add a meta-analysis grouping, make the following changes to the optional meta-analysis fields in the original, raw timber file:

ID_meta: assign the same meta-analysis ID to all factors you wish to include in the grouping
meta_amr: specify the antimicrobial or class of antimicrobials to which resistance is assayed
meta_type: describe the type and level of granularity of the meta-analysis grouping

Tip

The actual meta-analysis ID assigned to a particular grouping is irrelevant, as long as it is consistent across all factors in the grouping.

The table below provides example values for each meta-analysis field, as they might appear for a factor in the raw timber.

Meta-analysis Example¶
ID_meta	meta_amr	meta_type
7	third-generation cephalosporin	Within Study, Same Antimicrobial Class

All three meta-analysis fields (ID_meta, meta_amr, and meta_type) can simply be left blank for factors that should not be involved in meta-analysis calculations.

9.3.3. Running sawmill¶

Please see the instructions in the Installation and Use section of the GitHub repository’s README.md file.

Prompts will appear in the Console as you follow the instructions from GitHub. Enter the information requested by the prompts and select the input timber file from its saved location on your computer.

Once sawmill is finished running, it will prompt you to save one or more output files. For each one, you will be prompted to select the save location on your computer.

Important

Save all output files with .csv extensions to prevent errors from occurring.

If errors or warnings appear, please see the following sub-sections.

Caution

You will likely rerun sawmill many times, as deciding which factors to include in a model is an iterative process. You will need to enter the command rm(list = ls()) into the Console before rerunning sawmill. This must be done once for every rerun. This way, variables saved during sawmill’s previous run will not carry over to the new one.

9.3.4. Errors¶

Errors will stop sawmill from continuing to run, at whichever point in the pipeline they are raised.

An error message will appear in the Console, indicating which function caused the error. For example, if the error is raised in the build_chairs function, the message will look something like the following:

Example error message.

Please note that only the lines beginning with “Error” constitute the actual error message. Although the “Processed function…” lines are also in red text, they should be present in the case of a normal output (i.e. one without errors or warnings).

Important

In the event of an error, please send the error message and input timber file that produced it to the maintainer of sawmill’s GitHub repository.

9.3.5. Warnings¶

Warnings alert the user to potential problems with the code or input data.

Their presence can indicate that sawmill may run into an error at a later step in the processing pipeline, or simply that the current code or input data will produce an incorrect output without further warning. Others may mean nothing; sawmill may continue to execute flawlessly.

Warnings do not stop the pipeline at the point they are raised, but they are still worth examining.

9.3.5.1. Warnings due to unexpected data types¶

If sawmill detects that one or more cells in the input timber file do not match the expected data types for their respective columns, a warning message will be generated for each mismatching cell. The warning messages are informative; they specify the exact cell addresses within your input file that contain data of the unexpected type.

These particular warnings will also generate a prompt asking whether you would like to stop the pipeline and fix your input data, or continue with processing anyway.

Warning prompt.

Caution

Electing to continue with processing when faced with this prompt can create unwanted/unexpected results, which you may not receive further warning about.

The type of warning received (Coercing or Expecting) can help you decide whether or not you should continue.

9.3.5.1.1. Coercing warnings¶

Coercing warnings appear when R is able to convert the affected cell(s) to the appropriate, expected data type(s).

Below is an example of a cell that is likely to produce a coercing warning. This value is in the odds_ratio_up column, so its data type should be numeric. While the value is a number, it is formatted as text (flagged by Excel in the upper left corner of the cell).

Image of Microsoft Excel spreadsheet example showing cell that produce expected warning.

Example of a cell that produces a coercing warning.

Warning messages for coercing warnings appear in the Console and look something like that shown below. The Excel cell shown above produced one of these warnings (the one affecting AE524 / R524C31).

Coercing warning examples.

If only coercing warnings are present, you can safely choose to continue with processing when faced with the prompt.

9.3.5.1.2. Expecting warnings¶

Expecting warnings appear when R is not able to convert the affected cell(s) to the appropriate, expected data type(s).

Below is an example of a cell that is likely to produce an expecting warning. This value is in the prev_table_d column, so its data type should be numeric. However, a text string is present, and it cannot be converted to a numeric data type.

Image of Microsoft Excel spreadsheet displaying cell that produces expected warning.

Example of a cell that produces an expecting warning.

Warning messages for expecting warnings appear in the Console and look something like that shown below. The Excel cell shown above produced this warning; it affects cell Z2 / R2C26.

Expecting warning example.

The implications of expecting warnings vary depending on the columns in which they occur.

If the affected cell(s) are in any of the columns specified in the table below, you should stop the pipeline and fix the affected cells. These fields have a direct effect on the odds ratio calculation, so in the event of unexpected data types in any of these, sawmill will typically deem the factor unusable, excluding the row from further processing and writing it to the scrap pile without warning.

Columns Which Affect Calculations¶
CEDAR v1 Field Name	CEDAR v2 Field Name
result_format	res_format
tbl_a	contable_a
tbl_b	contable_b
tbl_c	contable_c
tbl_d	contable_d
tbl_p	prevtable_a
	prevtable_b
tbl_q	prevtable_c
	prevtable_d
tbl_m1	table_n_exp
tbl_m2	table_n_ref
factor_or	odds_ratio
or_lo	odds_ratio_lo
or_up	odds_ratio_up

If the affected cell(s) are in any of the other columns, however, sawmill will simply replace the cell with a value of NA. The factor will not be deleted, and the row will still appear in the processed timber. In cases like this, it is up to the user whether or not to continue with processing when faced with the prompt.

Attention

Output fields may still be affected by unexpected data types in these other columns. For instance, the url and html_link output columns are affected by ident_doi (v2)/docID (v1), and sometimes ident_pmid (v2). Also, the identifier output column is affected by ID_factor (v2)/ID (v1) and factor_title (v2)/title (v1).

9.3.5.2. Other warnings¶

Every time you execute sawmill, you will likely see a message resembling the following in the Console, once the pipeline has finished and you have saved your processed timber.

Generic warnings alert.

If you follow the prompt by entering the following into the Console:

warnings()

You will see something closely resembling the following:

Image of generic warnings in the console tab.

Generic warning messages.

This type of warning can be ignored. It occurs when the significance value (p-value) for the factor is calculated using the Fisher’s exact test. Since the values used in the Fisher’s test must be rounded to the nearest integer, a warning is generated to notify the user that the rounding took place.

Attention

If the warning messages are of any other nature than those mentioned, please contact the maintainer of sawmill’s GitHub repository for assistance.

9.4. Evaluating the Processed Timber (Planks) and Other Outputs¶

This section outlines the fields that will be present in the processed timber .csv file. Each row now represents a plank of processed timber, or a factor usable for an iAM.AMR model.

An overview of additional output .csv files that may be produced is also provided.

9.4.1. The output .csv files¶

9.4.1.1. Processed timber¶

A processed timber file is produced for each successful run of sawmill.

Two types of planks (rows) are present in the following order, from top to bottom:

Error-free factors for which an odds ratio and other outputs were successfully calculated
Meta-analysis results for each meta-analysis grouping (each unique meta-analysis ID)

Note

Rows containing the results of a meta-analysis will look slightly different (for instance, some fields may have values of NA).

9.4.1.2. Scrap pile¶

This file is only provided as an output if there is at least one erroneous factor in the raw timber.

The scrap pile contains all erroneous factors, or factors for which an odds ratio and other key outputs were not successfully calculated.

Its fields are overall quite similar to those present in the raw timber, with two unique additions:

exclude_sawmill: Flagged as TRUE, indicating that the factor was excluded from calculations by sawmill due to errors/missing data
exclude_sawmill_reason: A more detailed description of why the factor was not usable

9.4.1.3. Full meta-analysis results¶

This file is only provided as an output if there is at least one meta-analysis grouping in the raw timber.

Each row represents the results from a single meta-analysis grouping, indicated by the value of ID_meta in the far-left column.

The main estimates produced by the meta-analysis calculation (odds ratio, standard error of the log(odds ratio), and p-value) are included in the processed timber. However, the full results produced by metafor (the meta-analysis R package used by sawmill), contain many more fields describing other parameters of the calculation.

For a full description of these parameters, please see pg. 241 of the metafor user guide, which is the Value list for rma.uni.

9.4.2. Planks¶

The following table is an example of processed timber.

While all fields present in the input timber are retained in the output, some will have new names. Sawmill renames some of the fields to improve uniformity between v1 and v2 outputs.

Example Output¶
ID	RWID	identifier	factor_title	factor_description	ref_key	html_link	group_exposed	group_referent	odds_ratio	se_log_or	pval	logOR	ID_meta	meta_amr	meta_type	AMR	host_01	host_02	microbe_01	microbe_02	stage_allocate	stage_observe	res_unit	res_format	grain	A	B	C	D	P	R	Q	S	nexp	nref	odds	oddslo	oddsup	oddsig	oddsci	low_cell_count	null_comparison	insensible_prev_table	doi	pmid	url
10002	10723	R10002_Apramaycin_su	Apramaycin sulfate, carbadox, and chlortetracycline hydroxchloride use	All used as feed additives. Apramaycin sulfate: 150 g/ton as Apralan 7; 5 lb/pig at weaning. Carbadox: 50 g/ton as Mecadox 2.5; about 15 lb/pig. Chlortetracycline hydroxchloride: 250 g/ton; 14 days ad libitum.	Kim2005	<a href=”http://dx.doi.org/10.1089/fpd.2005.2.304”>Click Here</a>	Apramaycin sulfate, carbadox, and chlortetracycline hydroxchloride use	No use	0.986111111	0.181571417	1	-0.013986242	NA	NA	NA	tetracycline	Swine	Piglets	Escherichia	coli	Farm	Farm	Isolate	Prevalence Table	prev_table_pos_tot	142	108	140	105	56.6	26.3	57.2	38	250	245	NA	NA	NA	NA	95	FALSE	FALSE	TRUE	10.1089/fpd.2005.2.304	NA	http://dx.doi.org/10.1089/fpd.2005.2.304

A description of each output field is provided below. The fields which are added by sawmill and thus only appear in the processed timber are also annotated with the function responsible for adding them.

Tip

The odds_ratio, se_log_or, and pval fields are added by the do_MA function in cases where the row contains the results of a meta-analysis.

Tip

The logOR field is only added if there is at least one meta-analysis grouping (one unique meta-analysis ID) in the raw timber.

Output Specification¶
Field Name	Field Description	Added to Output by Function __
ID	ID of the factor
RWID	RefWorks ID for the reference
identifier	Unique identifier for this factor, for Analytica	add_ident
factor_title	Factor title
factor_description	Factor description
ref_key	First author + publication date of reference
html_link	HTML link to the reference study	add_HTMLink
group_exposed	Description of the exposed group
group_referent	Description of the referent group
odds_ratio	Final odds ratio (either copied from odds field if provided, or calculated)	build_chairs
se_log_or	Standard error of the log(odds ratio)	build_chairs
pval	Significance value (p-value, either copied from oddsig field if the grain is odds_ratio, or calculated for other grains)	build_horse
logOR	Log of the final odds ratio	do_MA
ID_meta	Meta-analysis: ID of the meta-analysis group to which the factor belongs (if applicable)
meta_amr	Meta-analysis: antimicrobial/class of antimicrobials to which resistance was assayed in the factors included in this particular meta-analysis group (if applicable)
meta_type	Meta-analysis: type/level of granularity of this particular meta-analysis group (i.e. within studies, across studies) (if applicable)
AMR	Antimicrobial assayed for resistance for the factor in question
host_01	Host involved (i.e. cattle, swine)
host_02	Specific host involved (i.e. dairy cows, beef calves)
microbe_01	Microbe genus involved (i.e. Escherichia, Salmonella)
microbe_02	Microbe species involved (i.e. coli, spp.)
stage_allocate	Stage of production at which the intervention to the exposed group is made in the study
stage_observe	Stage of production at which the intervention’s effect is measured in the study
res_unit	Unit to which the results apply (i.e. flock, isolate, pooled isolate)
res_format	Set of result fields available in the study (i.e. Contingency Table, Prevalence Table)
grain	Set of result fields provided in the study (i.e. if A, B, C, and D are provided, the grain is con_table_pos_neg)	check_grain
A	Contingency Table: # AMR+ in exposed group
B	Contingency Table: # AMR- in exposed group
C	Contingency Table: # AMR+ in referent group
D	Contingency Table: # AMR- in referent group
P	Prevalence Table: % AMR+ in exposed group
R	Prevalence Table: % AMR- in exposed group
Q	Prevalence Table: % AMR+ in referent group
S	Prevalence Table: % AMR- in referent group
nexp	Total # [res_unit] in exposed group
nref	Total # [res_unit] in referent group
odds	Odds Ratio: Odds ratio value for the factor
oddslo	Odds Ratio: Lower bound of the confidence interval
oddsup	Odds Ratio: Upper bound of the confidence interval
oddsig	Odds Ratio: Significance value (p-value)
oddsci	Odds Ratio: The confidence interval (i.e. 95%)	add_CI
low_cell_count	If TRUE, at least one of A, B, C, or D is less than or equal to the low_cell_threshold and a correction factor was applied to A, B, C, and D	build_table
null_comparison	If TRUE, both A and C are equal to 0 (meaning that no AMR+ observations were made)	build_table
insensible_prev_table	If TRUE, the grain is prev_table_pos_tot and the values in the prevalence table do not add up to 100 where they should	polish_table
doi	DOI for the reference
pmid	PMID for the reference
url	URL to the reference study	add_URL

9.4.3. Checking the validation fields¶

These are present in the processed timber file.

9.4.3.1. Low cell count factors¶

When one or more of the four values in the 2x2 contingency table is equal to zero, sawmill sets the low_cell_count field to True. To avoid divide by zero errors, sawmill increments all four values by 0.5.

9.4.3.2. Null comparison factors¶

When the # AMR+ observations in both the exposed and referent groups are equal to zero, sawmill sets the null_comparison field to True. To avoid divide by zero errors, sawmill increments all four values by 0.5.

Any null comparison factors also have the low_cell_count field set to True.

9.4.3.3. CEDAR v2: factors with an insensible_prev_table¶

Check your output .csv file for rows where the insensible_prev_table field is set to True. These rows likely have data entry errors in the prevalence table columns, as this result indicates that (% AMR+ exposed) + (% AMR- exposed) does not come to approximately 100, and/or that (% AMR+ referent) + (% AMR- referent) does not come to approximately 100.