5. CEDAR¶
The Collection of Epidemiologically Derived Associations with Resistance (CEDAR) database stores
is the central repository of epidemiological data for the iAM.AMR project.
5.1. Introduction¶
5.1.1. What is a database?¶
A database is a structured set of data, organized a way that makes it easy to search for, select, and retrieve specific subsets or combinations of information. There is no one defining characteristic that makes a database a database, but a database is often differentiated from a simpler application by its formal structure, and rigidly defined data-relationships.
Tip
We often use the term database to refer to the sum of the data, the data structure, and the software used to create, manipulate, and access the database. However, we can more accurately refer to the data and its structure as the database, and the software as the database management system or DBMS.
5.1.2. Why use a database?¶
There are numerous benefits of using a database to store large amounts of complex data, most of which become evident when we contrast a database against a spreadsheet or flat-file.
Take a look at the table below, which includes demographic and political information for some of Canada’s largest cities (circa 2020). This represents a flat-file approach.
| City | Province | Population (2016) | Premier | Party |
|---|---|---|---|---|
| Toronto | Ontario | 5,429,524 | Doug Ford | OPC |
| Montreal | Quebec | 3,519,595 | François Legault | CAQ |
| Vancouver | BC | 2,264,823 | John Horgan | NDP |
| Calgary | Alberta | 1,237,656 | Jason Kenney | UCP |
| Edmonton | Alberta | 1,062,643 | Jason Kenney | UCP |
| Winnipeg | Manitoba | 711,925 | Brian Pallister | PCM |
| Quebec City | Quebec | 705,103 | François Legault | CAQ |
| Hamilton | Ontario | 693,645 | Doug Ford | OPC |
| Guelph | Ontario | 132,397 | Doug Ford | OPC |
There are two obvious drawbacks to this approach. The first is practical – this table contains a number of duplicate values, which increase the size of the table, and add opportunities for input error.
The second is more conceptual, in that this table has no singular purpose – if you had to title it, what would that title be? When we mix heterogenous data (i.e. demographic data with political data), we often lose clarity, and forget where we stored data, or with whom that data should be shared.
The alternative is a relational database, which involves organizing our complex data, and defining relationships between the disparate parts. Take a look at the tables below.
| ID | City | Population (2016) |
|---|---|---|
| 01 | Toronto | 5,429,524 |
| 02 | Montreal | 3,519,595 |
| 03 | Vancouver | 2,264,823 |
| 04 | Calgary | 1,237,656 |
| 04 | Edmonton | 1,062,643 |
| 05 | Winnipeg | 711,925 |
| 02 | Quebec City | 705,103 |
| 01 | Hamilton | 693,645 |
| 01 | Guelph | 132,397 |
| ID | Province | Premier | Party |
|---|---|---|---|
| 01 | Ontario | Doug Ford | OPC |
| 02 | Quebec | François Legault | CAQ |
| 03 | BC | John Horgan | NDP |
| 04 | Alberta | Jason Kenney | UCP |
| 05 | Manitoba | Brian Pallister | PCM |
Now each table has a singular theme or purpose, and is clear in the information it conveys. We have fewer error-prone entries (e.g. the names of the premieres), and fewer duplicate datapoints. And by matching the IDs in the table, we can recreate the main table if necessary, or share component parts without sharing the entire data collection. The benefits of a database approach are evident, even at this small of a scale.
5.1.3. What is the terminology?¶
A relational database is a collection of tables, linked together by relationships.
A table contains data, and consists of rows and columns. The rows – also known as tuples or records – are sets of data related to a single object. These sets consist of multiple, named elements of data, organized into columns – also known as attributes or fields.
A relationship defines how we match data between tables. Often, this matching is done via unique primary key or ID.
A form is a graphical user-interface for entering data into the database.
A query is a request we pass to the database to retrieve a specific subset of records and fields, constrained by criteria we specify.
5.1.4. How do we store a database?¶
Generally, databases are separated into two parts: a front-end and a back-end. The front-end consists of the user-interface, through which we enter, manipulate, and retrieve data. The back-end consists of the data itself, organized into tables and other data storage formats.
Tip
The front-end and back-end can be thought of as a web browser and website respectively; the distributed front-end is used to retrieve and display information from a centralized back-end.
This configuration allows multiple users to simultaneously access and work with the same, always up-to-date set of information. There is no explcit requirement for these parts to be seperate, however combining the files reduces multi-user capability.