How To Create A Simple Excel Data Set From the SEER Database

An application of the SEER*Stat software

Amr Ebied
Towards Data Science

--

Image made by author using Canva

In previous articles, I have highlighted the value of the Surveillance, Epidemiology, and End Results (SEER) database as a readily available source of population-based cancer data for researchers from all over the world.

In short, if you are an aspiring researcher who is passionate about cancer epidemiology, you can use the SEER database in order to extract data that you can use either instantaneously or later to run statistical tests and derive valuable insights.

In order to hit the ground running and start using the SEER database more quickly, I advise you to watch the following video, in which I explain how to gain access to the SEER database, as well as download and install the SEER*Stat software, a program that you should use to extract cancer-related data from the SEER database.

After reading and watching the above resources, you should now have the SEER*Stat downloaded and installed on your computer. Also, you should have been emailed your username and password in order to start extracting data.

What I intend to do is extract a small data set from the SEER database, and transfer this data into a Microsoft Excel file in order to open it with a statistical software (like JMP, R, or Python) and visualize the data or do some data analysis with it.

But before running any statistical tests, we should have our data in Excel first, and that’s what I intend to teach you between the lines.

The Analytic Sample

The first thing to do is specify the group of cancer cases that you want to extract data about. For demonstration purposes, and to get a small data set (about 100 cases), I am going to add more conditions to my selection statement in order to reach that target sample.

So, after logging into SEER*Stat, let’s click on the “case listing” session. See the uploaded screenshot below to know where you can find the “case listing” session.

After clicking on the “case listing” session, this is the view of the available databases from which you can extract your data. By default, the highlighted database will be the most recent one, unless you specify otherwise.

I will ignore the most recent database, and will choose the one titled “Incidence — SEER 18 Reg Research Data + Hurricane Katrina Impacted Louisiana Cases, Nov 2017 Sub (1973–2015)”. This title means that the data that we are extracting came from 18 cancer registries, and that it included cases diagnosed between 1973 and 2015, and that it was submitted to the SEER database on November of 2017.

After clicking on the desired database, it will be highlighted in blue, and your computer will spend a few seconds to get linked to the desired database. Also, the suggested citation for the selected database will be changed to match the database that you will use. See the screenshot below to know the difference.

Now, you can click on the “selection” tab to start selecting your cancer cases.

After clicking on the “selection” tab, you will see the default selection screen below. This is before you have specified any selection criteria. It will look as follows.

Here comes the most important step of our extraction process.

The analytic sample.

I am going to choose a sample with the following selection criteria:

  1. Pancreatic cancer cases with additional specifications.
  2. Females.
  3. Blacks.
  4. Diagnosed in 2010.
  5. Divorced.

This led to an analytic sample with size of 110 cases. Perfect!

This brings us to how we can feed the SEER*Stat with our selection criteria in order to get the sample that we targeted.

Let’s start by choosing cases with pancreatic cancer.

From our last screenshot:

(1) Click the “Edit” tab.

(2) Click on the (+) sign at “Site and Morphology”

(3) Click “Site recode ICD-O-3/WHO 2008”. Operator is set to is = to.

(4) Scroll down the values until you reach “Pancreas”. Click on it.

(5) Click “OK”.

After clicking “OK”, you will automatically find your selected criterion added to the selection statement (See the screenshot below).

Building on the previous selection criterion, we are now going to choose “females”. In order to do that, please follow these instructions using the SEER*Stat software.

(1) Click the “Edit” tab.

(2) Click on the (+) sign at “Race, Sex, Year Dx, Registry, County”

(3) Click “Sex”. Operator is set to is = to.

(4) Click on “Female”.

(5) Click “OK”.

Then, we are going to choose “Black” using the same method and steps that we used in the previous step to select “Females”. Notice that after (5) clicking “OK” at each step, the selection statement will increase in lines by the amount of selection criteria that you will feed into SEER*Stat.

Then, we will use the same method to choose “2010” as the year of diagnosis. Please look at the screenshot below to see the detailed steps.

Then, we will use the same method to choose “Divorced”, but this time, we are going to find it within a folder named “Other”. After clicking the (+) sign to the left of this folder, we will find multiple files. We will then click on the file named “Marital status at diagnosis”. From here, the value “Divorced” will be shown within the values to the right of the operator, as shown below.

Then, the following selection statement will finally contain all our previously stated selection criteria.

At this time, we should be done with the selection process, and we should be moving on to the “table” tab in order to start designing the data table that we will eventually transform into a Microsoft Excel data set.

So, let’s move on and click on the “table” tab. This will bring us to the following interface.

It works this way. We choose variables from the list of “Available Variables” at the bottom of the screenshot below. Choosing the variables is done by opening the yellow folders by clicking the (+) signs to the left of the folders, and then clicking the variable files under the selected folder. After clicking (selecting) the variable, we click the “Column” button to the far right of the screenshot, and the variable will be displayed in the upper left corner of the screenshot under the word “Column”.

Let’s demonstrate!

We are going to select 3 column variables for our data table:

  1. Patient ID (found in the folder “Other”).
  2. Age at diagnosis (found in “Race and Age [Case data only] folder).
  3. Survival months (found in the “Cause of Death [COD] and Follow-up” folder).

The screenshot below demonstrates the steps we should follow to choose “Patient ID” as a column variable.

We use the same method to choose “Age at diagnosis” as a column variable.

And the same method for “Survival months”.

After we are finished selecting column variables, the final design of the data table will be as follows.

Then, all what is left to get our data table is to create a name for our table within the “Title” section, and then click the yellow bolt-shaped button.

SEER*Stat will spend a few seconds compiling the data table.

As you see below within the encircled text, SEER*Stat is telling us that we have a total of 110 cases of pancreatic cancer fitting our selection criteria.

And here we are! Our data table is finally there!

Ofcourse, we can’t visualize or analyze this data table as is. It has to be transferred into an Excel file where it can be analyzed in Excel, or opened by another statistical package in order to do the analysis.

So, to transfer this data table into Excel, simply hover your mouse button over the patient ID column title until the mouse pointer changes into a looking-down small black arrow. From there, left-click on the column title and you will see all the column turning into “dark mode”. Hold your left-click as you go to the right to include the other two columns.

When all the three columns are highlighted (turned into dark mode), right-click your mouse and a drop-down menu will appear besides the highlighted table. Left-click “Copy”, and SEER*Stat will copy your data table.

Now, you are ready to transfer your data. All you have to do is open Excel and paste the data as shown in the screenshot below.

--

--