The UCSC Cancer Browser allows researchers to interactively explore cancer genomics data and its associated clinical information. Data can be viewed in a variety of ways, including by value, chromosome location, clinical feature, biological pathway or geneset of interest. It is also possible to quickly perform and easily view statistical analysis on subsets of the data.
The data heatmap displays genome-wide data from copy number, transcriptome, protein, epigenetic, mutation, sh/siRNA, and PARADIGM pathway analysis studies as well as associated clinical information. The left column shows datasets that are currently in view along with a button to add more.
If viewing less than one chromosome, the "View in Genome Browser" button will become activated. Clicking on this will open a new window at the UCSC Genome Browser at the same genomic coordinates, allowing users to view other annotations.
Users can access our community of users by clicking the forum link. Sign in to save bookmarks, genesets and signatures as well as access any private datasets you are authorized to view. Click the "bookmarks" button to create a bookmark and share it with others.
Clicking on "PDF" will show a screen with just the heatmap images and titles. You can then use your browsers innate "Print to PDF" function or take a screenshot. Note that if you are using Firefox, you will need to change the scale to 70% in Page Setup.
The dataset name and the number of samples is displayed in the dataset header. Each dataset is composed of two heatmaps: the left genomic heatmap and the right clinical feature heatmap.
Columns in the genomic heatmap represent individual probes that have been mapped to chromosome positions or gene names. All data is mapped to the human Mar. 2006 (NCBI36/hg18) assembly. Columns in the clinical heatmap represent clinical data associated with the genomic dataset with the names appearing at the bottom of the heatmap. No names will be displayed if there is not enough space for all names.
Mousing-over a genomic location or clinical feature will display the sample name and value in the detail box at the top.
Above the genomic heatmap, click to download all the files that make up this dataset, the drop down menu to change the viewing mode, the color legend to adjust the heatmap colors, and in the upper right corner to close the dataset. To make the map smaller or bigger, click and drag the bottom edge up or down.
Above the clinical heatmap, click on the "Tools" button to see various tools available. These include More Features, Kaplan-Meier plots, Upload, Download, About Dataset (to view more information about a dataset), Signatures, and Subgroups & Statistics.
Data values are represented with the following default colors:
|Heatmap||Data Type||Color||Data Value|
|genomic||gene expression||red||> 0|
|other data types||red||> 0|
information (such as age)
(such as treatment)
There are three different viewing modes: "Heatmap" and two summary modes: "Box plot" and "Proportions". In heatmap mode, data from the genomic and clinical heatmaps are connected by the sample ID. Thus, genomic and clinical data from one sample are displayed as one row across both heatmaps. Samples/rows are sorted vertically by the left-most clinical feature and then sub-sorted on all clinical features following. Color intensity in the genomic heatmap is proportional to the deviation from zero.
"Box plot" and "Proportions" mode are called summary modes because they show overall patterns in the dataset. In a heatmap mode, each row across the genomic and clinical heatmaps represents a single sample, while in summary mode, individual clinical features as well as genomic data are sorted independently. When displaying subgroups in summary mode, each subgroup will be displayed in its own heatmap.
Example of box plot mode
When zoomed out in "Box Plot", the median is represented by a dot and the outer quartiles. When zoomed in, a box is drawn around these inner quartiles and the median becomes a line. Color intensity is proportional to the deviation from the median. More information on box plots can be found here.
Example of proportions mode
In "Proportions" mode, data for each clinical feature/gene/genomic location are arranged independently in ascending order. Color intensity is proportional to the deviation from zero, identical to "Heatmap" viewing mode. This view tends to highlight regions where there are many values that are above or below zero.
Click on "Add Datasets" to access the dataset window. Use the search bar at the top to find datasets of interest or browse available data by toggling the groups displayed below. Data are organized into datasets by data provider, cancer type, and data type (e.g. gene expression or copy number variation). Click on the "Flat" toggle in the upper right corner to see the datasets in a grid-like format. Clicking on the column headers in either Grouped or Flat view will sort the datasets by that column.
Click on a dataset to open the heatmap. All displayed datasets have a checkmark next to them in the dataset window for easy reference. Once you have selected the datasets you wish to view, click "Return the Cancer Browser" to view the data.
To obtain more information about a particular dataset, click , the information icon, that appears to the left of the dataset name. To close a dataset, click the checkmark that appears to the left of the dataset name in the dataset window or click the at the top right of the heatmap or to the left of the heatmap thumbnail.
Users authorized to view restricted access datasets must sign in to see those datasets.
Users can zoom to a specific region of the genome by clicking and dragging on a region of interest. Click to zoom out incrementally and the bigger to zoom out to either the full chromosome or the full genome. It is also possible to zoom vertically by clicking and dragging on the clinical feature heatmap. When zoomed in, a scroll bar will appear to the right, allowing users to pan across the image. To zoom out click the button. It is possible to zoom on the genomic heatmap in both "Chromosomes" and "Genes" view. Vertical clinical heatmap zooming is only active when displayed in "Heatmap" mode, not in any summary mode.
While in chromosomes mode, you can go to a specific genomic position by entering it into the search bar such as "chr1-chr4", "chr1:800000-900000", or "chr7:86217345-chr8:45710883" and hitting return. Users can also go directly to a gene by entering first few characters of the gene name and selecting it from the suggestion list. The HUGO coordinates for the gene will be put in search bar; hit return to go to those coordinates. To view more than one gene, go to genes view.
Underneath the position bar are two tracks to help provide genomic context: RefSeq Genes and the chromosome ideogram. RefSeq Genes track display includes intron/exon structure; mouse over the RefSeq track to bring up a tooltip with the gene name. More information about this track can be found in the RefSeq Genes Track Settings of the Genome Browser. The chromosome ideogram below the RefSeq track shows centromeres in red and cytobands in grayscale.
Click the color legend at the top right corner of a dataset to adjust the genomic heatmap coloring.
Possible color options are listed as less than zero, zero and greater than zero. Any value greater or lower than the values listed in the boxes will be colored the maximum color. A lower maximum makes it easier to see the differences between values close to zero and a higher maximum makes it easier to see the differences between values far from zero. Users may need re-adjust the maximum after toggling "Normalize".
A particularly useful function is "Normalize", which subtracts the mean of each genomic location (i.e. data column) from each sample. This option is helpful for viewing datasets where all the genomic data are greater than zero, or less than zero, and thus, all the same color. For example, RNAseq datasets tend to have data all greater than zero; toggling "Normalize: Subtract column mean" will bring out the difference between samples.
Additionally, this option is helpful when there are vertical red/green/blue stripes at each genomic location or gene. This indicates that the vast majority of the data values for a specific gene are greater/lower than zero, which is often seen in many gene expression and DNA methylation datasets. This is due the data being normalized on the whole dataset level and not on the individual gene level. Thus, the difference in data values are more reflective of the difference between genes rather than between samples. The same browser "mean normalization" can be used to make the differences between samples more easily seen.
Genes mode displays genomic data as individual genes. To view data as genes, toggle the views button in the top left corner of the window to "Genes". A set of genes our group has found interesting will be displayed by default, which may change. In genes mode, gene names are listed at the bottom of the genomic heatmap, along with the name of the geneset itself. It is possible to click and drag to zoom to genes within a geneset, similar to chromosome mode.
If more than one probe maps to a gene, the gene column will be divided into sub columns for each probe. If no probes map to the gene or the gene name is not recognized, it will be displayed as gray in heatmap mode or white if in a summary mode. Probe mapping for each dataset is individually curated at UCSC.
To view one or more genes of interest, enter the gene(s) into the search box at the top of the screen. For advanced options, including access to predefined genesets, click on the "Advanced Genes" button to the right of the search box.
There are three advanced ways to change the displayed geneset. The first is to select a predefined geneset from the "Favorites" pull down menu. Selecting a geneset from this menu will fill the HUGO gene name box with the list of genes. Then choose either "Replace" to replace the current active genesets or "Add" to view the new geneset side-by-side with the geneset currently in view.
The second way is to search for a geneset by its name or gene members. Enter at least three characters of the gene or geneset of interest into the search bar and hit return. The results will appear below the search bar. Clicking on a geneset will fill the HUGO gene name box with the genes in that geneset. Users can then either edit the list of genes or leave it as is and click either "Replace" or "Add".
The third way to make a geneset is by creating a new geneset by typing gene names into the HUGO gene name box. Our software will attempt to autocomplete the gene name after the first few letters are typed. Note that gene names need to be separated by a comma, space or new line. After adding all genes of interest, enter a geneset name and click either "Replace" or "Add".
The displayed genesets are listed under the "Active Genesets" at the top. To hide a geneset click to the far right of the geneset name. Clicking on a geneset in either the active list, Favorites list or in the search results will display the genes in that geneset in the HUGO gene name box.
Any geneset made during a session is prefaced with the word "user", automatically saved and will be available under the "Favorites" drop down menu. Once a user closes a web browser tab, user genesets will be deleted if not signed in. To permanently save these genesets, users must sign in. After signing in, all genesets created will be automatically saved.
Every dataset has a set of clinical variables/features associated with it, some or all of which are visible by default. In heatmap viewing mode, both heatmaps are sorted by the left most clinical feature/column and then sub-sorted on all columns following. To hide or reorder the clinical features click on a clinical feature to bring up a context menu. Then click the option to hide or move the column to a different position in the sort order.
To display a hidden clinical feature, click the "More Features" button just above the clinical heatmap. This will open a window with all available clinical features for that dataset. Displayed features are colored gray while hidden features are in black; click on a clinical feature to add it to the clinical heatmap. Narrow the list of features by using the search bar at the top.
Subgroups are a way to view differences between groups of samples as well as perform statistical tests. Users can define sample subgroups using one or more clinical features or genomic signatures. To create subgroups, click on the "Tools" button and select "Subgroups & Signatures" from the menu.
Click on the clinical feature of interest under "Add subgroup" on the left side of the window. This will bring up two sections containing either slider bars, or a small list of values. The slider bars are for features with continuous values such as "age". The small list of values is for features with categorical values such as "estrogen receptor status". After selecting either a range, one value or multiple values, click "Add".
These subgroups will automatically appear as a new clinical feature called "Subgroup" in the clinical heatmap. If a sample is in both subgroups it will appear white, rather than red or green; samples falling into neither are colored black. When displaying subgroups in summary mode, each subgroup will be displayed in its own heatmap. Samples not in either subgroup will not be displayed.
Hints for creating subgroups:
- Try adding more than one feature to a subgroup. For example, create a subgroup that contains individuals who have had chemotherapy and are female.
- Try adding the same values to both subgroups. For instance, to compare females who had chemotherapy to other females who did not have chemotherapy, add "female" from the sex clinical feature to both subgroups and then chemotherapy positive to the red subgroup and chemotherapy negative to the green subgroup.
After creating subgroups, choose a test and, if desired, multiple hypothesis correction from the drop down menu and check the "Show Statistics" box to display the statistical track. The statistical track is displayed under the genomic heatmap showing the logarithmic plot of p-values for each genomic position/gene, where the center line indicates a p-value of 1. The direction of the test is shown by whether the bar is above or below the center line. In the case of a t-test, a bar above the line indicates that the red subgroup is greater than the green subgroup; a bar below the line indicates that the green subgroup is greater than the red subgroup. The height of the bar (either above or below the center line) is proportional to the significance of the p-value (-log p); the higher the bar, the lower the p-value, and the stronger statistical difference between the two subgroups. The bar is colored either red or green if the p-value is less than 0.05.
If the test cannot be performed, a shaded track with an error message will appear where the Statistical Track would be. Note that all statistical tests require there to be at least one sample in each subgroup.
Parameters used when performing tests in the browser are listed here or by clicking on the next to "Show Statistics". Additional details on the application of specific tests can be found in the NIST Engineering Statistics Handbook.
Genomic signatures, sometimes known as gene expressions, are an algebraic expression over a set of genes, such as "ESR1+0.5*ERBB2-GRB7". Once a signature is defined, the probe values for each sample are substituted for gene names and the algebraic expression is evaluated. Signatures are added as a new clinical features, allowing users to sort, compare, subgroup, and perform statistical analysis on genomic data. Genomic signatures allow real-time evaluation of gene expression signatures such as those that predict chemotherapy sensitivity in breast cancer.
To open the signature menu, click the "Tools" button and select "Signatures" for the menu. Displayed signatures are listed under the "Active Signatures". To hide a signature click the "x" to the far right of the signature name. Genes missing from a signature will appear as "missing" in the tool tip and subgroup window. Missing genes are included as zero in the expression calculation. If the gene has multiple probes, the average value of the probes is used.
There are three ways to display a signature. The first way is to select a signature from the "Favorites" pull down menu. Selecting a signature from this menu will fill the gene expression box with the signature. Users can then either edit the expression or leave it as is and click "Update".
The second way is to search for a signature by its name or gene component. Enter at least 3 characters of the gene or signature name of interest into the search bar and hit return. The results will be displayed in a list below the search bar. Clicking on a signature name will fill the gene expression box with the signature. Users can then either edit the expression or leave it as is and click "Update".
The third way is to make a signature by typing in an algebraic expression over gene names into the text box. Note that only HUGO gene names, numbers and the following mathematical symbols: +, -, *, / are recognized. Our software will autocomplete the gene name after the first few letters of a gene have been typed. After entering a gene expression, enter a signature name and clicking "Update".
Note that users can always check the contents of any signature by clicking on the signature name. This will fill the gene expression box with the gene expression of that signature.
Any signature made during a session is prefaced with the word "user", automatically saved and will be available under the "Favorites" drop down menu. Once a user closes the tab user signatures will be deleted if not signed in. To permanently save these signatures, users must sign in. After signing in, all signatures created will be automatically saved.
A very limited number of predefined signatures are currently available. More information on these signatures can be found by clicking the "i" icon next to "Active Signatures" at the top of the signatures settings, which will bring up the Signature Information page.
By defining a genomic signature as only one gene, the gene value will be added as a new feature in the clinical heatmap, allowing users to sort, subgroup and perform statistical tests using the value of a single gene.
Bookmarks are an easy way to save and share a view of the cancer browser. You do not need to be logged in to create a bookmark, though logging in allows you to save bookmarks to your account to access later. To bookmark a view, click the "Bookmark" button in the upper right corner and select "Create". This will generate a link to your current view and a link for sharing via email.
If you are logged in, you can instead choose "Create & Save to Account", which will give you the same links as above along with the ability to save it to your account with a name and notes. To update a saved bookmark, select it from the drop-down menu at the bottom of the window. Clicking "Save" saves the new link for that bookmark, along with any changes you made to the name or notes.
Bookmarks you have saved will appear in the bookmark menu under "My Bookmarks". You can edit the name or notes of these bookmarks by clicking the pencil icon next to the name. Update the name and/or notes in the window and click "Save". You can delete individual bookmarks by selecting "Remove from Account".
You can share bookmarks by clicking the share icon next to the bookmark name. This will give you a link to the bookmarked view and a link for sharing via email.
We have included several examples of the browser's capabilities at the bottom of the bookmark menu. Click on the name to see the example. To get step-by-step instructions on how to get to that particular view, click on the information icon.
You can reset your browser by selecting "Reset to Defaults" from the bookmark menu.
To download the processed files that make up a dataset click on the either to the left of the dataset name in the map header, or to the left of the dataset name in the dataset selection window. You may also click on the download button on the dataset details page.
Each dataset is comprised of 4 core data files: the genomic data, the clinical matrix, the probeMap and the sampleMap. The genomic data is most often a matrix that contains all the genomic information associated with a dataset. In matrix files the columns are samples IDs and the rows are probe/gene names. The clinical matrix contains all the clinical feature information associated with a dataset. In this file the columns are clinical features and the rows are samples IDs.
The sampleMap is used to connect the genomic data to the clinical data. It is a two-column file that lists the sample IDs shared by the genomic data and the clinical matrix. The probeMap maps probes/gene names in the genomic data to genomic coordinates. The first 4 columns are the probe name, HUGO gene names or alias, chromosome, chromosome start and chromosome end.
Associated with each of the 4 core files is a .json file that contains metadata such as author, assembly, experimental platform, etc. These files can help the software and users interpret the core data files.
It is important to note that all genomic coordinates are one-based and inclusive. More information about the dataset specification can be found here.
You can download any of our data to your own Galaxy installation. First install galaxy and make yourself the administrator. Next, click on the "Admin" link at the top of the main galaxy page and select "Search and browse tool sheds" from under "Tool Sheds" from the left side menu. Search the Test tool shed for our id: "cancer_browser". Select our tool shed and install it to galaxy. When prompted with "Select existing tool panel section", add it to the "Get Data" tool panel.
From there navigate to the Get Data tool list and choose "UCSC Cancer Genomics Browser server". Select a dataset from our interface and click the download icon. Choose "Download to Galaxy".
You can download the current clinical heatmap in view by clicking the "Tools" button and selecting "Download" from the menu. Then select "Clinical data in view" to get just the data that you can see in the clinical heatmap. The file will be a TSV that can be opened by spreadsheet and other applications. This feature can be used to add data from one dataset to another in conjunction with uploading custom data.
Downloading the clinical cohort will include values from all datasets that are in the same cohort as the one selected, including samples not in view. Note that signatures and subgroups will not be included in clinical cohort downloads because they are calculated on the dataset only and not the rest of the cohort.
Downloading the full processed dataset will download all the files that make up the dataset. This is exactly the same as clicking the download icon next to the left of the genomic heatmap or next to the dataset name in the "Add dataset" window.
Users can add their own clinical data to the clinical heatmap. Click on the "Tools" button and select "Upload" from the menu. This will open a dialogue box where you can either enter the data by hand or copy and paste it from a spreadsheet. All tabs will become commas in the text box.
Data should be entered as key-value pairs where the first value is the patient or sample ID and the second value is the actual value. Here's an example you can try with a TCGA lung adenocarcinoma dataset:
Currently users can only enter numerical values. Categorical values will not be recognized (e.g. "positive", "negative", "group 1", etc). Multiple clinical features have to be added separately. Once you are finished, name the custom data feature and click "Update".
If you are unsure of the sample or patient ID for a particular sample/patient, add either the _PATIENT_ID or _SAMPLE_ID clinical feature to the heatmap. Both of these special clinical features were created to show what IDs we recognize.
Generate a Kaplan-Meier Plot by clicking the "KM" button or selecting it from the "Tools" menu.
A Kaplan-Meier Plot is a visual estimate of the survival function of different groups of patients over time. Percent survival is on the Y-axis and time is on the X-axis; the steeper curve, the worse the survival outcome is over time. Patient groups are determined by the left-most clinical feature, which is also the primary sort for both heatmaps. Only the first 30 groups will be displayed. If a feature is continuous, then the samples will be divided into 3 relatively equal sized groups. A vertical tick in the curve indicates that one or more patients were censored, meaning they either did not follow up, dropped out of the study, or the study ended before they had an outcome.
More information about KM plots can be found in this article.
Two survival phenotypes, "_EVENT overall survival indicator 1=death 0=censor" and "_TIME_TO_EVENT overall survival", have been curated for many datasets, including those from TCGA. The KM plot is generated using these two phenotypes if they exist, otherwise, you may select phenotypes using the Advanced Options. Please note a dataset may not come with survival information.
"_EVENT overall survival indicator 1=death 0=censor" is a binary variable recording an outcome (such as death, injury, onset of illness, recurrence of cancer). For example, _EVENT=alive or 0 means the event of death has not occurred, and _EVENT=deceased or 1 means that the event of death has occurred. If a subject does not have an event, it is called "censored", which can be due to the lack of an event, lost to follow up or dropping out of the study.
"_TIME_TO_EVENT overall survival" is the time (such as days, months, or years) from subject's entry into a study until a particular outcome occurred (such as death, recurrence, or metastasis).
There are two other pairs of survival phenotypes that have been curated for many datasets. The first pair are "_OS overall survival" and "_OS_IND overall survival indicator 1=death 0=censor", which are identical to "_EVENT" and "_TIME_TO_EVENT". The other pair is "_RFS recurrence free survival" and "_RFS_IND recurrence free survival indicator 1=new tumor; 0=otherwise", which shows recurrence free survival.
Often there are multiple datasets (CNV, gene expression, methylation) for one group of samples; we call these groups of samples "clinical cohorts". These cohorts always share clinical data: a patient marked as age 65 will be marked as such in both datasets. Because datasets in a cohort share clinical data, changing the clinical map for one dataset will also change the clinical map for all other datasets in a cohort.
Signing up for an account allows users to save bookmarks, genesets and signatures and also allows access to restricted datasets that they are authorized to view.
To sign up for an account, click "sign in" in the top bar of any screen. Then click on "Need an account?" and follow the instructions. You will need to verify your email address in order to activate your account. To sign in, click "sign in", enter your Username and Password, and click "login". To sign out, click the "sign out" button in the top right corner. If you have forgotten your password, please click "sign in", then click on "Forgot your password?" and follow the instructions.
If you believe you are supposed to have access to a restricted dataset but don't see it in the dataset settings, please contact us to begin the authorization process.
You can also get lists of your bookmarks, genesets and signatures through the "My Account" link that is visible once you've signed in. At the top is User Information that we have received from you; if any of this information is incorrect please contact us so that we may correct it.
The "My Account" page allows you to delete your saved bookmarks, signatures and genesets. Check the box next to the bookmark, signature or geneset you wish to delete and click "Delete". Note that after you delete a one of these, there is no way to recover it.
You can also use the My Account page to change your password.
You can specify datasets, genomic positions, genesets and other parameters to display on the Cancer Browser by using key-value pairs added onto a base URL. Each link must start with the following base (note must be https):
Then parameters are passed in query string format, as a URL fragment identifier. All key-value pairs must be separated by the ampersand symbol (&). Please note that you must use URL encoding, which is especially important for genesets and signatures that tend to have spaces and plus signs. Note that after a link is followed and the page displayed, the parameters are deleted from the URL.
Anything not explicitly stated will use the default value. Note that you can set both the geneset and the genomic position in one link; this allows you to specify a geneset/position for when the user switches to another view.
All supported parameters are listed below. Note that they do not need to be in any particular order within the URL.
The dataset ID can be found by clicking on the "i" icon for a dataset and is usually located about half way down the page.
The mode corresponds to the different viewing modes. "heatmap" is Heatmap, "proportions" is Proportions and "tukey" is Box Plot.
This is typically a genomic position (e.g. chr1-chr2), but can be any term that is accepted in the position search bar.
The "displayas" corresponds to whether to view the data as chromosomes (chrom) or genes (geneset).
A predefined geneset name.
genes=<HUGO name>[,<HUGO name>...]
It is possible to create a user geneset by listing the genes to display in a comma separated list using the "genes" parameter.
It is also possible to create a user signature by listing the signature expression. Be sure to use appropriate URL encoding for signatures, including for plus signs, minus signs and spaces.
Name for your user signature, if desired. If none is specified, the signature will be given the name of "User signature".