Bioinformatics Instructions

Bioinformatics Instructions

1.       Choose a protein from your textbook. Write the name of the protein, the book chapter, AND the page number and turn in a hard copy to me

2.       Obtain FIVE different sequences (from five different organisms) from each of the following programs. In your presentation, discuss the differences and similarities between these two sites (ease of use, differences and similarities in presentation, organization, etc., of data (i.e., accession number, species name [whether common or scientific], etc) and what the data in the first line means.

a.       Uniprot (

b.      NCBI (

Procedure: Uniprot

To use Uniprot: At the top of the home page enter the name of your protein (under query; make sure that to the left of the search box it reads ‘UniprotKB’) and click ‘search’.

Once the list populates, determine if you need to make your search more specific. (If you have more than 100 results, be more specific about how you search: ex. Hemoglobin vs. HBG1 Gamma A. Both are hemoglobin, however, Gamma A is a specific subunit sequence from one of the chains.)

Select 5 DIFFERENT organisms from the list with the SAME protein name as you searched by clicking on the radio button on the left.

After you choose your five proteins, at the top of the list hit Add to Basket. You will immediately see the five proteins show up in the basket in the upper right hand corner.

Click on the basket to open it, select all five proteins, and click download. Another window will pop up. Click go. Be sure that the box reads FASTA (canonical). Your proteins will show up in a Notepad document. Each sequence will begin with > sp. Space between sequences, name, and save this document as a .txt file.

An example of Uniprot data:


The accession number, (P06836), partial protein and species names (NEUM_BOVIN), species (Bos taurus), and protein name (GAP43) are listed on the first line (point this out in your presentation)

Procedure: NCBI

To use NCBI: On the home page (where it says ‘All databases’), select ‘Protein’ from the drop menu.

Type the name of your protein (note: Use the same name you searched in Uniprot) in the query box. You may also search using the accession number.

You should find the EXACT SAME FIVE organisms you found in Uniprot in the list. These databases are linked. You are confirming they match and the sequences you are utilizing are verified by both sources. Match the accession number if necessary.

After you have located and selected each of them, select FASTA for each of them. This must be done one by one. As you open each FASTA for each organism, highlight the entire sequence.



Notice the accession numbers are also copied. Genbank accession number (37781182), RefSeq accession number (AAO60065.1), protein name (GAP43), and species name (Bos taurus) are listed on the top line.

Open a new notepad file (same as previously) and copy each FASTA into the file. Save all 5 complete sequences into the same .txt notepad file and save under another file name.

You should now have 2 different notepad files-one entirely from Uniprot and one entirely from NCBI.

You will upload each of these files into one of the programs below.

3.       Do a multiple sequence alignment using Clustal omega AND one of the following programs (either a, b, or c below) from the internet (Use default parameters in the programs). Before doing your multiple sequence alignment, delete the accession numbers and other identifiers and change the name of your organism to the common name (for example, change Homo sapien to human). In your presentation compare and contrast how well the two different programs align the five sequences.

a.  MSA



Procedure: Multiple Sequence Alignment

The above programs are all similar: I am highlighting Clustal Omega only for these instructions but make sure you also do MSA, MAFFT, or MUSCLE in addition to Clustal Omega.

Access the Clustal Omega homepage, (

Select browse then locate the 1st notepad (.txt file) you created from Uniprot. (I find it easiest if you save all files to either a specific folder or simply to your desktop.)

Once the file is in the browse box, scroll to the bottom of the page and select submit.

A new window will open as the data is being processed. Then you will see your first alignment file. Bookmark this page! Make sure that you label the bookmark so that you can distinguish that this is the alignment generated from the Uniprot search.

Open another browser window or new tab and repeat the step above for your other text file. Both of these files should be identical but if they are not make note of the differences in the chains. You can print each of them and compare differences and include your findings in your presentation. However, more than likely they will be the exact same sequences. You have now verified your sources for the alignment.

You will need to bookmark both pages as these screenshots will be slides in your presentation.



4.       Find domains using GeneDoc (can be downloaded free from the internet). In your presentation, discuss the similarities and differences in the domains across species. Use the information from ClustalW2 or one of the other multiple sequence alignment programs but not both.

GeneDoc Procedure:

Access ( and scroll down to the download link. Select it and save the file. Install the program on your PC (the program CANNOT be installed on University computers!). If you cannot install GeneDoc, come to my lab and use the lab computer.

Once installation is complete, open the program. (A screen will open that is almost entirely blank.

At the top, select file.

A drop menu will appear. Select import.

A small window will appear. The default settings should be FASTA and FILE. If not, select those options and select IMPORT.

Another file load window will appear. Select your text file from earlier and double click.

The alignment should populate almost immediately on your screen. Hit done so that the file box will close.

At the top of the GeneDoc screen select the icon labeled ‘C’.

Another drop box will appear.

Select shading tab.

Looking at percentage of conservation (shading levels),    background colors [ choose ‘fore’ and ‘back’ to change colors]

This is your GeneDoc alignment file. Save the file to your desktop or folder you have saved the text files in. You will need this screenshot for your presentation.

5. Search for protein domains. Copy and paste the following html: . Insert one of your protein Fasta sequences into the text box and click ‘submit’. You should get a screen that looks like this:


Q#1 – >rat ((Local ID))Show functional sites    Redundancy:

These are your domains. If you click on the box, there will also be an explanation of what each domain does. Include the explanation in your presentation.

6. Generate a score table

How to generate a scores table of your organism’s protein alignments:

Return to the Clustal Omega page you bookmarked earlier.

Select the results summary tab

Scroll down and click on ‘Percent Identity Matrix’.

You should see the scores table on this screen.

Bookmark this page. (Name it scores table.) You will need this screenshot for your presentation.

7.       Build a phylogenetic tree of your species.

Return to the Clustal Omega page you bookmarked earlier.

Select ‘Phylogenetic Tree’.

Bookmark this page. (Name it scores table.) You will need this screenshot for your presentation.

In your presentation, discuss the relationship among the five species regarding how closely (or not) they are related (based on % identity) through evolution. Describe which one of the five species is the outgroup (if any) and define what an outgroup is. Do you have sister and/or basal taxa? Discuss the answers to these questions in your presentation.

8.       Build a three-dimensional model using or another program.

a.       Label areas of interest on your 3-D model (This can be done using paint also.)

b.      Point out the location of the active site, secondary structures, and other domains in your presentation.

How to put together all of the screenshots:

Open each of the above bookmarked windows. This process must be done one by one. So select the first window you bookmarked.

Once you have the screen up, make sure that it is maximized to take up your entire monitor.

Select the Prnt Scrn key on your keyboard

Now open Microsoft Paint or similar program.

Hit Ctrl+V or right click and hit paste or select the paste option from the Paint menu.

The screenshot should appear.

Hit the select button from the paint menu if you need to crop the image.

Once you have surrounded the portion you need to crop, right click and select crop.

Only the part of the screenshot you wish to include remains.

Hit CTRL+S and save the file (type will be .png by default and is fine) by the title and slide number you want to use. This will make it easier for you to keep the screenshots in the correct order for your presentation when you start inserting them. Ex. 1SearchResultsUniprot, 2SearchResultsNCBI, 3AlignmentFileClustal, and so forth.

9.       Present your findings as a Powerpoint presentation (not less than 5 minutes nor more than 6 minutes). For each program you use or each step that you do you will need to show the data on a slide and add the source to your references. In addition, you will need to have a title slide, an introduction slide (these come before your data slides), a summary slide, and a slide with references (last slide). You should have a total of 15-16 slides. At A minimum, your title slide should have the name of the course, the name of the protein, and your name. Your introduction slide(s) should give some background about the protein, including the history of the protein, who or how it was discovered or isolated, its function, its molecular weight, its cellular location, and any other characteristics about the protein that are pertinent to your discussion of the protein

10. Order of slides (you will be graded on this specific order!):

Title slide (should contain information listed in the preceding paragraph)

Introduction slide(s) (should contain information requested above; no more than 2 slides; information should be in bullet pointsdo not use long paragraphs)

Species used


NCBI data of sequences (do not show selection page; show the sequences)

Uniprot date of sequences (do not show selection page; show the sequences)

ClustalW2 multiple sequence alignment

Other multiple sequence alignment program data



Scores table

Phylogenetic tree

Three dimensional model

Summary of research (not what you learned but what you discovered about your protein)

References (include ALL programs used and data sited

How to insert screenshots into your Powerpoint presentation:

Open Powerpoint

Select insert

Select picture

Select your file

You may need to adjust the size of the image to make it more easily viewable, or even re-crop it to another size in paint.

This project cannot be done at the last minute. I will be available for questions throughout the remainder of the semester. If you need help, I will be available after class until fall break. After that, you’re on your own! You will be graded according to how well you follow instructions AND how completely you do your research, organize it according to the instructions, and present your data!