Assignment Goal: To use the Internet-based Genes and Disease site (NCBI) to view the assignment of genes to chromosomes.
Assignment: Access the Genes and Disease site at http://www.ncbi.nlm.nih.gov/books/NBK22183/
Under “Contents”, select “Chromosome Map” (at the very bottom).
A karyotype will appear.
Click on a chromosome.
- WHICH CHROMOSOME DID YOU CHOOSE? I chose chromosome 7.
Above the chromosome image you will see the number of genes and base pairs on that particular chromosome.
2 & 3. STATE THE NUMBER OF GENES AND BASE PAIRS ON THE CHROMOSOME YOU CHOSE. Chromosome 7 contains approximately 1800 genes and over 150 million base pairs.
Scan the chromosome map.
4. LIST ONE GENE WHICH IS LOCATED ON THIS CHROMOSOME: CFTR (human cystic fibrosis)
5. STATE THE NORMAL FUNCTION OF THE GENE YOU LISTED IN #4. This is possible by clicking on the gene you stated in #4. It is important that you state the NORMAL physiological function of the gene product you select. The function of the CFTR gene is to bring assistance to maintaining the balance of salt and water on surfaces in the body, such as the lungs.
Introduction to BLAST
Assignment Goal: To use the Internet-based site BLAST, Basic Local Alignment Search Tool (NCBI), to search for similarities between nucleotide sequences.
Assignment: Access the BLAST site at http://blast.ncbi.nlm.nih.gov/Blast.cgi
Click on “Nucleotide Blast”
Assume that you found this nucleotide sequence when you cloned a piece of gene in the laboratory in which you work:
aattggaagc aaatgacatc acagcaggtc agagaaaaag ggttgagcgg caggcaccca gagtagtagg tctttggcat taggagcttg agcccagacg gccctagcag ggaccccagc
Enter the above sequence (you may copy and paste) into the “Enter Query Sequence” box at the top of the page. Under “Program Selection” near the bottom of the page, choose “somewhat similar sequence (blastn)”
Click the “BLAST” button at the bottom of the page to run the search.
Give some time for the results of your search to show up.
You will be given significant matches for the sequence that you entered.
6. WHAT IS THE TOP SEQUENCE DESCRIPTION MATCH FOR YOUR QUERY SEQUENCE? For this answer, you should give the description listed. Do not choose a Predicted sequence.
AATTGGAAG CAAATGACATC ACAGCAGGTC AGAGAAAAAG GGTTGAGCGG CAGGCACCCA
7. IS THIS A SEQUENCE FOR A PROTEIN OR ANOTHER PART OF THE GENE? IF IT IS “ANOTHER PART OF THE GENE”, EXPLAIN ITS PURPOSE. Yes, this sequence is a protein.
8.. WHAT DOES THE ENCODED PROTEIN ASSOCIATED WITH THE ABOVE SEQUENCE DO IN THE BODY? Search the PubMed site at www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed to answer this question. Under “Article types,” choose “Review”. CITE THE PAPER YOU USED TO DETERMINE THE PURPOSE OF THE ENCODED PROTEIN. The CFTR gene is known as the cystic fibrosis transmembrane conductance regulator protein. This protein has the responsibility of maintaining the levels of salt and water on surfaces of the body.
CITATION: Gillen, A. E., & Harris, A. (2012). Transcriptional regulation of CFTR gene expression. Frontiers in bioscience (Elite edition), 4(2), 587–592. https://doi.org/10.2741/401
Click on the top match to find the following.
9. A MUTATED FORM OF THIS GENE IS RESPONSIBLE FOR A WELL-STUDIED DISEASE. WHAT IS THAT DISEASE? You should be able to get this information from the description of the gene. You may need to “probe” the gene description. A mutated form of the CFTR gene is responsible for the well-studied disease of Cystic Fibrosis. Cystic Fibrosis takes place when a defect or mutation happens in a gene. Cystic fibrosis is a disease in which it causes problems with digestion and breathing; those individuals who have this genetic disorder can often have thick and/or sticky mucus that potentially can block the airways to the lungs creating such results as infections or damages to the lungs.
10. ON WHAT CHROMOSOME IS THE GENE LOCATED? You should be able to get this information by clicking on the description of the gene. This gene is located on chromosome 7.
11. Return to the original nucleotide sequence alignment descriptions. WHAT SPECIES (STATE THE SCIENTIFIC NAME) OTHER THAN HOMO SAPIENS ALSO HAS A 100% IDENTITY (Ident) FOR THIS SEQUENCE? USE THE TOP SEQUENCE LISTED, BUT DO NOT USE THE PREDICTED SEQUENCES.
Species including Pan paniscus, Pogo abelli, and Pan troglodytes have a 100% identity for the sequence.
12. WHAT IS THE COMMON NAME FOR THIS SPECIES?
Pan pansicus : Pygmy Chimpanzee
Pogo abelli : Sumatran orangutan
Pan troglodytes : Sumatran orangutan
13. DOES IT SURPRISE YOU THAT THIS SPECIES ALSO HAS A 100% SIMILARITY IN IDENTITY? WHY OR WHY NOT?
No it does not surprise me, this is because when referring to evolution of Homo Sapiens, humans are more closely related to chimpanzees.
14. DESCRIBE THE FIRST MATCH THAT HAS LESS THAN 100% QUERY COVER BUT IS NOT PREDICTED OR HOMO SAPIENS. STATE THE SCIENTIFIC AND COMMON NAMES.
The first match that had less than 100% that was not predicted or Homo Sapiens had a scientific name of Nomascus leucogenys. This species common name is classified as Northern White-Cheeked Crested Gibbon.
15. Click on the description to answer this question. HOW MANY GAPS OCCUR BETWEEN THE TWO SEQUENCES (THE ONE YOU SUBMITTED AND THE FIRST ONE THAT HAS LESS THAN 100% QUERY COVER)? 42 gaps
16. WHAT IS A GAP IN SEQUENCE ALIGNMENTS? (This is something you’ll have to search for elsewhere.) If a sequence has a gap, it means that either one or more amino acid residue has been delete from the sequence
You can also do BLAST searches using an accession number that has been assigned to a particular sequence when it has been entered into the database. Go back to the Blast home page (www.ncbi.nlm.nih.gov/BLAST.cgi ) and again choose “Nucleotide Blast”. Look up the following sequences using the given accession numbers. (Under “Program Selection” near the bottom of the page, choose “somewhat similar sequence (blastn)”. (Again, click on the “BLAST” button at the bottom of the page after you have entered the accession number.)
FOR EACH, STATE WHAT THE GENE IS (#16-19). Give the description of the gene and gene product. You do not need to state the organism source.
17. NM_145556 – TARDBP gene; this gene stands for TAR DNA Binding Protein. This gene is responsible for giving instructions for making a protein, a transactive response DNA binding protein.
18. NM_013444 – UBQLN2 gene; this gene is called Ubiquilin 2. It has the responsibility of directing misfolded proteins towards the proteasome, it is a key component in protein homeostasis.
19. NM_001010850 – FUS gene; this gene stands for Fused in Sacroma. This gene is responsible for promoting DNA transcription and protein production, as well as promoting cell growth.
20. KJ174530 – SOD-1 gene, this gene is recognized as the superoxide dismutase gene. It provides further instruction for making enzymes.
21. Search Google to answer the following: WHAT DISEASE IS ASSOCIATED WITH MUTATIONS OF THE GENES REFERENCED IN #17-#20? WHAT IS A “COMMON NAME” OF THE DISEASE? (The name of a person; Hint, hint…We just finished the World Series…)
BLAST is possible because of the submission of DNA sequences to GenBank.
- TARDBP gene : Amyotrophic lateral sclerosis (ALS) : Lou Gehrig’s disease
- UBQLN2 gene : Frontotemporal dementia (FTD) : Pick’s disease
- FUS gene : Amyotrophic lateral sclerosis (ALS) : Lou Gehrig’s disease
- SOD-1 gene : Amyotrophic lateral sclerosis (ALS) : Lou Gehrig’s disease
22. WHAT IS GENBANK? (You can do an Internet search to find this information.)
Genbank is a data base that publicly contains nucleotide sequences for more than 300,000 organisms that are named at the genus level and/or lower. They’ve obtained most of this information from such laboratories and submissions from research and projects that have taken place.
Introduction to Swiss-Prot to Study Protein Sequences
Assignment Goal: To use the Internet-based site ExPASy (Expert Protein Analysis System) to translate cDNA, and the Internet-based database UniProt KB/Swiss-Prot to access a complete polypeptide.
23. WHAT IS cDNA? How can we obtain cDNA in the lab?
eDNA is a nuclear or mitochondrial DNA is released into the environment from an organism. We can obtain eDNA in the lab from a common method that filters water and traps the biological material.
Assignment: Access the BLAST site at www.ncbi.nlm.nih.gov/BLAST.cgi
Click on “Nucleotide Blast”
Enter the following sequence: ACATTTGCTTCTGACACAATTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAG
GCCCTGGGCAG
24. USING THE SAME PROGRAM YOU USED IN THE INTRODUCTION TO BLAST ABOVE, WHAT IS THE SEQUENCE MATCH?
Query 1 ACATTTGCTTCTGACACAATTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATC 60
|
Sbjct 1 ACATTTGCTTCTGACACAATTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATC 60
Query 61 TGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG 120
Sbjct 61 TGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG 120
Query 121 TTGGTGGTGAGGCCCTGGGCAG 142
Sbjct 121 TTGGTGGTGAGGCCCTGGGCAG 142
Now access the Expasy translate tool at https://web.expasy.org/translate/.
Enter the above DNA sequence.
Click “Translate Sequence”.
25. WHAT IS AN OPEN READING FRAME?
An open reading frame is a portion of a DNA sequence that doesn’t include a stop codon.
26. ALL OF THE PROPOSED OPEN READING FRAMES (HIGHLIGHTED IN RED) START WITH THE AMINO ACID “M”. FROM WHAT YOU KNOW ABOUT POLYPEPTIDES, WHAT IS “M”?
M stands for Met, which is the standard start codon, it encodes the amino acid methionine.
27. WHICH 5’ TO 3’ FRAME IS MOST LIKELY TO BE AN OPEN READING FRAME? WHY DID YOU CHOOSE THAT FRAME? Frame 3 is most likely to be an open reading frame. I chose this frame because it is the longest chain of the 3 frames that is between a potential start codon and stop codon.
Amino Acid Sequence Comparisons
Assignment Goal: To use the Internet-based site Expasy SIM program to align two amino acid sequences. Knowing the sources of these sequences will allow one to determine the mutation and potential cause of a human disease.
Assignment: Access the Expasy site at https://web.expasy.org/sim/. Copy and paste each of the following sequences into the “Sequence” text boxes as User-entered sequence.
Person 1/Sequence 1:
MGAPACALALCVAVAIVAGASSESLGTEQRVVGRAAEVPGPEPGQQEQLVFGSGDAVELSCPPPGGGPMGPTVWVKDGTGLVPSERVLVGPQRLQVLNASHEDSGAYSCRQRLTQRVLCHFSVRVTDAPSSGDDEDGEDEAEDTGVDTGAPYWTRPERMDKKLLAVPAANTVRFRCPAAGNPTPSISWLKNGREFRGEHRIGGIKLRHQQWSLVMESVVPSDRGNYTCVVENKFGSIRQTYTLDVLERSPHRPILQAGLPANQTAVLGSDVEFHCKVYSDAQPHIQWLKHVEVNGSKVGPDGTPYVTVLKTAGANTTDKELEVLSLHNVTFEDAGEYTCLAGNSIGFSHHSAWLVVLPAEEELVEADEAGSVYAGILSYGVGFFLFILVVAAVTLCRLRSPPKKGLGSPTVHKISRFPLKRQVSLESNASMSSNTPLVRIARLSSGEGPTLANVSELELPADPKWELSRARLTLGKPLGEGCFGQVVMAEAIGIDKDRAAKPVTVAVKMLKDDATDKDLSDLVSEMEMMKMIGKHKNIINLLGACTQGGPLYVLVEYAAKGNLREFLRARRPPGLDYSFDTCKPPEEQLTFKDLVSCAYQVARGMEYLASQKCIHRDLAARNVLVTEDNVMKIADFGLARDVHNLDYYKKTTNGRLPVKWMAPEALFDRVYTHQSDVWSFGVLLWEIFTLGGSPYPGIPVEELFKLLKEGHRMDKPANCTHDLYMIMRECWHAAPSQRPTFKQLVEDLDRVLTVTSTDEYLDLSAPFEQYSPGGQDTPSSSSGDDSVFAHDLLPPAPPSSGGSRT
Person 2/Sequence 2:
MGAPACALALCVAVAIVAGASSESLGTEQRVVGRAAEVPGPEPGQQEQLVFGSGDAVELSCPPPGGGPMGPTVWVKDGTGLVPSERVLVGPQRLQVLNASHEDSGAYSCRQRLTQRVLCHFSVRVTDAPSSGDDEDGEDEAEDTGVDTGAPYWTRPERMDKKLLAVPAANTVRFRCPAAGNPTPSISWLKNGREFRGEHRIGGIKLRHQQWSLVMESVVPSDRGNYTCVVENKFGSIRQTYTLDVLERSPHRPILQAGLPANQTAVLGSDVEFHCKVYSDAQPHIQWLKHVEVNGSKVGPDGTPYVTVLKTAGANTTDKELEVLSLHNVTFEDAGEYTCLAGNSIGFSHHSAWLVVLPAEEELVEADEAGSVYAGILSYRVGFFLFILVVAAVTLCRLRSPPKKGLGSPTVHKISRFPLKRQVSLESNASMSSNTPLVRIARLSSGEGPTLANVSELELPADPKWELSRARLTLGKPLGEGCFGQVVMAEAIGIDKDRAAKPVTVAVKMLKDDATDKDLSDLVSEMEMMKMIGKHKNIINLL
GACTQGGPLYVLVEYAAKGNLREFLRARRPPGLDYSFDTCKPPEEQLTFKDLVSCAYQVARGMEYLASQKCIHRDLAARNVLVTEDNVMKIADFGLARDVHNLDYYKKTTNGRLPVKWMAPEALFDRVYTHQSDVWSFGVLLWEIFTLGGSPYPGIPVEELFKLLKEGHRMDKPANCTHDLYMIMRECWHAAPSQRPTFKQLVEDLDRVLTVTSTDEYLDLSAPFEQYSPGGQDTPSSSSSGDDSVFAHDLLPPAPPSSGGSRT
Submit the sequences for comparison.
28. DO YOU SEE ANY DIFFERENCES BETWEEN THE TWO AMINO ACID SEQUENCES? (Look for the absence of an asterisk, which indicates the same amino acid in both sequences.)
Yes, there are differences between the two amino acid sequences.
29. IF YOU SAW DIFFERENCES, WHAT WERE THEY?
There is a difference in sequence 361 and 781 between the two amino acid sequences.
361 – Sequence 1 has a G and Sequence 2 has an R
781 – Sequence 1 has a – and sequence 2 has a S
Return to the BLAST home page (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Run a PROTEIN BLAST search to identify the polypeptide which you have been analyzing. (You may use either sequence.)
30. WHAT IS THE FUNCTION OF THIS PROTEIN?
The gene that the second sequence identifies as is the FGFR3 Gene. The function of this gene is to regulates bone growth by limiting the formation of cartilage and bone within the body.
31. WHAT HUMAN DISEASE IS CAUSED BY A MUTATION IN THIS GENE?
Muenke Syndrome is caused by a mutation in this gene.
32. REFLECT ON ONE THING THAT YOU LEARNED FROM DOING THIS ASSIGNMENT. Please be honest. If you didn’t learn anything, admit it…
One thing that I have learned from doing this assignment is the sequences from proteins/genes that are mutated to create these very common diseases that were discussed during this assignment.