Computational Biology Resource Center
Transcription Factor Searches
MUSC non-Net-based Transcription based searches
If you wish to scan your DNA sequence for transcription factor bindingsites you have two choices in GCG. I. Here's the first process 1) Start GCG and "fetch" tfsites.dat. tfsites.dat is a compilation (a bit outdated--who knows why it is supplied by GCG and there was NOT a newer release available from NCBI...) of known transcription factor binding sites along with their literature references. The current tfsites.dat file fetched this way is dated 1996. A 2003 version in GCG format(tfsites.gcg) may be retrieved from this site.2) Select your to be scanned sequence and use findpatterns with the following option: -data=tfsites.dat ie: findpatterns -dat=tfsites.dat filename.seq ^^^^^^^^^^^^^^^^3) The outputfile ie filename.find contains all the locations where the recognition sites for transcription factors have been mapped. You can scan through the file to see if anything looks useful to you.4) There is no provision for finding the references to these known sites. This is a flaw in GCG. To get around this I wrote some things and Karen Jesmer at CCIT helped a huge amount to create a short unix shell script which will read your findpatterns output file and then get the references which match your hits.
Send an email to Starr Hazard requesting the unix shell program starr
5) Using "starr" a) run findpatterns with -dat=tfsites.dat b) type sh, starr, the findpatterns output file and the ref file to be created ie sh starr filename.find filename.findref c) examine filename.findref for the references which your findpattern search located.II. Here's the second way 1) The tfsites.dat file may also be read by any of the map programs to locate the tfsites along your sequence. type mapplot -dat=tfsites.dat filename.seq ^^^^^^^^^^^^^^^^ This will create a "digestion" map showing the places where the tfsites finds recognition sites. Of course, the MAP program will use the tfsites.dat file as well. Type: map -dat=tfsites.dat filename.seq ^^^^^^^^^^^^^^^^7) OR you could use Dan Prestridge's SignalScan program. To use this you must add two lines to your .cshrc file. Then save the modifications. Finally type "source .cshrc" to activate the changes (or start a new shell, or log out then login again). Typing "signal" should initiate SignalScan. This is not a better program its just different. You can go back and look at the references but only one at a time.
There are the two lines to add to your .cshrc file. Send an e-mail to Starr Hazard to get these twolines
8) OR finally, refer to the following links to Web resources. These do not generally work better or faster but they do give you hypertext links to the references and are therefore more convenient in that regard.
Net-based Transcription Factor and Promoter Search Services
- VISTA which compares two genomic sequences may be the best bet for deciphering which DNA in a non-coding region is regulatory VISTA.
- NIH ProScan
- SRS database search engine.The UnTranslatedRegion UTR database may be searched UTR
- Genomatix PromoterInspectorFind Promoters:register first
- Berkeley-Martin Reese's Neural-network Promoter Predictor Promoter Predictor
- Genomatix MatInspector: you will have to register first!
- Genomatix FastM hopes to inform you about distance correlated transcription factor binding sites.
- Japan TFSEARCH
- UPenn TESS String-based search
- IFTI-USA TFsearch.
The IFTI site returns some nice graphics regarding the strength of the hits. This is useful for sorting voluminous data.
The NIH Proscan site is a Web implementation of Dan Prestridge's PROSCAN program.
revised by ESH August 14, 2012