The PROSITE database of protein domains, families and functional sites Release Notes Release 20.0, November 2006 |
Table of contents |
---|
(1) Introduction |
---|
PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them [More details].
PROSITE is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More details].
This release of PROSITE contains 1449 documentation entries that describe 1331 patterns and 671 profiles, and 737 rulesThe following table shows the growth of the database since its creation in 1989.
Rel. | Date | Doc | Entries | Note |
---|---|---|---|---|
1.0 | 03/89 | 58 | 60 | Only released in PC/Gene (Version 5.16) |
2.0 | 03/89 | 129 | 132 | Only released in PC/Gene (Version 6.00) |
3.0 | 05/89 | ? | 160 | |
4.0 | 10/89 | ? | 202 | Printed release (EMBL Biocomputing document) |
5.0 | 04/90 | 296 | 338 | |
6.0 | 11/90 | 375 | 433 | |
7.0 | 05/91 | 441 | 508 | |
8.0 | 11/91 | 530 | 605 | |
9.0 | 06/91 | 580 | 689 | |
10.0 | 12/92 | 635 | 803 | |
11.0 | 10/93 | 715 | 927 | |
12.0 | 06/94 | 785 | 1029 | First release to include profiles |
13.0 | 11/95 | 889 | 1167 | |
14.0 | 12/97 | 997 | 1335 | |
15.0 | 06/98 | 1014 | 1352 | |
16.0 | 07/99 | 1034 | 1374 | |
17.0 | 12/01 | 1108 | 1501 | |
18.0 | 07/03 | 1200 | 1639 | |
19.0 | 04/05 | 1344 | 1841 | |
20.0 | 04/10 | 1449 | 2004 | Introduction of the ProRule section |
(2) Description of the changes made to PROSITE since release 19.0 |
---|
For more details on new implementations see:
ftp://www.expasy.org/databases/prosite/tools/
PROSITE profiles normally use two cut-off levels, a reliable cut-off (LEVEL=0) and a low confidence cut-off (LEVEL=-1). The low level cut-off usually covers the twilight zone where few true positives, that cannot be separated from false positives, might be present. The output of pfsearch and pfscan programs indicate strong matches (level 0) with '!' and weak matches (level -1) with '?'. This specific tagging in the match list can be used in post-processing, to validate some true positives present in the twilight zone or to eliminate some false positives detected with significant score.
We have already started to introduce some contextual information for the detection of repeat units, where a weak match can be promoted in some particular cases (see user-manual) and we have now generalized this approach to other contexts. To do so, we have introduced a new line type, PP (for Post Processing), that defines the conditions to retrieve matches in post processing.
Four different types of post processing are defined as bellow:
PP /COMPETES_HIT_WITH: PS50001; PS50002; ...;Overlapping matches between a profile and the one(s) listed in its PP line are in competition. For each region of the protein matched by competing profiles only the match with the highest normalized score is kept.
PP /COMPETES_SEQ_WITH: PS50001; PS50002; ...;For each sequence matched by the two profiles only the one that produces the highest normalized score is kept to annotate the protein.
PP /PROMOTED_BY: PS50001; PS50002; ...;Weak matches (?) with the profile containing the PP line are promoted by the presence in the protein of a strong match (!) with the profile(s) defined in the PP line.
PP /DEMOTED_BY: PS50001; PS50002; ...;
Strong matches (!) with the profile containing the PP line are demoted by the presence in the protein of a match with the profile(s) defined in the PP line.
The PP line is located just after the last MA line as shown in the following example:
MA /I: E1=0; IE=-105; DE=-105; PP /COMPETES_HIT_WITH: PS51192; PS51193; PS51194; NR /RELEASE=50.1,223100;
PROSITE is now complemented with a set of rules (ProRule, more details) which are used to give extra meaningful information when a match with a PROSITE profile or pattern is detected. Each rule is triggered by a PROSITE entry and contains information linked to the domain or protein family covered by the profile/pattern. This information can be general, e.g. always associated with the domain or protein family, or conditional, depending on the presence of particular residues in functionally or structurally critical positions. The rule(s) associated with a profile/pattern is cross-referenced in the profile/pattern entry in a new line type (PR line).
Example:
PR PRU00001;
The PR line is located just before the DO line as shown in the following example:
3D 1V87; 1WEO; 1WIM; 1X4J; 1Z6U; 2CSY; 2CT2; PR PRU00175; DO PDOC00449;
Some information given by the rules is accessible to PROSITE users through our ScanProsite web page. The prorule.dat file containing all the rules is available on our ftp site under the PROSITE copyright conditions.
With the introduction of ProRule this line type has become obsolete and has thus been deleted.
The prosite.dat file included three types of entries: patterns profiles and rules. To avoid confusion with ProRule all rules have been deleted from prosite.dat.
The rules describing the prokaryotic membrane lipoprotein lipid attachment site and the nuclear localization signal have been replaced by profiles.
The PROSITE documentation reference blocks has been completed with the PubMed identifier, the digital object identifier (DOI) and the title of the article.
The new format is:
[ 1] Marshall R.D. Glycoproteins. Annu. Rev. Biochem. 41:673-702(1972). PubMed=4563441; DOI=10.1146/annurev.bi.41.070172.003325
The PubMed/DOI line is not restricted to 78 characters like other documentation lines.
The first column of the prosite.lis file which was used to indicate if a documentation entry was new '+', or had been modified '*' since the last major release has been deleted. Each line now start with the accession number of documentation.
(3) Forthcoming changes |
---|
From the 28th of November 2006 onward the structure of the PROSITE ftp site will be modified as follows:
+-prosite--+ updated version of the PROSITE release (updated roughly | every 2 weeks); this is the same version as the one | that can be queried through http://www.expasy.org/prosite/. | | | |--old_releases gzipped 'tar' (archive) files of previous releases of PROSITE. | Each release is stored in a file with the name: prositeXX_X.tar.gz | where "XX_X" is the release number. Each release file contains | prosite.doc, prosite.dat and since release 20.0 prorule.dat. | | +--tools--+ +--ps_scan binaries and source code of the reference PROSITE scanning tool
(4) Status of the PROSITE files |
---|
profile.txt | Description of the profile syntax |
psrelnot.htm | Release notes for the current release |
prosuser.htm | PROSITE usermanual |
unirule.pdf | The UniRule complete manual describing the format of rules |
prosite.dat | Patterns and profiles databases (updated weekly) |
prorule.dat | Rules database (updated weekly) |
prosite.doc | Documentation database for each pattern and profile (updated weekly) |
prosite_alignments_tar.gz | Multiple sequence alignment files (updated weekly) |
prosite.lis | List of documentation entries (updated weekly) |
pautindex.txt | Authors index (updated weekly) |
psdelac.txt | Deleted accession number index (updated weekly) |
experts.txt | List of on-line experts for PROSITE and Swiss-Prot (updated weekly) |
jourlist.txt | List of cited journals in PROSITE (updated weekly ) |
prosite_license.htm | PROSITE license conditions |
(5) FTP access to PROSITE |
---|
Organization | SIB Swiss Institute of Bioinformatics |
Address | ftp.expasy.org |
Directory | /databases/prosite/ |
(6) References |
---|
If you want to refer to the PROSITE database please cite:
If you want to refer to the PROSITE methodology please cite:
If you want to refer to the scanprosite web page please cite:
If you want to refer to the ProRule database please cite:
If you want to refer to the stand-alone tool to scan PROSITE please cite:
(7) Acknowledgments |
---|
This release of PROSITE has been prepared by: