PROSITE Release notes for Release 20, October 2006

The PROSITE database of protein domains, families and functional sites
Release Notes

Release 20.0, November 2006

Table of contents

1   Introduction
2   Description of the changes made to PROSITE since release 19.0
3   Forthcoming changes
4   Status of the PROSITE files
5   FTP access to PROSITE
6   References
7   Acknowledgments

(1) Introduction

PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them [More details].

PROSITE is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids [More details].

This release of PROSITE contains 1449 documentation entries that describe 1331 patterns and 671 profiles, and 737 rules new

Since release 19.0, 463 entries have been updated, 105 documentation and 163 signatures have been added.

The following table shows the growth of the database since its creation in 1989.

Rel.	Date	Doc	Entries	Note
1.0	03/89	58	60	Only released in PC/Gene (Version 5.16)
2.0	03/89	129	132	Only released in PC/Gene (Version 6.00)
3.0	05/89	?	160
4.0	10/89	?	202	Printed release (EMBL Biocomputing document)
5.0	04/90	296	338
6.0	11/90	375	433
7.0	05/91	441	508
8.0	11/91	530	605
9.0	06/91	580	689
10.0	12/92	635	803
11.0	10/93	715	927
12.0	06/94	785	1029	First release to include profiles
13.0	11/95	889	1167
14.0	12/97	997	1335
15.0	06/98	1014	1352
16.0	07/99	1034	1374
17.0	12/01	1108	1501
18.0	07/03	1200	1639
19.0	04/05	1344	1841
20.0	04/10	1449	2004	Introduction of the ProRule section

(2) Description of the changes made to PROSITE since release 19.0

: 2.1 New version of ps_scan.pl, the PROSITE scan tools

For more details on new implementations see:

ftp://www.expasy.org/databases/prosite/tools/

: 2.2 Introduction of a new line type for the post-processing retrieval of data

PROSITE profiles normally use two cut-off levels, a reliable cut-off (LEVEL=0) and a low confidence cut-off (LEVEL=-1). The low level cut-off usually covers the twilight zone where few true positives, that cannot be separated from false positives, might be present. The output of pfsearch and pfscan programs indicate strong matches (level 0) with '!' and weak matches (level -1) with '?'. This specific tagging in the match list can be used in post-processing, to validate some true positives present in the twilight zone or to eliminate some false positives detected with significant score.

We have already started to introduce some contextual information for the detection of repeat units, where a weak match can be promoted in some particular cases (see user-manual) and we have now generalized this approach to other contexts. To do so, we have introduced a new line type, PP (for Post Processing), that defines the conditions to retrieve matches in post processing.

Four different types of post processing are defined as bellow:

PP   /COMPETES_HIT_WITH: PS50001; PS50002; ...;

Overlapping matches between a profile and the one(s) listed in its PP line are in competition. For each region of the protein matched by competing profiles only the match with the highest normalized score is kept.

PP   /COMPETES_SEQ_WITH: PS50001; PS50002; ...;

For each sequence matched by the two profiles only the one that produces the highest normalized score is kept to annotate the protein.

PP   /PROMOTED_BY: PS50001; PS50002; ...;

Weak matches (?) with the profile containing the PP line are promoted by the presence in the protein of a strong match (!) with the profile(s) defined in the PP line.

PP   /DEMOTED_BY: PS50001; PS50002; ...;

Strong matches (!) with the profile containing the PP line are demoted by the presence in the protein of a match with the profile(s) defined in the PP line.

The PP line is located just after the last MA line as shown in the following example:

MA   /I: E1=0; IE=-105; DE=-105;
PP   /COMPETES_HIT_WITH: PS51192; PS51193; PS51194;
NR   /RELEASE=50.1,223100;

: 2.3 Introduction of the ProRule section

PROSITE is now complemented with a set of rules (ProRule, more details) which are used to give extra meaningful information when a match with a PROSITE profile or pattern is detected. Each rule is triggered by a PROSITE entry and contains information linked to the domain or protein family covered by the profile/pattern. This information can be general, e.g. always associated with the domain or protein family, or conditional, depending on the presence of particular residues in functionally or structurally critical positions. The rule(s) associated with a profile/pattern is cross-referenced in the profile/pattern entry in a new line type (PR line).

Example:

PR   PRU00001;

The PR line is located just before the DO line as shown in the following example:

3D   1V87; 1WEO; 1WIM; 1X4J; 1Z6U; 2CSY; 2CT2;
PR   PRU00175;
DO   PDOC00449;

Some information given by the rules is accessible to PROSITE users through our ScanProsite web page. The prorule.dat file containing all the rules is available on our ftp site under the PROSITE copyright conditions.

: 2.4 Deletion of the CC FT_KEY line type

With the introduction of ProRule this line type has become obsolete and has thus been deleted.

: 2.5 Deletion of all rules from the prosite.dat file

The prosite.dat file included three types of entries: patterns profiles and rules. To avoid confusion with ProRule all rules have been deleted from prosite.dat.

The rules describing the prokaryotic membrane lipoprotein lipid attachment site and the nuclear localization signal have been replaced by profiles.

: 2.6 Change in the format of references

The PROSITE documentation reference blocks has been completed with the PubMed identifier, the digital object identifier (DOI) and the title of the article.

The new format is:

[ 1] Marshall R.D.
     Glycoproteins.
     Annu. Rev. Biochem. 41:673-702(1972).
     PubMed=4563441; DOI=10.1146/annurev.bi.41.070172.003325

The PubMed/DOI line is not restricted to 78 characters like other documentation lines.

: 2.7 Change in the format of the prosite.lis file

The first column of the prosite.lis file which was used to indicate if a documentation entry was new '+', or had been modified '*' since the last major release has been deleted. Each line now start with the accession number of documentation.

(3) Forthcoming changes

From the 28th of November 2006 onward the structure of the PROSITE ftp site will be modified as follows:

+-prosite--+			 updated version of the PROSITE release (updated roughly 
           |                     every 2 weeks); this is the same version as the one 
	   |                     that can be queried through http://www.expasy.org/prosite/.      
	   |
	   |
           |
           |--old_releases       gzipped 'tar' (archive) files of previous releases of PROSITE.     
           |                     Each release is stored in a file with the name: prositeXX_X.tar.gz 
           |                     where "XX_X" is the release number.  Each release file contains    
           |                     prosite.doc, prosite.dat and since release 20.0 prorule.dat.
           |
	   |
           +--tools--+
                     +--ps_scan  binaries and source code of the reference PROSITE scanning tool

(4) Status of the PROSITE files

PROSITE is distributed with different data and documentation files. The following table lists the files that are currently available.

profile.txt	Description of the profile syntax
psrelnot.htm	Release notes for the current release
prosuser.htm	PROSITE usermanual
unirule.pdf	The UniRule complete manual describing the format of rules
prosite.dat	Patterns and profiles databases (updated weekly)
prorule.dat	Rules database (updated weekly)
prosite.doc	Documentation database for each pattern and profile (updated weekly)
prosite_alignments_tar.gz	Multiple sequence alignment files (updated weekly)
prosite.lis	List of documentation entries (updated weekly)
pautindex.txt	Authors index (updated weekly)
psdelac.txt	Deleted accession number index (updated weekly)
experts.txt	List of on-line experts for PROSITE and Swiss-Prot (updated weekly)
jourlist.txt	List of cited journals in PROSITE (updated weekly )
prosite_license.htm	PROSITE license conditions

(5) FTP access to PROSITE

PROSITE is available for download on the following anonymous FTP servers:

Organization	SIB Swiss Institute of Bioinformatics
Address	ftp.expasy.org
Directory	/databases/prosite/

(6) References

If you want to refer to the PROSITE database please cite:

Hulo N., Bairoch A., Bulliard V., Cerutti L., De Castro E., Langendijk-Genevaux P.S., Pagni M., Sigrist C.J.A.

The PROSITE database.

Nucleic Acids Res. 34:D227-D230(2006).

If you want to refer to the PROSITE methodology please cite:

Sigrist C.J.A., Cerutti L., Hulo N., Gattiker A., Falquet L., Pagni M., Bairoch A., Bucher P.

PROSITE: a documented database using patterns and profiles as motif descriptors.

Brief Bioinform. 3:265-274(2002).

PubMed: 12230035

If you want to refer to the scanprosite web page please cite:

de Castro E., Sigrist C.J.A., Gattiker A., Bulliard V., Petra S. Langendijk-Genevaux P.S., Gasteiger E. Bairoch A., Hulo N.

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins.

Nucleic Acids Res. 34:W362-W365(2006).

If you want to refer to the ProRule database please cite:

Sigrist C.J.A., De Castro E., Langendijk-Genevaux P.S., Le Saux V., Bairoch A., Hulo N.

ProRule: a new database containing functional and structural information on PROSITE profiles.

Bioinformatics. 21:4060-4066(2005).

If you want to refer to the stand-alone tool to scan PROSITE please cite:

Gattiker A., Gasteiger E. and Bairoch A.;

ScanProsite: a reference implementation of a PROSITE scanning tool

Applied Bioinformatics 1:107-108(2002)

(7) Acknowledgments

This release of PROSITE has been prepared by:

Nicolas Hulo, Christian J.A. Sigrist, Virginie Buillard, Petra Langendijk-Genevaux, Edouard de Castro, Lorenzo Cerutti, Corinne Lachaize and Amos Bairoch

Swiss-Prot group, SIB Swiss Institute of Bioinformatics.