Hofmann Laboratory

SBAL (Structure Based Amino Acid Sequence Alignment) is intended for multiple protein sequence alignments guided by secondary structure elements. The program provides automatic and semi-automatic alignment features, and also possesses manual editing capabilities.
Using sequences provided either as individual PSIPRED output files, FASTA files, DSSP output files or PDB files, SBAL calculates a multiple sequence alignment using a position-specific global alignment algorithm that accounts both for structural as well as for sequence homology.

Amino acid sequences can be imported from the following data formats:

DSSP
FASTA
PDB
PSIPRED

Add sequence
Delete sequence
Remove duplicates
Add alignment
Manual editing
Manual assignment of secondary structure
Automatic alignment of sequences with minimal user-input
Position-specific alignment of sequences against user-selected templates
Threading of a sequence onto a single sequence within an alignment

Alignment score
Pairwise sequence identity
Distance matrix (AA identity, p-distance)
Alignment profile

SBAL alignments can be saved in three different formats:

SBAL
HTML
PDF
FASTA

Examples illustrating common applications for SBAL:

I have output files with secondary structure prediction results from PSIPRED
I have many AA sequences in individual files in FASTA format
I have an existing AA sequence alignment
I have an existing non-SBAL AA sequence alignment and PSIPRED secondary structure prediction
Introducing a manual domain annotation

1.	You can use PSIPRED output files in horizontal (file extension horiz) or vertical format (file extension ss2).
2.	Put all individual PSIPRED files to be analysed into one directory.
3.	PSIPRED files do not include titles for individual sequences. If you wish to have an individual title for each sequence, then you need to provide an individual FASTA file (with the same name root name) for each sequence. E.g. to combine the secondary structure from the PSIPRED files 1.ss2, 2.ss2, 3.ss2 with information for each sequence, you need to provide the files 1.fa, 2.fa, 3.fa in FASTA format. The title of each sequence will be read from the first line.
4.	Start SBAL, and go to Tools - Auto-Alignment. Select the directory with the PSIPRED files and choose PSIPRED files (.ss2, .horiz) as source. If you want the sequences to be automatically aligned, check automatically align sequences. Then click Start.

1.	All files must have a title in the first line, beginning with ">". Sequences start in the second line. The file extension needs to be "fa" (e.g. 1dk5.fa, 1abl.fa, etc).
2.	Since the FASTA format has no secondary structure information, SBAL will predict secondary structure using a single sequence prediction algorithm.
3.	Put all individual FASTA files to be analysed into one directory.
4.	Start SBAL, and go to Tools - Auto-Alignment. Select the directory with the FASTA files and choose FASTA files (*.fa) as source. If you want the sequences to be automatically aligned, check automatically align sequences. Then click Start.

Exisiting SBAL sequence alignments can be read in through File - Open Alignment; select your input file and choose SBAL format. Alignments in SBAL format contain amino acid sequence and secondary structure infromation.
Existing alignments can also imported from the following formats: FASTA, MSF and Clustal, using File - Open Alignment. Since these formats do not contain secondary structure information, SBAL will perform single sequence secondary structure prediction for each sequence.

Prerequisite 1	An existing alignment in one of the formats FASTA, MSF or Clustal.
Prerequisite 2	PSIPRED secondary structure prediction files in .ss2 or .horiz format; named for example 01.ss2, 02.ss2, 03.ss2, etc.
Prerequisite 3	FASTA files with the same root name as the PSIPRED files (i.e. 01.fa, 02.fa, 03.fa, etc).
Prerequisite 4	All files above need to be in the same directory.
Open the alignment using File - Open Alignment. SBAL will search all available FASTA files to find a match of the sequence identifier between the alignment and the individual FASTA files. The secondary structure information will then be read from the corresponding PSIPRED file. If no match can be found, SBAL will predict secondary structure using the built-in single-sequence prediction algorithm.

1.	Place the line cursor in a sequence above which the Annotation line is supposed to occur.
2.	Use the menu item Edit - Add annotation to generate the annotation line.
3.	Enter the annotation text in the newly generated line and use the pipe character \| to denote domain boundaries. Place the line cursor in the annotation line.
4.	Then use the menu item Edit - Convert to domain annotation.
5.	A popup window with the recognised domains appears.
6.	In the popup window, the background colour (first colour box) of the individual domains, as well as the text colour (second colour box) of the domain description can be selected by clicking on the colour boxes. The annotation text cannot be changed in the popup window.
7.	After confirming the choices, the individual domains are shown in the chosen colours. To change any settings, simply repeat from step 1.

When using this program, please cite:

Wang, C.K., Broder, U., Weeratunga, S.K., Gasser, R.B., Loukas, A., Hofmann, A. (2012) SBAL: a practical tool to generate and edit structure-based amino acid sequence alignments. Bioinformatics 28, 1026-1027.

DOI | PubMed | More