Once upon a time, in the early 1990's, I was in graduate school studying molecular biology. I developed a thesis project, which I never pursued and which, to my knowledge, no one has ever pursued.
The project would have been one of the first forays into bioinformatics. The idea was to use an artificial neural net (ANN) to predict protein function from primary structure. Basically, one would create inputs of known proteins' amino acid sequences and the qualities of each of those amino acids, including whether they were:
- Polar / hydrophilic
- Non-polar / hydrophobic
- Sulfur containing
- Charged at Neutral pH Negative / acidic
- Charged at Neutral pH Positive / basic
- Aromatic (and potentially stacking)
- Forms covalent cross-link (disulfide bond)
- C-Beta branching
- pK values
- pI values
- Ka values
The outputs of the ANN would of course be the protein function, though it is perhaps not impossible that some structural predictions -- of alpha helices or beta sheets, for example -- could not be an output for such a system.
As for discovering the best ANN architecture, perhaps genetic algorithms could be used. There is no telling what is the optimal architecture, so some sort of evolution and selection process would likely be most efficient.
Of course, it may be possible that there are programs other than ANNs that could do this better/more efficiently. I suggest ANNs because they are able to conceptualize and therefore make pattern predictions, which is what a program like this, with the outputs desired, require.
If anyone thinks this worth pursuing, I encourage you to do so. I just ask for a courtesy 10% of any profits. :-)