BpForms is a toolkit for unambiguously describing the molecular structure (atoms and bonds) of DNA, RNA, and proteins, including non-canonical monomeric forms (subunits which compose polymers), crosslinks, nicks, and circular topologies. By concretely describing the molecular structure of biopolymers, BpForms aims to help epigenomics, transcriptomics, proteomics, systems biology, and synthetic biology researchers share and integrate information about DNA modification, post-transcriptional modification, post-translational modification, expanded genetic codes, and synthetic parts. In particular, BpForms was developed to help researchers collaboratively develop whole-cell computational models . See the use cases below for more information .
BpForms includes a grammar for describing biopolymer forms and three consensus alphabets of non-canonical monomeric forms of DNA nucleotide monophosphates, RNA nucleotide monophosphates, and protein amino acids. BpForms also includes four software tools for verifying descriptions of biopolymer forms and calculating properties such as their molecular structure, formula, molecular weight, and charge: this website, a JSON REST API , a command line interface , and a Python API . BpForms is available open-source under the MIT license.
BpForms can be combined with BcForms to concretely describe the primary structure of complexes.
BpForms has the following features:
The BpForms grammar extends the IUPAC/IUBMB notation commonly used to represent unmodified DNA, RNA, and proteins to describe non-canonical forms of DNA, RNA, and proteins:
BpForms descriptions of biopolymers consists of three parts separated by pipes ("|") (e.g., A{pSer}Y | circular | x-link: [...]):
The monomeric forms in the sequence of monomeric forms can be described in three ways:
The BpForms grammar is defined in Lark syntax , which is based on EBNF syntax .
BpForms includes several alphabets. Each alphabet describes hundreds of monomeric forms.
As described above, monomeric forms with single characters codes can be indicated by their codes (e.g., C) and monomeric forms with multi-character codes can be indicated by their codes delimited by curly brackets (e.g., {m2A}).
Monomeric forms which are not defined in the alphabet can be defined "inline" within the sequence of monomeric forms.
Inline monomeric forms can be defined by enclosing multiple attributes separated by pipes ("|") in square brackets (e.g., [id: "a-short-name" | name: "a long name " | ...]).
The inline monomeric forms support seven types of attributes. See below for more information about each attribute.
All of these attributes are optional. However, the structure, l-bond-atom, l-displaced-atom, r-bond-atom, and r-displaced-atom attributes are required to calculate the molecular structure, chemical formula, molecular weight, and charge of the polymer.
[id: "dI" | name: "hypoxanthine" | structure: "OC[C@H]1O[C@H](C[C@@H]1O)[N+]1(C=Nc2c1nc[nH]c2=O) C1CC(C(O1)COP(=O)([O-])[O-])O" ]
[id: "AA0305" | name: "N5-methyl-L-arginine" | structure: "OC(=O)[C@H](CCCN(C(=[NH2])N)C)[NH3+]" | l-bond-atom: N16-1 | l-displaced-atom: H16+1 | l-displaced-atom: H16 | r-bond-atom: C2 | r-displaced-atom: O1 | r-displaced-atom: H1 | comments: "Methylated form of L-arginine" ]
The structure attribute describes the molecular structure of the monomeric form. This attribute must be a SMILES-encoded string, and the atoms should be canonically ordered (i.e., Open Babel canonical SMILES format). Each monomeric form can only have one structure attribute. This attribute is required to calculate the molecular structure of the polymer.
The text below illustrates how to describe the modified DNA nucleotide monophosphate 2'-deoxy-2-O-methylcytosine-5'-monophosphate, and the image below illustrates the molecule that the text specifies. The atom labels indicate the numbers of the atoms within the molecule. These numbers can be generated with Open Babel .
[id: "m2C" | name: "2-O-methylcytosine" | structure: "COC1=NC(=CCN1C1CC(C(O1)COP(=O) ([O-])[O-])O)N" ]
The l-bond-atom and l-displaced-atom attributes describe bonds with preceding monomeric forms; the r-bond-atom and r-displaced-atom attributes describe bonds with succeeding monomeric forms.
The values of these attributes are the element of the atom, the position of the atom within the monomeric form, and the charge of the atom (e.g., N3+1). Open Babel can be used to display the numbers of the atoms within monomeric forms.
Each monomeric form can have one or more bonds and displaced atoms with the preceding and following monomeric forms. In addition, the number of left bond atoms must be equal to the number of right bond atoms for the preceding monomeric form, and the number of right bond atoms must be equal to the number of left bond atoms for the following monomeric form. The BpForms software verifies these constraints. These attributes are required to calculate the molecular structure of the polymer.
The example below illustrates how to describe the modified amino acid N5-methyl-L-arginine.
The blue atoms indicate atoms (N terminus) involved in left bonds to preceding monomeric forms; the dark blue N atom indicates the atom which bonds with preceding monomeric forms; the light blue H atom indicates atoms displaced by the formation of these bonds.
The green atoms indicate the atoms (C terminus) involved in right bonds to succeeding monomeric forms. The dark green C atom indicates the atom which bonds with succeeding monomeric forms; the light green H atom (not shown) indicates atoms which are displaced by the formation of these bonds.
[id: "AA0305" | name: "N5-methyl-L-arginine" | structure: "OC(=O)[C@H](CCCN(C(=[NH2])N)C) [NH3+]" | l-bond-atom: N16-1 | l-displaced-atom: H16+1 | l-displaced-atom: H16 | r-bond-atom: C2 | r-displaced-atom: O1 | r-displaced-atom: H1 ]
Through the inline monomeric forms, BpForms can represent two types of uncertainty in the molecular structure of forms of biopolymers:
BpForms can represent several types of metadata about inline monomeric forms:
The x-link polymer attribute can be used to indicate a bond between non-adjacent monomeric forms. For example, this attribute can be used to describe intrastrand crosslinks in DNA and disulfide bonds between cysteines in proteins.
Crosslinks can be described our ontology of crosslinks or defined inline.
Polymers can have zero, one, or more crosslinks.
Crosslinks defined using our ontology can be described by enclosing attributes which indicate the monomeric forms involved in the crosslink within square brackets and delimiting the attributes with pipes (e.g., CAC | x-link: [type: disfulfide | l: 1 | r: 3]).
The value of the type attribute must be the id of a crosslink in our ontology. See the crosslinks browser for a list of the defined crosslinks.
The values of the l and r attributes should be integers which indicates the positions of the monomeric forms involved in the crosslink. The left/right orientation of the monomeric forms must be matched to the definition of the crosslink in the ontology.
Users can also define crosslinks "inline" by enclosing attributes which indicate the atoms involved in the bond within square brackets and delimiting the attributes with pipes (e.g., | x-link: [l-bond-atom: 1C1 | r-bond-atom: 3C2 | ...]).
Each user-defined crosslink can be described with the following attributes:
Each user-defined crosslink can have one or more left and right bond atoms and zero or more left and right displaced atoms. Furthermore, each user-defined crosslink must have the same number of left and right bond atoms. This constraint is verified by the BpForms software.
The example below illustrates how to describe a tripeptide with a disulfide bond. The blue line indicates the disulfide bond (crosslink). The green lines indicate the bonds between the successive amino acids. The black labels indicate the positions of monomeric forms within the sequence.
Ontology-defined crosslink
CAC | x-link: [type: "disulfide" l: 1 | r: 3 ]
User-defined crosslink
CAC | x-link: [ l-bond-atom: 1S11 | l-displaced-atom: 1H11 | r-bond-atom: 3S11 | r-displaced-atom: 3H11 | comments: "Disulfide bond" ]
The : notation can be used to indicate a nick between adjacent monomeric forms.
The example below illustrates how to describe a tripeptide with nick between the first and second residues and a disulfide bond between the first and third residues. Such a peptide could be generated by nicking the form of the peptide that doesn't contain the nick. The blue line indicates the disulfide bond (crosslink). The green lines indicate the bonds between the successive amino acids. The black labels indicate the positions of monomeric forms within the sequence.
Ontology-defined crosslink
C:AC | x-link: [type: "disulfide" l: 1 | r: 3 ]
By default, BpForms describes linear polymers. The circular polymer attribute can be used to indicate that the polymer is circular (there is a bond between the last and first monomeric forms).
The example below illustrates how to describe the circular DNA dimer of the DNA nucleotides deoxyadenosine monophosphate and deoxycytosine monophosphate. The green lines indicate the bonds between successive nucleotides. The black labels indicate the positions of monomeric forms within the sequence.
AC | circular
Each residue, and atom represented by BpForms has a unique coordinate. The coordinate of each residue is its position within the residue sequence of its parent polymer. The coordinate of each atom is a tuple of the coordinate of its parent residue and its position within the canonical SMILES ordering of the atoms in its parent residue prior to incorporation into polymers (which can be displayed by Open Babel).
The example below illustrates the atom coordinates for the modified amino acid N5-methyl-L-arginine.
[id: "AA0305" | name: "N5-methyl-L-arginine" | structure: "OC(=O)[C@H](CCCN(C(=[NH2])N)C) [NH3+]" | l-bond-atom: N16-1 | l-displaced-atom: H16+1 | l-displaced-atom: H16 | r-bond-atom: C2 | r-displaced-atom: O1 | r-displaced-atom: H1 ]
To help quality control information about macromolecules, the BpForms user interfaces include methods for verifying the syntactic and semantic correctness of complexes:
BpForms includes four software interfaces for verifying descriptions of biopolymers and calculating properties such as their molecular structures, formulae, molecular weights, and charges.
By concretely capturing the molecular structure of biopolymers, BpForms can facilitate a wide range of epigenomics, proteomics, proteomics systems biology, and synthetic biology research.
BpForms can help researchers precisely communicate the structures of modified DNA, such as methylations that bacteria use to distinguish self from non-self DNA. We anticipate that this will be increasingly important as researchers continue to discover new types of modifications and begin to investigate their impact on the interactions of proteins with DNA.
Several chemotherapeutics, such as cisplatin, cause toxic side effects by damaging the DNA of healthy cells. Cells have several pathways with overlapping functions to repair DNA damage. This includes direct repair, base excision repair, nucleotide excision repair, and homologous recombination. Because chemotherapeutics cause a wide range of damage, and because cells have several pathways to repair DNA damage, it is challenging to assemble an integrated understanding of the repair of DNA damage caused by chemotherapeutics. BpForms can help researchers develop an integrated understanding of DNA repair by helping researchers concretely communicate the damage caused by each chemotherapeutic and the types of damage repaired by each pathway.
BpForms can help researchers precisely communicate the sequences of rRNA, tRNA, and other non-coding RNA; analyze RNA modifications; and improve the quality of reported sequences by identifying errors in the descriptions of modified RNA such as undefined monomeric forms and inconsistent bonds (e.g., 3' caps that are not located at the 3' position).
MODOMICS contains 732 curated sequences of rRNA and tRNA . We used BpForms together with MODOMICS to assess the metabolic cost of RNA modification in Escherichia coli. We found that E. coli tRNA have 7.8 ± 2.2 modifications per transcript that increase their mass by 166.2 ± 103.7 Da and charge by 0.40 ± 0.63 per transcript. This analysis also led us to add missing information about the origin of several of the monomeric forms derived from MODOMICS and correct three types of errors in the MODOMICS RNA sequences. The code is available at GitHub .
BpForms can help researchers precisely communicate the sequences of proteoforms, analyze modifications, and improve the quality of reported sequences by identifying errors in the descriptions of proteoforms, such as monomeric forms that are inconsistent with the unmodified sequence (e.g., selenocysteine modification of a non-cysteine amino acid) and inconsistent bonds (e.g., N,N,N-trimethyl-L-alanine (which has no N-terminus) located in the middle of a peptide).
The PRO database contains curated modifications of 2,312 human proteins . We used BpForms together with PRO to assess the metabolic cost of protein modification in humans. We found that human proteins have 1.7 ±1.6 modifications per protein that increase their mass by 146.4 ± 154.3 Da per protein and decrease their charge by 3.05 ± 3.30 per protein. This analysis also led us to improve PRO by correcting four types of errors in the curated modifications. The code is available at GitHub .
BpForms can help modelers describe the semantic meaning of models by helping modelers precisely describe the species in models. Importantly, this precision makes it easier for other researchers to understand, reuse, extend, and compose models for other studies. BpForms can also help modelers build more comprehensive models by helping researchers identify gaps in models such as missing intermediate modification states of proteins and missing interactions between modification states. In particular, BpForms can help modelers identify the full combinatorial complexity of biochemistry that should be modeled. In addition, BpForms can help researchers increase the quality of models by helping identify errors such as element imbalances.
The Kholodenko model of the eukaryotic MAPK signaling cascade (DOI: 10.1046/j.1432-1327.2000.01197.x , BioModels: BIOMD0000000010 ) represents the biphosporylation of Mek1/MAPKK by Mos/MAPKKK and the biphosporylation of Erk2/MAPK by Mek1/MAPKK. Annotating the structures of the species in the model with BpForms, enabled us to identify two gaps in the model: two additional intermediate phosphorylation forms of Mek1 and Erk2 and the reactions involving these species. BpForms also enabled us to identify several unbalanced reactions that do not capture phosphate donors.
BpForms can help engineers precisely represent and communicate the structures of parts for synthetic organisms. In addition, BpForms could help engineers identify the dependencies and interfaces of parts which, in turn, could help engineers use parts in alternative hosts, compose parts, and share parts more reliably.
E. coli pyruvate dehydrogenase requires lipoate ligation at L43 of the active site of the E1 subunit (UniProt: P96104 ). By representing lipoate ligation of the E1 subunit, BpForms can help capture the dependence of E. coli pyruvate dehydrogenase on lipoate ligase. In turn, this could help engineers recognize that E. coli pyruvate dehydrogenase can only be used in other hosts that have a lipoate ligase, or that a lipoate ligase, such as LplA (UnitProt: P32099 ), must be co-transformed with E. coli pyruvate dehydrogenase.
BpForms can be used in conjunction with several commonly used standards in genomics, transcriptomics, proteomics, systems biology, and synthetic biology. In addition, BpForms can easily be embedded within other documents such as Excel workbooks and comma-separated tables.
BpForms can be used to provide human-readable annotations of protein structures encoded in PDB files . BpForms can be embedded within REMARK records.
Protein: Bos taurus selenocysteine synthase Gpx1
... REMARK 1 REMARK 1 >1GP1:A REMARK 1 AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVASL{SE7}GTTVRDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEIL REMARK 1 NCLKYVRPGGGFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTDPKFITWSPVCRNDVSWNFEKFLVGPDGVPVRRYSRRFLTI REMARK 1 DIEPDIETLLSQGASA REMARK 1 >1GP1:B REMARK 1 AAALAAAAPRTVYAFSARPLAGGEPFNLSSLRGKVLLIENVASL{SE7}GTTVRDYTQMNDLQRRLGPRGLVVLGFPCNQFGHQENAKNEEIV REMARK 1 VLGFPCNQFGHQENAKNEEILNCLKYVRPGGGFEPNFMLFEKCEVNGEKAHPLFAFLREVLPTPSDDATALMTDPKFITWSPVCRNDVSWNFE REMARK 1 KFLVGPDGVPVRRYSRRFLTIDIEPDIETLLSQGASA ...
Sets of BpForms can be encoded in FASTA files. Such files can be written with the bpforms.util.write_to_fasta function or packages such as BioPython .
Protein: multiple phosphorylated forms of H. sapiens MAPK
> y | MEK | Q02750 MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRLEAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKLIH LEIKPAIRNQIIRELQVLHECNSPYIVGFYGAFYSDGEISICMEHMDGGSLDQVLKKAGRIPEQILGKVSIAVIKGLTYLREKHKIMHRDVKPSNILVNS RGEIKLCDFGVSGQLIDSMANSFVGTRSYMSPERLQGTHYSVQSDIWSMGLSLVEMAVGRYPIPPPDAKELELMFGCQVEGDAAETPPRPRTPGRPLSSY GMDSRPPMAIFELLDYIVNEPPPKLPSGVFSLEFQDFVNKCLIKNPAERADLKQLMVHAFIKRSDAEEVDFAGWLCSTIGLNQPSTPTHAAGV > yp | phosphorylated MEK | Q02750 | pS218 MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRLEAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKLIH LEIKPAIRNQIIRELQVLHECNSPYIVGFYGAFYSDGEISICMEHMDGGSLDQVLKKAGRIPEQILGKVSIAVIKGLTYLREKHKIMHRDVKPSNILVNS RGEIKLCDFGVSGQLID{AA0037}MANSFVGTRSYMSPERLQGTHYSVQSDIWSMGLSLVEMAVGRYPIPPPDAKELELMFGCQVEGDAAETPPRPRTP GRPLSSYGMDSRPPMAIFELLDYIVNEPPPKLPSGVFSLEFQDFVNKCLIKNPAERADLKQLMVHAFIKRSDAEEVDFAGWLCSTIGLNQPSTPTHAAGV
BpForms can be used to concretely describe all of the DNA, RNA, and proteins involved in pathways encoded in BioPAX . BpForms can be used with the sequence child of DNAReference, RNAReference, and ProteinReference objects.
DNA: E. coli K-12 MG1655 Dam 6-methyladenine sites (701..914) involved in host recognition
... <bp:DNA> <bp:entityReference> <bp:DNAReference> <bp:sequence rdf:datatype="http://www.w3.org/2001/XMLSchema#string" rdf:about="http://edamontology.org/format_3909#dna"> ... TGATTTGCCGTGGCGAGAAAATGTCG{a}TCGCCATTATGGCCGGCGTATTAGAAGCGCGCGGTCACAAC GTTACTGTTATCG{a}TCCGGTCGAAAAACTGCTGGCAGTGGGGCATTACCTCGAATCTACCGTCGATAT TGCTGAGTCCACCCGCCGTATTGCGGCAAGCCGCATTCCGGCTG{a}TCACATGGTGCTGATGGCAGGTT ... </bp:sequence> </bp:DNAReference> </bp:entityReference> </bp:DNA> ...
RNA: Modifications of B. subtilis tRNAUGC involved in stability
... <bp:RNA> <bp:entityReference> <bp:RNAReference> <bp:sequence rdf:datatype="http://www.w3.org/2001/XMLSchema#string" rdf:about="http://edamontology.org/format_3909#rna"> GGAGCCUUAGCUCAGC{8U}GGGAGAGCGCCUGCUU{501U}GC{6A}CGCAGGAG{7G}UCAGCGG{5U}{9U}CGAUCCCGCUAGGCUCCA CCA </bp:sequence> </bp:RNAReference> </bp:entityReference> </bp:RNA> ...
Protein: Modifications of H. sapiens MAPK3 involved in signaling
... <bp:Protein> <bp:entityReference> <bp:ProteinReference> <bp:sequence rdf:datatype="http://www.w3.org/2001/XMLSchema#string" rdf:about="http://edamontology.org/format_3909#protein"> M{AA0041}AAAAQGGGGGEPRRTEGVGPGVPGEVEMVKGQPFDVGPRYTQLQYIGEGAYGMVSSAYDHVRKTRVAIKKISPFEHQTYCQRTL REIQILLRFRHENVIGIRDILRASTLEAMRDVYIVQDLMETDLYKLLKSQQLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLINTTCD LKICDFGLARIADPEHDH{AA0038}GFL{AA0038}E{AA0039}VA{AA0038}RWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPI FPGKHYLDQLNHILGILGSPSQEDLNCIINMKARNYLQSLPSKTKVAWAKLFPKSDSKALDLLDRMLTFNPNKRITVEEALAHPYLEQYYDPT DEPVAEEPFTFAMELDDLPKERLKELIFQETARFQPGVLEAP </bp:sequence> </bp:ProteinReference> </bp:entityReference> </bp:Protein> ...
BpForms can be used to concretely describe the meaning of each component of a model encoded in CellML . BpForms can be used with the RDF element of component objects.
Protein: Phosphorylated Erk and Mek in signal transduction (DOI: 10.1038/msb.2009.4 , Physiome Model Repository ). See complete CellML file .
... <component cmeta:id="ypp" name="ypp"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#ypp"> <bpforms:ProteinForm xmlns:bpforms="https://bpforms.org"> MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRLEAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSG LVMARKLIHLEIKPAIRNQIIRELQVLHECNSPYIVGFYGAFYSDGEISICMEHMDGGSLDQVLKKAGRIPEQILGKVSIAVIKGLTYLRE KHKIMHRDVKPSNILVNSRGEIKLCDFGVSGQLID{AA0037}MAN{AA0037}FVGTRSYMSPERLQGTHYSVQSDIWSMGLSLVEMAVG RYPIPPPDAKELELMFGCQVEGDAAETPPRPRTPGRPLSSYGMDSRPPMAIFELLDYIVNEPPPKLPSGVFSLEFQDFVNKCLIKNPAERA DLKQLMVHAFIKRSDAEEVDFAGWLCSTIGLNQPSTPTHAAGV </bpforms:ProteinForm> </rdf:Description> </rdf:RDF> </component> ...
BpForms can be used to concretely describe the meaning of each species in a model encoded in Systems Biology Markup Language (SBML) . BpForms can be used with the annotation element of species objects.
Protein: Phosphorylated Cdc2 and Cdc12 in the yeast cell cycle (DOI: 10.1073/pnas.88.16.7328 , BioModels: BIOMD0000000005 ). See complete SBML file .
... <species name="cdc2k-p" metaid="cdc2k"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#cdc2k-p"> <bpforms:ProteinForm xmlns:bpforms="https://bpforms.org"> MENYQKVEKIGEG{AA0038}{AA0039}GVVYKARHKLSGRIVAMKKIRLEDESEGVPSTAIREISLLKEVNDENNRSNCVRLLDI LHAESKLYLVFEFLDMDLKKYMDRISETGATSLDPRLVQKFTYQLVNGVNFCHSRRIIHRDLKPQNLLIDKEGNLKLADFGLARSFGVPLRN Y{AA0038}HEIVTLWYRAPEVLLGSRHYSTGVDIWSVGCIFAEMIRRSPLFPGDSEIDEIFKIFQVLGTPNEEVWPGVTLLQDYKSTFPRW KRMDLHKVVPNGEEDAIELLSAMLVYDPAHRISAKRALQQNYLRDFH </bpforms:ProteinForm> </rdf:Description> </rdf:RDF> </annotation> </species> ...
Protein: Phosphorylated Mos/Raf1, Mek1, and Erk2 in the eukaryote MAPK cascade (DOI: 10.1046/j.1432-1327.2000.01197.x , BioModels: BIOMD0000000010 ). See complete SBML file .
... <species name="Erk2-PP" metaid="_584615"> <annotation> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="#_584615"> <bpforms:ProteinForm xmlns:bpforms="https://bpforms.org"> MAAAGAASNPGGGPEMVRGQAFDVGPRYINLAYIGEGAYGMVCSAHDNVNKVRVAIKKISPFEHQTYCQRTLREIKILLRFKHENIIGINDI IRAPTIEQMKDVYIVQDLMETDLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDHT GFL{AA0038}E{AA0039}VATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLK ARNYLLSLPHKNKVPWNRLFPNADPKALDLLDKMLTFNPHKRIEVEAALAHPYLEQYYDPSDEPVAEAPFKFEMELDDLPKETLKELIFEET ARFQPGY </bpforms:ProteinForm> </rdf:Description> </rdf:RDF> </annotation> </species> ...
BpForms can be used to describe the meaning of each DNA, RNA, and protein molecule in genetic designs encoded in the Synthetic Biology Open Language (SBOL) . BpForms can be used with the elements attribute of Sequence objects.
The following URIs should be used to indicate the encodings for the sequences of DNA, RNA, and protein molecules.
RNA: Modified B. subtilis tRNAILE 69 (SynBioHub: BO_28687 ). See complete SBOL file .
... <sbol:Sequence> <sbol:elements> GGGCCUGUAGCUCAGC{8U}GG{8U}{8U}AGAGCGCACGCCUGAU{62A}AGCGUGAG{7G}UCGAUGG{5U}{9U}CGAGUCCAUUCAGGCCCACCA </sbol:elements> <sbol:encoding rdf:resource="http://edamontology.org/format_3909#rna"/> </sbol:Sequence> ...
Protein: Lipoate-ligated acetyltransferase component PdhC of B. subtilis pyruvate dehydrogenase complex (SynBioHub: BO_32431 ). See complete SBOL file .
... <sbol:Sequence> <sbol:elements> MAFEFKLPDIGEGIHEGEIVKWFVKPNDEVDEDDVLAEVQND{AA0118}AVVEIPSPVKGKVLELKVEEGTVATVGQTIITFDAPGYEDLQFKGSDE SDDAKTEAQVQSTAEAGQDVAKEEQAQEPAKATGAGQQDQAEVDPNKRVIAMPSVRKYAREKGVDIRKVTGSGNNGRVVKEDIDSFVNGGAQEAAPQE TAAPQETAAKPAAAPAPEGEFPETREKMSGIRKAIAKAMVNSKHTAPHVTLMDEVDVTNLVAHRKQFKQVAADQGIKLTYLPYVVKALTSALKKFPVL NTSIDDKTDEVIQKHYFNIGIAADTEKGLLVPVVKNADRKSVFEISDEINGLATKAREGKLAPAEMKGASCTITNIGSAGGQWFTPVINHPEVAILGI GRIAEKAIVRDGEIVAAPVLALSLSFDHRMIDGATAQNALNHIKRLLNDPQLILMEA </sbol:elements> <sbol:encoding rdf:resource="http://edamontology.org/format_3909#protein"/> </sbol:Sequence> ...
Below are several resources which can be helpful for determining the sequences of natural biopolymers and designing the sequences of synthetic biopolymers.
To suggest new residues or modifications to the existing residues, please use this GitHub issue template , submit a GitHub pull request , or contact us by email .
Please provide as much information as possible about each residue using the YAML alphabet format. Please see the Git repository for examples .
To contribute to an additional alphabet, please use this GitHub issue template , submit a GitHub pull request , or contact us by email .
Please provide as much information as possible about each residue using the YAML alphabet format. Please see the Git repository for examples .