Javascript is required

What is CD-CODE?

CD-CODE conceptual image

CD-CODE 2.0: An enhanced condensate knowledgebase integrating pathobiology, modulators, and host–pathogen interactions

CD-CODE (CrowDsourcing COndensate Database and Encyclopedia) is a continuously updated versatile semi-manually curated crowdsourcing database of biomolecular condensates and their constituents (proteins and nucleic acids), including a repository of infectious condensates , condensate modulating drugs ( c-mods ) and condensate aberrations across diverse disease contexts ( condensatopathies ), as well as an encyclopedia for the scientific terms used to describe them. Biomolecular condensates are micron-scale compartments that selectively concentrate biomolecules, mainly proteins and nucleic acids but lack surrounding lipid membranes.


CD-CODE is a semi-manually curated and annotated database that aggregates information from primary literature and other protein and LLPS databases.

We extract information from the literature of which proteins are members of biomolecular condensates and provide experimental evidence for each condensate-protein relationship. We rely on condensates.com literature curation engine to continously assemble all relevant papers published.

The standardisation of terms and condensate nomenclature supports our understanding of condensate proteomes across organisms.

CD-CODE is a "living database" that was designed for dynamic and fast addition and review of information about condensates and proteins by contributing users and is open to expert researcher who wishes to contribute. Our user management system supports three types of users: viewers, contributors and maintainers. Viewers can read and download the curated information. Contributors can suggest edits to the existing information and propose new condensate and protein entries. Maintainers can curate the changes and accept or reject suggestions by contributors, who are then notified about the status and can engage in further discussion.

If you are interested in becoming a contributor, please join us here.

How to cite us?

If you use CD-CODE in a scientific publication, we would appreciate citation to the following paper:

2025

"CD-CODE 2.0: an enhanced condensate knowledgebase integrating pathobiology, condensate modulating drugs, and host–pathogen interactions"
Ksenia Kuznetsova, Maxim Scheremetjew, Jialin Yin, HongKee Moon, Diego A Vargas, Anna Hadarovich, Natasha Lewis, Carsten Hoege, Chi Fung Willis Chow, David Kuster, Jik Nijssen, Alberto Hernandez-Armendariz, Jonathan C Savage, Yu Wei, Silja Zedlitz, Hari Raj Singh, Soumyadeep Ghosh, Allysa P Kemraj, Lena Hersemann, Anthony A Hyman, Diana M Mitrea, Agnes Toth-Petroczy, Nucleic Acids Research (2025).
https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkaf1104/8317324

DOI: 10.1093/nar/gkaf1104

2023

"CD-CODE: crowdsourcing condensate database and encyclopedia"
Nadia Rostam, Soumyadeep Ghosh, Chi Fung Willis Chow, Anna Hadarovich, Cedric Landerer, Rajat Ghosh, HongKee Moon, Lena Hersemann, Diana M. Mitrea, Isaac A. Klein, Anthony A. Hyman, Agnes Toth-Petroczy, Nat Methods (2023).
https://www.nature.com/articles/s41592-023-01831-0

DOI: 10.1038/s41592-023-01831-0

Definitions at CD-CODE

Biomolecular condensates: Membrane-less organelles that selectively concentrate biomolecules, mainly proteins and nucleic acids in vivo.


Infectious condensates: Biomolecular condensates that are formed upon infection of the host organism by pathogen (virus or bacteria) with their annotated host and pathogen protein and nucleic acids components. Biomolecular and infectious condensates are annotated according to their protein constituents and information on containing nucleic acids, RNA and DNA.


Synthetic condensates: In vitro experiments that demonstrate condensate formation, usually via liquid-liquid phase separation.


Markers: Proteins that are used to define and label/mark a specific biomolecular condensate. Markers are often drivers.

Proteins in CD-CODE are divided into two groups:

1. Drivers: Proteins which fulfill at least one of the following criteria.

  • Induce the formation of a condensate.
  • Are essential for the integrity of a condensate.

2. Members: Proteins which are part of a condensate but their role is unknown.

It is important to note that a particular protein may be a driver in one condensate and a member in another one.


C-mods: Condensate modifying compound, also referred to as condensate modulator or condensate modulating drug, is a molecule that modifies the properties of a biomolecular condensate (PMID: 35974095).

C-mods in CD-CODE are divided into 4 groups based on their mode of action (MOA) (PMID: 35974095):

  • Inducer: A compound or biomolecule that promotes formation of a condensate.
  • Dissolver: A compound or biomolecule that promotes dissolution of a condensate.
  • Localizer: A compound or biomolecule that promotes relocalization of a biomolecule from or to a condensate.
  • Morpher: A compound or biomolecule that promotes changes in the morphology of a biomolecular condensate.

Condensatopathy: An aberrant condensate phenotype associated with disease pathophysiology, supported by experimental evidence in relevant cellular, animal and/or clinical models. A condensatopathy can be viewed as a biomarker – it could be a driver or an effect of the disease, but in either case it is strongly associated with, and reports on the diseased state (PMID: 35974095).


Nucleic Acid information: We annotate nucleic acid (RNA and DNA) components within condensates based on experimental evidence and note if they are shown to associate with specific proteins-members of the condensate. For additional RNA classifications in condensates, we recommend consulting the RPS database https://rps.renlab.cn (PMID: 39460625).


Confidence score for a protein belonging to a condensate (max. 5 stars):

Number of stars Description
PubMed reference annotated
High throughput experiment (eg. mass spectrometry)
In vitro
In cellulo
In vivo

Confidence score for a condensate (max. 5 stars): median of the confidence scores of its member proteins.

CD-CODE information flow

Figure

CD-CODE information flow. Users can view and search the data, or become a contributor after registration, and edit the content of the database. The maintainers assure quality control and only approved edits will be part of the dynamically updated database. Moreover, maintainers keep the database up to date by manually curating newly released research.

Currently, CD-CODE contains curated data about more than 700 condensates and more than 10000 proteins which can be viewed by users.

Join us and become a contributor or maintainer.

Limitations of CD-CODE

The main limitation of CD-CODE is that it contains only experimentally validated entries and therefore it is heavily biased towards researcher’s choices to study condensates and in select organisms. We deliberately have not included any predictions, since inferring if condensate exists in other organisms and if an orthologous protein would be part of a condensate in an orthologous condensate is a non-trivial question. Any missing information could mean that the protein or condensate 1) has not been studied yet; 2) there is a research paper but the information has not been added to the database yet; 3) the condensate truly does not exist or the protein truly does not belong to a given condensate. The database is continuously undergoing curation and users can refer to scores and experimental evidence to guide their decision-making.

License

CD-CODE is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Creative Commons License