Skip to main content
MIP

Overview

MIP is an AI-native research platform for biology. It combines a domain-specialized reasoning engine with real-time access to 40+ scientific databases, a live literature index, and a containerized compute environment — giving you a single surface to ask questions, analyze data, run pipelines, and generate structured outputs across the full breadth of biological research. The platform is built around three layers: a reasoning layer that understands biology at the level of genes, proteins, pathways, variants, drugs, and disease mechanisms; a data layer that connects to authoritative public databases and your own uploaded datasets; and a compute layer that can execute code, run autonomous multi-step pipelines, and produce publication-quality results.
MIP is a research and clinical decision-support tool. All outputs are intended to assist qualified professionals and should be reviewed by a domain expert before being used in any clinical or regulatory context.

Who MIP is for

MIP is built for researchers, clinicians, and teams working across biology and biomedicine. This includes:
  • Molecular and cell biologists investigating gene function, protein interactions, and disease mechanisms
  • Clinical geneticists and genetic counselors interpreting variants and generating reports
  • Bioinformaticians running pipelines, validating data, and exploring multi-omics datasets
  • Drug discovery scientists mapping targets, pathways, and compound activity
  • Translational researchers bridging bench findings with clinical and population data
  • Lab directors and principal investigators overseeing analysis across projects and team members
Whether you are working on a single gene or running a genome-wide screen, MIP adapts to the scope and depth of your question.

Architecture

MIP’s architecture is designed to keep reasoning, data, and compute tightly integrated so you never have to context-switch between tools.
1

Reasoning layer

A domain-specialized AI engine that understands biological entities and relationships — genes, variants, proteins, pathways, diseases, drugs, phenotypes. It decomposes complex questions into tool calls, searches the right databases, interprets results, and synthesizes answers with citations. Multiple specialized agents (genomics, literature, structural biology, pharmacology) are composed dynamically based on your question.
2

Data layer

Real-time connections to 40+ public databases (see full list below), your uploaded datasets (VCFs, expression matrices, custom files), and a live PubMed/preprint literature index. The reasoning layer queries these sources on every request — results are never cached or stale.
3

Compute layer

A containerized execution environment with Python and R, pre-installed scientific libraries (pandas, NumPy, Biopython, scanpy, pydeseq2, matplotlib, and 15+ others), and support for long-running background jobs. Code runs in isolated sandboxes with results streamed back inline or available asynchronously.
4

Output layer

Structured artifacts — reports, spreadsheets, visualizations, code files, protein viewers — generated from any analysis and versioned for review, export, or sharing.

Genomic variant analysis

MIP includes a full variant interpretation workflow for clinical and research genomics:
StepWhat happens
Data ingestionUpload VCF files in GRCh37 (hg19) or GRCh38 (hg38) format. The platform auto-detects the genome assembly and validates input.
AnnotationVariants are annotated using Ensembl VEP 115 with RefSeq transcripts, population frequencies from gnomAD, clinical classifications from ClinVar, and in silico predictions from SIFT, PolyPhen-2, and AlphaMissense. See the Annotation Pipeline for technical details.
ClassificationEach variant receives an automated ACMG/AMP pathogenicity classification based on 28 standard criteria. Classifications can be refined interactively using AI-enhanced evidence gathering.
InterpretationAnnotated variants are available for querying, filtering, and analysis through the AI chat interface, variant browser, and code execution environment.
ReportingStructured clinical reports can be generated on demand with findings, ACMG evidence, and case summaries.
Supported sequencing applications:
ApplicationDescription
Whole Exome Sequencing (WES)Coding regions, typically 30,000-100,000 variants
Whole Genome Sequencing (WGS)Full genome coverage, up to 1 million variants
Targeted gene panelsCustom or curated gene sets, typically 100-5,000 variants
Chromosomal Microarray Analysis (CMA)Copy number variant detection from array-based platforms

Multi-omics and custom data

Beyond genomics, MIP supports analysis of any biological dataset you bring:
  • Transcriptomics — RNA-seq count matrices, differential expression, pathway enrichment
  • Proteomics — Protein quantification, structure exploration via AlphaFold and PDB
  • Metabolomics — Metabolite identification, pathway mapping
  • Single-cell — scanpy-based workflows for clustering, trajectory analysis, marker identification
  • Custom datasets — Upload CSVs, TSVs, or other tabular data and analyze with natural language or code
MIP doesn’t require a specific input format for non-genomic data. Describe what you have, upload it, and the reasoning layer figures out how to work with it.

Data sources

MIP integrates evidence from 40+ public databases across six categories. All sources are queried in real time.
DatabaseCategoryUsed for
PubMedLiteraturePeer-reviewed publications, systematic reviews, case reports
bioRxiv / medRxivLiteraturePreprints and emerging research
ClinVarClinicalVariant clinical significance, review status, conditions
OMIMClinicalGene-disease relationships and inheritance patterns
ClinGenClinicalGene-disease validity and dosage sensitivity
OrphanetClinicalRare disease classifications and epidemiology
gnomADPopulationAllele frequencies across global populations
dbSNPPopulationVariant rsID identifiers
GWAS CatalogPopulationGenome-wide association study results
Ensembl VEPAnnotationConsequence prediction, transcript mapping, HGVS nomenclature
RefSeqAnnotationTranscript definitions and gene models
UniProtProteinProtein sequence, function, domains, PTMs, active sites
PDBStructureExperimentally determined protein structures
AlphaFoldStructureAI-predicted protein structures
AlphaMissensePredictionDeep learning-based missense pathogenicity scores
SIFTPredictionSequence homology-based functional impact
PolyPhen-2PredictionStructure and sequence-based functional impact
ChEMBLDrug discoveryBioactivity data, compound-target relationships
Open TargetsDrug discoveryTarget-disease associations, tractability
DrugBankPharmacologyDrug mechanisms, interactions, pharmacokinetics
PharmGKBPharmacogenomicsDrug-gene interactions and dosing guidelines
KEGGPathwaysMetabolic and signaling pathway maps
ReactomePathwaysCurated biological pathway knowledge base
Gene OntologyAnnotationFunctional annotation, biological process, molecular function
MyGene.infoGeneGene metadata aggregation across multiple sources
NCBI GeneGeneGene summaries, nomenclature, orthologs
HPOPhenotypeHuman Phenotype Ontology terms and gene associations
Phen2GenePhenotypePhenotype-to-gene matching with ranked scores
If there is a database or data source you need integrated, let us know at contact@purna.ai. We actively prioritize requests from researchers.

Output types

MIP produces structured, versionable outputs from any analysis:
HGVS nomenclature, consequence terms, transcript mapping, population frequencies, in silico predictions, and automated ACMG/AMP classifications with per-criterion evidence. See ACMG Classification.
Variant interpretation reports, findings summaries, case overviews, and ACMG evidence reports. See Clinical Reports.
Color-coded protein sequence viewers with domain annotations, PTMs, active sites, transmembrane regions, and feature tracks. Single protein and comparison modes with sequence alignment.
Summarized findings from PubMed searches, contradiction analysis, evidence tables, and citation-backed answers to specific research questions.
Multi-agent debate outputs including refined hypotheses, agent consensus, unresolved tensions, minority dissent, and aggregated literature evidence. See Consilium.
Publication-quality figures generated from natural language or code — bar, line, scatter, pie, heatmaps, volcano plots, and more.
Tables, statistical outputs, and programmatic analysis from Python or R running in a sandboxed compute environment. See Code Execution.
Tabular data in CSV or spreadsheet format, downloadable and shareable.

Limitations

  • MIP is a decision-support tool. It does not make autonomous clinical decisions and is not intended to replace the judgment of qualified professionals.
  • Automated ACMG/AMP classifications are rule-based starting points. All classifications should be reviewed by a trained professional before clinical use.
  • Variant annotations reflect the versions of reference databases at the time of processing. Re-annotation may be needed as databases are updated.
  • AI-generated analysis should be verified against primary sources. MIP provides citations to make this straightforward.

Security and privacy

  • All data is encrypted in transit (TLS) and at rest.
  • Patient and research data is stored in your organization’s isolated environment and is never shared across accounts.
  • Access to individual cases and projects is controlled through role-based permissions configurable in Team settings.
  • All data access and mutations are logged to an immutable audit trail.
Never include patient-identifying information such as names, medical record numbers, or contact details in case names, notes, or shared conversations. Use anonymized identifiers for all cases.