mmpc-logo mmpc-logo
twitter-logo @NationalMMPC
| Create Account | login
Coordinating and Bioinformatics unit
The Coordinating and Bioinformatics unit is responsible for the creation of the software and informatics infrastructure for the consortium as well as facilitating the efforts of the mouse engineering centers. This page provides information about the infrastructure created for the consortium as well as any software created for the scientific community.

Lab Personnel

Mike Jiaqi Brianna Lili
Mike Aufiero
Systems Analyst
Jiaqi Li
PhD Student
Brianna Perez
Curator
Lili Liang
Senior Research Assistant

Biostatisticians/Bioinformatics

Ashok Nathan Tae
Ashok Sharma
Associate Professor
Hongyan Xu
Professor
Tae Lee
Systems Analyst


Infrastructure Information
MMPC IT Infrastructure
Our programming paradigm is to develop software systems based on an n-tier architecture, where we create the presentation layer, business logic and data layer into separate software systems. These systems have been developed to minimize maintenance, but provide a robust scalable model for future growth and interactions at the national level with other organism databases. These systems have been designed using the unified modeling language (UML) with the designs available to the general public. The two UML modeling tools we use are Rational Rose and Powerdesigner.

MMPC WebServices
MMPC has a broad array of WebServices available for implementing methods related to client accounts, the order process and catalog services. The following documentation is provided to assist the MMPCs understand the nomenclature and usage of these services. Please click here to view the pdf document.



MMPC Data Model
The core relational data model for the MMPC was created using SQL Server 2000 and was based on a number of existing schemas containing our key subject areas: animal models, genotypes (including array experiment data), histopathology, and phenotype Assays. The Mouse Models of Human Cancer Consortium (MMHCC) and the Jackson Labs were particularly helpful, and shared several successful models. Currently DiaComp Data Model has been migrated to SQL Server 2005 and has been modified to include MMPC (National Mouse Metabolic Phenotyping Centers) Data Schema. The current version of the database addresses several domains, including DiaComp - MMPC administration, models, strains, publications, external database references, experiments, phenotype assays, microarray data, histology, images and dataset persistence. Current data model has 250 tables, 55 functions, 994 stored procedures, 141 data views and a total of 9344 lines of code.

MMPC Administration Data Model

MMPC Science Data Model

* Note: Above links require Internet Explorer version 5.0 or above to view Data Model with Zoom capability. Also please make sure to accept ActiveX warning to start viewer. Viewer has links to different data schemas on Navigation Dropdown Box, you will need to click go Next to the Links to load different schema.

MMPC Object Model
The MMPC Object Model (MMPC-OM) created for the consortium fully describes the activities of the MMPC and provides an OOP API to access the data generated by the consortium. The MMPC-OM was designed using Powerdesigner and UML, written in C# and compiled as a .NET DLL. The object model contains both administrative and domain specific classes. However, only the data centric classes are available to the public. The Domain classes provide both object specific classes (e.g. Model, Strain, Experiment, Protocol, etc.) as well as DataManager and SearchCriteria classes used to retrieve data from the system. These DataManager classes are specific for each of the data types maintained by MMPC. For example, the StrainMgr class provides methods to retrieve strain specific data. The SearchCriteria classes are also datatype specific and are used by the DataManager classes to query the database using different type specific parameters. For example, the StrainSearchCriteria class provides queryable properties specific for the Strain data in the system.

MMPC Object model base was modified to add MMPC (National Mouse Metabolic Phenotyping Centers) schema. Currently common object model for both consortium contains classes to serve DiaComp and MMPC consortium web portals.

In order to provide the broadest access to the data, we are also creating a WebService that exposes specific portions fo the MMPC-OM to the public. Specifically, the WebService will provide access to all the object specific classes as well as the DataManager and SearchCriteria classes. This provides a mechanisms for programmers to create local MMPC-OM objects in other languages. The current version of the MMPC-OM has 185 object classes.


Software Applications
ParaKMeans
ParaKMeans is a high performance parallel processing implementation of the K Means Clustering algorithm. We designed the software so it can be deployed on most Windows operating systems. The applications are written for the .NET Framework v1.1 using the C# programming language. The parallel nature of the application comes from the use of a web service to perform the distance calculations and cluster assignments. Because we use a web service, it is essential that at least one computer has Internet Information Services (IIS v.5 or better) installed and running. The parallel K Means algorithm used in this application is based on the work of Ben Zhang, Meichun Hsu and George Forman.
If you make use of the program presented here, please cite the following article:

Kraj P, Sharma A, Garge N, Podolsky R, McIndoe RA: ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use. BMC Bioinformatics 2008;9:200.
HPCluster
Clustering is an unsupervised exploratory technique applied to microarray data to find similar data structures or expression patterns. Because of the high I/O costs involved and large distance matrices calculated, most of the clustering algorithms fail on large datasets (30,000+ genes/200+ arrays). We propose a new two-stage algorithm which partitions the high dimensional space associated with microarray data using hyper planes. The first stage is based on the BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) algorithm with the second stage being a conventional k-Means clustering technique. Because the first stage traverses the data in a single scan, the performance and speed increases substantially. The data reduction accomplished in the first stage of the algorithm reduces the memory requirements allowing us to cluster 44,460 genes without failure and significantly decreases the time to complete when compared to popular k-Means programs. The software was written in C# (.NET 1.1). This algorithm has been implemented in a software tool (HPCluster) designed to cluster gene expression data.
If you make use of the program presented here, please cite the following article:

Sharma A, Podolsky R, Zhao J, McIndoe RA: A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets. Bioinformatics 2009;25:1152-1157.
ParaSAM
Significance Analysis of Microarrays (SAM) is a permutation-based method that relies on estimating the FDR for determining significance. SAM is freely available as an Excel plug-in and as an R-package module. However, for large datasets the memory requirements are high and the algorithm fails. To overcome the memory limitations, we have developed a parallelized version of the SAM algorithm called ParaSAM. This high performance multithreaded application does not require programming experience to run and is designed to provide the general scientific community with an easy and manageable client-server Windows application. The parallel nature of the application comes from the use of web services to perform the permutations. The software is written in C# (.NET 1.1) and is designed in a modular fashion to provide both deployment flexibility as well as flexibility in the user interface. Our results indicate ParaSAM is not only faster than the serial versions, but can analyze extremely large datasets that cannot be performed using a single PC.
If you make use of the program presented here, please cite the following article:

Sharma A, Zhao J, Podolsky R, McIndoe RA: ParaSAM: A parallelized version of the significance analysis of microarrays algorithm. Bioinformatics 2010.

Menu

Home
Contact
About MMPC
Animal Husbandry
Tests Data
Search Data
Analysis
Clients
MMPC Centers

Newsletter

Interested in receiving MMPC News?
twitter-logo Mouse Phenotyping
@NationalMMPC



2017 National MMPC. All Rights Reserved.