Infrastructure – SciServer

The Cyber-infrastructure development core function delivers all aspects of the technical system to support the project goals. This Core Function works closely with the Scientific Integration Core Function to ensure that the system designed and developed will support real scientific research needs and goals.

The setup and establishment of suitable development tools and processes
The re-engineering of the core building blocks of SkyServer to make it extensible to new science domains, and to ensure it remains maintainable and sustainable for the next decade
Purchasing updates to existing hardware, and purchasing additional storage space, to provide the physical environment to support the projects large data scale objectives
Update all existing subsystems to use and operate with these re-engineered building blocks, thus providing for a new baseline system that operates in a much more integrated and centrally managed fashion, with extended capabilities for exposing services to a wider scientific audience
Specifically develop a newly engineered SciDrive “drop-box” interface, integral to the core system that integrates seamlessly with the CASJobs/MyDB system and that can form the basis for later Long-Tail science community support.
Extend the re-engineered infrastructure to provide a framework for “numerical laboratories” – an environment in which a user can access a large science domain database stored in the SkyServer environment, run a complex large scale analysis on the data which may generate many TB of intermediate data, and then direct output of these analyses back to their own user space for subsequent analysis or integration with other data sets providing an experimental ‘loop’ successively and incremental analysis and assessment.
Develop integrated support for GPU and ‘large memory’ numerical processing.
Implement a series of architectural changes to the SkyServer environment to support significant “scalability” in this system to meet the current and future needs for processing data sets so large that it is not even feasible to move the data form one location to another. There are many activities to support and develop this, including piloting and migrating existing technologies developed with the group, scaling out the physical storage infrastructure, scaling out the server clusters, scaling and parallelizing out the database storage architecture, parallelizing the processing architecture, parallelizing data transfer and ingest, and implementing new approaches to data indexing and storage to maximize search and retrieval performance.
Implementing a framework for system logging that can take the current highly successful logging infrastructure to the level required to support the integrated environments and new and diverse datasets and processing requirements.
Investigating and implementing new technologies for the design and development of user interfaces, and maximizing the potential for client side processing to improve user experience, reduce burden on server processing requirements, reduce burden on networking bandwidth, and provide capabilities for advanced data visualization that are not possible with current technologies.