Although I haven’t really dabbled with AWS I do know anecdotally that it appears to be gaining ground among the library computing community for hosting (meta) data sets and experimental projects. I have also heard rumors of adoption of AWS by my employer.
Having been a frequent reader of Deepak Singh’s business|bytes|genes|molecules blog over the last year, I was interested to see him hired by Amazon as business development manager of Amazon Web Services. On December 04, 2008, Deepak posted to the AWS blog the announcement of Public Data Sets on AWS. PDS on AWS is a data sharing experiment that takes advantage of Amazon’s in-the-cloud storage and computing services.
Just two weeks after Deepak’s post, Clint Boulton at eWeek confirmed that Google had axed its own Research Datasets project along with other projects of questionable value to Google’s bottom line. While sharing data across the web, publicly or not, will surely become more common among researchers, milking copious amounts of ad revenue from that sharing is less likely.
The storage and computation of large datasets appears to be more in line with the AWS business model and perhaps Amazon has the lead on scalable architecture to support cloud computing.  Even if large numbers of researchers and research projects store and crunch their data on the web, that in itself won’t score big in the social web scene. Programmers, analysts, and machines are more likely to be interfacing directly with the data than are the research investigators themselves.
It’s yet to be seen what Microsoft’s strategy for data storage might be in the recently released Azure platform, but they obviously have eyes on the educational and research markets. Products like live@edu and SharePoint are increasing Microsoft’s reach into the academic computing world.
The Microsoft Research group quietly released a beta version it’s own repository software, running on .NET and SQL Server of course, but this isn’t just a reformulation of Dspace using Microsoft ingredients:
“The platform focuses on the management of research assets-such as people, papers, lectures, workflows, data, and tags-as well as the semantic relationships between them.”
Sounds like they’re paying attention. And they’re beginning to appeal in ecumenical fashion to the larger research community by offering things like OfficeSWORD and taking part in discussions about open research repositories.
What is most interesting from the perspectives of the library and the university’s research office is how these services will redefine our notion of the “institutional repository”. On one hand, many IT services such as web hosting and email have been commoditized to the point that institutions, especially smaller publicly-funded campuses, are unable to resist the cost savings and agility that come from hosted services like live@edu. Why not commit fully to the .NET architecture and have your institutional repository software and data hosted on Azure as well?
On the other hand, why not take advantage of AWS’ flexibility and scalability for storing data or running our repository application?
Regardless of the platform(s) we choose, our notion of “institutional repository” is going to be stretched as we want to aggregate data and services from multiple platforms. How will our Dspace service reflect our data stored in AWS? The building blocks are already in place to support more complex relationships between our repositories, services, and data. The time has finally come to put them to work.
Research Data in the Cloud
Although I haven’t really dabbled with AWS I do know anecdotally that it appears to be gaining ground among the library computing community for hosting (meta) data sets and experimental projects. I have also heard rumors of adoption of AWS by my employer.
Having been a frequent reader of Deepak Singh’s business|bytes|genes|molecules blog over the last year, I was interested to see him hired by Amazon as business development manager of Amazon Web Services. On December 04, 2008, Deepak posted to the AWS blog the announcement of Public Data Sets on AWS. PDS on AWS is a data sharing experiment that takes advantage of Amazon’s in-the-cloud storage and computing services.
Just two weeks after Deepak’s post, Clint Boulton at eWeek confirmed that Google had axed its own Research Datasets project along with other projects of questionable value to Google’s bottom line. While sharing data across the web, publicly or not, will surely become more common among researchers, milking copious amounts of ad revenue from that sharing is less likely.
The storage and computation of large datasets appears to be more in line with the AWS business model and perhaps Amazon has the lead on scalable architecture to support cloud computing.  Even if large numbers of researchers and research projects store and crunch their data on the web, that in itself won’t score big in the social web scene. Programmers, analysts, and machines are more likely to be interfacing directly with the data than are the research investigators themselves.
It’s yet to be seen what Microsoft’s strategy for data storage might be in the recently released Azure platform, but they obviously have eyes on the educational and research markets. Products like live@edu and SharePoint are increasing Microsoft’s reach into the academic computing world.
The Microsoft Research group quietly released a beta version it’s own repository software, running on .NET and SQL Server of course, but this isn’t just a reformulation of Dspace using Microsoft ingredients:
“The platform focuses on the management of research assets-such as people, papers, lectures, workflows, data, and tags-as well as the semantic relationships between them.”
Sounds like they’re paying attention. And they’re beginning to appeal in ecumenical fashion to the larger research community by offering things like OfficeSWORD and taking part in discussions about open research repositories.
What is most interesting from the perspectives of the library and the university’s research office is how these services will redefine our notion of the “institutional repository”. On one hand, many IT services such as web hosting and email have been commoditized to the point that institutions, especially smaller publicly-funded campuses, are unable to resist the cost savings and agility that come from hosted services like live@edu. Why not commit fully to the .NET architecture and have your institutional repository software and data hosted on Azure as well?
On the other hand, why not take advantage of AWS’ flexibility and scalability for storing data or running our repository application?
Regardless of the platform(s) we choose, our notion of “institutional repository” is going to be stretched as we want to aggregate data and services from multiple platforms. How will our Dspace service reflect our data stored in AWS? The building blocks are already in place to support more complex relationships between our repositories, services, and data. The time has finally come to put them to work.