Categories
Archives
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- January 2009
- May 2008
- April 2008
- January 2008
- December 2007
- April 2007
- February 2007
- January 2007
- October 2006
- August 2006
- June 2006
- May 2006
Research Data in the Cloud
Although I haven’t really dabbled with AWS I do know anecdotally that it appears to be gaining ground among the library computing community for hosting (meta) data sets and experimental projects. I have also heard rumors of adoption of AWS by my employer.
Having been a frequent reader of Deepak Singh’s business|bytes|genes|molecules blog over the last year, I was interested to see him hired by Amazon as business development manager of Amazon Web Services. On December 04, 2008, Deepak posted to the AWS blog the announcement of Public Data Sets on AWS. PDS on AWS is a data sharing experiment that takes advantage of Amazon’s in-the-cloud storage and computing services.
Just two weeks after Deepak’s post, Clint Boulton at eWeek confirmed that Google had axed its own Research Datasets project along with other projects of questionable value to Google’s bottom line. While sharing data across the web, publicly or not, will surely become more common among researchers, milking copious amounts of ad revenue from that sharing is less likely.
The storage and computation of large datasets appears to be more in line with the AWS business model and perhaps Amazon has the lead on scalable architecture to support cloud computing.  Even if large numbers of researchers and research projects store and crunch their data on the web, that in itself won’t score big in the social web scene. Programmers, analysts, and machines are more likely to be interfacing directly with the data than are the research investigators themselves.
It’s yet to be seen what Microsoft’s strategy for data storage might be in the recently released Azure platform, but they obviously have eyes on the educational and research markets. Products like live@edu and SharePoint are increasing Microsoft’s reach into the academic computing world.
The Microsoft Research group quietly released a beta version it’s own repository software, running on .NET and SQL Server of course, but this isn’t just a reformulation of Dspace using Microsoft ingredients:
“The platform focuses on the management of research assets-such as people, papers, lectures, workflows, data, and tags-as well as the semantic relationships between them.”
Sounds like they’re paying attention. And they’re beginning to appeal in ecumenical fashion to the larger research community by offering things like OfficeSWORD and taking part in discussions about open research repositories.
What is most interesting from the perspectives of the library and the university’s research office is how these services will redefine our notion of the “institutional repository”. On one hand, many IT services such as web hosting and email have been commoditized to the point that institutions, especially smaller publicly-funded campuses, are unable to resist the cost savings and agility that come from hosted services like live@edu. Why not commit fully to the .NET architecture and have your institutional repository software and data hosted on Azure as well?
On the other hand, why not take advantage of AWS’ flexibility and scalability for storing data or running our repository application?
Regardless of the platform(s) we choose, our notion of “institutional repository” is going to be stretched as we want to aggregate data and services from multiple platforms. How will our Dspace service reflect our data stored in AWS? The building blocks are already in place to support more complex relationships between our repositories, services, and data. The time has finally come to put them to work.