Posts filed under 'libraries'
MuseGlobal and Adhere Solutions recently announced a federated search extendor, the All Access Connector, for the Google Search Appliance and Google Mini. Sol at Federated Search Blog raises some good questions about how relevancy is calculated for search results. One point is that Google’s PageRank probably won’t fare well in the enterprise. He says it this way in a previous post:
…the popular search engines perform full text searches of unstructured text but enterprise content is much more structured than content in the Internet at large, it often contains fielded data in databases, and it is often hierarchically organized. Federated search vendors that want to sell into the enterprise need to consider this important difference.
True. However, Google isn’t new to enterprise search and they’re quick to point out that the algorithms they use for web content aren’t the same as for the GSA. Nevertheless, I am curious to know if it’s Google or MuseGlobal doing the relevancy math.
Sol also makes an interesting prediction about the impact the product will have on the market:
For better or worse, I think this offering will get many potential customers to view federated search as a commodity. Thus, it will force the high-end federated search vendors to work even harder than they do now to differentiate themselves from their low-end competitors. I can see it now: prospective customers will start using Google as a reference for product comparisons and will expect vendors to provide cheap and simple solutions.
My information, including an article at Information Today, says the AAC will run, in most cases, at least $50,000 plus over two years. That’s in addition to the cost of the Google appliance. I’m not sure which competitors or price tags Sol considers low-end in the federated search space. I wouldn’t consider this low-end. In my experience, such a price point might actually hit a sweet spot where only a couple of vendors exist now, especially for organizations that have already invested in Google search.
May 23rd, 2008
Lots of great Open Source and Community Source work was showcased at JA-SIG this week. Here’s a list, in no particular order, of the most interesting, most relevant projects for me:
collections management and online access application for museums, archives and digital collections.
software for writing and reading rich media documents in a networked environment.
rich media analytics for humanists and artists.
DSpace repository using the Manakin XMLUI. A comprehensive digital library of public policy research.
guidelines for the interaction of tools with learning/course management systems. This is really about decoupling functionality from any single LMS. It would create a more pluggable model, enabling faculty or students to be application producers and Learning Management Systems and other applications to be consumers.
collaborative project for developing and distributing a library of sharable customizable user interfaces designed to improve the user experience of web applications. Fluid is not only developing component libraries, but is also churning out research, education, and outreach about how to design user experiences.
discover who at Cornell is working on a particular research topic; what they’ve taught or published recently; where facilities might be and what online tools are available to expedite research. Powered by RDF and Semantic Web technologies.
April 30th, 2008
Mark Diggory
Look & Feel
Branding
- Repository
- Communities
- Collections
- Items
Visualization
- Interpret metadata
- Link metadata
- can serialize metadata to JSON
Share
Tiers
-
Style Tier
- Simple themes
- XHTML + CSS
- Theme Tier
- Complex themes
- XSL + XHTML + CSS
- Aspect Tier
- Introducing new content into pipeline
- Introducing new functionality
- Cocoon + Java
Resources
Documentation
- DSpace manual
- Theme writing tutorial
- Mailing Lists
Cocoon
- DSpace will use Spring-based Cocoon in future
- Understand the Cocoon Pipeline. Manakin imposes another model on top of Cocoon (themes, styles, aspects)
- DRI Schema - Abstract representation of a repository page
- Metadata elements
- Structural elements
- defines logical structure for rendering content
Aspects:
- Applied to all pages (even if they don’t add anything to page)
- DRI abstracts away characteristics to be rendered later in HTML (”highlighting” for bold, italics, etc.)
- DRI -> XHTML default template in Manakin (base XSL library). Custom XSL overrides templates in base.
- Aspects apply transforms to the DRI
- Base XSL library:
- Package
- Structural display
- Metadata handlers - generally broken up into Lists and Views
- SummaryList
- SummaryView
- DetailedList
- DetailedView
- Have access to all the Request Objects and methods throughout the Aspect chain.
- Themes should ideally be packaged up as webapp overlays
April 30th, 2008
Mark Diggory, MIT
Upgrading Version 1.4.2 to Version 1.5.x
Pre-session chat/gripes about inadequacy/orphan status of stats module.
1.5.1 coming out soon
Code Reorg
- Build downloads a dependency
- pom.xml represent the modular, distributed dependency model. For parents and dependencies, if artifacts aren’t found on local server, Maven will look for them in the central repository (cloud) and pull them down for the build process.
- Use distribution package (not source). Only reason for using source package is to make significant modifications to build process or Java Virtual Machine requirements. Instead customizations should be done against the
- Each module is a Maven Project. Can provide “overlays” for modules. Modify code in “target\”? Target files are what get built to WAR
Configuration
- New Configurability
- Stackable Authentication
- Configurable Browse
- Configurable Submission
- Separate New Module Configurations
- Maintain configuration files in CVS
- for upgrade, use CVS to compare local config file to original 1.4.1 file, then copy those properties over to appropriate place in 1.5 (contrary to original 1.5 documentation)
- Stackable authentication changes
- in config, org.dspace.eperson is changed to org.dspace.authenticate
- Configurable Browse
- Database schema changes (new/dropped tables and columns) - more intelligent about how it manages the datastore in the dbConsider contributing to DSpace documentation
Planning
- Backup everything often
- database
- (sql db dump) /usr/bin/pg_dump –create –oids\ -U postgres -f backup.sql dspace
- customizations, configuration, app directory
- ${assetstore.dir}…${assetstore.dir(N)}
- more…
- assetstore
- disaster recovery
- Track customizations
- MIT created package import support for OpenCourseware content packages.
- Map migration path
- Ask questions!
- Practice alot
- MIT does upgrade repeatedly to ensure everything works before going to production
Upgrade:
- Building w/ Maven
- Installing w/ Ant
- Upgrading Database
- Rebuilding Search/Browse
Development
- Eclipse setups available on http://wiki.dspace.org
- Maven plugins for Eclipse
- Process (Mark demos upgrade)
- drop in customized JSPs from 1.4 to dspace1.5/dspace/modules/jspui/src/main/webapp/layout
- add in config changes from 1.4 one at a time
- terminal: navigate to dspace/ and use Maven to build
- build.xml works differently, [ant update] now updates more directories. Can add entries to backup all directories (config.bak, bin.bak, lib.bak, webapps.bak directories) before it builds new ones
- install with Ant
- can configure Tomcat to point to WARs in webapps/ instead of copying files over to Tomcat
- update database using postgres/bin/psql
- Events system logs events like editing, addition of bitstreams
- Tim Donohue has tutorial for Configurable Submission system
- 1.5 branch on SVN repository is probably a better bet for getting bug fixes, build process fixes, etc. than the release on the web site, i.e.most 1.5.1 changes are already in the 1.5 branch
- SWORD, LNI can be used to ingest packages from FTP “drop-box” via remote client. Enables remote or batch import without having direct access to the server.
April 30th, 2008
I’m at JA-SIG, St. Paul. It’s winding down today with some sessions, a BarCamp and a uCamp. I’m looking forward to the uCamp. Overall, it has been a good conference, probably not as relevant for me personally as the Open Repositories Conference, but still very useful. And it’s inspiring to see these different projects and developer groups talking to each other and learning from each other.
I’ve had the privilege of hanging out with Mark Diggory a bit as well as other DSpace cohorts and some of the Fedora guys. The comaradie between the Fedora and DSpace folks is encouraging. It’s a relief to know that I’m not the only one that admires Fedora’s content model and wonders why DSpace should try to reinvent that with it’s “2.0″ vision versus adopting Fedora as a storage and web services layer and benefiting from a shared developer base. As one of the Fedora stakeholders put it, we could really turn the heat up on Microsoft by taking advantage of the best of both platforms.
Community Source and Open Source software development is thriving in the academic space. Collaborate or die!
I’ll be posting my notes from JA-SIG 2008 over the next couple of days. They’ll be raw, probably incoherent and fraught with errors, but there you are.
April 30th, 2008
I just came across Cleveland Public Library’s site featured on drupalib. They’ve done some very nice design work. Their use of “Premium” as a paradigm for describing research databases is both catchy and sensible.
April 9th, 2008
I wish this was around when I was working with a Millennium system. Of course, it still would have been hard to use since we were in a Microsoft-only shop. I wonder if it’s adaptable to Voyager?
January 30th, 2008
Recently, a Medlib-er asked for examples of how medical librarians were using Microsoft Sharepoint. The majority of respondants said they had created sites or pages for their library in Sharepoint, duplicating the usual stuff found on library web sites: ILL forms, links to the public catalog, and other sites - essentially reconstructing the library’s public web site in the Intranet, or even just linking to it.
I don’t mean to disparage the efforts of my cohort. Hospital and corporate librarians tend to be lone rangers with little time, resources, and permission to push the envelope. At least they did something. I’m convinced, though, that we can do better than that.
At the academic medical campus where I work, we’ve had a (non-Sharepoint) staff and student portal for some time. The library has worked closely with developers to incorporate some library services into the portal. From my brief experience, though, University staff only pay attention to the portal every two weeks when it’s time to print their timesheets. Students visit maybe a little more frequently to check their campus accounts. Ultimately, though, there’s no reason for anyone to visit the portal in order to get work done.
Sharepoint, as collaboration space, I hope will be different. My goal is to insert library services into the flow of work and study. Not in a “hey, look at us” or “eat your spinach” kind of way, but invisibly and naturally. I’ve spent a little time envisioning how we might accomplish that. I hope to spend a lot more time over the next year.
Here are my early thoughts:
Identify the stages and flow of research, work, and study on campus that might take place in Sharepoint.
Find areas where there’s been an observable, neglected need and suggest how the library might help, eg. metadata, text analysis, categorization, training.
Build small, modular web parts, connectors, and widgets that faculty, staff, and students can include in their own spaces.
Don’t make people come to the Library’s Sharepoint site to do something.
Don’t waste time recreating the Library’s web site in Sharepoint.
Don’t just link to the web site.
Share openly.
I got some serendipitous affirmation and inspiration today while following up on a medical student’s request. Upon entering med school, our medical students receive digital versions of recommended textbooks. This student wanted to know, reasonably enough, if there was an add-on for incorporating Stedman’s Medical Dictionary (which he already owned in digital copy) into Microsoft Word or, even better, OneNote - a popular tablet pc notetaking application among our students [1].
While searching for available options, I ran across a presentation by Carl Nolan, head of the medical research services project involving Microsoft and NHS. Here’s an excerpt from an article by Microsoft:
Microsoft has invested £40 million in the Common User Interface programme - a series of projects to help the NHS get the most out of its IT investment. One of these projects has been looking for ways to build medical research services into the software that NHS staff already use every day.
These are exactly the kinds of services I would like to see us implement at KUMC. I hope they’re sharing.
Note: What I ultimately found was that for $100 you can buy the Stedman’s Medical Spellchecker which adds a custom dictionary to MS Office apps. But that’s only spellchecking. What if I want to look up the definition of a new term? Ideally, I’d want the spellchecking dictionary feature wrapped into a single service-package with the full dictionary available in the Research Services Task Pane. Instead, both Microsoft and LWW make seem to make that impossible.
January 30th, 2008
I was catching up on Lorcan Dempsey’s thoughts before attending his talks at KU today and tomorrow. Lorcan, thanks for pointing me to John Wilkins’ blog - really more essay than blog. While reading John’s salient and well-structured thoughts on metasearch and library systems, I find myself nodding and thinking “exactly”. “We must not try to do what the network can do for us”. Read on….
December 5th, 2007
I’m working on a presentation one of my colleagues is giving next week. I had the idea to insert a well-known piece of TV history. Thanks to sites like YouTube, Google Video, and MySpace, the video I wanted was easily found. Unfortunately, we’re using PowerPoint which presented its own set of challenges and that’s another story. Ideally, I wanted to cue the video to start at a specific spot in my clip. Sure, the whole clip is of itself entertaining, but I’m trying more and more to keep presentations direct, to the point, and tasteful.
After some searching, I discovered that Google Video now supports timecode linking. Google Video uses the Flash Player and Adobe’s marketing name for timecode links is Cue Points. Well, it turns out that this method works nicely if I want to show the video surrounded by the usual Google search box, links, and video information. I didn’t. I wanted to embed a video object that would preserve my cueing either in a web page or directly within my PowerPoint.
Finally, I discovered Angsuman Chakraborty’s post on how to achieve just what I wanted using embedded Google Video. Google makes the embedding easy, but not so much the cueing. It took me a good hour of searching and URL hacking before I realized that it would have to be done in the FlashVars property of the Flash player if at all. That turned out to be the key not only to solving my original problem, but also in creating a successful search in Google.
I don’t frequently bemoan the weaknesses of brute-force search, as it’s been referred to in the digital library community. Frankly, I don’t experience those weaknesses much. However, this was certainly one of those times.
January 18th, 2007
Previous Posts