Federated Search for Google Search Appliance

MuseGlobal and Adhere Solutions recently announced a federated search extendor, the All Access Connector, for the Google Search Appliance and Google Mini. Sol at Federated Search Blog raises some good questions about how relevancy is calculated for search results. One point is that Google’s PageRank probably won’t fare well in the enterprise. He says it this way in a previous post:

…the popular search engines perform full text searches of unstructured text but enterprise content is much more structured than content in the Internet at large, it often contains fielded data in databases, and it is often hierarchically organized. Federated search vendors that want to sell into the enterprise need to consider this important difference.

True. However, Google isn’t new to enterprise search and they’re quick to point out that the algorithms they use for web content aren’t the same as for the GSA. Nevertheless, I am curious to know if it’s Google or MuseGlobal doing the relevancy math.

Sol also makes an interesting prediction about the impact the product will have on the market:

For better or worse, I think this offering will get many potential customers to view federated search as a commodity. Thus, it will force the high-end federated search vendors to work even harder than they do now to differentiate themselves from their low-end competitors. I can see it now: prospective customers will start using Google as a reference for product comparisons and will expect vendors to provide cheap and simple solutions.

My information, including an article at Information Today, says the AAC will run, in most cases, at least $50,000 plus over two years. That’s in addition to the cost of the Google appliance. I’m not sure which competitors or price tags Sol considers low-end in the federated search space. I wouldn’t consider this low-end. In my experience, such a price point might actually hit a sweet spot where only a couple of vendors exist now, especially for organizations that have already invested in Google search.

Add comment May 23rd, 2008

Best of JA-SIG 2008

Lots of great Open Source and Community Source work was showcased at JA-SIG this week. Here’s a list, in no particular order, of the most interesting, most relevant projects for me:

OpenCollection

collections management and online access application for museums, archives and digital collections.

Sophie

software for writing and reading rich media documents in a networked environment.

SEASR/NEMA

rich media analytics for humanists and artists.

Policy Archive, policyarchive.org

DSpace repository using the Manakin XMLUI. A comprehensive digital library of public policy research.

IMS Learning Tools Interoperability (LTI) v2.0

guidelines for the interaction of tools with learning/course management systems. This is really about decoupling functionality from any single LMS. It would create a more pluggable model, enabling faculty or students to be application producers and Learning Management Systems and other applications to be consumers.

Fluid Project

collaborative project for developing and distributing a library of sharable customizable user interfaces designed to improve the user experience of web applications. Fluid is not only developing component libraries, but is also churning out research, education, and outreach about how to design user experiences.

VIVO at Cornell

discover who at Cornell is working on a particular research topic; what they’ve taught or published recently; where facilities might be and what online tools are available to expedite research. Powered by RDF and Semantic Web technologies.

Add comment April 30th, 2008

Manakin XML UI for DSpace 1.5

Mark Diggory

Look & Feel

  • CSS, HTML layout

Branding

  • Repository
  • Communities
  • Collections
  • Items

Visualization

  • Interpret metadata
  • Link metadata
    • can serialize metadata to JSON
  • Explain metadata

Share

Tiers

  1. Style Tier

    1. Simple themes
    2. XHTML + CSS
  2. Theme Tier
    1. Complex themes
    2. XSL + XHTML + CSS
  3. Aspect Tier
    • Add new features
      • Introducing new content into pipeline
      • Introducing new functionality
    • Cocoon + Java

Resources

Documentation

  • DSpace manual
  • Theme writing tutorial
  • Mailing Lists

Cocoon

  • DSpace will use Spring-based Cocoon in future
  • Understand the Cocoon Pipeline. Manakin imposes another model on top of Cocoon (themes, styles, aspects)
  • DRI Schema - Abstract representation of a repository page
  • Metadata elements
    • References out to METS
  • Structural elements
    • TEI (Lite)
  • defines logical structure for rendering content

    Aspects:

  • Applied to all pages (even if they don’t add anything to page)
  • DRI abstracts away characteristics to be rendered later in HTML (”highlighting” for bold, italics, etc.)
    • DRI -> XHTML default template in Manakin (base XSL library). Custom XSL overrides templates in base.
  • Aspects apply transforms to the DRI
    • Base XSL library:
      • Package
      • Structural display
      • Metadata handlers - generally broken up into Lists and Views
        • SummaryList
        • SummaryView
        • DetailedList
        • DetailedView
    • Have access to all the Request Objects and methods throughout the Aspect chain.
  • Themes should ideally be packaged up as webapp overlays

Add comment April 30th, 2008

Upgrading DSpace

Mark Diggory, MIT

Upgrading Version 1.4.2 to Version 1.5.x

Pre-session chat/gripes about inadequacy/orphan status of stats module.

1.5.1 coming out soon

Code Reorg

  • Separates Java code into functional units (API, OAI, JSP-UI)
  • Reorgs resources by web-app service (OAI, JSP-UI)
  • Adds new web-app services (SWORD, LNI, XML-UI)
  • Allows for better customization (Overlays)
  • These services are all committer supported going forward
  • \src is broken up into each service. dspace\src does NOT contain primary source code
  • Distribution does NOT contain any JARs. This is where Maven comes in. “Convention Driven”. Enforces src code conventions. Dependency resolution (think CPAN for Perl). Build Modularization.
    • Disadvantages of Maven: learning curve, distributed configuration, requires network Internet access, larger project than Ant (many sub-projects and plugins)

    DSpace Release and Build Process

  • Build downloads a dependency
  • pom.xml represent the modular, distributed dependency model. For parents and dependencies, if artifacts aren’t found on local server, Maven will look for them in the central repository (cloud) and pull them down for the build process.
  • Use distribution package (not source). Only reason for using source package is to make significant modifications to build process or Java Virtual Machine requirements. Instead customizations should be done against the
  • Each module is a Maven Project. Can provide “overlays” for modules. Modify code in “target\”? Target files are what get built to WAR

Configuration

  • New Configurability
    • Stackable Authentication
    • Configurable Browse
    • Configurable Submission
    • Separate New Module Configurations
  • Maintain configuration files in CVS
    • for upgrade, use CVS to compare local config file to original 1.4.1 file, then copy those properties over to appropriate place in 1.5 (contrary to original 1.5 documentation)
  • Stackable authentication changes
    • in config, org.dspace.eperson is changed to org.dspace.authenticate
  • Configurable Browse
  • Database schema changes (new/dropped tables and columns) - more intelligent about how it manages the datastore in the dbConsider contributing to DSpace documentation

Planning

  • Backup everything often
    • database
      • (sql db dump) /usr/bin/pg_dump –create –oids\ -U postgres -f backup.sql dspace
    • customizations, configuration, app directory
      • ${assetstore.dir}…${assetstore.dir(N)}
      • more…
    • assetstore
    • disaster recovery
  • Track customizations
    • MIT created package import support for OpenCourseware content packages.
  • Map migration path
  • Ask questions!
  • Practice alot
    • MIT does upgrade repeatedly to ensure everything works before going to production

Upgrade:

  • Building w/ Maven
  • Installing w/ Ant
  • Upgrading Database
  • Rebuilding Search/Browse

Development

  • Eclipse setups available on http://wiki.dspace.org
  • Maven plugins for Eclipse
  • Process (Mark demos upgrade)
    • drop in customized JSPs from 1.4 to dspace1.5/dspace/modules/jspui/src/main/webapp/layout
    • add in config changes from 1.4 one at a time
    • terminal: navigate to dspace/ and use Maven to build
    • build.xml works differently, [ant update] now updates more directories. Can add entries to backup all directories (config.bak, bin.bak, lib.bak, webapps.bak directories) before it builds new ones
    • install with Ant
    • can configure Tomcat to point to WARs in webapps/ instead of copying files over to Tomcat
    • update database using postgres/bin/psql
  • Events system logs events like editing, addition of bitstreams
  • Tim Donohue has tutorial for Configurable Submission system
  • 1.5 branch on SVN repository is probably a better bet for getting bug fixes, build process fixes, etc. than the release on the web site, i.e.most 1.5.1 changes are already in the 1.5 branch
  • SWORD, LNI can be used to ingest packages from FTP “drop-box” via remote client. Enables remote or batch import without having direct access to the server.

Add comment April 30th, 2008

JA-SIG 2008

I’m at JA-SIG, St. Paul.  It’s winding down today with some sessions, a BarCamp and a uCamp.  I’m looking forward to the uCamp.  Overall, it has been a good conference, probably not as relevant for me personally as the Open Repositories Conference, but still very useful.  And it’s inspiring to see these different projects and developer groups talking to each other and learning from each other.

I’ve had the privilege of hanging out with Mark Diggory a bit as well as other  DSpace cohorts and some of the Fedora guys.  The comaradie between the Fedora and DSpace folks is encouraging. It’s a relief to know that I’m not the only one that admires Fedora’s content model and wonders why DSpace should try to reinvent that with it’s “2.0″ vision versus adopting Fedora as a storage and web services layer and benefiting from a shared developer base.  As one of the Fedora stakeholders put it, we could really turn the heat up on Microsoft by taking advantage of the best of both platforms.

Community Source and Open Source software development is thriving in the academic space.  Collaborate or die!

I’ll be posting my notes from JA-SIG 2008 over the next couple of days.  They’ll be raw, probably incoherent and fraught with errors, but there you are.

Add comment April 30th, 2008

kudos: Cleveland Public Library site

I just came across Cleveland Public Library’s site featured on drupalib.  They’ve done some very nice design work.  Their use of “Premium” as a paradigm for describing research databases is both catchy and sensible.

Add comment April 9th, 2008

Tip: Add .Mac iCal to Google Calendar

If Google Calendar is your primary calendaring spot but you also need to keep track of events in a Mac group, you can subscribe GCal to a .Mac calendar like so:
1. Visit your .Mac group calendar
2. In the left-hand column there’s a “Subscribe” link and a “Download” link.  Copy the URL for the “Download” link.
3. Go to your Google calendar
4. In the left-hand column that shows your list of calendars, click the little drop-down arrow next to “Add” and select “Add by URL”
5. Paste the URL in the Public Calendar field and save.  You’re now subscribed to the .ics output of your Mac group calendar.

I found lots of posts out there for doing the reverse.  It took me a few trials and errors to find the right URL to use, so hopefully I’ve saved you some trouble.  Google Calendar doesn’t seem to like the webcal:// protocol that Apple prefers, so you have to use the http:// version.

Add comment January 30th, 2008

Drupal for OPAC

I wish this was around when I was working with a Millennium system. Of course, it still would have been hard to use since we were in a Microsoft-only shop. I wonder if it’s adaptable to Voyager?

Add comment January 30th, 2008

Medical Research Services in Sharepoint

Recently, a Medlib-er asked for examples of how medical librarians were using Microsoft Sharepoint. The majority of respondants said they had created sites or pages for their library in Sharepoint, duplicating the usual stuff found on library web sites: ILL forms, links to the public catalog, and other sites - essentially reconstructing the library’s public web site in the Intranet, or even just linking to it.

I don’t mean to disparage the efforts of my cohort. Hospital and corporate librarians tend to be lone rangers with little time, resources, and permission to push the envelope. At least they did something. I’m convinced, though, that we can do better than that.

At the academic medical campus where I work, we’ve had a (non-Sharepoint) staff and student portal for some time. The library has worked closely with developers to incorporate some library services into the portal. From my brief experience, though, University staff only pay attention to the portal every two weeks when it’s time to print their timesheets. Students visit maybe a little more frequently to check their campus accounts. Ultimately, though, there’s no reason for anyone to visit the portal in order to get work done.

Sharepoint, as collaboration space, I hope will be different. My goal is to insert library services into the flow of work and study. Not in a “hey, look at us” or “eat your spinach” kind of way, but invisibly and naturally. I’ve spent a little time envisioning how we might accomplish that. I hope to spend a lot more time over the next year.
Here are my early thoughts:
Identify the stages and flow of research, work, and study on campus that might take place in Sharepoint.
Find areas where there’s been an observable, neglected need and suggest how the library might help, eg. metadata, text analysis, categorization, training.
Build small, modular web parts, connectors, and widgets that faculty, staff, and students can include in their own spaces.
Don’t make people come to the Library’s Sharepoint site to do something.
Don’t waste time recreating the Library’s web site in Sharepoint.
Don’t just link to the web site.
Share openly.
I got some serendipitous affirmation and inspiration today while following up on a medical student’s request. Upon entering med school, our medical students receive digital versions of recommended textbooks. This student wanted to know, reasonably enough, if there was an add-on for incorporating Stedman’s Medical Dictionary (which he already owned in digital copy) into Microsoft Word or, even better, OneNote - a popular tablet pc notetaking application among our students [1].

While searching for available options, I ran across a presentation by Carl Nolan, head of the medical research services project involving Microsoft and NHS.  Here’s an excerpt from an article by Microsoft:

Microsoft has invested £40 million in the Common User Interface programme - a series of projects to help the NHS get the most out of its IT investment. One of these projects has been looking for ways to build medical research services into the software that NHS staff already use every day.

These are exactly the kinds of services I would like to see us implement at KUMC. I hope they’re sharing.

Note: What I ultimately found was that for $100 you can buy the Stedman’s Medical Spellchecker which adds a custom dictionary to MS Office apps. But that’s only spellchecking. What if I want to look up the definition of a new term? Ideally, I’d want the spellchecking dictionary feature wrapped into a single service-package with the full dictionary available in the Research Services Task Pane. Instead, both Microsoft and LWW make seem to make that impossible.

Add comment January 30th, 2008

Report on Emerging Technologies Released

The 2007 Horizon Report has been released by the New Media Consortium .

The annual Horizon Report describes the continuing work of the NMC’s Horizon Project, a research-oriented effort that seeks to identify and describe emerging technologies likely to have a large impact on teaching, learning, or creative expression within higher education….

The core of the report describes six areas of emerging technology that will impact higher education within three adoption horizons over the next one to five years. To identify these areas, the project draws on an ongoing conversation among knowledgeable persons in the fields of business, industry, and education; on published resources, current research and practice; and on the expertise of the NMC and ELI communities….

Learn more about the Horizon Project and contribute to future editions at http://horizon.nmc.org/wiki/Main_Page.  NMC is a community of hundreds of leading universities, colleges, museums, and research centers exploring the use of media and emerging technologies in higher education.

Add comment December 15th, 2007

Previous Posts


Recent Posts

Archives

Categories

my flickr

IMAG0193.jpgIMAG0185.jpgIMAG0014.jpgSlide9Slide8Slide7Slide6Slide5Slide4Slide3

my del.icio.us

Links

Blogs