Dipping into Alfresco

For our office, I needed to incorporate another collaboration site for our team on a project.  I had already used MS Sharepoint on a previous 3 year project.  The interesting thing is the users could never warm up to Sharepoint on that project.  Yes, Sharepoint is customizable, and I did a minimal amount with logo changes and the like.  But I think folks just found useability a little bumpy.  The main place for collaboration on that project?  Ended up being a plain old FTP site (which we always had) with one virtual directory/site as an upload area, and another virtual directory/site as the read-only repository.  Of course, that meant I had to do the moving of permanent documents and files to the ‘download-only’ site.

Enter ‘open source’ solutions for collaboration.  There a a few out there that do collaboration and ECM/WCM, like Drupal and Alfresco.  Based on the reviews and forums, I chose Alfresco.  Alfresco has two forks for end users: Enterprise paid-for model that is stable and comes with support, and the ‘community’ version that is pure open source with GPL and the whole 9 yards.  The community version is a little bit on the bleeding edge.  For instance the ‘beta’ which is available for download now is 3.4e, where the somewhat stable version is 3.4d (which I am using).  All-in-all, I am very impressed with collection of coding that is essentially java-based with a ‘spring surf’ model as the framework.

What are the main reasons I chose Alfresco?  Main reason is the Sharepoint compatibility.  Your Microsoft Office apps do not know the difference.  It works seamlessly from within the Office apps, or from the Alfresco server using “inline” editing.  Another reason is cost.  Well, that is a gray area.  Sometimes when using open source with no official support, you are on your own and a production environment can be at risk of strange things (this is the case with Alfresco too – more on that later) happening.  But the cost of the Community Edition is nothing.  The hardware is up to you (I have a 64-bit Linux box running Ubuntu 10.04).  To get basic Sharepoint functionality, I would have had to get Windows Server 2008 Web Edition (minimum entry costwise), and configured the Sharepoint Foundation (sharepoint services) to work for external token-based security logins, as I am not using our Active Directory.  Why?  because this project, like the last is a statewide project that involves people outside our organization.  Sharepoint does work that way using form-based authentication, but it wasn’t easy out-of-the box.  I setup a testbed version of Alfresco with the binary installer and it works with external users from the get-go.

Up and running:  It took two installs to get Alfresco running properly.  The install uses a binary ‘.bin’ file that aparently works on many versions of Linux (did I mention Alfresco is also available for Microsoft X86 and X64 boxes?).  My first install failed because I already had Tomcat on my box and it conflicted horribly.  That was the main reason, another was my SMTP server which was Zimbra.  So I uninstalled Alfresco, Zimbra, Tomcat, and MySQL.  I now had a clean Ubuntu server which still had Apache2 (I also deleted all virtual web sites for good measure).  Then I reinstalled Alfresco using defaults (I chose to have it install MySQL).  Once installed, it resided in /opt/alfresco directory.  I liked this idea better anyway, especially if this box was only used for this collaboration site/server.  All pertinent files are contained in one directory tree.  Nightly back-ups simply backup the /opt/alfresco directoty and everything gets saved like a bare-metal backup of a complete server.  If something goes horribly wrong, a simple replacement of that directory tree brings everything back the way you want it.  If you set it up as a service, then you have a script in your /etc/init.d directory to start it up and that would be the only other area related to Alfresco involved in getting it running.  If you don’t want it running at boot, you run from script (/opt/alfresco/alfresco.sh start).

Next I had to get my SMTP server installed.  I chose Postfix for simplicity.  There are a number of things you must do to get your email server to work with Alfresco for mailing out invites (the default model for getting users on the share site).  These are mostly taken care of in the global properties file (alfresco-global.properties).  Depending on your install, the best way to find where it is is (in Linux) “find -name alfresco-global*”.  You must set the server to know your email setup.  Like this:

##Email Outgoing ####


The interesting thing about those settings is the default “from” does not work.  You have to go into the actual template in your repository to make that change.  You have to log into the share site as ‘admin’ to do this.  This is the file you want to change under ‘data dictionary’ in repository:

Email Invite

That should take care of email invites.  One other thing that is catastrophic if not taken care of.  In all the install and setup blogs, I did not see this mentioned.  It is pretty major.  It may be a moot point after the next version, but version 3.4d remains broken in the install binary.  It is this:  All goes well for your share site (days, weeks even) until someone uploads a PDF that messes with the java library.  Apparently not all PDFs cause the issue, but all it took is one with our site.  This can actually bring your site to its knees.  After that nothing works, including Tomcat.  If I had known about this beforehand, I would not have a live site come down (took 5 hours to find and solve issue).  What you get is an Apache page that says the ‘service is temporarily unavailable’ with ‘due to maintenance downtime or capacity problems’.  It happens after the upload and then if any user navigates to the ‘document library’…..boom! everything is hosed.  The real tricky part is getting it working right again.  If it was early in the day, a restore of the Alfresco tree would be a simple cure, but underlying problem still has to be corrected.  In my case it was in the afternoon and all the morning data would be lost.  restarting Tomcat throws errors because it died without getting rid of the ‘pid’ file in the tomcat ‘/bin’ directory.  What I did to get it back on its feet to do a proper shutdown was delete stop alfresco with alfresco.sh script (with errors).  Then I deleted all files in /opt/alfresco/tomcat/temp and delete catalina.pid and catalina.out in the /opt/alfresco/tomcat/bin directory.  This can get the site up and running again after running start script.  Of course, site goes down horribly the moment a user goes into repository again.  Boy, I wish I knew this information before site went live.  Anyway,  it has to do with two files – the pdfbox and fontbox files that relate to the java JDK files in the library.  Alfresco 3.4d shipped with version 1.2 of PDFbox and Fontbox.  A number of people on the net said that installing or upgrading to ‘openjdk-1.6’ vs. Sun or other versions.  Well that is exactly what I had, so that wasn’t the issue for me.  It is the actual PDFbox and Fontbox .jar files.  here’s the weird thing, replacing with current version 1.6 does not fix it, in fact it really screws up Tomcat and all the log files show tons of errors.  I like to delete current logfiles after each fix so I get pure output from current fix.  The key for me was replacing the .jar files with version 1.3.  That fixed everything – all stable again.  You get them here (go to older release section for 1.3):

http://pdfbox.apache.org/download.html   You simply replace the old 1.2 versions with the 1.3 versions (do not rename them, leave as 1.3), but you must remove or delete the 1.2 versions.  I left the 1.2 versions in with ‘.old’ appended to name and it didn’t work, so I moved them to an inert directory.  Where are they in the Tomcat tree?  Right here:  /opt/alfresco/tomcat/webapps/alfresco/WEB-INF/lib

I hope this helps you before you get into trouble like I did.