Stuff. And nonsense.

HTML Tidy wrapper for .NET

This is a managed .NET wrapper for the open source, cross-platform Tidy library, a HTML/XHTML/XML markup parser & cleaner originally created by Dave Raggett.

I'm not going to explain Tidy's "raison d'être" - please read Dave Raggett's original web page for more information, or the SourceForge project that has taken over maintenance of the library.

Download: Tidy.1244 version Revision 1244, 29 July 2009
Compressed (zipped) file, 336.2 KB

Sample Usage

Here's a quick'n'dirty example using a simple console app - written on my Mac using Mono, no less!

Note: always remember to .Dispose() of your Document instance (or wrap it in a "using" statement), so the interop layer can clean up any unmanaged resources (memory, file handles etc) when it's done cleaning.

using System;
using Mark.Tidy;

public class Test
{
  public static void Main(string[] args)
  {
    using (Document doc = new Document("<hTml><title>test</tootle><body>asd</body>"))
    {
      doc.ShowWarnings = false;
      doc.Quiet = true;
      doc.OutputXhtml = true;
      doc.CleanAndRepair();
      string parsed = doc.Save();
      Console.WriteLine(parsed);
    }
  }
}

results in:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>test</title>
</head>
<body>
asd
</body>
</html>

This wrapper is written in C#, and makes use of .NET platform invoke (p/invoke) functionality to interoperate with the Tidy library (written in portable ANSI C).

Included in the download alongside the wrapper library (Mark.Tidy.dll) are 32 and 64-bit builds of the Tidy library (libtidy.dll) for the Windows platform. The appropriate DLL should be copied into your app's assembly folder alongside the wrapper library.

I'm not using Windows!

- MonologoThanks to the platform-agnostic nature of ANSI C, and the excellent work of the people at the Mono Project, you can use this wrapper library anywhere that Mono is supported, assuming you can find (or build) a version of the underlying Tidy library for your platform. That shouldn't be too hard - it's a default part of a standard Mac OS X install, for example; it probably is for most Linux distributions as well.

Under Mono, you might need to re-map the p/invoke calls to the appropriate library in your .NET application's *.config file - or you might find it just works (as it seems to on my Mac laptop). See this page on DLL mapping for more information on achieving this.

The API

At this stage I've just created a basic mapping of each of the configuration options made available by Tidy to properties of the main Document object - I've renamed a few things here & there, but it should be pretty easy to figure out what each property does (the XML documentation file included in the download includes the original Tidy option name for each property). You can read the Tidy configuration documentation here.

The Future

At some point I'll add a nicer ".NET-style" API layer over the top, as it's a bit clunky (although perfectly usable) at the moment.

Release notes

r1244

  • Fixed a bug on cleanup when a Document object was initialised with a stream of 2 bytes or less. Reported by Eqbal Sajadi.

r1037

  • Initial release.

Latest Releases