I am learning C#, coming from VBA (Excel and Access). I am having great moments with a grown-up language, particularly with LINQ.
One thing that confuses me is this: One would assume that Microsofts flagship language, C#, would easily integrate with Microsofts flagship product, Office. How come people use third party libraries like EPPlus to talk to Excel?
I get that with EPPlus, you don’t need to have MS Office installed, so it’s useful for servers running Linux. What about desktop apps? Does anyone use Microsoft.Office.Interop.Excel? What would you recommend?
Sorry if this question is below the usual level of this sub, idk where else to post. Thx!
EDIT: I put all suggestions from the comments in one … Excel spreadsheet!
It depends….
We have some projects that depend heavily on Interop.Excel, but not for manipulating or creating Excel files on disk, but to control and manipulate the running Excel instance, in most cases as an Addin.
The Interop.Excel namespace is much slower than most other XLSX libraries you can find on nuget.org – this is because, if you create an Excel.Application object, there is a full Excel app running in background set to visible=false.
If your goal is to read or write XLSX files and not to manipulate a running Excel instance, I recommend, that you use another 3party library.
Thanks, I did not realise this. I’ve had a lot of performance problems creating Excel files with Interop
I tend to use ClosedXML if I’m doing work with Excel. It does only do Excel files, though, so you’d have to find something else if you want to work with Word/PP/whatever.
I’ll second that be careful with large XLSX files as they tend to eat up a lot more memory than their file size.
ClosedXML works great. Also worth taking a look at ExcelDataReader if you want a simple lightweight solution for parsing excel and csv files with a single library.
Awesome, this looks like what I need! I found EPPlus quite bloated if all I want is reading/writing a range into a Sheet. Thank you!
Here’s the deal, it’s not as simple as you think and I like stories. Lots of people are covering, “What do you use?” I want to cover, “Why is it this way?”
VBA and VB were unmanaged languages. Sure, the runtime did your allocation/deallocation for you, but the way they worked is closer to how COM worked than how C# works. VBA in particular was designed around the COM APIs for Office Interop.
As a managed language, it’s much more complicated for C# to use Office’s COM interop APIs. Interop is sort of like a middleman. You have to whisper what you want to do in its ear, then it calls the unmanaged Office libraries and tells them what you said. Then the interop middleman listens to what they say and explains the result to you. It’s slow and clunky.
But MS sort of optimistically hoped it was temporary. .NET was originally “COM 2”. MS really wanted it to replace COM. Unfortunately they were a little too optimistic about that timeline. We see hints of it today but we’re still a far cry from the scenario where new Windows API is written in .NET and C/COM have to use interop to use it. We may never get there.
Anyway.
Office interop created a LOT of problems for MS’s strategy starting in the mid-2000s and heading in to the 2010s. The problem wasn’t just clunky .NET interop. The rise of x64 computers made everything more complicated since COM, being C-based, couldn’t support two architectures with one library. They thought .NET’s AnyCPU support was going to be a solution to this, but again they were a little too optimistic and deployment of Office Interop products can be a nightmare. Worse, the rise of Macs and iOS/Android made it clear that having cross-platform access to Office was becoming dramatically more important, but COM is a Windows-only technology (and for a long time, so was .NET.)
So they basically gave up on the COM interop API being the sole way to get things done in Office. They abandoned (and documented) their complex binary formats and introduced new, openly-specified XML formats for all Office documents. This allowed people to do something they’d never been able to do before: create and edit Office documents without Office installed on a computer. That meant they didn’t need sophisticated new APIs for Linux, Mac, iOS, etc, because anyone (with a lot of effort) could create a library that does what Office Interop did.
Even better: they didn’t necessarily need to develop the no-Office libraries themselves. There was a lot of demand, so companies and individuals did it themselves with no compensation from Microsoft. It’s always nice when you can get free labor. MS still maintains the older interop libraries for people who already have solutions, but it’s not the preferred way to go about it.
I don’t know that they are on record as saying it but I think they see the major use cases as:
If you just want to produce Office documents in an automated fashion without the user interacting with Office, it’s faster and easier to use a third-party API to do so.
If you want to only partially automate an Office workflow and have the user interact with Office for part of that workflow, it has always been preferred to use VBA as the interop between C# and COM is a bit clunky.
openly-specified XML formats for all Office documents
Slight clarification… They are zip archives with mostly XML files in them.
So if someone was thinking, “oh, I don’t even need a library, I can just use the XML APIs to change things!”, that’s probably not a good idea. You can rename an XLSX (or DocX, etc.) to .zip and browse around, and you’ll find that it’s quite a bit more complex than just an XML file and unless automating Office docs is your entire job, you don’t want to sink down that rabbit hole.
Even without the interop between C# and COM using ActiveX / COM Automation was always slow, clunky and liable to leave you with invisible orphaned apps using resources till the PC ground to a halt. Writing the modern Office formats with a third party library is much better.
Stories are always appreciated! Very interesting, thank you!
I do not recommend Microsoft.Office.Interop.Excel
I would recommend OpenXML… specifically, I use the ClosedXML implementation
https://github.com/ClosedXML/ClosedXML
I second this. Microsoft even says don’t use Interop with .NET in many applications that aren’t meant for non-interactive client applications. They suggest OpenXML. ClosedXML just makes it a bit easier to use OpenXML.
I’m on it, it was mentioned above 😉 Really my favourite so far.
I have used VSTO and C#
https://docs.microsoft.com/en-us/visualstudio/vsto/create-vsto-add-ins-for-office-by-using-visual-studio?view=vs-2019
I know right, you’d think so? Wish they’d just integrate office with python somehow.
Anyway, for super-fast IO to generate Excel files without needing Excel installed I use DocumentFormat.OpenXml. It’s quite granular and a bit of a learning curve, but you can figure it out. It’s essentially manipulating the compressed xml that the files are built on (if you change the .xlsx to .zip and unzip it you have access to the xml directly).
This will not work on .xls files, however.
For interop there’s VSTO “Office/Sharepoint development” workload in your Visual Studio installer which lets you use excel addin projects templates to get started more easily.
Thanks, that’s helpful!