Jump to content


Photo

Replicating The Success Of The Openpandora (Discussion V2.0)


  • Please log in to reply
7 replies to this topic

Poll: Xilinx Zync-7000 Dual-Core A9 + built-in FPGA for OpenPandora v2 and more (13 member(s) have cast votes)

What would you pay for a 7in hand-held with a Zync 7030 Dual ARM A9 CPU+FPGA?

  1. below $100 (1 votes [7.69%] - View)

    Percentage of vote: 7.69%

  2. $100 to $150 (5 votes [38.46%] - View)

    Percentage of vote: 38.46%

  3. $150 to $200 (1 votes [7.69%] - View)

    Percentage of vote: 7.69%

  4. $200 to $250 (3 votes [23.08%] - View)

    Percentage of vote: 23.08%

  5. $250+ (3 votes [23.08%] - View)

    Percentage of vote: 23.08%

What would you pay for a FSF Hardware-Endorsed 12in Laptop with a Zync 7030 Dual ARM A9 CPU+FPGA

  1. below $150 (3 votes [23.08%] - View)

    Percentage of vote: 23.08%

  2. $150 to $200 (2 votes [15.38%] - View)

    Percentage of vote: 15.38%

  3. $200 to $250 (2 votes [15.38%] - View)

    Percentage of vote: 15.38%

  4. $250 to $300 (4 votes [30.77%] - View)

    Percentage of vote: 30.77%

  5. $300 to 350 (1 votes [7.69%] - View)

    Percentage of vote: 7.69%

  6. $350+ (1 votes [7.69%] - View)

    Percentage of vote: 7.69%

Would you be prepared to sponsor a FSF Hardware-Endorsed Modular Design with a Zync 7030 and 1gb DDR3 RAM (in KickStarter style)?

  1. yes (4 votes [25.00%] - View)

    Percentage of vote: 25.00%

  2. no (7 votes [43.75%] - View)

    Percentage of vote: 43.75%

  3. yes - through Kickstarter itself (3 votes [18.75%] - View)

    Percentage of vote: 18.75%

  4. yes - through the present OpenPandora Team (2 votes [12.50%] - View)

    Percentage of vote: 12.50%

Vote Guests cannot vote

#1 lkcl

lkcl

    GP32 User

  • Member
  • PipPipPip
  • 55 posts

Posted 20 August 2011 - 10:24 PM

i hope that posting here is still ok. the original discussions are here:
http://www.gp32x.com...a-netbooklaptop
http://www.gp32x.com...280x800-laptop/

i am still looking for CPUs that would captivate people's interest, and am also looking for projects that are willing to collaborate and thus share on both hardware development costs as well as volume pricing.

i believe i may have found a modern CPU that has the potential to be of interest to a very wide range of people, for both Commercial as well as Free Software Development purposes. it's the Xilinx Zync-7000 series:
http://www.eetimes.c...M-based-devices

note, right at the bottom, that the anticipated pricing for the entry-level device is under $USD 15, and that's for a 28nm Dual-Core 800mhz Cortex A9 with some really quite acceptable "hard" peripherals (2x USB2, 2x Gigabit Ethernet, CAN-bus, I2C, SPI, 2x SD/MMC) as well as a lowly 25k Logic Gates on-board FPGA. by the time you get to the mid-range (the 7030), you get 125k Logic Gates, 4x PCI-e 2.0 and 4x Serial Transceivers capable of 6.6gbit/sec each. a very rough guess on the pricing for the 7030 would be something between $25 and $30 (even in mass-volume).

What about peripherals?

here's the kicker: there's no LCD driver, there's no SATA-II controller, there's no HDMI DVI or VGA, there's no on-board 3D GPU, and there's no proprietary MPEG or other Video or Audio encoding or decoding engine.

but, far from making the CPU completely unacceptable, it's actually the complete opposite, because the on-board FPGA can do all of those tasks, and a lot more besides. thus, it makes this CPU perfect for both FSF Hardware-endorseable purposes as well as for a wide range of commercial uses.
http://www.fsf.org/n...sement-criteria

so you want an SATA-II controller? use two of the Programmable Serial Transceivers, and program the FPGA to talk to the SATA device.

you want an RGB-TTL LCD output? use 30 pins directly off the FPGA side, drive them at 75mhz and knock yourself out.

you want an HDMI or a DVI output? use a Texas Instruments TFP410 driver, use another 30 pins to drive it from the FPGA side, and you're done. http://www.ti.com/lit/gpn/tfp410 although the fact that the 7030 has 4x Serial Transceivers, if you didn't also want SATA-II you could probably generate the HDMI or DVI signals directly (you need 3 Serial Transceivers for HDMI, leaving only one for SATA-II, which would not be enough: total required is 5 Serial Transceivers).

you want to do a Microphone? you can use any one of the 12-bit ADC converters. a Speaker? just use a Class D Amplifier (PCM) and the speed of the FPGA would be more than able to turn DMA audio data into PCM on one of its outputs.

but this is getting ahead, and is a little technical - i just wanted to illustrate that the fact that there are no "major" peripherals is not a disadvantage, it's actually an advantage, because it's possible to implement all of those with the FPGA.

What about 3D Graphics

so, whilst peripherals are covered, things like 3D are not. or... are they? i looked at the OpenGraphics Project, and it turns out that they are using a Spartan 3 XC3S4000, which may turn out to be roughly equivalent to the FPGA on-board the Zync 7030.

so what does that mean? it means that a system based around a Xilinx Zync-7000 ARM-FPGA CPU Hybrid could quite possibly not only be an OpenGraphics 3D Video Card, but also the CPU itself could be used inside an actual computer based solely around that very same CPU, and still have 3D graphics, thanks to the OGP Project!

there are two critical things to point out here. the first is that, with the right modular hardware design, a module containing a Zync ARM+FPGA CPU plus 1gb of RAM could be a "plugin component" onto both an OGP PCI-e card as well as being used in all other projects envisaged (Laptop, Hand-held, Engineering Board, Server etc.). the second thing is that the fact that the Dual CPUs already have NEON and there is already MesaGL software support (Free Software) for the ARM CPU, it is not inconceivable to perform bit-by-bit development and acceleration off to the FPGA in order to slowly gain better and better performance. Gallium3D would be the ideal starting point, here. http://en.wikipedia.org/wiki/Gallium3D

in fact, given that Gallium3D can run on LLVM, and LLVM has support since version 2.7 for Xilinx FPGAs (MicroBlaze), i would hazard a guess that actually, the job is virtually completed before it's even begun! http://llvm.org/rele...leaseNotes.html (update: it's not :) fortunately, the Zynq-7030 has 4 PCI-e lanes: it is therefore annoying to consider, but feasible to use a standard PCI express 3D Chipset)

The Gnu Radio Project

there is one other technology area which may find the Zync-7000 CPU particularly exciting. non-free wireless firmware is particularly galling to the Free Software Community. particularly for things like the Open Base Station Project and the Open Handset Projects. both of these projects use the GNU Radio Project as the basis for experimentation and, sometimes, actual live deployment.

however, the hardware is prohibitively expensive, mostly because it's classed as "Test Equipment" in order to actually be sold without a license, but also because it contains, like the OGP Project, a rather beefy FPGA. i have reached out to the Gnu Radio Project, to see if there is someone there who could evaluate the Zync-7000 series for use in GNU-Radio. the fact that the Zync-7000s have 12-bit ADC converters directly wired to the FPGA could mean that it would work very well as a BaseBand processor.

Handhelds, Games Consoles, Tablets, Laptops

These are just different form-factor chassis into which a module, with the Zync-7000 CPU and 1gb of RAM, could be directly plugged. Once a module with all the "hard work" exists (CPU+RAM), different kinds of chassis can be done, if they are run-of-the-mill enough, as either 2-layer or 4-layer boards, in around 2 weeks.

Of course, getting the case-work done is a different matter, as anyone who has an OpenPandora knows. the lesson learned there is to avoid that entirely by re-using existing case-work. Luckily, OpenPandora v1.0 already has existing casework!

In China, there are Industrial Flea Markets. these are multi-storey buildings where they are divided by floor for different types of devices and components. The ground floors are dedicated to finished products. 1st is usually peripherals. 2nd floor and up you get discrete components - resistors and ICs in any quantity you need. Top floor you get empty cases by the bucket-load, from discarded designs that were successful last year (or not). Licensing the designs for these cases would save vast amounts of time and money. There are thousands to choose from, so any product can be made at a perfectly reasonable price, with very little risk.

Conclusion

The Xilinx Zync-7000 looks like it could be a game-changer. Not only is it attractive to Free Software Developers, but also its price, performance and flexibility make it worthwhile to build competitive mass-market devices around. It's basic, having no "modern" interfaces only the basic ones such as USB2 and Gigabit Ethernet, but then the FPGA side more than makes up for that - including for Video Decode as well as 3D Graphics.

So the real question is: who would be interested to see a device made around this CPU, and if so, what kind of device, and, most importantly, what would you be prepared to pay to make it happen?

(I have to contact him to check, but I already have one person who is willing to put up $USD 1,000 towards the creation of a Laptop that meets the FSF Hardware-Endorsement Criteria. All other CPUs that meet the FSF Criteria that I have heard about are simply not worthwhile dealing with (too slow, too few features), but the Zync-7000 not only meets the FSF Criteria but is also one of the most exciting CPUs I've ever encountered)

Edited by lkcl, 21 August 2011 - 08:06 PM.


#2 lkcl

lkcl

    GP32 User

  • Member
  • PipPipPip
  • 55 posts

Posted 20 August 2011 - 11:05 PM

here are some links (which will be added to as they occur) to threads on the various free software projects mentioned, above. hopefully the developers on each list will pick up the topic(s) to help evaluate whether use of the Zynq 7030 CPU is actually practical or even exciting.


Edited by lkcl, 21 August 2011 - 04:08 PM.


#3 Exophase

Exophase

    Exophase is bad. Nothing good will ever come of him.

  • GP Guru
  • 5464 posts
  • Location:Cleveland OH

Posted 22 August 2011 - 03:35 AM

This FPGA will be cool for a lot of embedded applications but looks like a bad choice for a general purpose computing device, especially a laptop. You seem to think that the FPGA portion is capable of magic, and quite frankly you're dreaming if you think the equivalent of "25k cells" (or "125k cells" for that matter) can accomplish any of those big peripherals you want, let alone all of them. Sure, it'll probably work for high speed transceivers like SATA and even give you good analog functionality but you won't get the big logic functionality which is absolutely vital in a laptop today. (also, 12-bit ADCs would give you a shitty microphone)

Look at the CPU; by 28nm standards it's nothing special - only 800MHz and only 512KB of L2 cache, which will suggest to you that they chose a size optimized build over a performance optimized one. And yet this is precisely what they decided to do a hard macro for, which should suggest to you that the available FPGA fabric can't compete if you were to implement a CPU in it instead. Now consider that most high end Cortex-A9 SoCs these days actually spend more die space on their GPUs and video encode/decode than the CPU cores and L2 cache. And even if you had the space, FPGAs can't clock nearly as high as fixed function implementation equivalents. And the perf/W is worse.

Furthermore, even if GPU or video IP could feasibly fit on these things, and I'm going to go further and give it the benefit of assuming we're okay with configuring just one of these at a time (definitely NOT okay with unloading the logic for all those transceivers though), the available IP out there isn't going to be nearly as advanced as high end SoCs. OpenGraphics is really not very special at all and this is before you hobble it with something much less than 128-bit dedicated memory. Their target frame rate for Quake 3 is not very impressive. Anyway, I saw that you made the comparison of being able to fit their IP based on comparing "logic cell" counts; what you don't realize is that this is not a fixed metric but varies depending on FPGA design. This is why the "equivalent gates" per "logic cells" ratios are so different between the devices you compared. Furthermore, most likely a lot of work would have to be done to port from their FPGA to this. It's not nearly as trivial as you make it out to be.

Even the advantage of flexibility the FPGA offers is going to be mostly rendered moot by being put in a laptop. The benefit here is the ability to customize hardware to fit your needs. The laptop would have its I/Os affixed to connectors already, you would have greatly reduced capability to pick and choose what you want it to do be doing.

Since we're talking 28nm (probably TSMC) and just-announced-today I don't expect this to be out anytime soon. By the time it is other options will probably be available, that'll offer everything you need for a laptop (SATA, HDMI etc) while having better CPU cores (dual core A15) and much better GPU and video than you could ever hope to get out of an FPGA like this. FPGAs are great for custom hardware but they're not a good solution for implementing a bunch of stuff you can get in a real hardware package.

Edited by Exophase, 22 August 2011 - 03:51 AM.


#4 lkcl

lkcl

    GP32 User

  • Member
  • PipPipPip
  • 55 posts

Posted 22 August 2011 - 11:37 AM

This FPGA will be cool for a lot of embedded applications but looks like a bad choice for a general purpose computing device, especially a laptop. You seem to think that the FPGA portion is capable of magic, and quite frankly you're dreaming if you think the equivalent of "25k cells" (or "125k cells" for that matter) can accomplish any of those big peripherals you want, let alone all of them. Sure, it'll probably work for high speed transceivers like SATA and even give you good analog functionality but you won't get the big logic functionality which is absolutely vital in a laptop today. (also, 12-bit ADCs would give you a shitty microphone)


i looked up vivante's GC400 - it's implemented in 850kgates. that's only 40% of the estimated-equivalent 1.9m of the Zynq-7030's 125k cells budget. and that's OpenGL ES 2.0 and more. so i'd say it's definitely doable.

Look at the CPU; by 28nm standards it's nothing special - only 800MHz and only 512KB of L2 cache, which will suggest to you that they chose a size optimized build over a performance optimized one. And yet this is precisely what they decided to do a hard macro for, which should suggest to you that the available FPGA fabric can't compete if you were to implement a CPU in it instead.


look at the posts from the llvm-dev team member, nick - you don't implement a full CPU, you implement a partial one, that cooperates with and is guided by the main one. nick even points out that using LLVM->FPGA you could likely experiment on the design of that cut-down CPU from a high-level language. the thread starts here http://lists.cs.uiuc...ust/042560.html i'll edit this later when the message arrives for display in the archives.

Now consider that most high end Cortex-A9 SoCs these days actually spend more die space on their GPUs and video encode/decode than the CPU cores and L2 cache. And even if you had the space, FPGAs can't clock nearly as high as fixed function implementation equivalents. And the perf/W is worse.


yeees... but absolute total keep-up-with-the-joneses isn't the goal here. even with some absolute basic 3D acceleration (even if it just offloaded vector multiplication for example) something that would provide screamingly-good performance for 800x600 use on an OpenPandora v2.0 would do "good enough" performance for the majority of desktop uses at 1280x768 or larger.

it's a different goal.... that could not even be remotely considered if the Zynq-7000 series did not exist.

Furthermore, even if GPU or video IP could feasibly fit on these things, and I'm going to go further and give it the benefit of assuming we're okay with configuring just one of these at a time (definitely NOT okay with unloading the logic for all those transceivers though), the available IP out there isn't going to be nearly as advanced as high end SoCs. OpenGraphics is really not very special at all and this is before you hobble it with something much less than 128-bit dedicated memory. Their target frame rate for Quake 3 is not very impressive. Anyway, I saw that you made the comparison of being able to fit their IP based on comparing "logic cell" counts; what you don't realize is that this is not a fixed metric but varies depending on FPGA design. This is why the "equivalent gates" per "logic cells" ratios are so different between the devices you compared. Furthermore, most likely a lot of work would have to be done to port from their FPGA to this. It's not nearly as trivial as you make it out to be.


i realise that. however i believe after seeing the Vivante GC400 gatecount (850k when the 7030 is equivalent to 1.9m), i'm not counting on it but i'd imagine that a 120% margin of wiggle-room would be enough!

Even the advantage of flexibility the FPGA offers is going to be mostly rendered moot by being put in a laptop. The benefit here is the ability to customize hardware to fit your needs. The laptop would have its I/Os affixed to connectors already, you would have greatly reduced capability to pick and choose what you want it to do be doing.


ah. no. that's where you're wrong. you have to bear in mind that i've been working for over 18 months behind this, part of the planning is a modular architecture where the CPU, RAM, NAND Flash and some of the standard peripherals (HDMI, USB-OTG) is on a separate user-removable hot-swappable card.

so no, the laptop would NOT have its I/Os affixed to connectors already: you would NOT have reduced capability to pick and choose what you want it to be doing.

see below for some of the details about the split strategy.

Since we're talking 28nm (probably TSMC) and just-announced-today I don't expect this to be out anytime soon. By the time it is other options will probably be available, that'll offer everything you need for a laptop (SATA, HDMI etc) while having better CPU cores (dual core A15) and much better GPU and video than you could ever hope to get out of an FPGA like this.


i haven't described everything, here, exo - there's a lot going on behind the scenes.

there are always going to be "better CPUs with better GPUs". if you don't throw a stake in the ground, pick one and run with it, you will always be chasing after pipe dreams. if on the other hand you throw a stake in the ground at last year's CPU, you end up developing something that nobody wants by the time it's done.

that's where the strategy that myself and my associates have been working on comes into play. the "laptop" is just a shell. the "games console" is just a shell. it takes a 2in x 4in CPU Card, containing the CPU, RAM, NAND Flash, HDMI out, Micro-SD and USB-OTG. the other end has a connector with SATA-II, 10/100 Ethernet, USB2, 24-pin RGB/TTL, I2C and some GPIO.

this "splitting" of the devices means that the development of a new CPU Card can take place within 6-10 weeks, yet there is absolutely no "total unit redesign" required. nobody needs to consider unscrewing their device in order to take advantage of the new CPU Card. just pop it out, and sell it on ebay.

so with this strategy, the nightmare of the OpenPandora casework fiasco need not be gone through all over again.

*and* the same CPU card that goes into your OpenPandora v2.0 can also be swapped out into a 12in Laptop Chassis.

FPGAs are great for custom hardware but they're not a good solution for implementing a bunch of stuff you can get in a real hardware package.


actually, in terms of FSF Hardware Endorsement Criteria, precisely the opposite is true. follow the chain here. with FSF Hardware Endorsement comes the possibility of FSF funding and PR.

i realise there's a lot going on here, exo. there are a dozen different possibilities that i'm pursuing: this thread is just a glimpse into some of them.

#5 lkcl

lkcl

    GP32 User

  • Member
  • PipPipPip
  • 55 posts

Posted 22 August 2011 - 07:09 PM

sorry exo i missed this earlier:

(also, 12-bit ADCs would give you a shitty microphone)


ah c'moon, exo - have you seen the specs on 3G / GSP modules such as the ones from Telit? they only do 12-bit sampling @ 8khz because the GSM protocol can't get the data to the other end any better than that :)

#6 Exophase

Exophase

    Exophase is bad. Nothing good will ever come of him.

  • GP Guru
  • 5464 posts
  • Location:Cleveland OH

Posted 22 August 2011 - 07:40 PM

i looked up vivante's GC400 - it's implemented in 850kgates. that's only 40% of the estimated-equivalent 1.9m of the Zynq-7030's 125k cells budget. and that's OpenGL ES 2.0 and more. so i'd say it's definitely doable.


You're a bit off..

Synthesis Gate Count (ND2D1 gates) 1.23 M 1.2 M
http://www.vivanteco..._mvr.html#GC400

Note that in the real world synthetic gate counts and "equivalent gates" on an FPGA often don't match up so well. Logic cells are pretty high level elements that tend to be capable of a lot more work than they're being used for, so in an ideal configuration could be used as a lot more "equivalent gates" than most real synthesis would allow for. I don't know what FPGA marketing is like, if that number is idealized or reflects real projects.

Of course we also don't know other important factors like how many SRAM blocks are on this. And there's an alleged price for the much smaller 25k cell version, undoubtedly the 125k one will cost significantly more..

I also take the Vivante numbers with some salt without any performance benchmarks of it in the wild. And I would hope this goes without saying, but for a design of the same logic complexity you wouldn't achieve nearly the same clock speed on an FPGA, so if you're expecting to take a design like that and do something similar on this... even the 28nm over 40nm advantage wouldn't buy you nearly enough...

look at the posts from the llvm-dev team member, nick - you don't implement a full CPU, you implement a partial one, that cooperates with and is guided by the main one. nick even points out that using LLVM->FPGA you could likely experiment on the design of that cut-down CPU from a high-level language. the thread starts here http://lists.cs.uiuc...ust/042560.html i'll edit this later when the message arrives for display in the archives.


You totally missed my point. This isn't about implementing a CPU to perform GPU functions, it's about relative die space. On high end SoCs good GPUs and video encode together take way more die space than CPUs. Yet it is the CPU that Xilinx chose to make a hard macro for, implying that they didn't have enough fabric to implement a CPU of that complexity. Which would mean they don't have enough fabric to implement a competitive GPU..

Compiling shaders straight to HDL, which is what you seem to be implying, is a neat idea, but has several problems:

1) Hardware description languages and software languages like GLSL don't actually match that well and of course there's no kind of backend for this out there
2) It would place an awkard hard limit on shader size that's not at all constant or obvious but instead proportional to the complexity of the shader's instructions
3) Most importantly, shader compilation is a real time activity that has to be fast because drivers are modifying shaders real-time to accommodate state changes.. if you've ever synthesized something for an FPGA you'd know it's extremely time consuming to do a proper job of it.

yeees... but absolute total keep-up-with-the-joneses isn't the goal here. even with some absolute basic 3D acceleration (even if it just offloaded vector multiplication for example) something that would provide screamingly-good performance for 800x600 use on an OpenPandora v2.0 would do "good enough" performance for the majority of desktop uses at 1280x768 or larger.

it's a different goal.... that could not even be remotely considered if the Zynq-7000 series did not exist.


What is the goal, purely to make FSF-pundits feel warm and fuzzy at the absolute expensive of functionality? Or is it something end users at large are supposed to care about? This isn't about being "the best", it's about relative metrics like perf/W and perf/$, of course with SOME performance baseline.

I don't know how you expect "screamingly good performance" for a Pandora 2, I'd be surprised if you even matched the Pandora 1 in GPU performance.. you can't keep up with an FPGA, the real Pandora 2 would have such a big advantage in selecting from real SoCs.

If you think that a GPU just needs to offload shader stuff then you've missed why GPUs exist as they do. There's a lot of important fixed function hardware that CPUs are still not all that good at. Offload triangle setup, rasterization, and texturing to a Cortex-A9 with NEON and your performance will become a sad joke.

i realise that. however i believe after seeing the Vivante GC400 gatecount (850k when the 7030 is equivalent to 1.9m), i'm not counting on it but i'd imagine that a 120% margin of wiggle-room would be enough!


You know what, come back to me with open IP that matches Vivante's claims and we'll talk.

ah. no. that's where you're wrong. you have to bear in mind that i've been working for over 18 months behind this, part of the planning is a modular architecture where the CPU, RAM, NAND Flash and some of the standard peripherals (HDMI, USB-OTG) is on a separate user-removable hot-swappable card.

so no, the laptop would NOT have its I/Os affixed to connectors already: you would NOT have reduced capability to pick and choose what you want it to be doing.

see below for some of the details about the split strategy.


I get that idea, and yes it's great for putting out a few different products. What I'm saying is, how does the end user benefit from this design after he has purchased one of these products? Once they have whatever configuration they're locked into what IP they can use, which already sounds like it's going to be a development mess for anyone who does want to target more than one configuration...

i haven't described everything, here, exo - there's a lot going on behind the scenes.

there are always going to be "better CPUs with better GPUs". if you don't throw a stake in the ground, pick one and run with it, you will always be chasing after pipe dreams. if on the other hand you throw a stake in the ground at last year's CPU, you end up developing something that nobody wants by the time it's done.


Dude, you say yourself you've spent 18 months thinking about this, and now you're thinking about jumping about something that was announced a few days ago, is highly risky, and would have at BEST a huge set of new HDL development work necessary to make it into a product. Don't tell ME about finalizing designs and not chasing pipedreams. When I talk about better 28nm processors I'm not talking about something "just around the corner", I'm talking about the contemporaries of this FPGA.. no, I'm talking about things that'll probably be available before it is.

that's where the strategy that myself and my associates have been working on comes into play. the "laptop" is just a shell. the "games console" is just a shell. it takes a 2in x 4in CPU Card, containing the CPU, RAM, NAND Flash, HDMI out, Micro-SD and USB-OTG. the other end has a connector with SATA-II, 10/100 Ethernet, USB2, 24-pin RGB/TTL, I2C and some GPIO.


Yes, I realize that, and IMO it's the best idea you've actually presented here. Mind you, you're not the only ones doing this - HardKernel is, for instance.

this "splitting" of the devices means that the development of a new CPU Card can take place within 6-10 weeks, yet there is absolutely no "total unit redesign" required. nobody needs to consider unscrewing their device in order to take advantage of the new CPU Card. just pop it out, and sell it on ebay.


.. you want the card to be accessible externally to the deviecs? Now that I'm skeptical about. Not quite as skeptical as your ability to spin cards in 6-10 weeks though ;p

actually, in terms of FSF Hardware Endorsement Criteria, precisely the opposite is true. follow the chain here. with FSF Hardware Endorsement comes the possibility of FSF funding and PR.

i realise there's a lot going on here, exo. there are a dozen different possibilities that i'm pursuing: this thread is just a glimpse into some of them.


Yes, okay, follow the possibility of the FSF funding you for using an FPGA. I'd be pretty amazed if that actually happened. But don't mind me, I'm not really in with the FSF religion.

#7 lkcl

lkcl

    GP32 User

  • Member
  • PipPipPip
  • 55 posts

Posted 22 August 2011 - 11:50 PM

acchhh... exophase, i have to cut this into two: the forum says i have "exceeded the allowed number of quoted blocks of text, can you believe it? pah.

You're a bit off..

Synthesis Gate Count (ND2D1 gates) 1.23 M 1.2 M
http://www.vivanteco..._mvr.html#GC400


*click*... i am, aren't i? :) where the hell did i get 850k from? i think i been staring at too many web site specs yesterday...

Note that in the real world synthetic gate counts and "equivalent gates" on an FPGA often don't match up so well. Logic cells are pretty high level elements that tend to be capable of a lot more work than they're being used for, so in an ideal configuration could be used as a lot more "equivalent gates" than most real synthesis would allow for. I don't know what FPGA marketing is like, if that number is idealized or reflects real projects.

Of course we also don't know other important factors like how many SRAM blocks are on this. And there's an alleged price for the much smaller 25k cell version, undoubtedly the 125k one will cost significantly more..


i spoke to one of the distributors today: he tells me that already, xilinx have dropped the 25k version. something about matching the corresponding series 7 FPGAs, which have been slightly re-jigged, and the zync-7000 series match closely with those. so, bye bye series 7 25k, bye bye z7010: hello series 7 50k, hello z70NN. whatever - you get it :)

but yes, it does put the 7030 into doubt somewhat. i'll find out in a few weeks... when i want to know _now_! :)

(snip...)

You totally missed my point. This isn't about implementing a CPU to perform GPU functions, it's about relative die space. On high end SoCs good GPUs and video encode together take way more die space than CPUs. Yet it is the CPU that Xilinx chose to make a hard macro for, implying that they didn't have enough fabric to implement a CPU of that complexity. Which would mean they don't have enough fabric to implement a competitive GPU..


... i did, didn't i? so yes, granted: there would not be enough fabric to implement a competitive GPU (unless the FPGA is way more die space than the GPU)... but implementing a _competitive_ GPU isn't the goal: the goal is to do the best job possible to get _some_ "acceptable" performance, with a hybrid CPU-FPGA mix, that passes for a saleable mass-volume low-cost product.

that's a very very different goal from "keeping up with the joneses, to sell based on absolute absolute raw performance, beat the hell out of the absolute latest and greatest".

Compiling shaders straight to HDL, which is what you seem to be implying, is a neat idea, but has several problems:

1) Hardware description languages and software languages like GLSL don't actually match that well and of course there's no kind of backend for this out there
2) It would place an awkard hard limit on shader size that's not at all constant or obvious but instead proportional to the complexity of the shader's instructions
3) Most importantly, shader compilation is a real time activity that has to be fast because drivers are modifying shaders real-time to accommodate state changes.. if you've ever synthesized something for an FPGA you'd know it's extremely time consuming to do a proper job of it.


yehhh... i think this occurred to a couple of other people who've considered this, as well, and the general feeling has been that it would be better to follow in the footsteps of existing GPU design methodology. there's a really good post on the phoronix site, but it boils down to: you emulate a micro CPU (pre-compiled into the FPGA) which gets fed by the main CPU with pre-translated instructions. so the CPU performs the GLSL-to-microcode translation, and the FPGA "executes" that. then, any cacheing of GLSL-to-microcode can be carried out by the CPU. esp. shader compilation.

i'd agree absolutely that to try to use the FPGA as a "direct" target for the shader compilation would be... absolutely insane :)

http://phoronix.com/...-A9-FPGA-Hybrid

ok... couple other points - please excuse me if i skip them, exo - they're valid questions: i'd just like to keep this a little shorter, if that's ok.

you have to bear in mind that i've been working for over 18 months behind this, part of the planning is a modular architecture where the CPU, RAM, NAND Flash and some of the standard peripherals (HDMI, USB-OTG) is on a separate user-removable hot-swappable card.

so no, the laptop would NOT have its I/Os affixed to connectors already: you would NOT have reduced capability to pick and choose what you want it to be doing.

see below for some of the details about the split strategy.


I get that idea, and yes it's great for putting out a few different products. What I'm saying is, how does the end user benefit from this design after he has purchased one of these products?


1) by being able to quite literally take his "computer" in the top shirt pocket with him. work at home, take CPU card out, drive to work, put it into big wide-screen LCD on desk, and carry on. finish work, pop it out, drive home, plug it into "Games Console" chassis. play games. get bored of playing games, want to surf internet, pop it out, put CPU card into large laptop.

2) by being able to upgrade to a newer faster better CPU card - when they come along. *NOT* having to sell the entire device.

3) by being able to upgrade to a newer better chassis. or a smaller one. or whatever you choose.

4) by being able to still have your entire device operational, by putting the CPU card into a "spare" chassis, *WITHOUT* having to send back the entire device just because the screen broke. or the keyboard.

so there are so many advantages it's hard to see why this hasn't been done before. i do know some rather cynical commercial greed-based reasons and i'm rather counting on companies who have such corporate greed at heart _staying_ that way :)


Once they have whatever configuration they're locked into what IP they can use, which already sounds like it's going to be a development mess for anyone who does want to target more than one configuration...


why would they be "locked in"? i don't understand.

but - yes, the initial chassis designs, whilst writing linux kernel code that covers on one side of the main connector all the possible devices that are supported, and on the other side of the main connector all the CPUs that are supported, it could get quite hair-raising :)

and if one 3rd party company decides to create a non-compliant CPU card, not get the electrical connections right jaeeesuz we're into a mess - but no more so than if some idiot makes a device which doesn't fully conform to e.g. the PCI bus - it's no different in that regard. it's a grouping of existing standards (ETH, SATA, USB, RGB/TTL, I2C), how hard can that be? :)

but yes: the other detail is that the power-up is the I2C bus first. read an I2C EEPROM, which says what the configuration is, of the motherboard. then, using the latest linux kernels, it would be a simple matter of picking the right "device tree" to load.

how hard is that? :)

i haven't described everything, here, exo - there's a lot going on behind the scenes.

there are always going to be "better CPUs with better GPUs". if you don't throw a stake in the ground, pick one and run with it, you will always be chasing after pipe dreams. if on the other hand you throw a stake in the ground at last year's CPU, you end up developing something that nobody wants by the time it's done.


Dude, you say yourself you've spent 18 months thinking about this, and now you're thinking about jumping about something that was announced a few days ago, is highly risky, and would have at BEST a huge set of new HDL development work necessary to make it into a product. Don't tell ME about finalizing designs and not chasing pipedreams. When I talk about better 28nm processors I'm not talking about something "just around the corner", I'm talking about the contemporaries of this FPGA.. no, I'm talking about things that'll probably be available before it is.


*lol* ok ok i know. i got all overexcited about this particular processor, when i'm also talking behind the scenes to a factory that has access to some existing CPUs. the problem with the present CPU that this factory has been using is that it's limited to a maximum of 512mb RAM. beh.

#8 lkcl

lkcl

    GP32 User

  • Member
  • PipPipPip
  • 55 posts

Posted 22 August 2011 - 11:51 PM

that's where the strategy that myself and my associates have been working on comes into play. the "laptop" is just a shell. the "games console" is just a shell. it takes a 2in x 4in CPU Card, containing the CPU, RAM, NAND Flash, HDMI out, Micro-SD and USB-OTG. the other end has a connector with SATA-II, 10/100 Ethernet, USB2, 24-pin RGB/TTL, I2C and some GPIO.


Yes, I realize that, and IMO it's the best idea you've actually presented here. Mind you, you're not the only ones doing this - HardKernel is, for instance.


ahh, yes - i know them, and love what they've done. they have two modules. S5PC110 (1ghz Cortex A8) and an S5PC210 (enyxos 1ghz Dual A9). the problem is

  • hardkernel modules are not user hot-swappable. the design we've done is.
  • hardkernel's modules are not inter-incompatible (you can't take the S5PC110 module and plug it into the latest chassis, nor plug the S5PC210 module into ODroid-T)

there do exist companies who have managed to get quite significant interoperability (usually in 200-pin SO-DIMM form-factor), but they're definitely not user-serviceable parts! directinsight i believe is one of them who have done this.

but the significant difference is that whilst the majority of these are 200 pin or in some cases 400 (such as the triton OMAP4430 module), the arrangement we've designed is... just 68 pins. it's pushing it prettty tight, but that's where a good chunk of that 18 months has gone, into coming up with use-cases, sorting out how to do audio (USB), finding a USB-powered audio IC (that's actually affordable!); sorting out how to do touchscreens (I2C), finding an I2C touchscreen controller rather than an SPI one; etc. etc. can you get away with USB-to-SDCard, if you want a larger SD/MMC slot on the main chassis? yes. how much does a USB-to-SDcard IC cost in mass-volume? $USD 1. great - i'll take it :)

then we also have gone over a large range of mass-produced devices, thinking "does this limited number of pins still work if we convert device of type X to a hot-swappable user-removable CPU card?" and in every case but things like "multi-disk NAS Storage Boxes" and "Blade Servers with multiple Gigabit Ethernet ports", the answer has been "yes".

think of the number of low-cost mass-market devices which need only one USB2 (you can add a hub on the motherboard), one SATA-II, one 10/100 Ethernet, and an LCD panel or a VGA/DVI on the motherboard, and it all fits together. and don't forget the HDMI, Micro-SD and USB-OTG on the "front-facing" side of the CPU card!

this "splitting" of the devices means that the development of a new CPU Card can take place within 6-10 weeks, yet there is absolutely no "total unit redesign" required. nobody needs to consider unscrewing their device in order to take advantage of the new CPU Card. just pop it out, and sell it on ebay.


.. you want the card to be accessible externally to the devices? Now that I'm skeptical about.


*lol*. ok... not so much "accessible externally" as "user-removable, pop it in the top pocket". this happens all the time with PCI-e modem cards, so why not the CPU card? think about it for a minute - mull the idea over... it goes "hmmmm" in the brain, doesn't it? :)

remember: embedded CPUs now - these ones in phones - they're ridiculously small, and would be quite happy in a 2in x 4in metal shielded case, along with a couple of RAM ICs and some NAND flash. an intel CPU? not a chance. but an ARM CPU? yeah...

Not quite as skeptical as your ability to spin cards in 6-10 weeks though ;p


ok :) if you try doing things "from scratch" yeah forget it. and a good 100k to 250k of your budget, too.

_but_... if you talk to a design house that _already_ has an existing design, or already has access to the SoC manufacturer's Engineering Board CAD/CAM PCB files, cutting out the bits of the board from that design which have the PMIC, CPU, DDR RAM and the NAND Flash on it, and transplanting those to a 2in x 4in board with practically nothing else on it, that's *not* hard.

and if anyone tries to tell you differently, for a particular CPU, then don't use their engineering services for that CPU! go find another designer who _does_ have experience with that CPU, who has a proven working design that they have, preferably, done a number of cut/paste jobs on before.

my friend spoke to the guys at simtec, last week, to ask them if they'd be interested in doing a CPU card with a completely open license on the PCB CAD/CAM files (a la openmoko). they said yes... but only if it was one of the CPU's that they have already done a board for.

the only exception to this rule is for the TI OMAP CPUs, which are so horrendously complex it doesn't matter _what_ level of experience you have, they're still a bitch to design PCBs around. hmm... funny that - the OpenPandora taking such a long time to develop... :) fortunately, TI have done something amazing, and released ORCAD files for beagleboard, pandaboard, beagleboard-xm and so on, and freescale are doing likewise. but, yeah, part-digression there.

actually, in terms of FSF Hardware Endorsement Criteria, precisely the opposite is true. follow the chain here. with FSF Hardware Endorsement comes the possibility of FSF funding and PR.

i realise there's a lot going on here, exo. there are a dozen different possibilities that i'm pursuing: this thread is just a glimpse into some of them.


Yes, okay, follow the possibility of the FSF funding you for using an FPGA. I'd be pretty amazed if that actually happened. But don't mind me, I'm not really in with the FSF religion.


*sigh* you know what? i am a fool, at heart. there i was, i got really excited about the combination of an FPGA and a Dual-Core Cortex A9, and i forgot to check two things. a) can you get samples now? (answer no: they'll be in general circulation Q1 2012) B) can you get a Free Software toolchain for Xilinx FPGAs?

if you know anyone who knows the answer to B) it would help enormously... in 6 months time.

but in the meantime, i'm still on the case with the split CPU / chassis strategy, so if the Zync-7030 doesn't work out, it's still possible to continue with multiple chassis designs as well as alternative CPU cards. using CPUs that exist *now* but are still attractive - just as we both realise is the only sensible thing to do. i can't tell you how many offers for tablets with VunnderMedia VIA ARM9, or RockChip ARM11 CPUs i've received.

ok. so. thank you for picking holes: i threw this out there in an over-enthusiastic way, but i maintain that the core strategy is still sound. ambitious, yes. covering absolutely all bases, no. can't have everything :)

thoughts?